Text generation method and electronic device
By using a parallel generative model to decompose the output pattern through multi-step learning, the problems of slow generation speed and insufficient accuracy of left-to-right model generation are solved, achieving fast and high-quality text generation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING YOUZHUJU NETWORK TECH CO LTD
- Filing Date
- 2022-12-02
- Publication Date
- 2026-06-12
AI Technical Summary
Existing left-to-right text generation models are inefficient and can only utilize local information, resulting in slow generation speed and insufficient accuracy.
A parallel generative model is adopted, which decomposes multiple output modes through multi-step learning, gradually increases the learning objectives, and combines intermediate outputs and output data to construct learning objectives. The parallel generative model is then trained to achieve fast and accurate text generation.
It achieves rapid text generation while improving the accuracy and quality of the generated text, and can effectively utilize overall information rather than just local information.
Smart Images

Figure CN115994545B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates generally to the field of computers, and more specifically to text generation methods and electronic devices. Background Technology
[0002] Artificial intelligence technology can be used to generate the required text through text generation models. Generally, text can be treated as a discrete sequence and obtained from left to right. For example, using machine learning, words (or characters) can be generated one by one from left to right to generate the entire sentence.
[0003] However, in left-to-right generation models, the generation efficiency is low because the generation of the next word (or character) is delayed until the previous word (or character) has finished generating. Furthermore, when generating the next word (or character), it can only rely on the word (or character) already generated to its left, resulting in the text generation process only utilizing local information from already generated words. Summary of the Invention
[0004] According to an example embodiment of this disclosure, a text generation scheme based on a parallel generation model is provided.
[0005] In a first aspect of this disclosure, a text generation method is provided, comprising: acquiring a trained parallel generation model, wherein the trained parallel generation model includes an encoder and a decoder, wherein the decoder includes multi-step learning during training, wherein the learning objective of the first step of the multi-step learning corresponds to a first number of output patterns, and the learning objective of the second step of the multi-step learning after the first step corresponds to a second number of output patterns, and the first number is not greater than the second number; and inputting input text into the trained parallel generation model to obtain output text.
[0006] In a second aspect of this disclosure, an electronic device is provided, comprising: at least one processing unit; and at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, the instructions causing the electronic device to perform the method described in the first aspect of this disclosure when executed by the at least one processing unit.
[0007] In a third aspect of this disclosure, a computer-readable storage medium is provided having machine-executable instructions stored thereon, which, when executed by a device, cause the device to perform the method described in the first aspect of this disclosure.
[0008] In a fourth aspect of this disclosure, a computer program product is provided, including computer-executable instructions, wherein the computer-executable instructions, when executed by a processor, implement the method described in the first aspect of this disclosure.
[0009] In a fifth aspect of this disclosure, an electronic device is provided, comprising: a processing circuit configured to perform the method described in the first aspect of this disclosure.
[0010] The summary section is provided to introduce a series of concepts in a simplified form, which will be further described in the detailed description below. The summary section is not intended to identify key or essential features of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description
[0011] The above and other features, advantages, and aspects of the embodiments of this disclosure will become more apparent from the accompanying drawings and the following detailed description. In the drawings, the same or similar reference numerals denote the same or similar elements, wherein:
[0012] Figure 1 A flowchart illustrating a model training process according to some embodiments of the present disclosure is shown;
[0013] Figure 2 A schematic diagram of a parallel generation model according to some embodiments of the present disclosure is shown;
[0014] Figure 3 A schematic diagram of intermediate learning objectives during model training according to some embodiments of the present disclosure is shown;
[0015] Figure 4 A schematic diagram of intermediate learning sampling during model training is shown according to some embodiments of the present disclosure;
[0016] Figure 5 A flowchart illustrating a text generation process according to some embodiments of this disclosure is shown;
[0017] Figure 6 Block diagrams of example apparatuses according to some embodiments of the present disclosure are shown; and
[0018] Figure 7 A block diagram of an example device that can be used to implement embodiments of the present disclosure is shown. Detailed Implementation
[0019] Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodiments of this disclosure are shown in the drawings, it should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.
[0020] As mentioned earlier, left-to-right text generation models have many shortcomings. Therefore, parallel generation models have been proposed, which can directly generate complete sequences. Parallel generation models generally include iterative and non-iterative models. Non-iterative models can generate the entire sequence directly in one step. Compared to left-to-right models, parallel generation models have a faster generation speed. However, parallel generation models do not offer an advantage in generation quality. A significant reason for the poor generation quality of parallel generation models is that each input may have multiple output results. Since parallel generation models do not depend on the words (or characters) on the left, they may produce incorrect output results. For example, "invent" and "create" are correct, but "fa zao" is incorrect. In other words, any sequence combination of multiple correct output results may produce errors.
[0021] To mitigate the problem of multiple outputs for a single input, iterative models can be used to generate the entire sequence in parallel through multiple iterations, with each iteration modifying the result of the previous one. Specifically, during the training of the iterative model, the sequence can be modified through operations such as replacement and deletion, and the model can learn how to reconstruct the target output based on the modified sequence. However, on the one hand, because the iterative model continuously learns the target output, the problem of multiple outputs for a single input still exists. On the other hand, the iterative model requires multiple iterations to generate text, resulting in a slow generation speed.
[0022] To address the aforementioned problems and other potential issues, embodiments of this disclosure provide a parallel generation model based on pattern decomposition. This parallel generation model can generate output text from input text. On one hand, it eliminates the need for multiple iterations and generates complete text sequences in parallel, thus achieving faster text generation. On the other hand, by considering learning objectives different from the target output at intermediate outputs, this model can solve the problem of multiple outputs corresponding to a single input.
[0023] It is understood that the embodiments of this disclosure can be applied to various text generation scenarios. Specifically, the text generation model in the embodiments of this disclosure can obtain output text based on input text. The input text and output text can be in the same language or different languages. In some examples, the solution of the embodiments of this disclosure can be used in machine translation scenarios, where the input text and output text can be in different languages. For example, the input text is Chinese, and the output text is English. Another example is that the input text is English, and the output text is French. In some examples, the solution of the embodiments of this disclosure can be used in article generation scenarios. For example, the input text is several keywords, and the output text is a paragraph. Another example is that the input text is an article, and the output text is an abstract. In some examples, the solution of the embodiments of this disclosure can be used in protein sequence modeling scenarios. For example, the input text is a part of a protein sequence, and the output text is the complete protein sequence. It should be noted that some of the scenarios listed here are only illustrative, and the embodiments of this disclosure can also be applied to other text generation scenarios, which will not be listed here one by one. For the sake of simplicity, the main embodiments below are illustrated using machine translation scenarios as an example.
[0024] In embodiments of this disclosure, the term "text" may also be referred to as a text sequence, discrete sequence, etc., which may include a sequence composed of multiple words, characters, etc.
[0025] Figure 1 A schematic flowchart of a model training process 100 according to some embodiments of the present disclosure is shown. In block 110, a training dataset is constructed, which may include multiple data items, each of which may include input data and output data. In block 120, a trained parallel generative model is generated based on the training dataset.
[0026] In some embodiments, the training dataset can be constructed based on existing data or on other existing models. For example, in a machine translation scenario, the training dataset can be constructed based on existing human-translated data; or it can be constructed based on, for example, a left-to-right text generation model; or a combination of both can be used to construct the training dataset. For instance, in a machine translation scenario, the input data can be a sequence in a first language, and the output data can be a translated sequence in a second language.
[0027] Suppose a data item in the training dataset is represented as (X1, Y1). In one example, Y1 could be obtained by manually translating X1. In another example, X1 could be input into a left-to-right generative model trained for machine translation, and the output of the left-to-right generative model could be Y1.
[0028] Optionally, in the case of human translation, different human translators can be used. For example, data item (X21, Y21) in the training dataset might be translated by one person, while another data item (X22, Y22) might be translated by another. Optionally, in the case of using a trained left-to-right generative model for machine translation, multiple different left-to-right generative models can be used. For example, data item (X31, Y31) in the training dataset might be based on one trained left-to-right generative model, while another data item (X32, Y32) might be based on another trained left-to-right generative model. Thus, the training data items in the training dataset can include various different output patterns, which can improve the training accuracy of the model.
[0029] It is understood that the above-listed methods for constructing training datasets are merely illustrative. In practical applications, other methods can be used to construct training datasets, which will not be listed in this disclosure.
[0030] In some embodiments of this disclosure, the parallel generation model differs from the left-to-right generation model in that it can determine the output in parallel, for example, simultaneously obtaining the output sequence, and a word in the output sequence does not depend on the output of preceding words. Optionally, the parallel generation model may be referred to as a text parallel generation model in this disclosure, or simply as a text generation model or generation model, etc., and this disclosure is not limited thereto. It is understood that, compared to the left-to-right model, the parallel generation model in the embodiments of this disclosure has a faster text generation speed.
[0031] In some embodiments, the parallel generative model may include an encoder and a decoder, wherein the decoder may include multi-step learning, and the learning objective of the first step of multi-step learning corresponds to a first number of output patterns, the learning objective of the second step of multi-step learning after the first step corresponds to a second number of output patterns, and the first number is not greater than the second number. That is, during training, as the number of multi-step learning increases, a smaller number of output patterns are learned first, and a larger number of output patterns are gradually learned.
[0032] Figure 2 A schematic diagram of a parallel generation model 200 according to some embodiments of the present disclosure is shown. For example... Figure 2 As shown, the parallel generative model 200 includes an encoder 210 and a decoder 220, wherein the decoder 220 can be learned through multiple steps (such as... Figure 2 The output result is obtained by the T-step shown. Optionally, the encoder 210 and / or decoder 220 may have multiple neural network layers.
[0033] For example, during training, input data 201 can be input to encoder 210 to obtain input vector 202, which is then input to decoder 220. Specifically, input vector 202 serves as the input for the 0th learning step of decoder 220. Further, the output of each learning step is used as the input for the next learning step. It is understood that through training, the aim is to expect the output 203 of the Tth learning step to be the same as or close to the output data.
[0034] In some embodiments, the input to each step of multi-step learning is an intermediate input to the model, and the output of each step is an intermediate output of the model. For example, each step has a corresponding learning objective. Optionally, for the first step of learning, the learning objective can be determined based on the intermediate output of the next step of learning and the output data.
[0035] In some examples, a first probability distribution of the intermediate output of the next learning step can be determined; a first product between the first probability distribution and the output data can be determined; a second product of element-wise multiplication between the output data and the intermediate output of the first learning step can be determined; a target probability distribution for the first learning step can be determined based on the first probability distribution, the first product, and the second product; and a learning objective for the first learning step can be determined based on the target probability distribution for the first learning step. Optionally, the intermediate output and the data output can be represented in one-hot encoded form, respectively.
[0036] For ease of description, let's assume the first learning step is step t, and the next learning step after the first step is step t+1. For example, during training, for each data item, the input data can be fed into the encoder to obtain an input vector, which is then fed into the decoder. The learning objective of the decoder's step t is determined based on the intermediate output of step t+1 and the output data. The following section combines... Figure 3 To describe, for example, the learning objective of step t.
[0037] Figure 3 A schematic diagram of an intermediate learning objective 300 during model training according to some embodiments of the present disclosure is shown. Figure 3 As shown, the text generation model may include an encoder 310 and a decoder 320, wherein the decoder 320 can be learned through multiple steps (such as... Figure 3 The T-step learning process shown in the figure yields the output results.
[0038] Taking a data item in the training dataset as an example, assume that the input data of this data item is "Thanks a lot!" and the output data is "太感谢了!" During the training process, "Thanks a lot!" 301 can be input into the generation model, for example, input into the encoder 310, and the expected goal of training is that the output of the generation model is or close to "太感谢了!" 302.
[0039] Specifically, the encoder 310 can obtain the vector representation of the input data 301 and use it as the input for the 0th step of learning of the decoder 320. Through the 0th step of learning of the decoder 320, an intermediate output can be obtained and used as the input for the 1st step of learning of the decoder 320, …, and in this way, the output of the Tth step of learning of the decoder 320 can be obtained.
[0040] Exemplarily, taking the tth step of learning as an example, the intermediate output of the tth step of learning can be represented as And the probability distribution form of the intermediate output of the tth step of learning can be represented as can be an N*V matrix, where N represents the sequence length and V represents the total number of words.
[0041] Exemplarily, each step of learning has a corresponding learning goal, and the learning goal of each step of learning is determined by interpolation based on the intermediate output of its next step of learning and the output data. As Figure 3 shown, taking the tth step of learning as an example, the learning goal of the tth step of learning can be determined based on the intermediate output of the (t + 1)th step of learning (i.e., ) and the output data (i.e., "太感谢了!", for example, represented as Y*) through interpolation 303.
[0042] Specifically, assume that the output data is represented as Y*, and in addition, the probability distribution of the intermediate output of the (t + 1)th step of learning is The intermediate output of the tth step of learning is Then, the target probability distribution of the tth step of learning can be determined by the following formula (1):
[0043]
[0044] In formula (1), α t 、β t and γ t are predefined parameters, ⊙ represents element-wise multiplication, Y * and are one-hot encoding representations. Further, the learning goal of the tth step of learning can be determined based on the target probability distribution. Specifically, the target probability distribution q tThe words with the highest probability at each position form the learning objective for the t-th step of learning, as shown in the following formula (2):
[0045]
[0046] As discussed above, an input may have multiple different output results, and multiple output results can correspond to multiple output modes. That is to say, different output modes can obtain different output results. However, it can be understood that too many output modes may cause the model output to be chaotic, so that a correct output cannot be accurately obtained. For this reason, the parallel generation model in the present disclosure does not learn multiple output modes from the beginning. On the contrary, during the training process, as the number of steps increases, the number of output modes corresponding to the learning objective gradually increases. Specifically, the embodiments of the present disclosure decompose multiple output modes, initially learning only a relatively small number of output modes, and then gradually learning more output modes based on the small number of output modes until all output modes are learned.
[0047] Combined with the above formula (3), by combining the intermediate output of the model and the output data to construct an intermediate learning objective, it is possible to decompose multiple output modes. Since the intermediate output of the model has learned a part of the output modes, and the output data contains all the output modes, an intermediate learning objective with the number of output modes between the two can be obtained by interpolation.
[0048] For example, referring to Figure 3 , the learning objectives from the 0th step to the s-th step correspond to 1 output mode, the learning objectives from the (s + 1)-th step (where s < t) to the t-th step correspond to 2 output modes, the learning objectives from the (t + 1)-th step to the (T - 1)-th step correspond to 3 output modes, and the learning objective of the T-th step corresponds to 4 output modes.
[0049] In this way, the multi-step learning objectives of the text generation model can be obtained. By using this series of learning objectives to train the text generation model, it is possible to start learning a small number of output modes and gradually learn all output modes.
[0050] In some embodiments, the end of the training process can be determined based on the training objective. For example, a loss function can be constructed, and the trained parallel generation model can be obtained based on the loss function. Exemplarily, the overall training objective of the training process can be the sum of the training objectives of each step in the multi-step learning. For example, the overall training objective can be expressed as the following formula (3):
[0051]
[0052] In formula (3), λ t is a preset coefficient, and L tThis represents the training objective for the t-th learning step.
[0053] Alternatively, the training objective for each learning step can be determined by optimizing the divergence between the model output distribution and the target sequence distribution, where the divergence is, for example, the Kullback-Leibler (KL) divergence. For example, the training objective for each learning step can be determined based on the learning objective for each learning step. For instance, the training objective for the t-th learning step could be:
[0054]
[0055] It is understood that the above description of the training objective is merely illustrative. In real-world scenarios, loss functions can be constructed in other ways, and this disclosure does not limit this approach.
[0056] In some embodiments of this disclosure, the intermediate output of a certain step in multi-step learning can be used as the intermediate input for the next step. In other embodiments of this disclosure, the learning objective for the next step can be sampled based on the intermediate output of a certain step in multi-step learning, and then the intermediate input for the next step can be determined.
[0057] For example, the intermediate output of a learning step can be compared with the learning objective of the next learning step to identify different words. The intermediate output of the next learning step can also be compared with the learning objective of the next learning step to determine the number of samples. Then, different words in the learning objective of the next learning step can be sampled according to the number of samples and combined with the intermediate output of the first learning step to obtain the intermediate input of the next learning step. In some examples, different words in the learning objective of the next learning step can be sampled according to the number of samples to obtain sampled words. Then, the corresponding positions in the intermediate output of the first learning step (i.e., the positions of the sampled words) can be replaced with the sampled words to obtain the intermediate input of the next learning step.
[0058] For example, suppose the intermediate output of a certain learning step is "Thank you so much!!", and the learning objective of the next step is "Thank you so much!". The sequence contains 5 words. Since the 3rd "thank you" and the 5th "!" are the same, there are 3 different words: "so much", "thank you", and "thank you" from the learning objective of the next step. We can then sample from these three different words, for example, sampling "so much" and "thank you". Based on the intermediate output of a certain learning step and the sampled words, we can determine the sequence "Thank you so much!", which can then be used as input for the next learning step.
[0059] Figure 4 A schematic diagram of intermediate step learning sampling 400 during model training is shown according to some embodiments of the present disclosure. (Refer to...) Figure 4 The intermediate output of the t-th learning step Is "Thank you very much!", which can be represented as the sequence H t , that is 410. Additionally, assume that the learning objective for the (t + 1)-th step of learning is "I am extremely grateful!", and the intermediate output of the (t + 1)-th step of learning is "Grateful thanks!".
[0060] By comparing "Thank you very much!" with "I am extremely grateful!", it can be determined that the first "thank" and the last "!" are the same. Subsequently, sampling can be performed from the three different words "extremely grateful" in the learning objective "I am extremely grateful!" for the (t + 1)-th step of learning. For example, residual sampling or other sampling methods can be used, and the present disclosure is not limited thereto.
[0061] By comparing "Grateful thanks!" with "I am extremely grateful!", it can be determined that the number of different words between the two is 2. Subsequently, the number of samples can be determined. As an example, the number of samples can be equal to the number of different words between the two, that is, 2. As another example, the number of samples can be determined based on the product of the number of different words between the two and a predefined coefficient, where the predefined coefficient is any value between 0 and 1, and the number of samples can be determined by rounding up or down the product. For example, assume that the predefined coefficient is equal to 0.5, then the number of samples can be determined to be equal to 1.
[0062] After that, sampling can be performed from the three different words "extremely grateful" in the learning objective "I am extremely grateful!" for the (t + 1)-th step of learning. As an example, assume that "not" is sampled. Combining with the intermediate output of the t-th step of learning, that is it is possible to not "(sequence )420 as the intermediate input for the (t + 1)-th step of learning.
[0063] Additionally, as Figure 4 shown, taking as the intermediate input for the (t + 1)-th step of learning, the intermediate output H t+1 , that is 430, can be obtained.
[0064] In this way, through sampling, the decomposition of the learning objective can be achieved, which is beneficial for the model to continuously learn the remaining part during the training process, thereby improving the training efficiency.
[0065] Through the above combination Figures 1 to 4As described above, embodiments of this disclosure can obtain a parallel generative model through training. During training, intermediate learning objectives can be constructed from a few to many, thereby gradually learning multiple output patterns. During training, sampling can be used to improve learning efficiency and accelerate the training speed.
[0066] Figure 5 An example flowchart of a text generation process 500 according to some embodiments of the present disclosure is shown. In block 510, a trained parallel generation model is obtained, wherein the trained parallel generation model includes an encoder and a decoder, and the decoder includes multiple steps. In block 520, input text is fed into the trained parallel generation model to obtain output text.
[0067] In some embodiments, the trained parallel generative model acquired in block 510 can be a combination of Figures 1 to 4 The described parallel generative model is generated through training. As previously mentioned, during training, the decoder involves multi-step learning. The learning objective of the first step in multi-step learning corresponds to a first number of output patterns, and the learning objective of the second step following the first step corresponds to a second number of output patterns, with the first number not exceeding the second number. As previously mentioned, each step in training has a learning objective. Furthermore, the learning objective of the first step is determined based on the intermediate outputs and output data of the next step following the first step. As previously mentioned, during training, the intermediate inputs of the first step can be determined based on sampling the learning objective of the first step and the intermediate outputs of the step preceding the first step.
[0068] For example, input text can be fed into an encoder to obtain a vector representation corresponding to the input text. This vector representation can then be fed into a decoder to obtain the decoder's output, which is the output text. For example, the decoder includes multiple steps, and the intermediate output of the first step can be used as the intermediate input of the next step. It is understood that intermediate outputs and intermediate inputs can also be represented in vector form.
[0069] In some examples, trained parallel generative models can be used for paragraph text generation. The input text can include multiple keywords or articles, and the output text can include a sentence or a paragraph; for example, the output text can represent a summary.
[0070] In other examples, trained parallel generative models can be used for machine translation. The input text can include a sequence in a first language, and the output text can include a translated sequence in a second language, where the second language differs from the first. For instance, assuming the input text is "many trees," the corresponding output text obtained by the trained parallel generative model could be, for example, "a lot of trees."
[0071] In this way, embodiments of this disclosure can generate output text based on input text using a parallel generation model. Compared to a left-to-right model, this approach generates text faster and utilizes overall information rather than just local information. Furthermore, since the decoder of this parallel generation model includes multiple steps, it can learn various output patterns from few to many during training, resulting in more accurate output text and high-quality generation capabilities. Moreover, during text generation, the output of the previous step is directly used as the input for the next step, thus eliminating additional decoding overhead.
[0072] It should be understood that in the embodiments of this disclosure, "first," "second," "third," etc., are only used to indicate that multiple objects may be different, but at the same time, it does not exclude that two objects are the same, and should not be interpreted as any limitation on the embodiments of this disclosure.
[0073] It should also be understood that the manner, situation, category and division of embodiments in the present disclosure are for the convenience of description only and should not constitute a special limitation. Various manners, categories, situations and features in the embodiments can be combined with each other where logically consistent.
[0074] It should also be understood that the foregoing is merely to help those skilled in the art better understand the embodiments of this disclosure, and is not intended to limit the scope of the embodiments of this disclosure. Those skilled in the art can make various modifications, variations, or combinations based on the foregoing. Such modifications, variations, or combinations are also within the scope of the embodiments of this disclosure.
[0075] It should also be understood that the above description focuses on highlighting the differences between the various embodiments. Similarities or commonalities can be referenced or learned from each other, and for the sake of brevity, they will not be repeated here.
[0076] Figure 6 A schematic block diagram of an example device 600 according to some embodiments of the present disclosure is shown. Device 600 can be implemented by software, hardware, or a combination of both. Figure 6 As shown, the device 600 includes a model acquisition module 610 and an output text determination module 620.
[0077] The model acquisition module 610 is configured to acquire a trained parallel generative model, wherein the trained parallel generative model includes an encoder and a decoder. During training, the decoder includes multi-step learning. The learning objective of the first step in the multi-step learning corresponds to a first number of output patterns, and the learning objective of the second step after the first step corresponds to a second number of output patterns, wherein the first number is not greater than the second number. The output text determination module 620 is configured to input the input text into the trained parallel generative model to obtain the output text.
[0078] In some embodiments, the model acquisition module 610 may acquire a trained parallel generative model from another device different from the device 600. For example, the other device may have already obtained a trained parallel generative model through training. In other embodiments, the device 600 may have already obtained a trained parallel generative model through training, and the model acquisition module 610 may acquire the trained parallel generative model from, for example, the memory of the device 600.
[0079] For example, apparatus 600 may include a dataset construction module and a training module. The dataset construction module is configured to construct a training dataset, which includes multiple data items, each of which includes input data and output data. The training module is configured to generate a trained parallel generative model based on the training dataset.
[0080] In some embodiments, the training module can be configured to, during training, for each data item: input data to an encoder to obtain an input vector; and input the input vector to a decoder, wherein the learning objective of the first step of the decoder's learning is determined based on the intermediate output of the next step of learning from the first step and the output data.
[0081] Optionally, the training module can be configured to obtain the learning objective of the first step of learning by: determining a first probability distribution of the intermediate output of the next step of learning from the first step of learning; determining a first product between the first probability distribution and the output data; determining a second product of the element-wise multiplication between the output data and the intermediate output of the first step of learning; determining a target probability distribution for the first step of learning based on the first probability distribution, the first product, and the second product; and determining the learning objective of the first step of learning based on the target probability distribution for the first step of learning.
[0082] For example, both the output data and the intermediate output from the first learning step are represented in one-hot encoding form.
[0083] In some examples, the training objective of the training process includes the sum of the training objectives of each step in multi-step learning, wherein the training objective of each step is determined based on the learning objective of each step.
[0084] In some embodiments, during training, the intermediate output of the first step of learning is used as the intermediate input of the next step of learning.
[0085] In some embodiments, during training, the intermediate input for the next step of learning from the first step is obtained by: comparing the intermediate output of the first step with the next learning objective of the first step to determine the number of different words; sampling the next learning objective of the first step according to the number of different words to obtain the sampled objective; and using the sampled objective as the intermediate input for the next step of learning from the first step.
[0086] In some examples, the trained parallel generative model is used for machine translation. The input text consists of a sequence in a first language, and the output text consists of a translated sequence in a second language. In other examples, the trained parallel generative model is used for paragraph text generation. The input text consists of multiple keywords or an article, and the output text consists of a sentence or a paragraph.
[0087] Figure 6 The device 600 can be used to achieve the above-mentioned combination. Figures 1 to 5 For the sake of brevity, the process described will not be repeated here.
[0088] The division of modules or units in the embodiments of this disclosure is illustrative and only represents one logical functional division. In actual implementation, there may be other division methods. Furthermore, the functional units in the disclosed embodiments may be integrated into one unit, exist as separate physical entities, or two or more units may be integrated into one unit. The integrated unit described above can be implemented in hardware or as a software functional unit.
[0089] Figure 7 A block diagram of an example device 700 that can be used to implement embodiments of the present disclosure is shown. It should be understood that... Figure 7 The device 700 shown is merely exemplary and should not be construed as limiting the functionality and scope of the implementation described herein. For example, device 700 can be used to perform the functions described above. Figures 1 to 5 The process described.
[0090] like Figure 7As shown, device 700 is in the form of a general-purpose computing device. Components of computing device 700 may include, but are not limited to, one or more processors or processing units 710, memory 720, storage devices 730, one or more communication units 740, one or more input devices 750, and one or more output devices 760. Processing unit 710 may be a physical or virtual processor and is capable of performing various processes according to programs stored in memory 720. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel to improve the parallel processing capability of computing device 700.
[0091] Computing device 700 typically includes multiple computer storage media. Such media can be any available media accessible to computing device 700, including but not limited to volatile and non-volatile media, removable and non-removable media. Memory 720 can be volatile memory (e.g., registers, cache, random access memory (RAM)), non-volatile memory (e.g., read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory), or some combination thereof). Storage device 730 can be removable or non-removable media and may include machine-readable media, such as flash drives, disks, or any other media capable of storing information and / or data (e.g., training data for training) and accessible within computing device 700.
[0092] The computing device 700 may further include additional removable / non-removable, volatile / non-volatile storage media. Although not explicitly stated... Figure 7 As shown, disk drives for reading from or writing to removable, non-volatile disks (e.g., "floppy disks") and optical disk drives for reading from or writing to removable, non-volatile optical disks can be provided. In these cases, each drive can be connected to a bus (not shown) via one or more data media interfaces. Memory 720 may include computer program product 725 having one or more program modules configured to perform various methods or actions of various implementations of this disclosure.
[0093] The communication unit 740 enables communication with other computing devices via a communication medium. Additionally, the components of the computing device 700 can function as a single computing cluster or multiple computing machines capable of communicating via communication connections. Therefore, the computing device 700 can operate in a networked environment using logical connections to one or more other servers, network personal computers (PCs), or other network nodes.
[0094] Input device 750 can be one or more input devices, such as a mouse, keyboard, trackball, etc. Output device 760 can be one or more output devices, such as a monitor, speaker, printer, etc. Computing device 700 can also communicate as needed with one or more external devices (not shown) via communication unit 740. These external devices include storage devices, display devices, etc., and can communicate with one or more devices that enable user interaction with computing device 700, or with any device that enables computing device 700 to communicate with one or more other computing devices (e.g., network card, modem, etc.). Such communication can be performed via an input / output (I / O) interface (not shown).
[0095] According to an exemplary implementation of this disclosure, a computer-readable storage medium is provided that stores computer-executable instructions thereon, wherein the computer-executable instructions are executed by a processor to implement the methods described above. According to an exemplary implementation of this disclosure, a computer program product is also provided, which is tangibly stored on a non-transitory computer-readable medium and includes computer-executable instructions, which are executed by a processor to implement the methods described above. According to an exemplary implementation of this disclosure, a computer program product is provided that stores a computer program thereon, which, when executed by a processor, implements the methods described above.
[0096] Various aspects of this disclosure are described herein with reference to flowchart illustrations and / or block diagrams of methods, apparatuses, devices, and computer program products implemented according to this disclosure. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-readable program instructions.
[0097] These computer-readable program instructions can be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processing unit of the computer or other programmable data processing apparatus, they create means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner. Thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.
[0098] Computer-readable program instructions can be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions that execute on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.
[0099] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction, which contains one or more executable instructions for implementing the specified logical function. In some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.
[0100] Various implementations of this disclosure have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed implementations. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described implementations. The terminology used herein is chosen to best explain the principles, practical applications, or improvements to technology in the market, or to enable others skilled in the art to understand the various implementations disclosed herein.
Claims
1. A text generation method, comprising: A trained parallel generative model is obtained, wherein the trained parallel generative model includes an encoder and a decoder, and the decoder includes multi-step learning during the training process. The learning objective of the first step of the multi-step learning corresponds to a first number of output patterns, and the learning objective of the second step of the multi-step learning after the first step corresponds to a second number of output patterns, wherein the first number is not greater than the second number. as well as The input text is fed into the trained parallel generation model to obtain the output text.
2. The method according to claim 1, further comprising: Construct a training dataset, which includes multiple data items, each of which includes input data and output data; as well as The trained parallel generative model is generated based on the training dataset.
3. The method of claim 2, further comprising, during the training process, for each data item: The input data is input into the encoder to obtain an input vector; and The input vector is input to the decoder, wherein the learning objective of the first step of the decoder is determined based on the intermediate output of the next step of the first step and the output data.
4. The method according to claim 3, wherein the learning objective of the first step is obtained by the following formula: Determine the first probability distribution of the intermediate output of the next learning step after the first step of learning; Determine the first product between the first probability distribution and the output data; Determine the second product of element-wise multiplication between the output data and the intermediate output learned in the first step; The target probability distribution learned in the first step is determined based on the first probability distribution, the first product, and the second product; and Based on the target probability distribution of the first step of learning, the learning objective of the first step of learning is determined.
5. The method according to claim 4, wherein the output data and the intermediate output learned in the first step are both represented in one-hot encoded form.
6. The method according to claim 1, wherein the training objective of the training process includes the sum of the training objectives of each step of the multi-step learning, wherein the training objective of each step of the learning is determined based on the learning objective of each step of the learning.
7. The method of claim 1, wherein during the training process, the intermediate output of the first step learning is used as the intermediate input of the next step learning of the first step learning.
8. The method according to claim 1, wherein during the training process, the intermediate input for the next learning step of the first learning step is obtained by the following formula: Different words are identified by comparing the intermediate output of the first step of learning with the learning objective of the next step of the first step; The number of samples is determined by comparing the intermediate output of the next learning step with the learning objective of the next learning step. Sampled words are obtained by sampling from the different words according to the number of samples taken; By combining the sampled words with the intermediate output learned in the first step, the sampled target is obtained; as well as The sampled target is used as the intermediate input for the next step of learning in the first step.
9. The method according to any one of claims 1 to 8, wherein the trained parallel generative model is used for machine translation, the input text comprises a sequence in a first language, and the output text comprises a translated sequence in a second language.
10. The method according to any one of claims 1 to 8, wherein the trained parallel generation model is used for paragraph text generation, the input text includes multiple keywords or articles, and the output text includes a sentence or a paragraph.
11. An electronic device, comprising: At least one processing unit; At least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, the instructions causing the electronic device to perform an action when executed by the at least one processing unit, the action including: A trained parallel generative model is obtained, wherein the trained parallel generative model includes an encoder and a decoder, and the decoder includes multi-step learning during the training process. The learning objective of the first step of the multi-step learning corresponds to a first number of output patterns, and the learning objective of the second step of the multi-step learning after the first step corresponds to a second number of output patterns, wherein the first number is not greater than the second number. as well as The input text is fed into the trained parallel generation model to obtain the output text.
12. A text generation apparatus, comprising: The model acquisition module is configured to acquire a trained parallel generative model, wherein the trained parallel generative model includes an encoder and a decoder, and the decoder includes multi-step learning during the training process. The learning objective of the first step of the multi-step learning corresponds to a first number of output patterns, and the learning objective of the second step of the multi-step learning after the first step corresponds to a second number of output patterns, and the first number is not greater than the second number. as well as The output text determination module is configured to input the input text into the trained parallel generation model to obtain the output text.
13. A computer-readable storage medium having a computer program stored thereon, the program, when executed by a processor, implementing the method according to any one of claims 1 to 10.