Text generation method and device, computer equipment and storage medium
A text and text sequence technology, applied in the computer field, can solve problems such as inaccurate translation results
Active Publication Date: 2019-11-15
TENCENT TECH (SHENZHEN) CO LTD
5 Cites 6 Cited by
AI-Extracted Technical Summary
Problems solved by technology
[0004] In view of this, the present invention provides a text generation method, device, computer eq...
Method used
As can be seen from Table 1, the combined sequence structure is the method for determining the position information corresponding to the word according to the combined sequence structure of each word in the text sentence sequence mentioned in the specification sheet of the application, and the tree structure source text sequence structure is the present In the scheme disclosed in the embodiment of the application to determine the position information corresponding to a word based on the tree-structured source text sequence, "×" means not to use it, and "√" means to use it. After the absolute position information representation of the BLEU value, the second line 35.43 in the BLEU value is 7.1 higher than the first line 28.33. After adopting the absolute position information representation of the word corresponding to the tree structure source text sequence in this application, the BLEU The fourth line 44.84 in the numerical value is 0.53 higher than the third line 44.31. Generally, an increase of more than 0.5 points is a significant improvement. Based on the above experimental data, it can be seen that the technical solution in this application can clearly translate the accuracy of the results.
Because in the embodiment of the application, according to the dependency between each word in the source text, obtain the tree structure source text sequence, and the tree structure source text sequence can reflect the syntactic structure of the source text, therefore, according to the tree structure Shape structure source text sequence structure, the calculated absolute...
Abstract
The invention provides a text generation method and device, computer equipment and a storage medium, and relates to a natural language processing and machine learning technology in artificial intelligence, and the method comprises the steps: calculating the position vector of each word according to the structure of a tree structure source text sequence; inputting the position vector correspondingto each word into a machine translation model; performing semantic encoding to obtain a semantic vector corresponding to each word; generating a source end semantic vector corresponding to each word according to the position vector and the semantic vector; performing semantic decoding to obtain a target word corresponding to each word; and determining a combination sequence of the target words andsplicing the target words to generate a target text. The tree structure source text sequence can reflect the syntactic structure of the source text, the calculated position vector can reflect the syntactic structure of the source text, the influence of the syntactic structure of the source text on the word semantics is considered in the process of determining the word semantics, and the accuracyof the translation result is improved.
Application Domain
Special data processing applications
Technology Topic
Machine translationComputer equipment +8
Image
Examples
- Experimental program(1)
Example Embodiment
[0047] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
[0048] The machine translation model training method provided by the embodiments of the present application involves natural language processing technology and machine learning technology in artificial intelligence. The artificial intelligence technology, natural language processing technology and machine learning technology will be described below.
[0049] Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
[0050] Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology. Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
[0051] Natural Language Processing (NLP) is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that enable effective communication between humans and computers in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Therefore, research in this field will involve natural language, which is the language people use daily, so it is closely related to linguistic research. Natural language processing technologies usually include text processing, semantic understanding, machine translation, robot question answering, knowledge graphs and other technologies.
[0052] Machine Learning (ML) is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance. Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent, and its applications cover all areas of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning techniques.
[0053] The solutions provided in the embodiments of this application involve artificial intelligence text preprocessing, semantic understanding, and machine translation technologies, which are specifically illustrated by the following embodiments:
[0054] Because the attention mechanism mimics the internal process of biological observation behavior, it is a mechanism that aligns internal experience with external sensation to increase the fineness of observation in some areas. The attention mechanism can quickly extract important features of sparse data, so it is widely used in natural language processing, especially in the field of machine translation. The self-attention mechanism is an improvement of the attention mechanism, which reduces the dependence on external information and is better at capturing the internal correlation of data or features.
[0055] In the field of machine translation, transformers based on self-attention models have become the mainstream architecture choice for neural network machine translation, and their performance has surpassed traditional statistical machine translation methods in many language pairs. The main advantages of the converter with the self-attention model as its core component are:
[0056] 1. It is no longer necessary to explicitly define features manually, but to learn implicit features directly from training data;
[0057] 2. The self-attention model in the converter architecture can better capture long-distance historical information;
[0058] 3. The self-attention model in the converter architecture can be trained in parallel, which greatly reduces the model training time.
[0059] Based on this, the currently adopted machine translation model is the self-attention model. Compared with the cyclic neural network model in traditional neural machine translation, the self-attention model breaks the structural constraints between the input sequence elements, so that the self-attention of each element Attention can be memorized independently of each other to improve training speed.
[0060] However, the inventor found through research that ignoring the structural constraints between the input sequence elements has brought new problems to natural language processing: since natural language sentences are not simply piles of words, the structure of text sentences plays a role in natural language understanding and natural language generation. Plays an important role. The ignorance of the text sentence structure will lead to inaccurate sentence semantics, which will greatly affect the accuracy of the translation results of the machine translation model. If the structural information of the text sentence is considered in the translation process, the accuracy of the machine translation model for determining the semantics of the sentence will be improved, and the accuracy of the translation result will be improved.
[0061] In order to improve the accuracy of the translation results, the following text sentence structure representation method is currently given: the position of each word in the text sentence sequence is used to represent the text sentence sequence structure, and each word is in the text sentence The position in the sequence can be expressed in terms of absolute position or relative position. The absolute position is the actual position of each word in the text sentence sequence in the order from left to right; the relative position is the relative position of each word relative to the text The position of the reference word in the sentence sequence, the reference word can be designated by a person skilled in the art, and this application is not specifically limited.
[0062] The following are specific examples of absolute and relative positions:
[0063] Given a text sentence sequence "Bush held a talk with Sharon", refer to figure 1 The position information shown in the schematic diagram uses the absolute position representation method to indicate the absolute position sequence number of each word in the text sentence sequence in the text sentence sequence from left to right: the absolute position sequence number of "Bush" is "0", the absolute position serial number of "held" is "1", the absolute position serial number of "a" is "2", the absolute position serial number of "talk" is "3", the absolute position serial number of "with" The number is "4", and the absolute position serial number of "Sharon" is "5".
[0064] The relative position representation method is used to indicate the relative position of each word in the text sentence sequence relative to the reference word in the text sentence sequence. Serial number: For each word in the sentence, select a target word as the reference word, and each word is relative The reference word is given a relative position information. With "talk" as the reference word, the direction vector of the word on the left of the reference word is negative, and the direction vector of the word on the right of the reference word is positive. Reference figure 1 As shown in the schematic diagram of the position information, the relative position sequence numbers of each word in the text sentence sequence are: the relative position of "Bush" is the serial number "-3", and the relative position of "held" is the serial number "-2". The relative position serial number of "a" is "-1", the relative position serial number of "talk" is "0", the relative position serial number of "with" is "+1", and the relative position of "Sharon" is "+" 2".
[0065] Furthermore, the inventor found through research that the current absolute position and relative position representation methods used above can only reflect the combined sequence structure of each word in the text sentence sequence, and cannot reflect the syntactic structure of the text sentence sequence. The syntactic structure can accurately reflect the structure of the text sentence, and the combined sequence structure of each word in the text sentence sequence cannot accurately reflect the structure of the text sentence. In other words, the current absolute position and relative position representation methods used above cannot accurately represent the structure of the text sentence.
[0066] In order to be able to accurately represent the structure of the text sentence, so as to consider the structure of the text sentence in the process of determining the meaning of the word, and to improve the accuracy of the translation result, the inventor of the present solution has proposed the following text generation method after research. The provided method can be used in all mainstream neural network machine translation systems and is suitable for translation tasks in all languages.
[0067] By obtaining the source text, according to the dependency relationship between each word in the source text, obtain the tree structure source text sequence; according to the structure of the tree structure source text sequence, calculate the location of each word in the source text The position vector in the tree structure source text sequence, the position vector represents the position of the word in the source text in the tree structure source text sequence; the position vector corresponding to each word in the source text Input into a pre-trained machine translation model; use the machine translation model to semantically encode each word in the source text to obtain the semantic vector corresponding to each word in the source text; use the machine The translation model generates source-end semantic vectors corresponding to each word in the source text according to the position vector and semantic vector corresponding to each word in the source text; and uses the machine translation model to analyze the source text The source semantic vector corresponding to each word in the source text is semantically decoded to obtain the target word corresponding to each word in the source text; the machine translation model is used to determine that each word in the source text corresponds to each According to the combination sequence of the target words, the target words are spliced to generate the target text corresponding to the source text. In this application, a tree-structured source text sequence is obtained according to the dependency relationship between each word in the source text, and the tree-structured source text sequence can reflect the syntactic structure of the source text, therefore, according to the tree-structured source text sequence The calculated position vector of each word in the source text can reflect the syntactic structure of the source text, and the machine translation model combines the position information vector corresponding to each word to obtain the source semantic vector corresponding to each word. Therefore, the influence of the syntactic structure of the source text on the semantics of the words is taken into account in the process of determining the semantics of the words, and the accuracy of the translation results is improved.
[0068] For ease of understanding, first introduce the system architecture to which the text generation method of the embodiment of the present application is applicable. Such as figure 2 As shown, it shows a schematic diagram of a composition structure of a text generation system to which the solution of the present application is applicable. in figure 2 Here, the text generation system may include: a terminal 10 and a server 20, the terminal 10 may send the source text to the server 20, the server 20 performs a text translation process to obtain the target text, and then returns the target text to the terminal 10. The terminal 10 may also execute the text translation method after obtaining the source text. The terminal 10 and the server 20 are connected through a network. The terminal 10 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, and a notebook computer. The server 20 may be implemented as an independent server or a server cluster composed of multiple servers.
[0069] The text generation method of the application embodiment may be applied to a computer device, and the computer device may specifically be the terminal 10 or the server 20 in the foregoing embodiment. Such as image 3 As shown, it shows a schematic diagram of a composition structure of a computer device to which the solution of the present application is applicable. in image 3 Here, the computer device may include: a processor 101 and a memory 102.
[0070] The server 100 may further include: a communication interface 103, an input unit 104, a display 105, and a communication bus 106.
[0071] The processor 101, the memory 102, the communication interface 103, the input unit 104, and the display 105 all communicate with each other through the communication bus 106.
[0072] In the embodiment of the present application, the processor 101 may be a central processing unit (Central Processing Unit, CPU), an off-the-shelf programmable gate array (FPGA), or other programmable logic devices.
[0073] The processor can call a program stored in the memory 102. Specifically, the processor can perform operations performed on the terminal side in the following method embodiments.
[0074] The memory 102 is used to store one or more programs, and the programs may include program codes, and the program codes include computer operation instructions. In this embodiment of the present application, the memory 102 stores at least programs for implementing the following functions:
[0075] Obtain the source text, and obtain the tree structure source text sequence according to the dependency relationship between each word in the source text;
[0076] According to the structure of the tree structure source text sequence, calculate the position vector of each word in the source text in the tree structure source text sequence, and the position vector indicates that the word in the source text is in the The position in the tree structure source text sequence;
[0077] Input the position vector corresponding to each word in the source text into a pre-trained machine translation model;
[0078] Using the machine translation model to perform semantic encoding on each word in the source text to obtain a semantic vector corresponding to each word in the source text;
[0079] Using the machine translation model to generate a source semantic vector corresponding to each word in the source text according to the position vector and semantic vector corresponding to each word in the source text;
[0080] Semantically decoding the source semantic vector corresponding to each word in the source text by using the machine translation model to obtain the target word corresponding to each word in the source text;
[0081] The machine translation model is used to determine the combination sequence of the target words corresponding to each word in the source text, and the target words are spliced according to the combination sequence to generate the target text corresponding to the source text.
[0082] Such as Figure 4 As shown, it shows a schematic flow chart of an embodiment of a text generation method of the present application. This embodiment mainly uses the method applied to a computer device as an example. The computer device may be the foregoing figure 2 In the terminal 10 or server 20. Reference Figure 4 , The text generation method specifically includes the following steps:
[0083] S100. Obtain the source text, and obtain a tree structure source text sequence according to the dependency relationship between each word in the source text;
[0084] It should be noted that the source text in this application is the text to be translated. The source text can specifically be sentences, paragraphs, or chapters. The language type of the source text is not specifically limited in this application. The source text can be specifically Chinese text, English text, etc.
[0085] Optionally, after obtaining the source text, word segmentation processing may be performed on the source text to obtain a source text sequence composed of various words, and then the source text sequence is processed to obtain a tree structure source text sequence.
[0086] It should be noted that the process of processing the source text sequence disclosed in this application to obtain the tree structure source text sequence is: determining the keywords in the source text sequence, according to each other except the keywords in the source text The dependence relationship between each word and the key word is arranged according to the tree structure of each word in the source text sequence to obtain the tree structure source text sequence.
[0087] Among them, the keywords in the source text sequence can be the predicate verbs in the source text sequence. Every word in the source text except the predicate verb has a direct or indirect dependency relationship with the predicate verb, and the dependence between words The relationship can reflect the syntactic collocation relationship between words. This collocation relationship is related to the semantics. That is to say, the tree structure source text sequence in this application can reflect the dependency relationship between the words in the source text sequence , So as to reflect the semantics of the words in the source text sequence.
[0088] It should be noted that the tree structure source text sequence in the present application is a sequence obtained by arranging each word in the source text sequence according to the tree structure, and the tree structure source text sequence in the present application may specifically be a dependency tree.
[0089] S110: Calculate the position vector of each word in the source text in the tree structure source text sequence according to the structure of the tree structure source text sequence;
[0090] It should be noted that the position vector indicates the position of the word in the source text in the tree-structured source text sequence. Since the tree-structured source text sequence can reflect the syntactic structure of the source text, it is based on the tree structure. The structure of the source text sequence, the calculated position vector of each word in the source text can reflect the syntactic structure of the source text.
[0091] Optionally, in this application, the position of the word in the tree structure source text sequence may be represented by a vector of preset dimensions, so as to obtain the position vector of the word in the tree structure source text sequence. The preset dimensions can be set by those skilled in the art according to actual conditions, and this application does not make specific limitations.
[0092] S120: Input the position vector corresponding to each word in the source text into a pre-trained machine translation model;
[0093] The machine translation model in this application uses the self-attention model, based on the encoder-decoder converter framework. The encoder reads the source text, and outputs a semantic vector after the self-attention mechanism and the forward neural network Sequence: According to this semantic vector sequence, the decoder generates the target text word by word through self-attention mechanism and forward neural network.
[0094] S130. Use the machine translation model to perform semantic encoding on each word in the source text to obtain a semantic vector corresponding to each word in the source text.
[0095] The semantics of each word in the source text can be represented by a semantic vector, and each dimension of the vector in the semantic vector represents the semantic information of the word.
[0096] The dimension of the semantic vector can be set by those skilled in the art according to actual conditions, and this application does not specifically limit it. In addition, the dimension of the semantic vector and the dimension of the position vector may be the same or different, which is not specifically limited in this application. If they are not the same, you need to unify the dimensions before the subsequent vector operations. For example: upgrade the low-dimensional vector. For example, the semantic vector dimension is 512 and the position vector dimension is 312. Then the position vector can be multiplied by a 312* A 512-dimensional matrix transforms the position vector into a 512-dimensional position vector. This application may also use dimensionality reduction processing on high-dimensional vectors, and the specific dimensionality reduction processing method will not be described in detail in the embodiments of this application.
[0097] The dimensionality of the semantic vector corresponding to each word can be the same or different. If it is not the same, the dimensionality needs to be unified before the subsequent vector operation, for example: the low-dimensional vector is upgraded, such as the semantics of a word The vector dimension is 512, and the semantic vector dimension of another word is 312, then the semantic vector of another word can be multiplied by a 312*512 dimension matrix to transform the semantic vector of another word into a 512-dimensional semantic vector. This application may also use dimensionality reduction processing on high-dimensional vectors, and the specific dimensionality reduction processing method will not be described in detail in the embodiments of this application.
[0098] S140. Use the machine translation model to generate a source semantic vector corresponding to each word in the source text according to the position vector and semantic vector corresponding to each word in the source text;
[0099] In this application, the machine translation model can be used to perform vector operations on the position vector and semantic vector corresponding to a certain word in the source text to obtain the source semantic vector corresponding to a certain word in the source text.
[0100] S150: Use the machine translation model to perform semantic decoding on the source semantic vector corresponding to each word in the source text to obtain the target word corresponding to each word in the source text;
[0101] It should be noted that, in the embodiments of the present application, a machine translation model is used to semantically decode the source semantic vector corresponding to each word in the source text, and the target word corresponding to each word in the source text can be obtained.
[0102] The process of semantically decoding the source-end semantic vector corresponding to any word in the source text is: the source-end semantic vector corresponding to any word in the source text is similar to the semantic vector of all candidate words stored in the database Operation, the similarity operation result is obtained, and the candidate word whose similarity operation result meets the preset similarity threshold is used as the target word corresponding to the word in the source text.
[0103] Among them, the embodiment of the present application may do a dot product of the source semantic vector corresponding to any word in the source text with the semantic vectors of all candidate words stored in the database to obtain the dot product result, and regard the dot product result as the similarity Calculate the result of the operation and perform a probabilistic operation on the dot product result to obtain the result of the probabilistic operation. According to the result of the probabilistic operation, a candidate word that meets the preset probability threshold is selected as the target word corresponding to the word in the source text.
[0104] S160. Use the machine translation model to determine the combination sequence of the target words corresponding to each word in the source text, and splice the target words according to the combination sequence to generate the target text corresponding to the source text .
[0105] The machine translation model in this application can combine the position vector of each word in the source text in the tree-structured source text sequence and the word ordering method obtained by pre-training to obtain the corresponding corresponding to each word in the source text. The combination sequence of the target words, so that the target words are spliced according to the combination sequence to generate the target text corresponding to the source text.
[0106] Since in the embodiments of the present application, a tree-structured source text sequence is obtained according to the dependency relationship between each word in the source text, and the tree-structured source text sequence can reflect the syntactic structure of the source text, it is based on the tree-structured source The structure of the text sequence, the calculated position vector of each word in the source text can reflect the syntactic structure of the source text, and the machine translation model combines the position information vector corresponding to each word to obtain the source semantics corresponding to each word Vector, so that in the process of determining the semantics of a word, the influence of the syntactic structure of the source text on the semantics of the word is taken into account, and the accuracy of the translation result is improved.
[0107] In order to improve the accuracy of the translation results, this application also provides a schematic flow chart of an embodiment of the following text generation method, refer to Figure 5 , This embodiment includes:
[0108] S200. Obtain the source text, and obtain a tree structure source text sequence according to the dependency relationship between each word in the source text;
[0109] S210: Use the word at the root node position in the tree structure source text sequence as a keyword;
[0110] The word at the root node position in the tree structure source text sequence is the keyword.
[0111] S220. Use the number of hops from each word in the tree structure source text sequence to the keyword as the absolute position corresponding to each word in the tree structure source text sequence;
[0112] Given a text sentence sequence "Bush held a talk with Sharon", the structure reference of the tree structure source text sequence Image 6 And refer to Figure 7 As shown in the schematic diagram of location information, the keyword is "held", and the number of hops from each word in the tree structure source text sequence to the keyword are: "1", "0", "2", "1" ", "2", "1". Accordingly, the absolute position sequence numbers corresponding to each word in the tree structure source text sequence are: the absolute position sequence number of "Bush" is "1", the absolute position sequence number of "held" is "0", "a" The absolute position serial number of "" is "2", the absolute position serial number of "talk" is "1", the absolute position serial number of "with" is "2", and the absolute position serial number of "Sharon" is "1".
[0113] Optionally, this application also provides the following another way to determine the absolute position of any word in the tree structure source text sequence: input the level in the tree structure source text sequence where any word is located as a tree The absolute position serial number corresponding to any word in the source text sequence of the shape structure.
[0114] For example: "held" is at the 0th level of the tree structure source text sequence, then the absolute position sequence number of "held" is "0"; "Bush", "talk" and "Sharon" are in the tree structure source text sequence At the first level, the absolute position sequence numbers of "Bush", "talk" and "Sharon" are "1"; "a" and "with" are at the second level of the tree structure source text sequence, then "a" and The absolute position serial number of "with" is "2".
[0115] S230: Map the respective absolute position of any word to a vector of preset dimensions to obtain the respective absolute position vector of each word;
[0116] It should be noted that the absolute position vector represents the absolute position of the word in the source text in the tree-structured source text sequence. Since the tree-structured source text sequence can reflect the syntactic structure of the source text, according to The structure of the source text sequence in a tree structure, and the calculated absolute position vector of each word in the source text can reflect the syntactic structure of the source text.
[0117] Optionally, in this application, the absolute position of the word in the tree structure source text sequence may be represented by a vector of preset dimensions, so as to obtain the absolute position vector of the word in the tree structure source text sequence. The preset dimensions can be set by those skilled in the art according to actual conditions, and this application does not make specific limitations.
[0118] S240: Input the absolute position vector corresponding to each word in the source text into a pre-trained machine translation model;
[0119] S250: Use the machine translation model to perform semantic encoding on each word in the source text to obtain a semantic vector corresponding to each word in the source text.
[0120] S260. Use the machine translation model to generate a source semantic vector corresponding to each word in the source text according to the absolute position vector and the semantic vector corresponding to each word in the source text;
[0121] Optionally, the machine translation model can be used in this application to perform a bitwise addition operation on the absolute position vector and semantic vector corresponding to each word in the source text, and use the bitwise addition operation as the result The specific calculation process of the source semantic vector corresponding to each word in the source text may be:
[0122] For word w_i, its absolute position is p_i, input the source semantic vector r of the self-attention model i for:
[0123] r i =f(E(w_i),D(p_i))
[0124] Where E(w_i) is the semantic vector of the word, and the dimension is d, and D(p_i) is the absolute position vector of the word, and the dimension is d. f is a bitwise addition operation, r i It is the result of the bit addition operation, that is, the source semantic vector, and the dimension is d.
[0125] S270: Use the machine translation model to perform semantic decoding on the source semantic vector corresponding to each word in the source text, to obtain a target word corresponding to each word in the source text;
[0126] S280. Use the machine translation model to determine the combination sequence of the target words corresponding to each word in the source text, and splice the target words according to the combination sequence to generate the target text corresponding to the source text .
[0127] It should be noted that step S200, step S250, and steps S270-S280 in the embodiment of the present application correspond to step S100, step S130, and step S150-S160 in the above embodiment respectively. For details, please refer to the above embodiment. The application examples will not be repeated.
[0128] Since in the embodiments of the present application, a tree-structured source text sequence is obtained according to the dependency relationship between each word in the source text, and the tree-structured source text sequence can reflect the syntactic structure of the source text, it is based on the tree-structured source The structure of the text sequence, the calculated absolute position vector of each word in the source text can reflect the syntactic structure of the source text, and the machine translation model combines the absolute position information vector corresponding to each word to obtain the source corresponding to each word End semantic vector, which takes into account the influence of the source text’s syntactic structure on the semantics of the word in the process of determining the semantics of the word, and improves the accuracy of the translation result.
[0129] In order to improve the accuracy of the translation results, this application also provides a schematic flow chart of an embodiment of the following text generation method, refer to Figure 8 , This embodiment includes:
[0130] S300. Obtain the source text, and obtain a tree structure source text sequence according to the dependency relationship between each word in the source text;
[0131] S310: Determine a reference word from the tree structure source text sequence according to a preset reference word determination rule;
[0132] It should be noted that in this application, a person skilled in the art can determine the reference word from the tree structure source text sequence according to actual conditions, and this application does not specifically limit it.
[0133] S320. Determine whether any word in the tree structure source text sequence and the reference word are on the same dependency path; if so, perform step S330, otherwise, perform step S340;
[0134] The path from the root node in the tree structure source text sequence to each different leaf node is called a different dependent path.
[0135] S330. Use the absolute value of the difference between the absolute position of any word and the absolute position of the reference word as the relative position value of any word;
[0136] It should be noted that if a word and the reference word are on a dependency path, the absolute value of the difference between the absolute position d_j of the word and the absolute position d_i of the reference word is used as the relative position value s_ij of the word, s_ij=|d_i –D_j|.
[0137] S340. Use the sum of the absolute position of any word and the absolute position of the reference word as the relative position value of any word;
[0138] It should be noted that if a word and the reference word are not on a dependency path, the sum of the absolute position d_j of the word and the absolute position d_i of the reference word is used as the relative position value s_ij' of the word, s_ij'=d_i+ d_j.
[0139] S350: Determine the relative position direction of any word according to the left-right position relationship between any word in the source text and the reference word;
[0140] It should be noted that, after determining the relative position value, the relative position direction is introduced. The embodiment of the present application judges the left-right position relationship between a certain word and the reference word in the source text sequence. If a word is on the left side of the reference word, the relative position direction is assigned a negative value; if the word is on the right side of the reference word, the relative position direction is assigned a positive value.
[0141] S360. Combine the relative position value of any word and the relative position direction of any word to obtain the relative position of any word;
[0142] Given a text sentence sequence "Bush held a talk with Sharon", refer to Figure 7 The schematic diagram of the position information shown uses the word "talk" as a reference word, "Bush" is not on the same dependency path as it, and the relative position of the dependency structure of "Bush" is -2. "A" is on the same dependency path as it, and the relative position of the dependency structure of "a" is "-1".
[0143] S370: Map the relative position corresponding to any word to a vector of preset dimensions to obtain a relative position vector corresponding to each word;
[0144] S380: Input the relative position vector corresponding to each word in the source text into a pre-trained machine translation model;
[0145] S390. Use the machine translation model to perform semantic encoding on each word in the source text to obtain a semantic vector corresponding to each word in the source text.
[0146] S391. Use the machine translation model to generate a source semantic vector corresponding to each word in the source text according to the relative position vector and semantic vector corresponding to each word in the source text;
[0147] Optionally, this embodiment of the application uses the machine translation model to generate the source-end semantics corresponding to each word in the source text according to the relative position vector and semantic vector corresponding to each word in the source text The vector process can be:
[0148] A1: Perform linear transformation processing on the semantic vector corresponding to each word in the source text, and convert the semantic vector corresponding to each word into a request vector sequence and a key-value pair vector sequence;
[0149] It should be noted that the semantic vector R corresponding to a word in the embodiment of the present application can be linearly transformed into a request vector sequence Q, a key-value pair vector sequence, and a key-value pair vector sequence by three different learnable parameter matrices. Contains the key vector sequence K and the value vector sequence V;
[0150] Among them, the dimensions of the request vector sequence Q, the key vector sequence K, and the value vector sequence V are the same as the dimension of the semantic vector R.
[0151] A2: Use the request vector sequence corresponding to each word, the key vector sequence and the relative position vector in the key-value pair vector sequence to obtain the logical similarity vector between the request vector sequence and the key vector sequence of each word;
[0152] Optionally, this application can use the formula Get the logical similarity vector between the request vector sequence and the key vector sequence of each word, q in the above formula i Is the request vector sequence Q, K is the key vector sequence, S is the relative position vector, d is the dimension of the hidden layer vector of the model, which is the same as the dimensions of Q, K and S, e i Is the logical similarity vector between the request vector sequence and the key vector sequence; the operation method of the above formula is to combine the key vector sequence K and S K Add the bits, then transpose the result of the bit addition to get the transposed matrix, and then add the request vector sequence q i Compare the similarity with each element in the transposed matrix to obtain the logical similarity vector between the request vector sequence and the key vector sequence.
[0153] A3: Perform normalization processing on the logical similarity vector between the request vector sequence and the key vector sequence of each word to obtain the weight vector corresponding to the logical similarity vector of each word;
[0154] Optionally, the embodiment of this application may use the formula α i =softmax(e i ) Get the weight vector corresponding to the logical similarity vector of each word, e i Is the logical similarity vector between the request vector sequence and the key vector sequence, softmax is the normalization function, α i Is a weight vector; the value of each dimension vector in the normalized weight vector is between 0 and 1, and the sum of the value of each dimension vector is 1.
[0155] A4: Use the weight vector of each word, the value vector sequence in the key-value pair vector sequence of each word, and the relative position vector to obtain the source semantic vector corresponding to each word.
[0156] Optionally, the embodiment of this application may use the formula o i =α i (V+S V ) To get the source semantic vector corresponding to each word. In the above formula, α i Is the weight vector, V is the value vector sequence, and S is the relative position vector. The calculation method of the above formula is to combine the value vector sequence V and S V Add the bits, and then perform a weighted summation of the result of the bit addition and the weight vector. The weighted summation method is to perform the dot product operation on the result of the bit addition and the weight vector to obtain the source semantic vector o i.
[0157] S392. Use the machine translation model to perform semantic decoding on the source semantic vector corresponding to each word in the source text to obtain the target word corresponding to each word in the source text;
[0158] S393. Use the machine translation model to determine the combination sequence of the target words corresponding to each word in the source text, and splice the target words according to the combination sequence to generate the target text corresponding to the source text .
[0159] It should be noted that step S300, step S390, and steps S392-S393 in the embodiment of this application correspond to step S100, step S130, and step S150-S160 in the above-mentioned embodiment, respectively. For details, please refer to the above-mentioned embodiment. The application examples will not be repeated.
[0160] Since in the embodiments of the present application, a tree-structured source text sequence is obtained according to the dependency relationship between each word in the source text, and the tree-structured source text sequence can reflect the syntactic structure of the source text, it is based on the tree-structured source The structure of the text sequence, the calculated relative position vector of each word in the source text can reflect the syntactic structure of the source text, and the machine translation model combines the relative position information vector corresponding to each word to obtain the source corresponding to each word End semantic vector, so that the syntactic structure of the source text is considered to affect the semantics of the word in the process of determining the semantics of the word, and the accuracy of the translation result is improved.
[0161] It should be noted that in the technical solution of this application, in addition to using the absolute position vector and the relative position vector to determine the semantics of the word, the absolute position vector and the relative position vector can also be combined to determine the semantics of the word. Improve the accuracy of the translation results. Below, this application also provides a schematic flow diagram of an embodiment of the following text generation method, refer to Picture 9 , This embodiment includes:
[0162] S400: Obtain the source text, and obtain a tree structure source text sequence according to the dependency relationship between each word in the source text;
[0163] S410. Use the word at the root node position in the tree structure source text sequence as a keyword;
[0164] S420. Use the number of hops from each word in the tree structure source text sequence to the keyword as the absolute position corresponding to each word in the tree structure source text sequence;
[0165] Given a text sentence sequence "Bush held a talk with Sharon", the structure reference of the tree structure source text sequence Image 6 And refer to Figure 7 As shown in the schematic diagram of location information, the keyword is "held", and the number of hops from each word in the tree structure source text sequence to the keyword are: "1", "0", "2", "1" ", "2", "1". Accordingly, the absolute position sequence numbers corresponding to each word in the tree structure source text sequence are: the absolute position sequence number of "Bush" is "1", the absolute position sequence number of "held" is "0", "a" The absolute position serial number of "" is "2", the absolute position serial number of "talk" is "1", the absolute position serial number of "with" is "2", and the absolute position serial number of "Sharon" is "1".
[0166] S430: Map the respective absolute position of any word to a vector of preset dimensions to obtain the respective absolute position vector of each word;
[0167] S440: Input the absolute position vector corresponding to each word in the source text into a pre-trained machine translation model;
[0168] S450. Use the machine translation model to perform semantic encoding on each word in the source text to obtain a semantic vector corresponding to each word in the source text.
[0169] S460. Use the machine translation model to generate an initial semantic vector corresponding to each word in the source text according to the absolute position vector and semantic vector corresponding to each word in the source text;
[0170] S470: Determine a reference word from the tree structure source text sequence according to a preset reference word determination rule;
[0171] S480: Determine whether any word in the tree structure source text sequence and the reference word are on the same dependency path; if so, perform step S490, otherwise, perform step S491;
[0172] The path from the root node in the tree structure source text sequence to each different leaf node is called a different dependent path.
[0173] S490. Use the absolute value of the difference between the absolute position of any word and the absolute position of the reference word as the relative position value of any word;
[0174] It should be noted that if a word and the reference word are on a dependency path, the absolute value of the difference between the absolute position d_j of the word and the absolute position d_i of the reference word is used as the relative position value s_ij of the word, s_ij=|d_i –D_j|.
[0175] S491. Use the sum of the absolute position of any word and the absolute position of the reference word as the relative position value of any word;
[0176] It should be noted that if a word and the reference word are not on a dependency path, the sum of the absolute position d_j of the word and the absolute position d_i of the reference word is used as the relative position value s_ij' of the word, s_ij'=d_i+ d_j.
[0177] S492: Determine the relative position direction of any word according to the left-right position relationship between any word in the source text and the reference word;
[0178] S493. Combine the relative position value of any word and the relative position direction of any word to obtain the relative position of any word;
[0179] S494: Map the relative position corresponding to any word to a vector of preset dimensions to obtain a relative position vector corresponding to each word;
[0180] S495. Input the relative position vector corresponding to each word in the source text into a pre-trained machine translation model;
[0181] S496. Using the machine translation model, generate a source semantic vector corresponding to each word in the source text according to the relative position vector and the initial semantic vector corresponding to each word in the source text;
[0182] S497: Use the machine translation model to perform semantic decoding on the source semantic vector corresponding to each word in the source text, to obtain a target word corresponding to each word in the source text;
[0183] S498. Use the machine translation model to determine the combination sequence of the target words corresponding to each word in the source text, and splice the target words according to the combination sequence to generate the target text corresponding to the source text .
[0184] Since in the embodiments of the present application, a tree-structured source text sequence is obtained according to the dependency relationship between each word in the source text, and the tree-structured source text sequence can reflect the syntactic structure of the source text, it is based on the tree-structured source The structure of the text sequence, the calculated absolute position vector of each word in the source text can reflect the syntactic structure of the source text, and the absolute position vector is used to generate the initial semantic vector corresponding to each word in the source text, thereby The initial semantic vector input into the machine translation model can reflect the syntactic structure of the source text. Then use the relative position vector to adjust the initial semantic vector, and further obtain the source semantic vector corresponding to each word in the source text, so that the source semantic vector can more fully reflect the syntactic structure of the source text, so as to determine the word semantics In the process, the influence of the syntactic structure of the source text on the semantics of words is fully taken into account, which further improves the accuracy of the translation results.
[0185] It should be noted that the above scheme disclosed in the embodiments of this application is applied to the Chinese-English machine translation task for testing, and the test results are as follows in Table 1:
[0186] Table 1
[0187]
[0188] It can be seen from Table 1 that the combined sequence structure is the method of determining the position information corresponding to the word according to the combined sequence structure of each word in the text sentence sequence mentioned in the specification of the application, and the tree structure source text sequence structure is an embodiment of the application The disclosed scheme for determining the position information corresponding to a word based on the tree structure source text sequence, "×" means not adopted, and "√" means adopted. The absolute position of the word corresponding to the word according to the tree structure source text sequence in this application is adopted After the information representation method, the second line 35.43 in the BLEU value is increased by 7.1 compared to the first line 28.33. After the absolute position information representation method corresponding to the word is determined according to the tree structure source text sequence in this application, the value in the BLEU value The fourth row 44.84 is 0.53 higher than the third row 44.31. Generally, an increase of more than 0.5 points is a significant improvement. Based on the above experimental data, it can be seen that the technical solution in this application can clearly translate the accuracy of the results.
[0189] Below, this application also provides a schematic flow diagram of the training process of the following machine translation model, refer to Picture 10 , This embodiment includes:
[0190] S500: Obtain the sample text sequence of the training sample in the sample set, the position vector corresponding to each word in the sample text, and the reference output text sequence;
[0191] Specifically, the sample set is a collection of a large amount of training data required for model training. The sample set includes the sample text sequence corresponding to each sample, the position vector corresponding to each word in the sample text, and the reference output text sequence.
[0192] S510: Input the sample text sequence of the training sample in the sample set and the position vector corresponding to each word in the sample text into the machine translation model for training, to obtain a predicted output text sequence;
[0193] Specifically, the sample text sequence of the training sample in the sample set and the position vector corresponding to each word in the sample text are input into the machine translation model, and the machine translation model is used to execute the above text generation method to obtain the predicted output text sequence.
[0194] S520: Obtain the objective function of the machine translation model by using the reference output text sequence and the predicted output text sequence;
[0195] Specifically, during the training process, the model parameters can be continuously adjusted in the direction of reducing the difference between the reference output text sequence and the predicted output text sequence. In this way, the predicted output text sequence is obtained through continuous input sample sets, and the model parameters are adjusted according to the difference between the reference output text sequence and the predicted output text sequence to train the machine translation model.
[0196] S530: Use the model parameter when the objective function is maximized as the model parameter of the machine translation model, and return to step S510 to continue training, and stop training when the training stop condition is met.
[0197] Wherein, the training stop condition is a condition for ending model training, and the training stop condition may be reaching a preset number of iterations, or the performance index of the machine translation model after adjusting the model parameters reaching a preset index, etc. This application does not specifically limit it. Specifically, for the objective function corresponding to each sample sequence, the model parameters when the objective function is maximized are taken as the model parameters of the machine translation model, and then the next sample set is predicted based on the model parameters to estimate the model parameters Continue training until you stop training when the training stop conditions are met.
[0198] By using the above-mentioned model training method disclosed in the embodiments of the present application, an accurate machine translation model can be obtained, and the trained model can obtain accurate translation results.
[0199] Corresponding to a text generation method of this application, this application also provides a text generation device. Such as Picture 11 As shown, it shows a schematic diagram of a composition structure of a text generation device of the present application, and the device may include:
[0200] The tree structure source text sequence obtaining unit 100 is configured to obtain the source text, and obtain the tree structure source text sequence according to the dependency relationship between each word in the source text;
[0201] The position vector calculation unit 110 is configured to calculate the position vector of each word in the source text in the tree structure source text sequence according to the structure of the tree structure source text sequence, and the position vector represents The position of the word in the source text in the tree structure source text sequence;
[0202] The position vector input unit 120 is configured to input the position vector corresponding to each word in the source text into a pre-trained machine translation model;
[0203] The semantic encoding unit 130 is configured to use the machine translation model to perform semantic encoding on each word in the source text to obtain a semantic vector corresponding to each word in the source text;
[0204] The source-end semantic vector generating unit 140 is configured to use the machine translation model to generate the source-end corresponding to each word in the source text according to the position vector and the semantic vector corresponding to each word in the source text Semantic vector
[0205] The semantic decoding unit 150 is configured to use the machine translation model to perform semantic decoding on the source semantic vector corresponding to each word in the source text to obtain the target word corresponding to each word in the source text;
[0206] The target text generating unit 160 is configured to use the machine translation model to determine the combination sequence of the target words corresponding to each word in the source text, and to splice the target words according to the combination sequence to generate the The target text corresponding to the source text.
[0207] Optionally, the position vector calculation unit includes:
[0208] The keyword determining unit is used to use the word at the root node position in the tree structure source text sequence as a keyword;
[0209] An absolute position determining unit, configured to use the number of hops from each word in the tree structure source text sequence to the keyword as the absolute position corresponding to each word in the tree structure source text sequence;
[0210] The absolute position mapping unit is used to map the respective absolute position of any word to a vector of preset dimensions to obtain the respective absolute position vector of each word.
[0211] Optionally, the position vector calculation unit includes:
[0212] The reference word determination unit is configured to determine the reference word from the tree structure source text sequence according to a preset reference word determination rule;
[0213] The first relative position value determination unit is used to compare the absolute position of any word with the absolute position of the reference word when any word in the tree structure source text sequence is on the same dependency path as the reference word The absolute value of the difference is used as the relative position value of any word;
[0214] The relative position direction determining unit is used to determine the relative position direction of any word according to the left-right position relationship between any word in the source text and the reference word;
[0215] The first relative position determining unit is used to combine the relative position value of any word with the relative position direction of any word to obtain the relative position of any word;
[0216] The first relative position mapping unit is used to map the relative position corresponding to any word into a vector of preset dimensions to obtain the relative position vector corresponding to each word.
[0217] Optionally, the position vector calculation unit includes:
[0218] The reference word determination unit is configured to determine the reference word from the tree structure source text sequence according to a preset reference word determination rule;
[0219] The second relative position value determination unit is used to compare the absolute position of any word with the absolute position of the reference word when any word in the tree structure source text sequence is not on the same dependency path as the reference word The sum value of as the relative position value of any word;
[0220] The relative position direction determining unit is used to determine the relative position direction of any word according to the left-right position relationship between any word in the source text and the reference word;
[0221] The second relative position determining unit is used to combine the relative position value of any word with the relative position direction of any word to obtain the relative position of any word;
[0222] The second relative position mapping unit is used to map the relative position corresponding to any word into a vector of preset dimension to obtain the relative position vector corresponding to each word.
[0223] Optionally, the source semantic vector generating unit includes:
[0224] The first source-end semantic vector generating subunit is used to use the machine translation model to perform a bitwise addition operation on the absolute position vector and the semantic vector corresponding to each word in the source text, and the bitwise addition operation The result obtained is used as the source semantic vector corresponding to each word in the source text.
[0225] Optionally, the source semantic vector generating unit includes:
[0226] The linear transformation processing unit is configured to perform linear transformation processing on the semantic vector corresponding to each word in the source text, and convert the semantic vector corresponding to each word into a request vector sequence and a key-value pair vector sequence;
[0227] The logical similarity vector determination unit is used to use the request vector sequence corresponding to each word, the key vector sequence and the relative position vector in the key-value pair vector sequence to obtain the difference between the request vector sequence and the key vector sequence of each word Logical similarity vector between;
[0228] The normalization unit is used to perform normalization processing on the logical similarity vector between the request vector sequence and the key vector sequence of each word to obtain the weight vector corresponding to the logical similarity vector of each word;
[0229] The second source semantic vector generating subunit is used to use the weight vector of each word, the value vector sequence in the key-value pair vector sequence of each word, and the relative position vector to obtain the respective source of each word Semantic vector.
[0230] The text generation device in this application further includes: a machine translation model training unit, and the machine translation model training unit is specifically used for:
[0231] Obtain the sample text sequence of the training sample in the sample set, the position vector corresponding to each word in the sample text, and the reference output text sequence;
[0232] Inputting the sample text sequence of the training sample in the sample set and the position vector corresponding to each word in the sample text into the machine translation model for training to obtain the predicted output text sequence;
[0233] Obtaining the objective function of the machine translation model by using the reference output text sequence and the predicted output text sequence;
[0234] The model parameters when the objective function is maximized are used as the model parameters of the machine translation model, and the sample text sequence of the training samples in the sample set and the position vector corresponding to each word in the sample text are returned to the machine Training is performed in the translation model, and the step of obtaining the predicted output text sequence continues training until the training stop condition is met.
[0235] On the other hand, the present application also provides a storage medium, characterized in that the storage medium stores computer-executable instructions, and when the computer-executable instructions are loaded and executed by the processor, the above-mentioned text is realized. Generation method.
[0236] With the research and progress of artificial intelligence technology, artificial intelligence technology has been researched and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, drones , Robotics, intelligent medical care, intelligent customer service, etc., I believe that with the development of technology, artificial intelligence technology will be applied in more fields and play more and more important values.
[0237] The text generation method provided in the embodiments of the present application can be applied to any of the above fields.
[0238] The various embodiments in this specification are described in a progressive manner. Each embodiment focuses on the differences from other embodiments, and the same or similar parts between the various embodiments can be referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method part.
[0239] Professionals can further realize that the units and algorithm steps of the examples described in the embodiments disclosed in this article can be implemented by electronic hardware, computer software, or a combination of both, in order to clearly illustrate the possibilities of hardware and software. Interchangeability, in the above description, the composition and steps of each example have been described generally in terms of function. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the present invention.
[0240] The steps of the method or algorithm described in combination with the embodiments disclosed herein can be directly implemented by hardware, a software module executed by a processor, or a combination of the two. The software module can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or all areas in the technical field. Any other known storage media.
[0241] The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be obvious to those skilled in the art, and the general principles defined in this document can be implemented in other embodiments without departing from the spirit or scope of the present invention. Therefore, the present invention will not be limited to the embodiments shown in this document, but should conform to the widest scope consistent with the principles and novel features disclosed in this document.
PUM


Description & Claims & Application Information
We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.