Word alignment methods, devices, storage media, electronic devices and products

CN115293179BActive Publication Date: 2026-06-30TENCENT TECHNOLOGY (SHENZHEN) CO LTD +1

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date: 2022-08-24
Publication Date: 2026-06-30

AI Technical Summary

Technical Problem

In existing technologies, there is little interaction between two sentences during word alignment, resulting in poor word alignment accuracy.

Method used

By encoding the source and target sentences and obtaining their encoding features, cross-fusion processing is performed to obtain cross-language encoding features. Based on these features, alignment processing is then performed to generate word alignment results.

Benefits of technology

It effectively improves the accuracy of word alignment, fully captures the deep interactive information between parallel sentence pairs, and takes into account cross-linguistic context, thus improving the accuracy of word alignment.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN115293179B_ABST

Patent Text Reader

Abstract

This application discloses a word alignment method, apparatus, storage medium, electronic device, and product, relating to the field of artificial intelligence technology. This application can be applied to fields such as blockchain, cloud technology, and map-based vehicle networking. The method includes: obtaining a source sentence and a corresponding target sentence; encoding the source sentence to obtain source sentence encoding features; encoding the target sentence to obtain target sentence encoding features; cross-fusion processing the source sentence encoding features and the target sentence encoding features to obtain cross-language encoding features corresponding to the source sentence and the target sentence; and performing alignment processing based on the cross-language encoding features corresponding to the source sentence and the target sentence to obtain a word alignment result between the source sentence and the target sentence. This application can effectively improve the accuracy of word alignment.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence technology, specifically to a word alignment method, apparatus, storage medium, electronic device, and product. Background Technology

[0002] Word alignment is the task of aligning words in two sentences, such as aligning words with the same meaning in Chinese and English sentences.

[0003] Currently, when performing word alignment in relevant schemes, the two sentences to be aligned are usually encoded separately, and then aligned based on the encoding results of the two sentences. In this way, the two sentences to be aligned are isolated from each other and have little interaction during word alignment, resulting in poor word alignment accuracy.

[0004] Therefore, there is currently a problem with poor word alignment accuracy. Summary of the Invention

[0005] This application provides a word alignment method and related apparatus, which can effectively improve the accuracy of word alignment.

[0006] To address the aforementioned technical problems, this application provides the following technical solutions:

[0007] According to one embodiment of this application, a word alignment method is provided, comprising: obtaining a source sentence and a target sentence corresponding to the source sentence; encoding the source sentence to obtain source sentence encoding features; encoding the target sentence to obtain target sentence encoding features; performing cross-fusion processing on the source sentence encoding features and the target sentence encoding features to obtain cross-language encoding features corresponding to the source sentence and the target sentence; and performing alignment processing based on the cross-language encoding features corresponding to the source sentence and the target sentence to obtain a word alignment result between the source sentence and the target sentence.

[0008] According to one embodiment of this application, a word alignment device includes: a statement acquisition module for acquiring a source statement and a target statement corresponding to the source statement; a source statement encoding module for encoding the source statement to obtain source statement encoding features; a target statement encoding module for encoding the target statement to obtain target statement encoding features; a cross-fusion module for cross-fusion processing the source statement encoding features and the target statement encoding features to obtain cross-language encoding features corresponding to the source statement and the target statement; and an alignment processing module for performing alignment processing based on the cross-language encoding features corresponding to the source statement and the target statement to obtain a word alignment result between the source statement and the target statement.

[0009] In some embodiments of this application, the cross-fusion module includes: at least one layer of cross-attention unit, used to perform at least one layer of cross-attention fusion processing on the source statement encoding features and the target statement encoding features to obtain cross-attention fusion features corresponding to the source statement and the target statement generated at each layer; and a cross-language encoding feature extraction unit, used to obtain cross-language encoding features corresponding to the source statement and the target statement based on the cross-attention fusion features corresponding to the source statement and the target statement.

[0010] In some embodiments of this application, the at least one layer of cross-attention unit is configured to: calculate a first key feature, a first value feature, and a first query feature based on source features at each layer, wherein the source feature at the first layer is the source statement encoding feature, and the source feature at layers after the first layer is the cross-attention fusion feature corresponding to the source statement generated at the previous layer; calculate a second key feature, a second value feature, and a second query feature based on target features, wherein the target feature at the first layer is the target statement encoding feature, and the target feature at layers after the first layer is the cross-attention fusion feature corresponding to the target statement generated at the previous layer; perform attention operations based on the first query feature, the second key feature, and the second value feature to obtain a first operation result, and perform attention operations based on the second query feature, the first value feature, and the first key feature to obtain a second operation result; fuse the first operation result with the source features to generate the cross-attention fusion feature corresponding to the source statement at each layer; and fuse the second operation result with the target features to generate the cross-attention fusion feature corresponding to the target statement at each layer.

[0011] In some embodiments of this application, the at least one layer of cross-attention unit is configured to: fuse the first operation result with the source feature to obtain a first fusion result; normalize the first fusion result to obtain a first normalized result; perform a fully connected operation on the first normalized result to obtain a first fully connected operation result; combine the first fully connected operation result with the first normalized result to obtain a first combined result; and normalize the first combined result to obtain the cross-attention fusion feature corresponding to the source statement generated by each layer.

[0012] In some embodiments of this application, the at least one layer of cross-attention unit is used to: fuse the second operation result with the target feature to obtain a second fusion result; normalize the second fusion result to obtain a second normalized result; perform a fully connected operation on the second normalized result to obtain a second fully connected operation result; combine the second fully connected operation result with the second normalized result to obtain a second combined result; and normalize the second combined result to obtain the cross-attention fusion feature corresponding to the target statement generated by each layer.

[0013] In some embodiments of this application, the source statement encoding module includes: at least one cascaded self-attention unit, used to perform at least one layer of self-attention encoding on the source statement to obtain attention encoding features of the source statement generated at each layer; and a source statement encoding feature extraction unit, used to obtain the source statement encoding features based on the attention encoding features corresponding to the source statement.

[0014] In some embodiments of this application, the source statement encoding module includes at least one cascaded self-attention unit, configured to: in each layer, calculate a third key feature, a third value feature, and a third query feature based on a first feature, wherein in the first layer, the first feature is the initial sentence vector of the source statement, and in layers after the first layer, the first feature is the attention encoding feature of the source statement generated in the previous layer; perform attention operations on the third key feature, the third value feature, and the third query feature to obtain a third operation result; fuse the third operation result with the first feature to obtain a third fusion result; and normalize the third fusion result to obtain the attention encoding feature of the source statement generated in each layer.

[0015] In some embodiments of this application, the target statement encoding module includes: at least one layer of cascaded self-attention units, used to perform at least one layer of self-attention encoding on the target statement to obtain attention encoding features of the target statement generated at each layer; and a target statement encoding feature extraction unit, used to obtain the target statement encoding features based on the attention encoding features corresponding to the target statement.

[0016] In some embodiments of this application, the target statement encoding module includes at least one cascaded self-attention unit, which is used in each layer to calculate a fourth key feature, a fourth value feature, and a fourth query feature based on a second feature; perform attention operations on the fourth key feature, the fourth value feature, and the fourth query feature to obtain a fourth operation result; fuse the fourth operation result with the second feature to obtain a fourth fusion result; and normalize the fourth fusion result to obtain the attention encoding feature of the target statement generated in each layer.

[0017] In some embodiments of this application, the alignment processing module is configured to: calculate the dot product of the cross-language encoding features corresponding to the source statement and the target statement to obtain a similarity matrix; generate a first probability matrix from the source statement to the target statement and a second probability matrix from the target statement to the source statement based on the similarity matrix; take the intersection of the first probability matrix and the second probability matrix to obtain an alignment matrix; and obtain the word alignment result between the source statement and the target statement based on the alignment matrix.

[0018] According to another embodiment of this application, a word alignment method includes: randomly masking words in a source sentence sample and a target sentence sample to obtain a masked source sentence and a masked target sentence; using the masked source sentence and the masked target sentence, performing cross-language mask training on a preset cross-alignment model to obtain a pre-trained cross-alignment model, wherein the pre-trained cross-alignment model includes a cross-fusion module for cross-fusion processing; selecting one layer in the cross-fusion module as an alignment layer, performing self-supervised alignment training on the alignment layer in the pre-trained cross-alignment model until a predetermined training condition is met; forming a trained cross-alignment model based on the alignment layer and the preceding layers in the pre-trained cross-alignment model, wherein the trained cross-alignment model is used in the word alignment method described in any of the foregoing embodiments.

[0019] According to another embodiment of this application, a word alignment device includes: a masking module, used to randomly mask words in a source sentence sample and a target sentence sample to obtain a masked source sentence and a masked target sentence; a cross-language masking training module, used to perform cross-language masking training on a preset cross-alignment model using the masked source sentence and the masked target sentence to obtain a pre-trained cross-alignment model, wherein the pre-trained cross-alignment model includes a cross-fusion module for cross-fusion processing; and a self-supervised alignment training module, used to select one layer of the cross-fusion module as an alignment layer, perform self-supervised alignment training on the alignment layer in the pre-trained cross-alignment model until a predetermined training condition is met, and form a trained cross-alignment model based on the alignment layer and the preceding layers in the pre-trained cross-alignment model, wherein the trained cross-alignment model is used in the word alignment method described in any of the foregoing embodiments.

[0020] In some embodiments of this application, the self-supervised alignment training module is configured to: preprocess the source sentence samples and target sentence samples using the pre-trained cross-alignment model to extract word alignment labels based on the features output by the alignment layer during preprocessing; use the pre-trained cross-alignment model as the current model to reprocess the source sentence samples and target sentence samples to extract the probability matrix between the source sentence samples and the target sentence samples based on the features output by the alignment layer during reprocessing; and optimize the parameters of the alignment layer in the pre-trained cross-alignment model using a predetermined loss function, based on the word alignment labels and the probability matrix, until a predetermined training condition is met.

[0021] According to another embodiment of this application, a computer-readable storage medium is provided, on which a computer program is stored, which, when executed by a computer's processor, causes the computer to perform the methods described in the embodiments of this application.

[0022] According to another embodiment of this application, an electronic device includes: a memory storing a computer program; and a processor reading the computer program stored in the memory to execute the methods described in the embodiments of this application.

[0023] According to another embodiment of this application, a computer program product or computer program includes computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the methods provided in the various optional implementations described in the embodiments of this application.

[0024] In the word alignment scheme of this application embodiment, a source sentence and a target sentence corresponding to the source sentence are obtained; the source sentence is encoded to obtain source sentence encoding features; the target sentence is encoded to obtain target sentence encoding features; the source sentence encoding features and the target sentence encoding features are cross-fused to obtain cross-language encoding features corresponding to the source sentence and the target sentence; and alignment processing is performed based on the cross-language encoding features corresponding to the source sentence and the target sentence to obtain the word alignment result of the source sentence and the target sentence.

[0025] In this way, the source and target sentences are encoded separately to obtain source sentence encoding features and target sentence encoding features. Furthermore, the source sentence encoding features and target sentence encoding features are cross-fused to fully capture the deep interaction information between parallel sentence pairs and obtain cross-language encoding features corresponding to the source and target sentences. Then, based on the cross-language encoding features, alignment processing can be performed to obtain word alignment results by fully considering the cross-language context between parallel sentence pairs, effectively improving word alignment accuracy. Attached Figure Description

[0026] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0027] Figure 1 A schematic diagram of a system to which embodiments of this application can be applied is shown.

[0028] Figure 2 A flowchart of a word alignment method according to an embodiment of this application is shown.

[0029] Figure 3 A flowchart of cross-alignment model training according to an embodiment of this application is shown.

[0030] Figure 4 A framework diagram of a cross-alignment model according to an embodiment of this application is shown.

[0031] Figure 5 A block diagram of a word alignment device according to an embodiment of this application is shown.

[0032] Figure 6 A block diagram of a word alignment device according to another embodiment of this application is shown.

[0033] Figure 7 A block diagram of an electronic device according to an embodiment of this application is shown. Detailed Implementation

[0034] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0035] It is understood that in the specific implementation of this application, source statements and other related data are involved. When the embodiments in this application are applied to specific products or technologies, user permission or consent is required, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.

[0036] Figure 1 A schematic diagram of a system 100 to which embodiments of this application can be applied is shown. For example... Figure 1 As shown, system 100 may include server 101 and terminal 102.

[0037] Server 101 can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.

[0038] Terminal 102 can be any device, including but not limited to mobile phones, computers, smart voice interaction devices, smart home appliances, in-vehicle terminals, VR / AR devices, smartwatches, and computers, etc. In one embodiment, server 101 or terminal 102 can be a node device in a blockchain network or a map-based vehicle networking platform.

[0039] In one embodiment of this example, server 101 or terminal 102 may: obtain a source statement and a target statement corresponding to the source statement; encode the source statement to obtain source statement encoding features; encode the target statement to obtain target statement encoding features; perform cross-fusion processing on the source statement encoding features and the target statement encoding features to obtain cross-language encoding features corresponding to the source statement and the target statement; and perform alignment processing based on the cross-language encoding features corresponding to the source statement and the target statement to obtain word alignment results between the source statement and the target statement.

[0040] Figure 2 A flowchart illustrating a word alignment method according to an embodiment of this application is shown schematically. The entity performing this word alignment method can be any device, such as... Figure 1 The server 101 or terminal 102 shown.

[0041] like Figure 2 As shown, the word alignment method may include steps S210 to S250.

[0042] Step S210: Obtain the source sentence and the corresponding target sentence; Step S220: Encode the source sentence to obtain source sentence encoding features; Step S230: Encode the target sentence to obtain target sentence encoding features; Step S240: Perform cross-fusion processing on the source sentence encoding features and the target sentence encoding features to obtain cross-language encoding features corresponding to the source sentence and the target sentence; Step S250: Perform alignment processing based on the cross-language encoding features corresponding to the source sentence and the target sentence to obtain word alignment results between the source sentence and the target sentence.

[0043] The source statement and the target statement can be two statements with the same semantics but different languages. For example, the source statement can be a Chinese statement, and the target statement can be an English statement translated from the Chinese statement. The source statement and its corresponding target statement can be obtained according to the actual specified method.

[0044] The source sentence can be encoded using mechanisms such as self-attention or convolutional neural networks to obtain source sentence encoding features. Source sentence encoding features are the encoded sentence representation information that characterizes the source sentence; they can be composed of the encoded word vectors corresponding to each word in the source sentence.

[0045] The target sentence can be encoded using mechanisms such as self-attention or convolutional neural networks to obtain its encoded features. These encoded features are the sentence representation information that characterizes the target sentence, and can be composed of the encoded word vectors corresponding to each word in the target sentence.

[0046] Cross-fusion of source and target sentence encoding features allows for the integration of feature information from the source sentence encoding features into the target sentence encoding features, resulting in cross-lingual encoding features corresponding to the target sentence. Similarly, cross-lingual encoding features contain cross-lingual information. The cross-lingual encoding features corresponding to the target sentence can be composed of cross-fused word vectors for each word in the target sentence, and vice versa.

[0047] Alignment processing based on the cross-language encoding features corresponding to the source and target sentences can align the cross-fused word vectors corresponding to words with the same meaning in the cross-language encoding features of the source and target sentences. Alignment information representing the alignment of words in the source and target sentences can be obtained from the feature level. Then, based on the alignment information, the word alignment result of the source and target sentences can be obtained.

[0048] In this way, based on steps S210 to S250, the source sentence and the target sentence are encoded separately to obtain source sentence encoding features and target sentence encoding features. Furthermore, the source sentence encoding features and target sentence encoding features are cross-fused to fully capture the deep interaction information between parallel sentence pairs and obtain cross-language encoding features corresponding to the source sentence and the target sentence. Then, based on the cross-language encoding features, alignment processing is performed to fully consider the cross-language context between parallel sentence pairs to obtain word alignment results, effectively improving word alignment accuracy.

[0049] The following description Figure 2 Other specific alternative embodiments for the steps performed when word alignment is performed in the Chinese embodiment.

[0050] In one embodiment, step S220 involves encoding the source statement to obtain source statement encoding features, including:

[0051] Perform at least one layer of self-attention encoding on the source statement to obtain the attention encoding features of the source statement generated at each layer; based on the attention encoding features corresponding to the source statement, obtain the source statement encoding features.

[0052] In this embodiment, encoding features of the source sentence are obtained based on a self-attention mechanism. Specifically, the initial sentence vector of the source sentence (including the initial word vector of each word in the source sentence) can be input into at least one layer of cascaded self-attention units. Each attention unit sequentially performs self-attention encoding based on the self-attention mechanism. Attention encoding features corresponding to the source sentence can be generated at each layer. The attention encoding features can be composed of the encoded word vectors of each word in the source sentence based on self-attention encoding.

[0053] Furthermore, from the attention encoding features corresponding to the source statement generated at least one layer, the attention encoding features generated at a certain layer can be selected as the source statement encoding features. In one aspect of this example, the attention encoding features generated at the last layer are selected as the source statement encoding features.

[0054] In one embodiment, the source statement is subjected to at least one layer of self-attention encoding to obtain the attention encoding features of the source statement generated at each layer, including:

[0055] At each layer, the third key feature, the third value feature, and the third query feature are calculated based on the first feature. Among them, at the first layer, the first feature is the initial sentence vector of the source sentence, and at the layers after the first layer, the first feature is the attention encoding feature of the source sentence generated by the previous layer; an attention operation is performed on the third key feature, the third value feature, and the third query feature to obtain a third operation result; the third operation result is fused with the first feature to obtain a third fusion result; the third fusion result is normalized to obtain the attention encoding feature of the source sentence generated by each layer.

[0056] In this embodiment, specifically, the initial sentence vector of the source sentence can be input into at least one (m layers) cascaded self-attention unit, and each self-attention unit performs self-attention encoding based on the self-attention mechanism in sequence. Among them, each self-attention unit can include a self-attention sub-layer and a fully connected feed-forward network (FFN).

[0057] In each l-th layer (1 ≤ l ≤ m), according to the formula Based on the first feature H3 l-1 and the third key feature K3, the third value feature V3, and the third query feature Q3 are calculated. Among them, at the first layer (i.e., if l = 1), the first feature H3 l-1 is the initial sentence vector of the source sentence, and at the layers after the first layer (i.e., if 1 < l ≤ m), the first feature H3 l-1 is the attention encoding feature of the source sentence generated by the previous layer, where is the parameter matrix of the pre-trained self-attention unit.

[0058] In each l-th layer, according to the formula an attention operation is performed on the third key feature K3, the third value feature V3, and the third query feature Q3 to obtain a third operation result According to the formula the third operation result is fused with the first feature to obtain a third fusion result, where + represents fusion, and the fusion method can be addition or splicing, etc.

[0059] In each l-th layer, the third fusion result can be further normalized to obtain the attention encoding feature of the source sentence generated by each l-th layer.

[0060] Furthermore, in an embodiment, normalizing the third fusion result to obtain the attention encoding feature of the source sentence generated by each layer may specifically include:

[0061] According to the formula the third fusion result Normalization yields the third normalization result. The third normalization result Perform a full-connect operation to obtain the third full-connect operation result.

[0062] According to the formula The result of the third fully connected operation Compared with the third normalization result Combining, we get the third combination result. The plus sign indicates a combination, which can be done by adding or splicing.

[0063] Furthermore, it can be done according to the formula The third combination result Normalization yields the attention encoding features H3 of the source statements in each layer l. l , where LN(·) is the normalization operation.

[0064] In one embodiment, step S230 involves encoding the target statement to obtain its encoding features, including:

[0065] Perform at least one layer of self-attention encoding on the target statement to obtain the attention encoding features of the target statement generated at each layer; based on the attention encoding features corresponding to the target statement, obtain the target statement encoding features.

[0066] In this embodiment, encoding features of the target sentence are obtained based on a self-attention mechanism. Specifically, the initial sentence vector of the target sentence (including the initial word vector of each word in the target sentence) can be input into at least one layer of cascaded self-attention units. Each attention unit sequentially performs self-attention encoding based on the self-attention mechanism. Attention encoding features corresponding to the target sentence can be generated at each layer. The attention encoding features can be composed of the encoded word vectors of each word in the target sentence based on self-attention encoding.

[0067] Furthermore, from the attention encoding features generated at least one layer corresponding to the target statement, the attention encoding features generated at a certain layer can be selected as the target statement encoding features. In one aspect of this example, the attention encoding features generated at the last layer are selected as the target statement encoding features.

[0068] Furthermore, in some implementations, at least one cascaded self-attention unit for self-attention encoding of the source statement and at least one cascaded self-attention unit for self-attention encoding of the target statement are set in parallel and share parameters, which can further improve the overall word alignment effect.

[0069] In one embodiment, at least one layer of self-attention encoding is performed on the target statement to obtain the attention encoding features of the target statement generated at each layer, including:

[0070] At each layer, based on the second feature, calculate the fourth key feature, the fourth value feature, and the fourth query feature; perform an attention operation on the fourth key feature, the fourth value feature, and the fourth query feature to obtain a fourth operation result; fuse the fourth operation result with the second feature to obtain a fourth fusion result; normalize the fourth fusion result to obtain the attention encoding features of the target statement generated at each layer.

[0071] In this embodiment, specifically, the initial sentence vector of the target statement can be input into at least one layer (m layers) of cascaded self-attention units, and each self-attention unit performs self-attention encoding based on the self-attention mechanism in turn. Among them, each self-attention unit can include a self-attention sublayer and a fully connected feed-forward network (FFN).

[0072] In each self-attention unit of the l-th layer (1 ≤ l ≤ m), according to the formula Based on the second feature H4 l-1 And Calculate the fourth key feature K4, the fourth value feature V4, and the fourth query feature Q4. Among them, in the first layer (i.e., if l = 1), the second feature H4 l-1 Is the initial sentence vector of the target statement. In the layers after the first layer (i.e., if 1 < l ≤ m), the first feature H4 l-1 Is the attention encoding feature of the target statement generated by the previous layer. Among them, Is the parameter matrix of the pre-trained self-attention unit.

[0073] In each l-th layer, according to the formula Perform an attention operation on the fourth key feature K4, the fourth value feature V4, and the fourth query feature Q4 to obtain a fourth operation result According to the formula Fuse the fourth operation result with the second feature to obtain a fourth fusion result. Among them, + represents fusion, and the fusion method can be addition or splicing, etc.

[0074] In each l-th layer, the fourth fusion result Can be further normalized to obtain the attention encoding features of the source statement generated by each l-th layer.

[0075] Furthermore, in one embodiment, normalizing the third fusion result to obtain the attention encoding features of the source statement generated by each layer specifically may include:

[0076] According to the formula The fourth fusion result Normalization yields the fourth normalization result. Where LN(·) is the normalization operation; the fourth normalization result is... Perform a full-connect operation to obtain the result of the fourth full-connect operation.

[0077] According to the formula The result of the fourth fully connected operation Compared with the fourth normalization result Combining, we get the fourth combination result. The plus sign indicates a combination, which can be done by adding or splicing.

[0078] Furthermore, it can be done according to the formula The result of the fourth combination Normalization yields the attention encoding features H4 of the target statement in each layer l. l , where LN(·) is the normalization operation.

[0079] In one embodiment, step S240, which involves cross-fusion of the source statement encoding features and the target statement encoding features to obtain cross-language encoding features corresponding to the source statement and the target statement, may include: concatenating some features from the source statement encoding features to the target statement encoding features to obtain cross-language encoding features corresponding to the target statement; and concatenating some features from the target statement encoding features to the source statement encoding features to obtain cross-language encoding features corresponding to the source statement.

[0080] In one embodiment, step S240 involves cross-fusion of the source statement encoding features and the target statement encoding features to obtain cross-language encoding features corresponding to the source and target statements, including:

[0081] The source sentence encoding features and the target sentence encoding features are subjected to at least one layer of cross-attention fusion processing to obtain the cross-attention fusion features corresponding to the source sentence and the target sentence generated at each layer; based on the cross-attention fusion features corresponding to the source sentence and the target sentence, the cross-language encoding features corresponding to the source sentence and the target sentence are obtained.

[0082] In this embodiment, cross-language encoding features are obtained based on a cross-attention mechanism. Specifically, the source sentence encoding features and the target sentence encoding features can be input together into at least one cascaded cross-attention unit. Each cross-attention unit sequentially performs cross-attention fusion processing based on the cross-attention mechanism. At each layer, cross-attention fusion features corresponding to the source sentence and the target sentence can be generated respectively. The cross-attention fusion feature corresponding to the source sentence can be composed of the cross-fused word vectors corresponding to each word in the source sentence, and the cross-attention fusion feature corresponding to the target sentence can be composed of the cross-fused word vectors corresponding to each word in the target sentence.

[0083] Furthermore, from the cross-attention fusion features generated at least one layer corresponding to the source and target statements, a cross-attention fusion feature generated at a certain layer can be selected as the cross-language encoding feature corresponding to the source and target statements. In one aspect of this example, the cross-attention fusion feature generated at the last layer is selected as the cross-language encoding feature corresponding to the source and target statements.

[0084] In one embodiment, the source statement encoding features and the target statement encoding features are subjected to at least one layer of cross-attention fusion processing to obtain cross-attention fusion features corresponding to the source statement and the target statement generated at each layer, including:

[0085] At each layer, a first key feature, a first value feature, and a first query feature are calculated based on the source features. In the first layer, the source feature is the source statement encoding feature, and in layers after the first layer, the source features are the cross-attention fusion features corresponding to the source statements generated in the previous layer. A second key feature, a second value feature, and a second query feature are calculated based on the target features. In the first layer, the target feature is the target statement encoding feature, and in layers after the first layer, the target features are the cross-attention fusion features corresponding to the target statements generated in the previous layer. Attention operations are performed based on the first query feature, the second key feature, and the second value feature to obtain a first operation result. Attention operations are also performed based on the second query feature, the first value feature, and the first key feature to obtain a second operation result. The first operation result is fused with the source features to generate the cross-attention fusion features corresponding to the source statements at each layer. The second operation result is fused with the target features to generate the cross-attention fusion features corresponding to the target statements at each layer.

[0086] In this embodiment, specifically, the source statement encoding features and the target statement encoding features can be input into at least one (n) cascaded cross-attention units. Each cross-attention unit sequentially performs cross-attention fusion processing based on the cross-attention mechanism. Each cross-attention unit can contain a cross-attention sublayer and a fully connected feedforward network (FFN).

[0087] In the cross-attention unit of each l-th layer (1 ≤ l ≤ n), according to the formula 1 the first key feature K and are calculated based on the source feature y and the first value feature V y and the first query feature Q x , where, in the first layer (i.e., if l = 1), the source feature is the source sentence encoding feature, and in the layers after the first layer (i.e., if 1 < l ≤ n), the source feature is the cross-attention fusion feature corresponding to the source sentence generated by the previous layer, and, the second key feature K and are calculated based on the target feature x and the second value feature V x and the second query feature Q y , where, in the first layer (i.e., if l = 1), the target feature is the target sentence encoding feature, and in the layers after the first layer (i.e., if 1 < l ≤ n), the target feature is the cross-attention fusion feature corresponding to the target sentence generated by the previous layer. Where is the parameter matrix of the pre-trained cross-attention unit.

[0088] Furthermore, according to the formula an attention operation is performed based on the first query feature Q x , the second key feature K x and the second value feature V x to obtain the first operation result. In this way, the query Q <sh x of the cross-attention comes from one language (i.e., the source sentence), while the key K x and the value V t x come from another language (i.e., the target sentence). And, according to the formula an attention operation is performed based on the second query feature Q y , the first value feature V y and the first key feature K y to obtain the second operation result. In this way, the query Q y of the cross-attention comes from one language (i.e., the target sentence), while the key K <000002s>and the value V y come from another language (i.e., the source sentence).

[0089] Furthermore, the first operation result is combined with the source feature The fusion process generates cross-attention fusion features corresponding to the source statements at each layer l. The second computation result is then... With target features Fusion processing is used to generate cross-attention fusion features corresponding to the target statement at each l-level.

[0090] In one embodiment, the first calculation result is fused with the source features to generate cross-attention fusion features corresponding to the source statement at each layer, which may specifically include:

[0091] The first operation result is fused with the source features to obtain the first fusion result; the first fusion result is normalized to obtain the first normalized result; the first normalized result is subjected to a fully connected operation to obtain the first fully connected operation result; the first fully connected operation result is combined with the first normalized result to obtain the first combined result; the first combined result is normalized to obtain the cross-attention fusion feature corresponding to the source statement generated at each layer.

[0092] According to the formula The first calculation result can be Source features The fusion was performed to obtain the first fusion result. Where + indicates fusion. According to the formula... The first fusion result can be normalized to obtain the first normalized result. Here, LN(·) is the normalization operation. The first normalization result... Perform a full-connect operation to obtain the first full-connect operation result.

[0093] Furthermore, it can be done according to the formula The result of the first fully connected operation Compared with the first normalization result By combining the results, we obtain the first combination. Furthermore, the first combination result is normalized to obtain the cross-attention fusion features corresponding to the source statements generated at each layer. Here, + represents combination, and LN(·) is the normalization operation.

[0094] In one embodiment, the second calculation result is fused with the target features to generate cross-attention fusion features corresponding to the target statement at each layer, which may specifically include:

[0095] The second operation result is fused with the target feature to obtain the second fusion result; the second fusion result is normalized to obtain the second normalized result; the second normalized result is subjected to a fully connected operation to obtain the second fully connected operation result; the second fully connected operation result is combined with the second normalized result to obtain the second combined result; the second combined result is normalized to obtain the cross-attention fusion feature corresponding to the target statement generated at each layer.

[0096] According to the formula Will With target features The fusion was performed to obtain the second fusion result. Where + indicates fusion. According to the formula... The second fusion result can be normalized to obtain the second normalized result. Here, LN(·) is the normalization operation. The second normalization result... Perform a full-connect operation to obtain the second full-connect operation result.

[0097] Furthermore, according to the formula The result of the second fully connected operation can be Compared with the second normalization result By combining the results, we obtain the second combination. Furthermore, the second combination result is normalized to obtain the cross-attention fusion feature corresponding to the target statement generated at each layer l. Here, + represents combination, and LN(·) is the normalization operation.

[0098] In one embodiment, step S250 involves performing alignment processing based on the cross-language encoding features corresponding to the source and target sentences to obtain word alignment results between the source and target sentences, including:

[0099] Calculate the dot product of the cross-language encoded features corresponding to the source and target sentences to obtain the similarity matrix; generate a first probability matrix from the source sentence to the target sentence and a second probability matrix from the target sentence to the source sentence based on the similarity matrix; take the intersection of the first probability matrix and the second probability matrix to obtain the alignment matrix; obtain the word alignment result between the source sentence and the target sentence based on the alignment matrix.

[0100] The cross-language encoding features corresponding to the source statement can be represented as s={s1,s2,…,s i ,…,s I The cross-language encoding features corresponding to the target statement can be represented as t = {t1, t2, ..., t}. j ,…,t J} Calculating the dot product of s and t yields the similarity matrix S. I×J .

[0101] Furthermore, softmax normalization can be applied to the similarity matrix S I×J Normalize by row and column respectively to generate the first probability matrix from source statement to target statement. And the second probability matrix from the target statement to the source statement.

[0102] Furthermore, it can be determined according to formula G. Ij =(P ij f >τ)*(P ij b >τ), can be applied to the first probability matrix Second probability matrix Taking the intersection, we obtain the alignment matrix G. I×J Where τ is a preset threshold, and P ij f The first probability matrix Middle element, P ij b The second probability matrix Middle element, G ij For the alignment matrix G I×J Alignment information in P, * indicates AND, if P ij f >and P ij b >, then G ij G equals 1, otherwise, G ij G equals 0 ij An equal value of 1 indicates the first probability matrix. Second probability matrix Align the elements in the i-th row and j-th column.

[0103] Furthermore, based on the alignment matrix G I×J This yields the word alignment result between the source and target sentences; that is, if two words in the source and target sentences are aligned in the alignment matrix G... I×J If the corresponding element is 1, then the two words are aligned.

[0104] Furthermore, in one embodiment of this application, a method for training a model in two stages is provided to obtain a trained cross-alignment model for word alignment. When the trained cross-alignment model implements the cross-alignment processing task in the word alignment method of the aforementioned embodiment, the overall word alignment accuracy can be further improved.

[0105] Specifically, see Figure 3 The word alignment method in this embodiment includes:

[0106] Step S310: Randomly mask the words in the source sentence sample and the target sentence sample to obtain a masked source sentence and a masked target sentence containing the mask; Step S320: Use the masked source sentence and the masked target sentence to perform cross-language mask training on the preset cross-alignment model to obtain a pre-trained cross-alignment model, which includes a cross-fusion module for cross-fusion processing; Step S330: Select one layer in the cross-fusion module as the alignment layer, and perform self-supervised alignment training on the alignment layer in the pre-trained cross-alignment model until the predetermined training conditions are met. Based on the alignment layer and the previous layers in the pre-trained cross-alignment model, a trained cross-alignment model is formed, which is used in the word alignment method of any of the foregoing embodiments.

[0107] The source sentence sample and the target sentence sample can be two sentence samples with the same semantics but different languages. For example, the source sentence sample can be a Chinese sentence, and the target sentence sample can be an English sentence translated from the Chinese sentence. The source sentence sample and the target sentence sample form a sentence sample pair used for model training. It can be understood that there can be multiple sentence sample pairs, such as 1 million sentence sample pairs.

[0108] In the first stage of training, a pre-trained cross-alignment model is trained through the Translation Language Modeling (TLM) task, enabling the model to learn cross-language representations first.

[0109] Specifically, the words in the source and target sentence samples are randomly masked. For example, 15% of the words (tokens) in the source and target sentence samples can be randomly selected. For each selected word (token), it is replaced with a mask ([MASK]token) with an 80% probability, replaced with a random word (random token) with a 10% probability, and the original word (token) is retained with a 10% probability. Thus, a masked source sentence and a masked target sentence containing the mask ([MASK]token) are obtained.

[0110] Using a mask source statement and a mask target statement, a pre-set cross-alignment model is trained for cross-language masking. Specifically, the pre-set cross-alignment model is trained to predict the original mask word corresponding to the mask ([MASK] token) based on the cross-language bilingual context between the mask source statement and the mask target statement. The "training objective" of the first stage is optimized to optimize the parameters in the pre-set cross-alignment model until the predetermined training conditions of the first stage are met (such as the prediction accuracy of the original mask word being higher than the predetermined accuracy). At this point, the first stage of the cross-language masking language model (TLM) task can be completed, and the pre-trained cross-alignment model is obtained.

[0111] In some embodiments, the pre-trained cross-alignment model may include a cross-fusion module for cross-fusion processing, and may also include a source statement encoding module and a target statement encoding module for encoding. The training objective of the first stage can be expressed as: in, For masking source statements, Let z be the source statement sample, k be the target statement sample, and θ be the target statement sample. s and θ c θ is a parameter in the preset language model. s θ can represent the parameters of the cross-fusion module. c The parameters can represent the source sentence encoding module and the target sentence encoding module. In some embodiments, the cross-fusion module in the pre-trained cross-alignment model may specifically include at least one layer (X layers) of cascaded cross-attention units. In some embodiments, the source sentence encoding module may include at least one layer of cascaded self-attention units. In some embodiments, the target sentence encoding module may include at least one layer of cascaded self-attention units.

[0112] In the second stage, based on the self-supervised alignment (SSA) task, the alignment layers in the pre-trained cross-alignment model trained in the first stage are subjected to self-supervised alignment training. That is, in the second stage, only the alignment layers in the model are fine-tuned while other layers are frozen. Further fine-tuning of the alignment layers in the model based on the self-supervised alignment task in the second stage ensures consistency between the training and inference goals. In related training schemes, a pre-trained cross-alignment model is trained through a cross-language masking language model task in the first stage. However, in the second stage of inference, sentence pairs (i.e., sentence pairs consisting of source and target sentence samples) are encoded separately to train the model, ensuring consistency between the training and inference goals. Through this two-stage training embodiment, the training and inference goals are aligned. Compared to the post-trained cross-alignment model obtained by related techniques, this model effectively improves word alignment accuracy.

[0113] In this process, one layer of the cross-fusion module is selected as the alignment layer. For example, the cross-fusion module may include 5 cascaded cross-attention units, and one of the 5 cross-attention units can be selected as the alignment layer.

[0114] Specifically, a post-trained cross-alignment model is formed based on the alignment layer and preceding layers in the pre-trained cross-alignment model. For example, the pre-trained cross-alignment model includes 12 layers: layers 1 to 6 are cascaded self-attention units, and layers 7 to 12 are cascaded cross-attention units. If the alignment layer is the c=10th layer of cross-attention units, the post-trained cross-alignment model can be formed based on the cascaded self-attention units of layers 1 to 6 and the cascaded cross-attention units of layers 7 to 10.

[0115] Furthermore, each layer in the cross-fusion module can be selected sequentially as an alignment layer to perform the second-stage task. Then, the alignment effect of the trained cross-alignment model when each layer in the cross-fusion module is used as an alignment layer is compared, and the trained cross-alignment model with the best alignment effect is selected as the final trained cross-alignment model.

[0116] This trained cross-alignment model is used according to the aforementioned embodiments (e.g.) Figure 2 When performing word alignment using the word alignment method in the corresponding embodiment, the trained cross-alignment model can: encode the source sentence to obtain source sentence encoding features; encode the target sentence to obtain target sentence encoding features; and perform cross-fusion processing on the source sentence encoding features and the target sentence encoding features to obtain cross-language encoding features corresponding to the source sentence and the target sentence.

[0117] In one embodiment, performing self-supervised alignment training on the alignment layer in the pre-trained cross-alignment model until predetermined training conditions are met may include:

[0118] A pre-trained cross-alignment model is used to preprocess the source and target sentence samples to extract word alignment labels based on the features output by the alignment layer during preprocessing. The pre-trained cross-alignment model is then used as the current model to reprocess the source and target sentence samples to extract the probability matrix between the source and target sentence samples based on the features output by the alignment layer during reprocessing. A predetermined loss function is used to optimize the parameters of the alignment layer in the pre-trained cross-alignment model based on the word alignment labels and the probability matrix until the predetermined training conditions are met.

[0119] Specifically, in the second stage, the source sentence samples and the target sentence samples are input into the pre-trained cross-alignment model trained in the first stage. The pre-trained cross-alignment model can preprocess the source sentence samples and the target sentence samples based on the model parameters trained in the first stage, and extract word alignment labels based on the features output by the alignment layer during the preprocessing process. That is, the alignment information generated by the alignment layer in the model trained in the first stage is used as the word alignment labels.

[0120] For example, the alignment layer can be a cross-attention unit. During preprocessing, the features output by this cross-attention unit can include the cross-attention fusion feature D1 corresponding to the source sentence sample and the cross-attention fusion feature D2 corresponding to the target sentence sample. Taking the dot product of D1 and D2 yields the sample similarity matrix S1′. I×J Softmax normalization can be applied to S1′ I×J Normalize by row and column respectively to generate the first sample probability matrix from the source statement sample to the target statement sample. And the second sample probability matrix from the target statement sample to the source statement sample Furthermore, it can be determined according to formula G′ ij =(P ij f′ >τ)*(P ij b′ >τ), which can be applied to the first sample probability matrix. Second sample probability matrix Taking the intersection, we obtain the sample alignment matrix G′. I×J Where τ is a preset threshold, and P ij f′ for Chinese elements, for middle element, G′ ij For G′ I×J Alignment information in P, * indicates AND, if P ij f′ >τ and P ij b′ >τ, then G′ ij Equals 1, otherwise, G′ ij G' equals 0 ij An equal to 1 means and Align the elements in the i-th row and j-th column.

[0121] The extracted word alignment labels are the sample alignment matrix G′ I×J Furthermore, this enables the use of alignment information (i.e., the sample alignment matrix G′) generated by the alignment layer in the model trained in the first stage. I×J () as a word alignment tag.

[0122] Furthermore, by using the pre-trained cross-alignment model as the current model, the source and target sentence samples are reprocessed. Based on the features output by the alignment layer during reprocessing, a probability matrix between the source and target sentence samples can be extracted. This probability matrix can include a third-sample probability matrix from the source to the target sentence sample. And the fourth sample probability matrix from the target statement sample to the source statement sample

[0123] During further processing, the features output by the cross-attention unit can also include the cross-attention fusion feature D3 corresponding to the source sentence sample and the cross-attention fusion feature D4 corresponding to the target sentence sample. Taking the dot product of D3 and D4 yields the sample similarity matrix S2′. I×J Softmax normalization can be applied to S2′ I×J Normalize by row and column respectively, and generate the third sample probability matrix from the source statement sample to the target statement sample. And the fourth sample probability matrix from the target statement sample to the source statement sample

[0124] Using a predetermined loss function, labels G′ can be aligned based on words. I×J and probability matrix The alignment objective in the second stage is optimized, thereby optimizing the parameters of the alignment layer in the pre-trained cross-alignment model. This involves freezing all layers except the alignment layer in the current model and fine-tuning only the parameters in the alignment layer until the predetermined training conditions of the second stage are met (these conditions can be set according to the actual situation), thus completing the self-supervised alignment task in the second stage.

[0125] In one implementation, the predetermined loss function is the cross-entropy loss function, which can be used based on word alignment labels G′. I×J and probability matrix Calculate the cross-entropy loss; the alignment objective can be optimized by minimizing the cross-entropy loss. in, for The element in the i-th row and j-th column, for The element in the i-th row and j-th column, G′ ij For G′ I×J The element in the i-th row and j-th column.

[0126] The foregoing embodiments are further described below in conjunction with a scenario where word alignment is performed by applying the foregoing embodiments of this application.

[0127] In this scenario, a two-stage training process is performed on the pre-defined cross-alignment model to obtain a post-trained cross-alignment model. Word alignment is then performed based on this post-trained cross-alignment model. (See also...) Figure 4 The model structure of the trained cross-alignment model is as follows: Figure 4 As shown, the post-training cross-alignment model includes a source statement encoding module 410, a target statement encoding module 420, and a cross-fusion module 430.

[0128] The source statement encoding module 410 includes at least one layer of cascaded self-attention units 411. Figure 4 Only one layer is shown in the diagram. Each self-attention unit 411 contains a self-attention sublayer 4111 and a fully connected feedforward network (FFN) 4112. The target statement encoding module 420 includes at least one layer of cascaded self-attention units 421 (…). Figure 4 (Only one layer is shown in the image), each self-attention unit 421 contains a self-attention sub-layer 4211 and a fully connected feedforward network (FFN) 4212.

[0129] In this context, the self-attention unit 411 in the source statement encoding module 410 and the self-attention unit 421 in the target statement encoding module 420 are configured and shared in parallel.

[0130] The cross-fusion module 430 includes at least one cascaded cross-attention unit 431. Figure 4 (Only one layer is shown in the figure), each cross attention unit 431 contains a cross attention sublayer 4311 and a fully connected feedforward network (FFN) 4312.

[0131] Continue reading Figure 4 In this scenario, the process of applying the trained cross-alignment model for word alignment can include steps one to three.

[0132] I. Obtain the source statement and its corresponding target statement. The source statement can be represented as x = {x1, x2, ..., x}. i ,…,x I The target statement can be represented as y = {y1, y2, ..., y}. j ,…,y J}

[0133] 2. Input the source statement and its corresponding target statement into the "post-trained cross-alignment model". In the post-trained cross-alignment model:

[0134] 2.1 In the source statement encoding module 410, the source statement is encoded to obtain the source statement encoding features; 2.2 In the target statement encoding module 420, the target statement is encoded to obtain the target statement encoding features; 2.3 In the cross-fusion module 430, the source statement encoding features and the target statement encoding features are cross-fused to obtain the cross-language encoding features corresponding to the source statement and the target statement.

[0135] In the source sentence encoding module 410, the source sentence is encoded to obtain source sentence encoding features, specifically including: through at least one layer of cascaded self-attention units 411, performing at least one layer of self-attention encoding on the source sentence to obtain the attention encoding features of the source sentence generated at each layer; based on the attention encoding features corresponding to the source sentence, obtaining the source sentence encoding features.

[0136] Through at least one layer of cascaded self-attention units 411, performing at least one layer of self-attention encoding on the source sentence to obtain the attention encoding features of the source sentence generated at each layer, including: at each layer, calculating the third key feature, the third value feature, and the third query feature based on the first feature, where in the first layer, the first feature is the initial sentence vector of the source sentence, and in the layers after the first layer, the first feature is the attention encoding feature of the source sentence generated in the previous layer; performing an attention operation on the third key feature, the third value feature, and the third query feature to obtain a third operation result; fusing the third operation result with the first feature to obtain a third fusion result; normalizing the third fusion result to obtain the attention encoding features of the source sentence generated at each layer.

[0137] Specifically, the initial sentence vector of the source sentence can be input into at least one layer (m layers) of cascaded self-attention units 411, and each self-attention unit performs self-attention encoding based on the self-attention mechanism in turn. Among them, each self-attention unit 411 can include a self-attention sublayer 4111 and a fully connected feed-forward network (FFN) 4112.

[0138] In each self-attention unit 411 of the l-th layer (1 ≤ l ≤ m), through the self-attention sublayer 4111, according to the formula Based on the first feature H3 l-1 And Calculate to obtain the third key feature K3, the third value feature V3, and the third query feature Q3, where in the first layer (i.e., if l = 1), the first feature H3 l-1 Is the initial sentence vector of the source sentence, and in the layers after the first layer (i.e., if 1 < l ≤ m), the first feature H3 l-1 Is the attention encoding feature of the source sentence generated in the previous layer, where Is the parameter matrix of the pre-trained self-attention unit 411. And, according to the formula Perform an attention operation on the third key feature K3, the third value feature V3, and the third query feature Q3 to obtain a third operation result

[0139] In each self-attention unit 411 of the l-th layer (1 ≤ l ≤ m), according to the formula The third operation result is fused with the first feature to obtain a third fusion result, where "+" indicates fusion, and the fusion method can be addition or concatenation, etc. The third fusion result is then further processed... By normalizing, we can obtain the attention encoding features of the source statements generated at each layer l.

[0140] Furthermore, in the self-attention unit 411, the third fusion result is... Normalization yields the attention encoding features of the source statements generated at each layer l, specifically including: according to the formula The third fusion result Normalization yields the third normalization result. The third normalization result is processed through a fully connected feedforward network (FFN) 4112. Perform a full-connect operation to obtain the third full-connect operation result. According to the formula The result of the third fully connected operation Compared with the third normalization result Combining, we get the third combination result. Here, "+" indicates combination, which can be done through addition or concatenation; furthermore, it can be based on the formula. The third combination result Normalization yields the attention encoding features H3 of the source statements in each layer l. l , where LN(·) is the normalization operation.

[0141] Furthermore, based on the attention encoding features corresponding to the source statement, the source statement encoding features can be obtained by selecting the attention encoding features generated in the last layer as the target statement encoding features M. x .

[0142] In the target statement encoding module 420, the target statement is encoded to obtain target statement encoding features. Specifically, this includes: performing at least one layer of self-attention encoding on the target statement through at least one layer of cascaded self-attention units 421 to obtain the attention encoding features of the target statement generated at each layer; and obtaining the target statement encoding features based on the attention encoding features corresponding to the target statement.

[0143] Through at least one layer of cascaded self-attention units 421, perform at least one layer of self-attention encoding on the target statement to obtain the attention encoding features of the target statement generated at each layer, specifically including: at each layer, calculate the fourth key feature, the fourth value feature, and the fourth query feature based on the second feature; perform an attention operation on the fourth key feature, the fourth value feature, and the fourth query feature to obtain a fourth operation result; fuse the fourth operation result with the second feature to obtain a fourth fusion result; normalize the fourth fusion result to obtain the attention encoding features of the target statement generated at each layer.

[0144] Specifically, the initial sentence vector of the target statement can be input into at least one layer (m layers) of cascaded self-attention units 421, and each self-attention unit performs self-attention encoding based on the self-attention mechanism in turn. Among them, each self-attention unit can include a self-attention sub-layer 4211 and a fully connected feed-forward network (FFN) 4212.

[0145] In each self-attention unit 421 of the l-th layer (1 ≤ l ≤ m), through the self-attention sub-layer 4211, according to the formula Based on the second feature H4 l-1 And Calculate to obtain the fourth key feature K4, the fourth value feature V4, and the fourth query feature Q4. Among them, in the first layer (i.e., if l = 1), the second feature H4 l-1 Is the initial sentence vector of the target statement. In the layers after the first layer (i.e., if 1 < l ≤ m), the first feature H4 l-1 Is the attention encoding feature of the target statement generated in the previous layer. Among them, Is the parameter matrix of the pre-trained self-attention unit. According to the formula Perform an attention operation on the fourth key feature K4, the fourth value feature V4, and the fourth query feature Q4 to obtain a fourth operation result In each self-attention unit 421 of the l-th layer (1 ≤ l ≤ m), further according to the formula [[ID=2」]] Fuse the fourth operation result with the second feature to obtain a fourth fusion result. Among them, + represents fusion, and the fusion method can be addition or splicing, etc. The fourth fusion result Is normalized to obtain the attention encoding features of the source statement generated at each l-th layer.

[0146] Among them, in the self-attention unit 421, normalize the third fusion result to obtain the attention encoding features of the source statement generated at each layer, specifically including: according to the formula Normalize the fourth fusion result To obtain a fourth normalization result Where LN(·) is the normalization operation; the fourth normalization result is... Perform a full-connect operation to obtain the result of the fourth full-connect operation. Through a fully connected feedforward network (FFN) 4212, according to the formula The result of the fourth fully connected operation Compared with the fourth normalization result Combining, we get the fourth combination result. Here, "+" indicates combination, which can be done through addition or concatenation; furthermore, it can be based on the formula. The result of the fourth combination Normalization yields the attention encoding features H4 of the target statement in each layer l. l , where LN(·) is the normalization operation.

[0147] Furthermore, based on the attention encoding features corresponding to the target statement, the target statement encoding features can be obtained by selecting the attention encoding features generated in the last layer as the target statement encoding features M. y .

[0148] In the 2.3 cross-fusion module 430, the source statement encoding feature M is... x With target statement encoding feature M y Cross-fusion processing is performed to obtain the cross-language encoding features corresponding to the source and target sentences. Specifically, this includes: 2.3.1 using at least one layer of cascaded cross-attention units 431 to integrate the source sentence encoding features M. x With target statement encoding feature M y Perform at least one layer of cross-attention fusion processing to obtain the cross-attention fusion features corresponding to the source and target statements generated at each layer; 2.3.2 Based on the cross-attention fusion features corresponding to the source and target statements, obtain the cross-language encoding features corresponding to the source and target statements.

[0149] 2.3.1 The source sentence encoding features M are encoded through at least one layer of cascaded cross-attention units 431. x With target statement encoding feature M y Perform at least one layer of cross-attention fusion processing to obtain the cross-attention fusion features corresponding to the source and target statements generated at each layer, including:

[0150] At each layer, the first key feature, the first value feature, and the first query feature are calculated based on the source feature. Among them, at the first layer, the source feature is the source statement encoding feature, and at the layers afterthe first layer, the source feature is the cross-attention fusion feature corresponding to the source statement generated by the previous layer; the second key feature, the second value feature, and the second query feature are calculated based on the target feature. Among them, at the first layer, the target feature is the target statement encoding feature, and at the layers after the first layer., the target feature is the cross-attention fusion featur corresponding to the target statement generated by the previous layer; an attention operation is performed based on the first query feature, the second key feature, and the second value feature to obtain a first operation result, and an attention operation is performed based on the second query feature, the first value feature, and the first key feature to obtain a second operation result; the first operation result is fused with the source feature to generate the cross-attention fusion feature corresponding to the source statement at each layer; the second operation result is fused with the target feature to generate the cross-attention fusion feature corresponding to the target statement at each layer.

[0151] Specifically, the source statement encoding feature M x and the target statement encoding feature M y are input into at least one (n-layer) cascaded cross-attention unit 431, and each cross-attention unit performs cross-attention fusion processing based on the cross-attention mechanism in sequence. Among them, each cross-attention unit 431 may include a cross-attention sublayer 4311 and a fully connected feed-forward network (FFN) 4312.

[0152] In the cross-attention unit 431 of each l-th layer (1 ≤ l ≤ n), through the cross-attention sublayer 4311, the first key feature K can be calculated based on the source feature and , the first value feature V y , and the first query feature Q y , where, at the first layer (i.e., if l = 1), the source feature x is the source statement encoding feature M , and at the layers after the first layer (i.e., if 1 < l ≤ n), the source feature x is the cross-attention fusion feature corresponding to the source statement generated by the previous layer, and, based on the target feature and and , the second key feature K x , the second value feature V x , and the second query feature Q y are calculated, where, at the first layer (i.e., if l = 1), the target feature is the target statement encoding feature M y , and at the layers after the first layer (i.e., if 1 < l ≤ n), the target feature These are the cross-attention fusion features corresponding to the target statement generated in the previous layer. Among them, This is the parameter matrix of the pre-trained cross-attention units. Further, according to the formula... Based on the first query feature Q x Second bond feature K x and the second-valued feature V x Perform attention operations to obtain the first result, thereby enabling the cross-attention query Q. x It comes from a language (i.e., the source statement), and the key K x Sum V x From another language (i.e., the target statement). And, according to the formula... Based on the second query feature Q y First-valued feature V y and the first key feature K y Perform attention operations to obtain the second result, thereby enabling the cross-attention query Q. y It comes from a language (i.e., the target statement), and the key K y Sum V y It comes from another language (i.e., the source statement).

[0153] Furthermore, in the cross-attention unit 431, the first operation result can be: Source features The fusion process generates cross-attention fusion features corresponding to the source statements at each layer l; and the second operation result is... With target features Fusion processing is used to generate cross-attention fusion features corresponding to the target statement at each l-level.

[0154] In the cross-attention unit 431, the first operation result is fused with the source features to generate cross-attention fusion features corresponding to the source statement at each layer. Specifically, this may include: fusing the first operation result with the source features to obtain a first fusion result; normalizing the first fusion result to obtain a first normalized result; performing a fully connected operation on the first normalized result to obtain a first fully connected operation result; combining the first fully connected operation result with the first normalized result to obtain a first combined result; and normalizing the first combined result to obtain the cross-attention fusion features corresponding to the source statement generated at each layer.

[0155] Specifically, in the cross-attention unit 431, it is possible to: according to the formula The first calculation result can be Source features The fusion was performed to obtain the first fusion result. Where + indicates fusion; according to the formula The first fusion result can be normalized to obtain the first normalized result. Where LN(·) is the normalization operation; the first normalization result is processed through a fully connected feedforward network (FFN) 4312. Perform a full-connect operation to obtain the first full-connect operation result. According to the formula The result of the first fully connected operation Compared with the first normalization result By combining the results, we obtain the first combination. Furthermore, the first combination result is normalized to obtain the cross-attention fusion features corresponding to the source statements generated at each layer. Here, + represents combination, and LN(·) is the normalization operation.

[0156] In the cross-attention unit 431, the second operation result is fused with the target feature to generate cross-attention fusion features corresponding to the target statement at each layer. Specifically, this may include: fusing the second operation result with the target feature to obtain a second fusion result; normalizing the second fusion result to obtain a second normalized result; performing a full connection operation on the second normalized result to obtain a second full connection operation result; combining the second full connection operation result with the second normalized result to obtain a second combined result; and normalizing the second combined result to obtain the cross-attention fusion features corresponding to the target statement generated at each layer.

[0157] Specifically, in the cross-attention unit 431, it is possible to: according to the formula Will With target features The fusion was performed to obtain the second fusion result. Where + indicates fusion; according to the formula The second fusion result can be normalized to obtain the second normalized result. Where LN(·( is the normalization operation; the second normalization result is processed through a fully connected feedforward network (FFN) 4312. Perform a full-connect operation to obtain the second full-connect operation result. According to the formula The result of the second fully connected operation can be Compared with the second normalization result By combining the results, we obtain the second combination. Furthermore, the second combination result is normalized to obtain the cross-attention fusion feature corresponding to the target statement generated at each layer l. Here, + represents combination, and LN(·) is the normalization operation.

[0158] 2.3.2 Based on the cross-attention fusion features corresponding to the source and target statements, the cross-language encoding features corresponding to the source and target statements are obtained. This can be achieved by selecting a certain layer of cross-attention fusion features generated from at least one layer as the cross-language encoding features corresponding to the source and target statements. The cross-language encoding features corresponding to the source statement can be represented as s={s1,s2,…,s…} i ,…,s I The cross-language encoding features corresponding to the target statement can be represented as t = {t1, t2, ..., t}. j ,…,t J}

[0159] Third, alignment processing is performed based on the cross-language encoding features corresponding to the source and target sentences to obtain the word alignment results of the source and target sentences.

[0160] Alignment processing is performed based on the cross-language encoding features corresponding to the source and target sentences to obtain word alignment results between the source and target sentences. Specifically, this includes: calculating the dot product of the cross-language encoding features corresponding to the source and target sentences to obtain a similarity matrix; generating a first probability matrix from the source sentence to the target sentence and a second probability matrix from the target sentence to the source sentence based on the similarity matrix; taking the intersection of the first probability matrix and the second probability matrix to obtain an alignment matrix; and obtaining the word alignment results between the source and target sentences based on the alignment matrix.

[0161] The cross-language encoding features corresponding to the source statement can be represented as s={s1,s2,…,s i ,…,s I The cross-language encoding features corresponding to the target statement can be represented as t = {t1, t2, ..., t}. j ,…,t J} Calculating the dot product of s and t yields the similarity matrix S. I×J .

[0162] Furthermore, softmax normalization can be applied to the similarity matrix S I×J Normalize by row and column respectively to generate the first probability matrix from source statement to target statement. And the second probability matrix from the target statement to the source statement.

[0163] Furthermore, it can be determined according to formula G. ij =(P ij f >τ)*(P ij b >τ), can be applied to the first probability matrix Second probability matrix Taking the intersection, we obtain the alignment matrix G. I×J Where τ is a preset threshold, and P ij f The first probability matrix Middle element, P ij b The second probability matrix Medium element. G ij For the alignment matrix G I×J Alignment information in P. * indicates AND, if P ij f >and P ij b >, then G ij G equals 1, otherwise, G ij Equals 0. G ij An equal value of 1 indicates the first probability matrix. Second probability matrix Align the elements in the i-th row and j-th column. Furthermore, based on the alignment matrix G... I×J This yields the word alignment result between the source and target sentences; that is, if two words in the source and target sentences are aligned in the alignment matrix G... I×J If all corresponding elements are 1, then the two words are aligned.

[0164] Furthermore, in this scenario, the alignment information (i.e., the alignment matrix G) between the source and target statements is extracted by applying a "post-trained cross-alignment model". I×J ), and further based on the alignment information (i.e., the alignment matrix G) I×J This function obtains the word alignment results between the source and target statements, indicating whether the words are aligned.

[0165] Furthermore, in this scenario, by using the two-stage training method described above in this application to obtain a trained cross-alignment model for word alignment, the cross-alignment model can further improve the overall word alignment accuracy when implementing the cross-alignment processing task in the word alignment method of the aforementioned embodiments.

[0166] Specifically: words in the source sentence sample and the target sentence sample are randomly masked to obtain a masked source sentence and a masked target sentence; using the masked source sentence and the masked target sentence, a pre-set cross-alignment model is trained using cross-language masking to obtain a pre-trained cross-alignment model, which includes a cross-fusion module for cross-fusion processing; one layer in the cross-fusion module is selected as the alignment layer, and the alignment layer in the pre-trained cross-alignment model is subjected to self-supervised alignment training until a predetermined training condition is met; a trained cross-alignment model is formed based on the alignment layer and the previous layers in the pre-trained cross-alignment model, and the trained cross-alignment model is used in the word alignment method of any of the aforementioned embodiments.

[0167] The source sentence sample and the target sentence sample can be two sentence samples with the same semantics but different languages. For example, the source sentence sample can be a Chinese sentence, and the target sentence sample can be an English sentence translated from the Chinese sentence. The source sentence sample and the target sentence sample form a sentence sample pair used for model training. It can be understood that there can be multiple sentence sample pairs, such as 1 million sentence sample pairs.

[0168] In the first stage of training, a pre-trained cross-alignment model is trained through the Translation Language Modeling (TLM) task, enabling the model to learn cross-language representations first.

[0169] Specifically, the words in the source and target sentence samples are randomly masked. For example, 15% of the words (tokens) in the source and target sentence samples can be randomly selected. For each selected word (token), it is replaced with a mask ([MASK]token) with an 80% probability, replaced with a random word (random token) with a 10% probability, and the original word (token) is retained with a 10% probability. Thus, a masked source sentence and a masked target sentence containing the mask ([MASK]token) are obtained.

[0170] Using a mask source statement and a mask target statement, a pre-set cross-alignment model is trained for cross-language masking. Specifically, the pre-set cross-alignment model is trained to predict the original mask word corresponding to the mask ([MASK] token) based on the cross-language bilingual context between the mask source statement and the mask target statement. The "training objective" of the first stage is optimized to optimize the parameters in the pre-set cross-alignment model until the predetermined training conditions of the first stage are met (such as the prediction accuracy of the original mask word being higher than the predetermined accuracy). At this point, the first stage of the cross-language masking language model (TLM) task can be completed, and the pre-trained cross-alignment model is obtained.

[0171] The pre-trained cross-alignment model may include a cross-fusion module for cross-fusion processing, and may also include a source sentence encoding module and a target sentence encoding module for encoding. The training objective of the first stage can be expressed as: in, For masking source statements, Let z be the source statement sample, k be the target statement sample, and θ be the target statement sample. s and θ c θ is a parameter in the preset language model. s θ can represent the parameters of the cross-fusion module. cThis can represent the parameters of the source sentence encoding module and the target sentence encoding module. Specifically, the cross-fusion module in the pre-trained cross-alignment model can include at least one layer (X layers, where X layers are greater than or equal to n layers in the aforementioned trained cross-alignment model) of cascaded cross-attention units, the source sentence encoding module can include at least one layer of cascaded self-attention units, and the target sentence encoding module can include at least one layer of cascaded self-attention units.

[0172] In the second stage, based on the self-supervised alignment (SSA) task, the alignment layers in the pre-trained cross-alignment model trained in the first stage are subjected to self-supervised alignment training. That is, in the second stage, only the alignment layers in the model are fine-tuned while other layers are frozen. Further fine-tuning of the alignment layers in the model based on the self-supervised alignment task in the second stage ensures consistency between the training and inference goals. In related training schemes, a pre-trained cross-alignment model is trained through a cross-language masking language model task in the first stage. However, in the second stage of inference, sentence pairs (i.e., sentence pairs consisting of source and target sentence samples) are encoded separately to train the model, ensuring consistency between the training and inference goals. Through this two-stage training embodiment, the training and inference goals are aligned. Compared to the post-trained cross-alignment model obtained by related techniques, this model effectively improves word alignment accuracy.

[0173] In this process, one layer of the cross-fusion module is selected as the alignment layer. For example, the cross-fusion module may include 5 cascaded cross-attention units, and one of the 5 cross-attention units can be selected as the alignment layer.

[0174] The trained cross-alignment model is formed based on the alignment layer and preceding layers in the pre-trained cross-alignment model. For example, the pre-trained cross-alignment model includes 12 layers: layers 1 to 6 are cascaded self-attention units, and layers 7 to 12 are cascaded cross-attention units. If the alignment layer is the c=10th layer of cross-attention units, the trained cross-alignment model can be formed based on the cascaded self-attention units of layers 1 to 6 and the cascaded cross-attention units of layers 7 to 10.

[0175] Furthermore, each layer in the cross-fusion module can be selected sequentially as an alignment layer to perform the second-stage task. Then, the alignment effect of the cross-alignment model obtained when each layer in the cross-fusion module is used as an alignment layer for word alignment is compared, and the cross-alignment model with the best alignment effect after training is selected as the final cross-alignment model after training.

[0176] This trained cross-alignment model is used according to the aforementioned embodiments (e.g.) Figure 2When performing word alignment using the word alignment method in the corresponding embodiment, the cross-alignment model can: encode the source sentence to obtain source sentence encoding features; encode the target sentence to obtain target sentence encoding features; and perform cross-fusion processing on the source sentence encoding features and the target sentence encoding features to obtain cross-language encoding features corresponding to the source sentence and the target sentence.

[0177] Furthermore, performing self-supervised alignment training on the alignment layer in the pre-trained cross-alignment model until predetermined training conditions are met can include: preprocessing the source and target sentence samples using the pre-trained cross-alignment model to extract word alignment labels based on the features output by the alignment layer during preprocessing; using the pre-trained cross-alignment model as the current model to reprocess the source and target sentence samples to extract the probability matrix between the source and target sentence samples based on the features output by the alignment layer during reprocessing; and optimizing the parameters of the alignment layer in the pre-trained cross-alignment model using a predetermined loss function based on the word alignment labels and the probability matrix until predetermined training conditions are met.

[0178] Specifically, in the second stage, the source sentence samples and the target sentence samples are input into the pre-trained cross-alignment model trained in the first stage. The pre-trained cross-alignment model can preprocess the source sentence samples and the target sentence samples based on the model parameters trained in the first stage, and extract word alignment labels based on the features output by the alignment layer during the preprocessing process. That is, the alignment information generated by the alignment layer in the model trained in the first stage is used as the word alignment labels.

[0179] The alignment layer can be a single cross-attention unit. During preprocessing, the features output by this cross-attention unit can include the cross-attention fusion feature D1 corresponding to the source sentence sample and the cross-attention fusion feature D2 corresponding to the target sentence sample. Taking the dot product of D1 and D2 yields the sample similarity matrix S1′. I×J Softmax normalization can be applied to S1′ I×J Normalize by row and column respectively to generate the first sample probability matrix from the source statement sample to the target statement sample. And the second sample probability matrix from the target statement sample to the source statement sample Furthermore, it can be done according to the formula The probability matrix of the first sample can be... Second sample probability matrix Taking the intersection, we obtain the sample alignment matrix G′. I×J Where τ is a preset threshold, and P ij f′ for Middle element, P ij b′ for middle element, G′ ij For G′ I×J Alignment information in P, * indicates AND, if P ij f′ >and P ij b′ >τ, then G′ ij Equals 1, otherwise, G′ ij G' equals 0 ij An equal to 1 means and Align the elements in the i-th row and j-th column.

[0180] The extracted word alignment labels are the sample alignment matrix G′ I×J Furthermore, this enables the use of alignment information (i.e., the sample alignment matrix G′) generated by the alignment layer in the model trained in the first stage. I×J () as a word alignment tag.

[0181] Furthermore, by using the pre-trained cross-alignment model as the current model, the source and target sentence samples are reprocessed. Based on the features output by the alignment layer during reprocessing, a probability matrix between the source and target sentence samples can be extracted. This probability matrix can include a third-sample probability matrix from the source to the target sentence sample. And the fourth sample probability matrix from the target statement sample to the source statement sample

[0182] During further processing, the features output by the cross-attention unit can also include the cross-attention fusion feature D3 corresponding to the source sentence sample and the cross-attention fusion feature D4 corresponding to the target sentence sample. Taking the dot product of D3 and D4 yields the sample similarity matrix S2′. I×J Softmax normalization can be applied to S2′ I×J Normalize by row and column respectively, and generate the third sample probability matrix from the source statement sample to the target statement sample. And the fourth sample probability matrix from the target statement sample to the source statement sample

[0183] Using a predetermined loss function, labels G′ can be aligned based on words. I×J and probability matrix The alignment objective in the second stage is optimized, thereby optimizing the parameters of the alignment layer in the pre-trained cross-alignment model. This involves freezing all layers except the alignment layer in the current model and fine-tuning only the parameters in the alignment layer until the predetermined training conditions of the second stage are met (these conditions can be set according to the actual situation), thus completing the self-supervised alignment task in the second stage.

[0184] The predetermined loss function is the cross-entropy loss function, which can be used based on word alignment labels G′. I×J and probability matrix Calculate the cross-entropy loss; the alignment objective can be optimized by minimizing the cross-entropy loss. in, for The element in the i-th row and j-th column, for The element in the i-th row and j-th column, G′ ij For G′ I×J The element in the i-th row and j-th column.

[0185] In this way, in this scenario, by applying the foregoing embodiments of this application, at least the following beneficial effects are achieved: on the one hand, the word alignment results obtained by fully considering the cross-linguistic context between parallel sentence pairs (i.e., source sentences and target sentences) effectively improve word alignment accuracy; on the other hand, through two-stage model training, the training and inference objectives can be aligned, and the word alignment accuracy can be further improved by using the cross-alignment model for word alignment after training.

[0186] In this scenario, the alignment error rate (AER) is used as the standard evaluation metric for word alignment. The alignment error rate (AER) of alignment A is calculated using the following formula:

[0187] Where S is the defined alignment and P is the possible alignment.

[0188] We conducted word alignment experiments on five publicly available datasets for this scenario, including German-English (De-En), English-French (En-Fr), Romanian-English (Ro-En), Chinese-English (Zh-En), and Japanese-English (Ja-En). The experimental results are shown in the table below:

[0189] method De→En En→Fr Ro→En Zh→En Ja→En First Comparison Scheme 18.8 7.6 27.2 21.6 46.6 Second comparison scheme 15.6 4.4 23.0 12.9 38.4 Solution in this scenario 13.6 3.4 20.9 10.1 35.4

[0190] The first contrasting scheme encodes the source and target sentences separately during word alignment, without considering the context of the other sentence. The second contrasting scheme, using an alignment model for word alignment, considers cross-linguistic context as input during training, but still encodes them separately during inference, resulting in inconsistent training and inference objectives.

[0191] Experiments show that, in this scenario, word alignment using the scheme described in this application has the lowest alignment error rate compared to the first and second control schemes. The embodiments described in this application demonstrate good performance in word alignment.

[0192] To facilitate better implementation of the word alignment method provided in the embodiments of this application, the embodiments of this application also provide a word alignment device based on the above-described word alignment method. The meanings of the nouns are the same as in the above-described word alignment method; for specific implementation details, please refer to the description in the method embodiments. Figure 5 A block diagram of a word alignment device according to an embodiment of this application is shown.

[0193] like Figure 5 As shown, the word alignment device 600 may include a sentence acquisition module 610, a source sentence encoding module 620, a target sentence encoding module 630, a cross-fusion module 640, and an alignment processing module 650.

[0194] The statement acquisition module 610 can be used to acquire a source statement and a target statement corresponding to the source statement; the source statement encoding module 620 can be used to encode the source statement to obtain source statement encoding features; the target statement encoding module 630 can be used to encode the target statement to obtain target statement encoding features; the cross-fusion module 640 can be used to cross-fusion the source statement encoding features and the target statement encoding features to obtain cross-language encoding features corresponding to the source statement and the target statement; the alignment processing module 650 can be used to perform alignment processing based on the cross-language encoding features corresponding to the source statement and the target statement to obtain word alignment results between the source statement and the target statement.

[0195] In some embodiments of this application, the cross-fusion module includes: at least one layer of cross-attention unit, used to perform at least one layer of cross-attention fusion processing on the source statement encoding features and the target statement encoding features to obtain cross-attention fusion features corresponding to the source statement and the target statement generated at each layer; and a cross-language encoding feature extraction unit, used to obtain cross-language encoding features corresponding to the source statement and the target statement based on the cross-attention fusion features corresponding to the source statement and the target statement.

[0196] In some embodiments of this application, the at least one layer of cross-attention unit is configured to: calculate a first key feature, a first value feature, and a first query feature based on source features at each layer, wherein the source feature at the first layer is the source statement encoding feature, and the source feature at layers after the first layer is the cross-attention fusion feature corresponding to the source statement generated at the previous layer; calculate a second key feature, a second value feature, and a second query feature based on target features, wherein the target feature at the first layer is the target statement encoding feature, and the target feature at layers after the first layer is the cross-attention fusion feature corresponding to the target statement generated at the previous layer; perform attention operations based on the first query feature, the second key feature, and the second value feature to obtain a first operation result, and perform attention operations based on the second query feature, the first value feature, and the first key feature to obtain a second operation result; fuse the first operation result with the source features to generate the cross-attention fusion feature corresponding to the source statement at each layer; and fuse the second operation result with the target features to generate the cross-attention fusion feature corresponding to the target statement at each layer.

[0197] In some embodiments of this application, the at least one layer of cross-attention unit is configured to: fuse the first operation result with the source feature to obtain a first fusion result; normalize the first fusion result to obtain a first normalized result; perform a fully connected operation on the first normalized result to obtain a first fully connected operation result; combine the first fully connected operation result with the first normalized result to obtain a first combined result; and normalize the first combined result to obtain the cross-attention fusion feature corresponding to the source statement generated by each layer.

[0198] In some embodiments of this application, the at least one layer of cross-attention unit is used to: fuse the second operation result with the target feature to obtain a second fusion result; normalize the second fusion result to obtain a second normalized result; perform a fully connected operation on the second normalized result to obtain a second fully connected operation result; combine the second fully connected operation result with the second normalized result to obtain a second combined result; and normalize the second combined result to obtain the cross-attention fusion feature corresponding to the target statement generated by each layer.

[0199] In some embodiments of this application, the source statement encoding module includes: at least one cascaded self-attention unit, used to perform at least one layer of self-attention encoding on the source statement to obtain attention encoding features of the source statement generated at each layer; and a source statement encoding feature extraction unit, used to obtain the source statement encoding features based on the attention encoding features corresponding to the source statement.

[0200] In some embodiments of this application, the source statement encoding module includes at least one cascaded self-attention unit, configured to: in each layer, calculate a third key feature, a third value feature, and a third query feature based on a first feature, wherein in the first layer, the first feature is the initial sentence vector of the source statement, and in layers after the first layer, the first feature is the attention encoding feature of the source statement generated in the previous layer; perform attention operations on the third key feature, the third value feature, and the third query feature to obtain a third operation result; fuse the third operation result with the first feature to obtain a third fusion result; and normalize the third fusion result to obtain the attention encoding feature of the source statement generated in each layer.

[0201] In some embodiments of this application, the target statement encoding module includes: at least one layer of cascaded self-attention units, used to perform at least one layer of self-attention encoding on the target statement to obtain attention encoding features of the target statement generated at each layer; and a target statement encoding feature extraction unit, used to obtain the target statement encoding features based on the attention encoding features corresponding to the target statement.

[0202] In some embodiments of this application, the target statement encoding module includes at least one cascaded self-attention unit, which is used in each layer to calculate a fourth key feature, a fourth value feature, and a fourth query feature based on a second feature; perform attention operations on the fourth key feature, the fourth value feature, and the fourth query feature to obtain a fourth operation result; fuse the fourth operation result with the second feature to obtain a fourth fusion result; and normalize the fourth fusion result to obtain the attention encoding feature of the target statement generated in each layer.

[0203] In some embodiments of this application, the alignment processing module is configured to: calculate the dot product of the cross-language encoding features corresponding to the source statement and the target statement to obtain a similarity matrix; generate a first probability matrix from the source statement to the target statement and a second probability matrix from the target statement to the source statement based on the similarity matrix; take the intersection of the first probability matrix and the second probability matrix to obtain an alignment matrix; and obtain the word alignment result between the source statement and the target statement based on the alignment matrix.

[0204] Figure 6 A block diagram of a word alignment device according to another embodiment of this application is shown.

[0205] like Figure 6 As shown, the word alignment device 700 may include a masking module 710, a cross-language masking training module 720, and a self-supervised alignment training module 730.

[0206] The masking module 710 can be used to randomly mask words in the source sentence sample and the target sentence sample to obtain a masked source sentence and a masked target sentence containing the mask; the cross-language masking training module 720 can be used to perform cross-language masking training on a preset cross-alignment model using the masked source sentence and the masked target sentence to obtain a pre-trained cross-alignment model, wherein the pre-trained cross-alignment model includes a cross-fusion module for cross-fusion processing; the self-supervised alignment training module 730 can be used to select one layer in the cross-fusion module as an alignment layer, perform self-supervised alignment training on the alignment layer in the pre-trained cross-alignment model until a predetermined training condition is met, and form a trained cross-alignment model based on the alignment layer and the layers before it in the pre-trained cross-alignment model, wherein the trained cross-alignment model is used in the word alignment method described in any of the foregoing embodiments.

[0207] In some embodiments of this application, the self-supervised alignment training module is configured to: preprocess the source sentence samples and target sentence samples using the pre-trained cross-alignment model to extract word alignment labels based on the features output by the alignment layer during preprocessing; use the pre-trained cross-alignment model as the current model to reprocess the source sentence samples and target sentence samples to extract the probability matrix between the source sentence samples and the target sentence samples based on the features output by the alignment layer during reprocessing; and optimize the parameters of the alignment layer in the pre-trained cross-alignment model using a predetermined loss function, based on the word alignment labels and the probability matrix, until a predetermined training condition is met.

[0208] It should be noted that although several modules or units for the device used to perform actions have been mentioned in the detailed description above, this division is not mandatory. In fact, according to the embodiments of this application, the features and functions of two or more modules or units described above can be embodied in one module or unit. Conversely, the features and functions of one module or unit described above can be further divided and embodied by multiple modules or units.

[0209] Furthermore, embodiments of this application also provide an electronic device, which can be a terminal or a server, such as... Figure 7 As shown, it illustrates a structural schematic diagram of the electronic device involved in the embodiments of this application, specifically:

[0210] The electronic device may include components such as a processor 801 with one or more processing cores, a memory 802 with one or more computer-readable storage media, a power supply 803, and an input unit 804. Those skilled in the art will understand that... Figure 7The electronic device structure shown does not constitute a limitation on the electronic device and may include more or fewer components than shown, or combine certain components, or have different component arrangements. Wherein:

[0211] The processor 801 is the control center of the electronic device. It connects to various parts of the computer device via various interfaces and lines. By running or executing software programs and / or modules stored in the memory 802, and by calling data stored in the memory 802, it performs various functions of the computer device and processes data, thereby detecting the electronic device. Optionally, the processor 801 may include one or more processing cores; preferably, the processor 801 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user page, and application programs, and the modem processor mainly handles wireless communication. It is understood that the modem processor may not be integrated into the processor 801.

[0212] The memory 802 can be used to store software programs and modules. The processor 801 executes various functional applications and data processing by running the software programs and modules stored in the memory 802. The memory 802 may mainly include a program storage area and a data storage area. The program storage area may store the operating system, application programs required for at least one function (such as sound playback function, image playback function, etc.), etc.; the data storage area may store data created according to the use of the computer device, etc. In addition, the memory 802 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 802 may also include a memory controller to provide the processor 801 with access to the memory 802.

[0213] The electronic device also includes a power supply 803 that supplies power to the various components. Preferably, the power supply 803 can be logically connected to the processor 801 through a power management system, thereby enabling functions such as charging, discharging, and power consumption management through the power management system. The power supply 803 may also include one or more DC or AC power supplies, recharging systems, power fault detection circuits, power converters or inverters, power status indicators, and other arbitrary components.

[0214] The electronic device may also include an input unit 804, which can be used to receive input digital or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

[0215] Although not shown, the electronic device may also include a display unit, etc., which will not be described in detail here. Specifically, in this embodiment, the processor 801 in the electronic device loads the executable files corresponding to the processes of one or more computer programs into the memory 802 according to the following instructions, and the processor 801 runs the computer programs stored in the memory 802, thereby realizing the various functions of the foregoing embodiments of this application.

[0216] For example, processor 801 can perform the following operations: obtaining a source statement and a target statement corresponding to the source statement; encoding the source statement to obtain source statement encoding features; encoding the target statement to obtain target statement encoding features; performing cross-fusion processing on the source statement encoding features and the target statement encoding features to obtain cross-language encoding features corresponding to the source statement and the target statement; and performing alignment processing based on the cross-language encoding features corresponding to the source statement and the target statement to obtain word alignment results between the source statement and the target statement.

[0217] For example, processor 801 can perform the following: random masking of words in source sentence samples and target sentence samples to obtain masked source sentences and masked target sentences; using the masked source sentences and masked target sentences, perform cross-language masking training on a preset cross-alignment model to obtain a pre-trained cross-alignment model, wherein the pre-trained cross-alignment model includes a cross-fusion module for cross-fusion processing; select one layer in the cross-fusion module as an alignment layer, perform self-supervised alignment training on the alignment layer in the pre-trained cross-alignment model until a predetermined training condition is met, and form a trained cross-alignment model based on the alignment layer and the layers before it in the pre-trained cross-alignment model, wherein the trained cross-alignment model is used in the word alignment method described in any of the foregoing embodiments.

[0218] Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be performed by a computer program, or by a computer program controlling related hardware. The computer program can be stored in a computer-readable storage medium and loaded and executed by a processor.

[0219] Therefore, embodiments of this application also provide a computer-readable storage medium storing a computer program that can be loaded by a processor to perform the steps in any of the methods provided in embodiments of this application.

[0220] The computer-readable storage medium may include: read-only memory (ROM), random access memory (RAM), disk or optical disk, etc.

[0221] Since the computer program stored in the computer-readable storage medium can execute the steps in any of the methods provided in the embodiments of this application, the beneficial effects that the methods provided in the embodiments of this application can achieve can be realized. For details, please refer to the previous embodiments, which will not be repeated here.

[0222] According to one aspect of this application, a computer program product or computer program is provided, comprising computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the methods provided in the various optional implementations of the above embodiments of this application.

[0223] Other embodiments of this application will readily occur to those skilled in the art upon consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary techniques in the art not disclosed herein.

[0224] It should be understood that this application is not limited to the embodiments described above and shown in the accompanying drawings, but various modifications and changes can be made without departing from its scope.

Claims

1. A word alignment method, characterized in that, include: Obtain the source statement and the target statement corresponding to the source statement; The source statement is encoded to obtain the source statement encoding features; The target statement is encoded to obtain the target statement encoding features; The source statement encoding features and the target statement encoding features are cross-fused to obtain the cross-language encoding features corresponding to the source statement and the target statement. Alignment processing is performed based on the cross-language coding features corresponding to the source statement and the target statement to obtain the word alignment result between the source statement and the target statement. This includes: calculating the dot product of the cross-language coding features corresponding to the source statement and the target statement to obtain a similarity matrix; normalizing the similarity matrix by rows and columns to generate a first probability matrix from the source statement to the target statement and a second probability matrix from the target statement to the source statement; taking the intersection of the first probability matrix and the second probability matrix to obtain an alignment matrix; and obtaining the word alignment result between the source statement and the target statement based on the alignment matrix.

2. The method according to claim 1, characterized in that, The step of cross-fusion processing of the source statement encoding features and the target statement encoding features to obtain the cross-language encoding features corresponding to the source statement and the target statement includes: The source statement encoding features and the target statement encoding features are subjected to at least one layer of cross-attention fusion processing to obtain the cross-attention fusion features corresponding to the source statement and the target statement generated at each layer; Based on the cross-attention fusion features corresponding to the source statement and the target statement, cross-language encoding features corresponding to the source statement and the target statement are obtained.

3. The method according to claim 2, characterized in that, The step of performing at least one layer of cross-attention fusion processing on the source statement encoding features and the target statement encoding features to obtain cross-attention fusion features corresponding to the source statement and the target statement generated at each layer includes: In each layer, a first key feature, a first value feature, and a first query feature are calculated based on the source features. In the first layer, the source features are the source statement encoding features, and in the layers after the first layer, the source features are the cross-attention fusion features corresponding to the source statement generated in the previous layer. The second key feature, the second value feature, and the second query feature are calculated based on the target features. In the first layer, the target features are the target statement encoding features, and in the layers after the first layer, the target features are the cross-attention fusion features corresponding to the target statement generated in the previous layer. Attention operations are performed based on the first query feature, the second key feature, and the second value feature to obtain a first operation result; and attention operations are performed based on the second query feature, the first value feature, and the first key feature to obtain a second operation result. The first calculation result is fused with the source feature to generate cross-attention fusion features corresponding to the source statement at each layer; The second calculation result is fused with the target feature to generate cross-attention fusion features corresponding to the target statement at each layer.

4. The method according to claim 3, characterized in that, The step of fusing the first computation result with the source features to generate cross-attention fusion features corresponding to the source statement at each layer includes: The first calculation result is fused with the source feature to obtain a first fusion result; The first fusion result is normalized to obtain the first normalized result; Perform a full-connect operation on the first normalized result to obtain the first full-connect operation result; The first fully connected operation result is combined with the first normalization result to obtain the first combined result; The first combination result is normalized to obtain the cross-attention fusion feature corresponding to the source statement generated at each layer.

5. The method according to claim 3, characterized in that, The step of fusing the second calculation result with the target feature to generate cross-attention fusion features corresponding to the target statement at each layer includes: The second calculation result is fused with the target feature to obtain a second fusion result; The second fusion result is normalized to obtain the second normalized result; Perform a fully connected operation on the second normalized result to obtain the second fully connected operation result; The result of the second fully connected operation is combined with the result of the second normalization to obtain the second combined result; The second combination result is normalized to obtain the cross-attention fusion feature corresponding to the target statement generated at each layer.

6. The method according to claim 1, characterized in that, The process of encoding the source statement to obtain source statement encoding features includes: The source statement is subjected to at least one layer of self-attention encoding to obtain the attention encoding features of the source statement generated at each layer; Based on the attention encoding features corresponding to the source statement, the source statement encoding features are obtained.

7. The method according to claim 6, characterized in that, The step of performing at least one layer of self-attention encoding on the source statement to obtain the attention encoding features of the source statement generated at each layer includes: In each layer, a third key feature, a third value feature, and a third query feature are calculated based on a first feature. In the first layer, the first feature is the initial sentence vector of the source statement, and in layers after the first layer, the first feature is the attention encoding feature of the source statement generated in the previous layer. An attention operation is performed on the third key feature, the third value feature, and the third query feature to obtain the third operation result; The third calculation result is fused with the first feature to obtain a third fusion result; The third fusion result is normalized to obtain the attention encoding features of the source statement generated at each layer.

8. The method according to claim 1, characterized in that, The process of encoding the target statement to obtain its encoding features includes: The target statement is subjected to at least one layer of self-attention encoding to obtain the attention encoding features of the target statement generated at each layer; Based on the attention encoding features corresponding to the target statement, the encoding features of the target statement are obtained.

9. The method according to claim 8, characterized in that, The step of performing at least one layer of self-attention encoding on the target statement to obtain the attention encoding features of the target statement generated at each layer includes: In each of these layers, a fourth key feature, a fourth value feature, and a fourth query feature are calculated based on the second feature; Attention operations are performed on the fourth key feature, the fourth value feature, and the fourth query feature to obtain the fourth operation result; The fourth operation result is fused with the second feature to obtain the fourth fusion result; The fourth fusion result is normalized to obtain the attention encoding features of the target statement generated at each layer.

10. A word alignment method, characterized in that, include: The words in the source sentence sample and the target sentence sample are randomly masked to obtain the masked source sentence and the masked target sentence. Using the mask source statement and mask target statement, a pre-set cross-alignment model is trained using cross-language masking to obtain a pre-trained cross-alignment model, which includes a cross-fusion module for cross-fusion processing. Select one layer from the cross-fusion module as the alignment layer, perform self-supervised alignment training on the alignment layer in the pre-trained cross-alignment model until the predetermined training conditions are met, and form a post-trained cross-alignment model based on the alignment layer and the previous layers in the pre-trained cross-alignment model. The post-trained cross-alignment model is used in the word alignment method according to any one of claims 1 to 9.

11. The method according to claim 10, characterized in that, The step of performing self-supervised alignment training on the alignment layer in the pre-trained cross-alignment model until predetermined training conditions are met includes: The pre-trained cross-alignment model is used to preprocess the source sentence samples and the target sentence samples, so as to extract word alignment labels based on the features output by the alignment layer during the preprocessing process; Using the pre-trained cross-alignment model as the current model, the source sentence sample and the target sentence sample are reprocessed to extract the probability matrix between the source sentence sample and the target sentence sample based on the features output by the alignment layer during the reprocessing process. Using a predetermined loss function, the parameters of the alignment layer in the pre-trained cross-alignment model are optimized based on the word alignment labels and the probability matrix until the predetermined training conditions are met.

12. A word alignment device, characterized in that, include: The statement acquisition module is used to acquire the source statement and the target statement corresponding to the source statement; The source statement encoding module is used to encode the source statement to obtain the source statement encoding features; The target statement encoding module is used to encode the target statement to obtain the target statement encoding features; The cross-fusion module is used to perform cross-fusion processing on the source statement encoding features and the target statement encoding features to obtain cross-language encoding features corresponding to the source statement and the target statement; An alignment processing module is used to perform alignment processing based on the cross-language encoding features corresponding to the source statement and the target statement to obtain word alignment results between the source statement and the target statement. This includes: calculating the dot product of the cross-language encoding features corresponding to the source statement and the target statement to obtain a similarity matrix; normalizing the similarity matrix by rows and columns to generate a first probability matrix from the source statement to the target statement and a second probability matrix from the target statement to the source statement; taking the intersection of the first probability matrix and the second probability matrix to obtain an alignment matrix; and obtaining the word alignment results between the source statement and the target statement based on the alignment matrix.

13. The apparatus according to claim 12, characterized in that, The cross-fusion module includes: At least one layer of cross-attention unit is used to perform at least one layer of cross-attention fusion processing on the source statement encoding features and the target statement encoding features to obtain cross-attention fusion features corresponding to the source statement and the target statement generated at each layer; The cross-language coding feature extraction unit is used to obtain the cross-language coding features corresponding to the source sentence and the target sentence based on the cross-attention fusion features corresponding to the source sentence and the target sentence.

14. The apparatus according to claim 13, characterized in that, The at least one layer of cross-attention unit is configured to: in each layer, calculate a first key feature, a first value feature, and a first query feature based on source features, wherein the source feature in the first layer is the source statement encoding feature, and the source feature in layers after the first layer is the cross-attention fusion feature corresponding to the source statement generated in the previous layer; calculate a second key feature, a second value feature, and a second query feature based on target features, wherein the target feature in the first layer is the target statement encoding feature, and the target feature in layers after the first layer is the cross-attention fusion feature corresponding to the target statement generated in the previous layer; perform attention operations based on the first query feature, the second key feature, and the second value feature to obtain a first operation result, and perform attention operations based on the second query feature, the first value feature, and the first key feature to obtain a second operation result; fuse the first operation result with the source features to generate the cross-attention fusion feature corresponding to the source statement in each layer; and fuse the second operation result with the target features to generate the cross-attention fusion feature corresponding to the target statement in each layer.

15. The apparatus according to claim 14, characterized in that, The at least one layer of cross-attention unit is used to: fuse the first operation result with the source feature to obtain a first fusion result; and normalize the first fusion result to obtain a first normalized result. Perform a full-connect operation on the first normalized result to obtain the first full-connect operation result; The first fully connected operation result is combined with the first normalized result to obtain the first combined result; the first combined result is normalized to obtain the cross-attention fusion feature corresponding to the source statement generated at each layer.

16. The apparatus according to claim 14, characterized in that, The at least one layer of cross-attention unit is used to: fuse the second operation result with the target feature to obtain a second fusion result; The second fusion result is normalized to obtain the second normalized result; Perform a fully connected operation on the second normalized result to obtain the second fully connected operation result; The result of the second fully connected operation is combined with the result of the second normalization to obtain the second combined result; The second combination result is normalized to obtain the cross-attention fusion feature corresponding to the target statement generated at each layer.

17. The apparatus according to claim 12, characterized in that, The source statement encoding module includes: At least one layer of cascaded self-attention units are used to perform at least one layer of self-attention encoding on the source statement to obtain the attention encoding features of the source statement generated at each layer; The source statement encoding feature extraction unit is used to obtain the source statement encoding features based on the attention encoding features corresponding to the source statement.

18. The apparatus according to claim 17, characterized in that, The source statement encoding module includes at least one cascaded self-attention unit, used for: calculating a third key feature, a third value feature, and a third query feature based on a first feature in each layer, wherein the first feature in the first layer is the initial sentence vector of the source statement, and the first feature in layers after the first layer is the attention encoding feature of the source statement generated in the previous layer; performing attention operations on the third key feature, the third value feature, and the third query feature to obtain a third operation result; fusing the third operation result with the first feature to obtain a third fusion result; and normalizing the third fusion result to obtain the attention encoding feature of the source statement generated in each layer.

19. The apparatus according to claim 12, characterized in that, The target statement encoding module includes: At least one layer of cascaded self-attention units are used to perform at least one layer of self-attention encoding on the target statement to obtain the attention encoding features of the target statement generated at each layer; The target statement encoding feature extraction unit is used to obtain the target statement encoding features based on the attention encoding features corresponding to the target statement.

20. The apparatus according to claim 19, characterized in that, The target statement encoding module has at least one cascaded self-attention unit, which is used in each layer to calculate the fourth key feature, the fourth value feature and the fourth query feature based on the second feature; Attention operations are performed on the fourth key feature, the fourth value feature, and the fourth query feature to obtain a fourth operation result; the fourth operation result is then fused with the second feature to obtain a fourth fusion result. The fourth fusion result is normalized to obtain the attention encoding features of the target statement generated at each layer.

21. A word alignment device, characterized in that, The device includes: The masking module is used to randomly mask the words in the source sentence sample and the target sentence sample to obtain the masked source sentence and the masked target sentence. A cross-language masking training module is used to train a preset cross-alignment model using the masking source statement and the masking target statement to obtain a pre-trained cross-alignment model. The pre-trained cross-alignment model includes a cross-fusion module for cross-fusion processing. A self-supervised alignment training module is used to select one layer in the cross-fusion module as an alignment layer, perform self-supervised alignment training on the alignment layer in the pre-trained cross-alignment model until a predetermined training condition is met, and form a post-trained cross-alignment model based on the alignment layer and the previous layers in the pre-trained cross-alignment model. The post-trained cross-alignment model is used in the word alignment device according to any one of claims 12 to 20.

22. The apparatus according to claim 21, characterized in that, The self-supervised alignment training module is used to: preprocess the source sentence sample and the target sentence sample using the pre-trained cross-alignment model, so as to extract word alignment labels based on the features output by the alignment layer during the preprocessing process; Using the pre-trained cross-alignment model as the current model, the source sentence samples and target sentence samples are reprocessed to extract the probability matrix between the source sentence samples and the target sentence samples based on the features output by the alignment layer during the reprocessing process; a predetermined loss function is used to optimize the parameters of the alignment layer in the pre-trained cross-alignment model according to the word alignment labels and the probability matrix until the predetermined training conditions are met.

23. A computer-readable storage medium, characterized in that, It stores a computer program that, when executed by the computer's processor, causes the computer to perform the method described in any one of claims 1 to 11.

24. An electronic device, characterized in that, include: Memory, which stores computer programs; A processor reads a computer program stored in memory to perform the method described in any one of claims 1 to 11.

25. A computer program product, characterized in that, The computer program product includes a computer program that, when executed by a processor, implements the method of any one of claims 1 to 11.