A neural machine translation method and system for unknown structures

By constructing a parallel bilingual dictionary to identify and replace unknown structures, the problem of insufficient unknown structure processing capability of neural machine translation models in low-resource scenarios is solved, thereby improving translation quality and robustness.

CN122197912APending Publication Date: 2026-06-12SUZHOU UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SUZHOU UNIV
Filing Date
2026-02-28
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing data augmentation methods cannot actively identify and supplement unknown structures in the translation model during training. The quality of pseudo data is unstable and contains a lot of noise, which leads to a decline in the translation quality of neural machine translation models in low-resource scenarios.

Method used

By constructing a parallel bilingual dictionary, we can identify and replace unknown structures in the source sentences with spliced ​​forms of target language structures, generate a set of candidate sentence pairs, adjust the parameters of the neural translation model, and utilize the high-quality parallel bilingual dictionary to provide interpretable bilingual structure mapping information, thereby enhancing the model's ability to understand unknown structures.

Benefits of technology

It significantly improves the translation quality and robustness of neural machine translation models in low-resource scenarios, reduces noise, enhances the model's ability to translate unknown structures, and avoids mislearning and training instability.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122197912A_ABST
    Figure CN122197912A_ABST
Patent Text Reader

Abstract

The application relates to the technical field of natural language processing, in particular to a neural machine translation method and system for unknown structures. A word alignment operation is performed on a pretreated training set to obtain a word alignment result; a parallel bilingual dictionary is constructed based on the word alignment result; candidate sentence pairs are generated based on the parallel bilingual dictionary; parameters of a trained neural translation model are adjusted by using a candidate sentence pair set to obtain a target neural translation model; and it is judged whether a language structure that does not appear in the parallel bilingual dictionary exists in a source end sentence to be translated. If the language structure exists, one or more unknown structures in the source end sentence to be translated are replaced by a splicing form of the unknown structure and a corresponding target end language structure of the unknown structure in a bilingual parallel corpus, an enhanced source end sentence of the source end sentence to be translated is generated, and a target end sentence is output. The application effectively improves the translation quality of neural machine translation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of natural language processing technology, and in particular to a neural machine translation method and system for unknown structures. Background Technology

[0002] Neural machine translation has become the mainstream technology in machine translation. Its performance heavily relies on large-scale, high-quality bilingual parallel corpora, and it has surpassed traditional statistical machine translation in resource-rich language pairs such as Chinese-English and German-English. However, large-scale parallel data is lacking for thousands of languages ​​worldwide, while deep learning models require massive amounts of data to effectively learn the mapping patterns between languages. Therefore, standard neural machine translation models show a significant decline in performance in these low-resource language scenarios, even falling short of traditional models. Low-resource neural machine translation is a research direction that emerged to address the core challenge of scarce training data. Since neural machine translation became a mainstream technology, this field has experienced rapid development, with various innovative methods emerging to address the data bottleneck.

[0003] Specifically, multilingual neural machine translation achieves knowledge sharing and transfer by jointly training a single model on multiple language pairs. The basic principle of this approach is to force the model to create a shared semantic representation space across different languages, thereby enabling positive cross-lingual parameter transfer. For low-resource language pairs, joint training with high-resource language pairs can effectively improve translation performance. Furthermore, multilingual models have enabled zero-shot translation, allowing translation between two languages ​​that were not paired during training, even without direct parallel data. Multilingual neural machine translation models rely on large-scale, high-quality bilingual parallel corpora. Therefore, data augmentation techniques are introduced in low-resource language scenarios, becoming an important supplementary solution to alleviate the scarcity of low-resource data.

[0004] Data augmentation is a direct and effective means of addressing the problem of data scarcity, with back-translation being the most representative technique. This involves using monolingual data from the target language side to generate source sentences through a back-translation model, thus obtaining pseudo-parallel data. This method can significantly expand the amount of training data and effectively improve the performance of neural machine translation models. To address the issues of low initial back-translation model quality and poor pseudo-data generation, researchers have further proposed iterative back-translation techniques. Through multiple iterations, the forward and back-translation models are continuously optimized, gradually improving the quality of pseudo-parallel data.

[0005] Existing data augmentation methods can expand the scale of parallel data without increasing manual costs, making them widely used in low-resource scenarios. However, they also have significant limitations. Since back-translation models are often trained on scarce data, their inherent performance is limited, leading to unstable quality and high noise levels in the generated pseudo-data. Furthermore, these methods only expand the data quantity; they cannot actively identify and supplement specific syntactic and phrase structures that the translation model struggles to learn or has never encountered during training—i.e., unknown structures. Because the augmentation process is decoupled from the model's inherent limitations, structures that neural machine translation models are not adept at handling remain scarce in the augmented data, preventing targeted optimization. Moreover, existing data augmentation methods commonly use pseudo-data obtained through random insertion, random replacement, or back-translation; however, these pseudo-sentence pairs often suffer from semantic biases, word order errors, and even inconsistencies with target language expression habits. This results in a high noise ratio in the augmented training set, causing neural machine translation models to mislearn knowledge during training, leading to instability and even performance degradation. In low-resource scenarios, the amount of raw data is already limited, and the negative impact of noise in pseudo-data will be further amplified, ultimately leading to a high error rate and weak generalization ability in neural machine translation models. Summary of the Invention

[0006] Therefore, the technical problem to be solved by the present invention is to overcome the shortcomings of existing data augmentation methods, which cannot actively identify and supplement the unknown structure of the translation model during the training process, and the unstable quality and high noise of pseudo data, resulting in a high noise ratio and semantic bias in the augmented training set, thereby reducing the translation quality of the neural machine translation model.

[0007] To address the aforementioned technical problems, this invention provides a neural machine translation method for unknown structures, comprising: The neural translation model is trained using the preprocessed training set to obtain a trained neural translation model; the training set and validation set include multiple source sentences and their corresponding target sentences; After segmenting the preprocessed training set, word alignment is performed to obtain word alignment results; based on the word alignment results, a parallel bilingual dictionary is constructed. The language structures existing in the parallel bilingual dictionary in the source sentence are used as candidate language structures for the source sentence; the language structures include phrases and words. For each candidate language structure, it is replaced with a concatenation of the candidate language structure and the corresponding target language structure in the parallel bilingual dictionary to generate a corresponding candidate source statement; each candidate source statement is combined with its corresponding target statement to obtain a set of candidate sentence pairs. The parameters of the trained neural translation model are adjusted using the candidate sentence set to obtain the target neural translation model; Extract source language structures whose lengths meet the preset range from the source sentences to be translated to obtain a set of candidate language structures; Determine whether there are any language structures in the source sentence to be translated that are not found in the parallel bilingual dictionary. If not, input the source sentence to be translated into the target neural translation model and output the target sentence of the source sentence to be translated. If it exists, the candidate language structure that does not appear in the parallel bilingual dictionary is identified as an unknown structure; one or more unknown structures are selected from the source sentence to be translated, and their corresponding target language structures are selected from the bilingual parallel corpus. The unknown structures are replaced with a concatenation of the unknown structure and the corresponding target language structure to generate an enhanced source sentence to be translated; the enhanced source sentence is input into the target neural translation model, and the target sentence to be translated is output.

[0008] Preferably, the method for constructing a phrase dictionary based on word alignment results includes: The Moses tool was used to quantify and score the word alignment results; based on the scores of the word alignment results, the target word alignment results were selected. Extract source language structures of a preset length from the source sentences; obtain the target language structures corresponding to the source language structures based on the target word alignment results; and form bilingual pairs by combining the source language structures and their corresponding target language structures. All bilingual pairs are filtered to construct a phrase dictionary.

[0009] Preferably, the method for screening all bilingual pairs and constructing a parallel bilingual dictionary includes: Calculate the first preset evaluation index value for all bilingual pairs, filter the bilingual pairs by comparing their values ​​with the set threshold corresponding to the first preset evaluation index, and construct a parallel bilingual dictionary based on the filtered bilingual pairs.

[0010] Preferably, the first preset evaluation index is any one or more combinations of direct translation probability, inverse translation probability, frequency of occurrence of bilingual pairs in the training set, and total length of bilingual pairs; When a bilingual pair is a bilingual phrase pair, the direct translation probability is the same as the direct phrase translation probability, and the inverse translation probability is the same as the inverse phrase translation probability. When a bilingual pair is a bilingual lexical pair, the direct translation probability is weighted by the direct lexical probabilities, and the inverse translation probability is weighted by the inverse lexical probabilities.

[0011] Preferably, after obtaining the candidate sentence pair set, the method for filtering the candidate sentence pair set to obtain the target candidate sentence pair set includes: For each candidate sentence pair, the semantic similarity of the embedding vector between the candidate language structure and the corresponding target language structure in the parallel bilingual dictionary is calculated, and the first preset evaluation index value of the bilingual pair formed by the candidate language structure and the corresponding target language structure is obtained. Based on the above embedding vector semantic similarity and the first preset evaluation index value, each candidate sentence pair is screened to obtain the target candidate sentence pair set.

[0012] Preferably, the method of selecting one or more unknown structures in the source sentence to be translated and selecting the corresponding target language structure in the bilingual parallel corpus includes: Search for the target language structure corresponding to the unknown structure in the bilingual parallel corpus, and form unknown bilingual pairs by combining the unknown structure with each of its corresponding target language structures. Based on the semantic similarity of the embedding vectors between unknown bilingual pairs and the second preset evaluation index value of the corresponding unknown bilingual pairs, the target score of each unknown bilingual pair is calculated. The second preset evaluation index is any one or more combinations of the following: direct translation probability, inverse translation probability, frequency of occurrence of unknown bilingual pairs in the bilingual parallel corpus, and total length of unknown bilingual pairs; when the unknown bilingual pair is a bilingual phrase pair, the direct translation probability is the direct phrase translation probability, and the inverse translation probability is the inverse phrase translation probability; when the unknown bilingual pair is a bilingual word pair, the direct translation probability is the direct word weighted, and the inverse translation probability is the inverse word weighted. The target language structure in the unknown bilingual pair with the highest target score is selected as the target language structure in the bilingual parallel corpus for that unknown structure.

[0013] Preferably, when only one unknown structure in the source statement to be translated is selected, the selection method includes: The unknown structure in the source statement to be translated is obtained through a structure selection network.

[0014] Preferably, the neural translation model is the mBART model.

[0015] Preferably, the structure selection network is trained based on the target neural translation model, and the training process includes: Obtain the parameters of the target neural translation model and use them as the initial parameters for the structure selection network; The coding layer parameters of the target neural translation model are frozen. The training objective is to minimize the error between the unknown structure output by the structure selection network and the true unknown structure. The structure selection network is then trained to obtain a well-trained structure selection network.

[0016] The present invention also provides a neural machine translation system for unknown structures, comprising: The initial training module is used to train the neural translation model using the preprocessed training set to obtain a trained neural translation model; the training set and validation set include multiple source sentences and their corresponding target sentences; The dictionary construction module is used to perform word alignment operations on the preprocessed training set after word segmentation to obtain word alignment results; and to construct a parallel bilingual dictionary based on the word alignment results. The candidate language structure generation module is used to select language structures existing in the parallel bilingual dictionary in the source sentence as candidate language structures for the source sentence; the language structures include phrases and words. The candidate sentence pair generation module is used to replace each candidate language structure with a concatenation of the candidate language structure and the corresponding target language structure in the parallel bilingual dictionary to generate a corresponding candidate source sentence; and to combine each candidate source sentence with its corresponding target sentence to obtain a set of candidate sentence pairs. The parameter adjustment module is used to adjust the parameters of the trained neural translation model using the candidate sentence set to obtain the target neural translation model; The translation module is used to determine whether there are language structures in the source sentence to be translated that do not appear in the parallel bilingual dictionary. If they do not exist, the source sentence to be translated is input into the target neural translation model, and the target sentence to be translated is output. If it exists, the candidate language structure that does not appear in the parallel bilingual dictionary is identified as an unknown structure; one or more unknown structures are selected from the source sentence to be translated, and their corresponding target language structures are selected from the bilingual parallel corpus. The unknown structures are replaced with a concatenation of the unknown structure and the corresponding target language structure to generate an enhanced source sentence to be translated; the enhanced source sentence is input into the target neural translation model, and the target sentence to be translated is output.

[0017] Compared with the prior art, the above-described technical solution of the present invention has the following advantages: This invention discloses a neural machine translation method and system for unknown structures. The method generates candidate source sentences by replacing the source-end candidate language structure with a concatenation of the source-end language structure and the target-end language structure corresponding to a parallel bilingual dictionary. This concatenation is then used to construct a set of candidate sentence pairs for parameter fine-tuning of a trained neural translation model. Based on a high-quality parallel bilingual dictionary, the method provides interpretable bilingual structure mapping information to the model. The concatenated structure allows the target neural translation model to directly learn stable source-target language structure correspondences during training. By generating variants of candidate language structures within sentences, the model can be exposed to multiple bilingual mixed syntactic expressions within the same semantic framework, enabling the model to shift from word-to-word mapping to structure-level understanding and strengthening its ability to model the overall sentence structure and local phrase constraints. Meanwhile, this process does not require the collection of large-scale corpora or rely on complex pre-trained models. It can generate high-quality candidate sentences using only a bilingual dictionary built with limited parallel data. The generated sentence pairs have high semantic consistency and extremely low noise. When used for model fine-tuning, they will not cause mislearning or training instability. While preserving the original translation capabilities of the model, it can accurately make up for the learning shortcomings of the target neural translation model in common bilingual structures, and significantly improve the efficiency and robustness of model training in low-resource scenarios.

[0018] Before translation, this invention compares the language structure of the sentence to be translated with a parallel bilingual dictionary, explicitly defining any unrecorded language structures as unknown structures. It then retrieves the corresponding target-side language structure from the bilingual parallel corpus and constructs an enhanced source sentence by concatenating the unknown structure with the corresponding target-side structure. This enhanced source sentence is then input into the target model to complete the translation. On one hand, it does not change the overall semantics and structure of the sentence, only providing explicit hints for local structures not seen by the model, thus avoiding the introduction of redundant noise. On the other hand, by injecting bilingual mapping information of unknown structures in real time during the inference stage, the model can use local structured hints to deconstruct and reconstruct complex sentences, upgrading from fuzzy translation relying on global statistical information to translation based on structure understanding. This improves the translation quality of the target neural translation model in long sentences, complex phrase combinations, and cross-structural dependencies. Attached Figure Description

[0019] To make the content of this invention easier to understand, the invention will be further described in detail below with reference to specific embodiments and accompanying drawings, wherein:

[0020] Figure 1 This is a flowchart illustrating a neural machine translation method for unknown structures according to the present invention.

[0021] Figure 2 It is a partial Chinese-English bilingual dictionary.

[0022] Figure 3 It is an unknown structure definition.

[0023] Figure 4 This is a flowchart of the unknown structure injection process.

[0024] Figure 5 This is an example of injecting an unknown structure.

[0025] Figure 6 This is a comparison of BLEU and COMET scores for different translation directions under simulated low-resource scenarios.

[0026] Figure 7 This study compares BLEU and COMET scores for different translation directions in real-world low-resource scenarios. Detailed Implementation

[0027] The present invention will be further described below with reference to the accompanying drawings and specific embodiments, so that those skilled in the art can better understand and implement the present invention. However, the embodiments described are not intended to limit the present invention.

[0028] Reference Figure 1 As shown, this embodiment provides a neural machine translation method and system for unknown structures, including: In recent years, the rise of large-scale pre-trained language models has brought a new paradigm to low-resource neural machine translation. Models such as Qianwen, Zhipu Qingyan, and Shusheng have learned rich linguistic knowledge through pre-training on massive amounts of multilingual monolingual data. These pre-trained models can serve as encoders or decoders for neural translation models, and can then be fine-tuned for specific low-resource translation tasks, thereby significantly improving translation quality. This pre-training-fine-tuning model has become one of the mainstream methods for solving low-resource problems.

[0029] Step S1: Train the neural translation model using the preprocessed training set to obtain a trained neural translation model; the training set and validation set include multiple source sentences and their corresponding target sentences; This invention collects and organizes bilingual parallel corpora, cleans the original corpora to remove noise samples, and obtains a preprocessed training set.

[0030] Step S2: After segmenting the preprocessed training set, perform word alignment to obtain the word alignment results; based on the word alignment results, construct a parallel bilingual dictionary; like Figure 2 As shown, Figure 2 This is a partial Chinese-English bilingual dictionary.

[0031] In this embodiment, the source sentences and their corresponding target sentences in the preprocessed training set are segmented into words, and word alignment is performed to obtain word alignment results. The Moses tool was used to quantify and score the word alignment results; based on the scores of the word alignment results, the target word alignment results were selected. Extract source language structures of a preset length from the source statement, and obtain the target language structure corresponding to the source language structure based on the target word alignment result; form a bilingual pair by combining the source language structure and its corresponding target language structure; the language structure includes phrases and words.

[0032] All bilingual pairs are filtered to construct a phrase dictionary.

[0033] In this embodiment, preferably, the method for screening all bilingual pairs and constructing a parallel bilingual dictionary includes: Calculate the first preset evaluation index value for all bilingual pairs, filter the bilingual pairs by comparing their values ​​with the set threshold corresponding to the first preset evaluation index, and construct a parallel bilingual dictionary based on the filtered bilingual pairs.

[0034] The first preset evaluation index is any one or more combinations of the following: direct translation probability, inverse translation probability, frequency of occurrence of bilingual pairs in the training set, and total length of bilingual pairs; When a bilingual pair is a bilingual phrase pair, the direct translation probability is the direct phrase translation probability, and the inverse translation probability is the inverse phrase translation probability. When a bilingual pair is a bilingual lexical pair, the probability of direct translation is calculated using direct lexical weighting, and the probability of inverse translation is calculated using inverse lexical weighting.

[0035] Among them, the direct bilingual pair translation probability The calculation formula is: , Inverse bilingual pair translation probability The calculation formula is: , In the formula, For bilingual phrase pairs, source phrases For bilingual phrases, the target phrase is... This represents the number of times the bilingual phrase pair co-occurs in the training set. For bilingual phrase pairs source-end phrases The number of times it co-occurs with all its corresponding target phrases. For bilingual phrases to target phrases The number of times it co-occurs with all its corresponding source phrases. Source phrase The corresponding target phrase index, For target phrase The corresponding source phrase index.

[0036] direct lexical weighting The calculation formula is: , direct lexical weighting The calculation formula is: , In the formula, For bilingual vocabulary pairs, source vocabulary, For bilingual vocabulary matching of target vocabulary, This represents the number of times bilingual word pairs co-occur in the training set. For bilingual vocabulary, source words The number of times it co-occurs with all its corresponding target words. For bilingual vocabulary matching target vocabulary The number of times it co-occurs with all its corresponding source words. For source vocabulary The corresponding target phrase index, target vocabulary The corresponding source vocabulary index.

[0037] Total length of bilingual pairs ,in, The length of the source language structure in a bilingual pair. The length of the target language structure in bilingual pairing. For bilingual source-end language structure, The target language structure in bilingual pairing.

[0038] Step S3: Select the language structures in the source sentence that exist in the parallel bilingual dictionary as candidate language structures for the source sentence; Step S4: For each candidate language structure, replace it with the concatenation of the candidate language structure and the corresponding target language structure in the parallel bilingual dictionary to generate the corresponding candidate source statement; combine each candidate source statement with its corresponding target statement to obtain a set of candidate sentence pairs. In this embodiment, preferably, after obtaining the candidate sentence pair set, the candidate sentence pair set is filtered to obtain the target candidate sentence pair set; For each candidate sentence pair, the semantic similarity of the embedding vector between the candidate language structure and the corresponding target language structure in the parallel bilingual dictionary is calculated, and the first preset evaluation index value of the bilingual pair formed by the candidate language structure and the corresponding target language structure is obtained. Based on the above embedding vector semantic similarity and the first preset evaluation index value, each candidate sentence pair is screened to obtain the target candidate sentence pair set.

[0039] In this embodiment, optionally, the method for filtering each candidate sentence pair includes: The first preset evaluation index is selected as the probability of direct translation, the probability of inverse translation, the frequency of occurrence of bilingual pairs in the training set, and the total length of bilingual pairs; We set the weights for direct translation probability, inverse translation probability, frequency of bilingual pair occurrences in the training set, and total length of bilingual pair as follows: , , , ; Based on the first preset evaluation index value of the bilingual pair formed by the candidate language structure in each candidate sentence pair and the corresponding target language structure in the parallel bilingual dictionary, the matching degree score of the bilingual pair corresponding to each candidate sentence pair is calculated. The formula is: , in, The direct translation probability of the corresponding bilingual pair for each candidate sentence pair. The inverse translation probability of the corresponding bilingual pair for each candidate sentence pair. The frequency of occurrence of the corresponding bilingual pair in the training set for each candidate sentence pair.

[0040] The weighted sum of the semantic similarity and matching scores of the embedding vectors of each candidate sentence pair and the corresponding bilingual pair is used as the screening score for each candidate sentence pair. In this embodiment, optionally, the semantic similarity and matching scores of the embedding vectors each account for half of the screening score for each candidate sentence pair.

[0041] The formula for calculating the semantic similarity of the embedding vectors of each candidate sentence pair and its corresponding bilingual pair is as follows: , in, The semantic similarity of the embedding vectors of bilingual pairs. For cosine similarity, , These represent the embedding vectors of the source and target language structures in the bilingual pair, respectively. The embedding vectors are obtained by inputting the language structures into a neural translation model (such as mBART) and represent their semantic information.

[0042] In this embodiment, in addition to cosine similarity, Euclidean distance, Manhattan distance, Pearson correlation coefficient, etc. can also be selected to calculate the semantic similarity of the embedded vectors.

[0043] Based on the screening scores of each candidate sentence pair, the candidate sentence pairs are screened to obtain the target candidate sentence pair set.

[0044] In this embodiment, optionally, the method for filtering candidate sentence pairs based on their screening scores includes: All candidate sentence pairs are sorted in descending order of their screening scores, and the candidate sentence pairs that are ranked first and below a set threshold are selected as target candidate sentence pairs.

[0045] In this embodiment, optionally, the method for filtering candidate sentence pairs based on their screening scores includes: Set filter threshold Each candidate sentence pair is evaluated to determine whether its screening score is greater than the filtering threshold. If it is, it is selected as the target candidate sentence pair; otherwise, it is discarded.

[0046] Step S5: Use the candidate sentence set to adjust the parameters of the trained neural translation model to obtain the target neural translation model; Step S6: Extract source language structures whose lengths meet the preset range from the source sentences to be translated to obtain a set of candidate language structures; Step S7: Determine whether there are any language structures in the source sentence to be translated that are not found in the parallel bilingual dictionary. If not, input the source sentence to be translated into the target neural translation model and output the target sentence of the source sentence to be translated. Step S8: If it exists, identify the candidate language structure that does not appear in the parallel bilingual dictionary as an unknown structure; select one or more unknown structures in the source sentence to be translated, select the corresponding target language structure in the bilingual parallel corpus, replace the unknown structure with a concatenation of the unknown structure and the corresponding target language structure, and generate an enhanced source sentence to be translated; input the enhanced source sentence into the target neural translation model, and output the target sentence of the source sentence to be translated.

[0047] like Figure 3 As shown, Figure 3 This invention defines phrases and words not found in parallel bilingual dictionaries as unknown structures.

[0048] This invention utilizes the fusion of a candidate language structure set with the original training set to train a neural machine translation model, enabling the model to effectively learn and adapt to injected unknown structural forms. During training, supervised learning guides the model to update parameters for newly added structures, significantly improving its modeling and generalization capabilities for unknown syntactic structures and phrase combinations while maintaining its original translation abilities.

[0049] In this embodiment, optionally, the method of selecting one or more unknown structures in the source sentence to be translated and selecting the corresponding target language structure in the bilingual parallel corpus includes: For the selected unknown structure, select any corresponding target language structure in the bilingual parallel corpus as the target language structure selected for the unknown structure in the bilingual parallel corpus.

[0050] In this embodiment, preferably, the method of selecting one or more unknown structures in the source sentence to be translated and selecting the corresponding target language structure in the bilingual parallel corpus includes: Search for the target language structure corresponding to the unknown structure in the bilingual parallel corpus, and form unknown bilingual pairs by combining the unknown structure with each of its corresponding target language structures. Based on the semantic similarity of the embedding vectors between unknown bilingual pairs and the second preset evaluation index value of the corresponding unknown bilingual pairs, the target score of each unknown bilingual pair is calculated. The second preset evaluation index is any one or more combinations of the following: direct translation probability, inverse translation probability, frequency of occurrence of unknown bilingual pairs in the bilingual parallel corpus, and total length of unknown bilingual pairs; when the unknown bilingual pair is a bilingual phrase pair, the direct translation probability is the direct phrase translation probability, and the inverse translation probability is the inverse phrase translation probability; when the unknown bilingual pair is a bilingual word pair, the direct translation probability is the direct word weighted, and the inverse translation probability is the inverse word weighted. The target language structure in the unknown bilingual pair with the highest target score is selected as the target language structure in the bilingual parallel corpus for that unknown structure.

[0051] Compared to the optional random selection method, this optimization method comprehensively searches for all target-side language structures corresponding to the unknown structure in a bilingual parallel corpus and forms unknown bilingual pairs, ensuring the comprehensiveness of the target-side language structure selection and avoiding the omission of optimal matches. Simultaneously, it calculates the target score for each unknown bilingual pair by combining the semantic similarity of the embedding vectors with the value of a second preset evaluation index. The second preset evaluation index covers key dimensions such as direct translation probability, inverse translation probability, frequency of occurrence, and total length, and adapts corresponding evaluation criteria for bilingual phrase pairs and bilingual word pairs respectively, making the scoring system more targeted and scientific, and comprehensively reflecting the semantic matching of unknown bilingual pairs. By considering suitability, reliability, and rationality, the target language structure in the unknown bilingual pair with the highest target score is ultimately selected. This ensures that the selected target language structure and the unknown structure are highly semantically compatible and highly adaptable to the usage scenario, providing high-quality support for the subsequent generation of enhanced source sentences. This, in turn, improves the translation accuracy of the target neural translation model for unknown structures, reduces translation errors, and further reduces the introduction of noise during the enhancement process, ensuring the fluency and accuracy of the model's translation output. It is particularly suitable for low-resource scenarios with many unknown structures and scarce corpora, and can fully leverage the value of limited bilingual parallel corpora to achieve precise improvement in model translation performance.

[0052] Methods for selecting one or more unknown structures in the source statement to be translated include: random injection, selective network injection, and traversal injection. This embodiment uses the source sentence to be translated as "He believes this book is very helpful for students, especially in English learning." as shown in Table 1. The unknown structure is "thinks". Different injection methods are introduced respectively.

[0053] Table 1

[0054] The random injection method randomly selects a corresponding number of unknown structures from the source statement to be translated and injects them according to a preset number of injections.

[0055] If the preset injection quantity is 1, and the unknown structure is randomly selected as "think", then "consider" is randomly selected as the target language structure in the target language structure of the bilingual parallel corpus corresponding to "think". The concatenation format is: unknown structure # target language structure corresponding to unknown structure. Then the enhanced source statement to be translated is: He thinks #consider This book is very helpful to students, especially in English learning.

[0056] Selective network injection is only used when selecting only one unknown structure in the source statement to be translated. The source statement to be translated is processed through a structure selection network to obtain the unknown structure in the source statement to be translated.

[0057] When the traversal injection method (Traversal Injection) is only applicable when only one unknown structure in the source sentence to be translated is selected, first calculate the maximum target score corresponding to each unknown structure in the source sentence to be translated (the maximum target score of each unknown structure is calculated through all the unknown bilingual pairs corresponding to it, that is, for a single unknown structure, find all the target-side language structures corresponding to it in the bilingual parallel corpus and form unknown bilingual pairs, and calculate the target scores of each unknown bilingual pair by combining the semantic similarity of the embedding vectors and the value of the second preset evaluation index, and then take the maximum score as the maximum target score of the unknown structure). Then compare the maximum target scores of all unknown structures, determine the unknown structure with the highest target score as the finally selected unknown structure, and at the same time use the target-side language structure in the unknown bilingual pair with the largest target score corresponding to the optimal unknown structure as the target-side language structure selected for the unknown structure in the bilingual parallel corpus.

[0058] If the preset injection quantity is 1, the target scores corresponding to "认为" for each unknown bilingual pair are as follows: The target score of the unknown bilingual pair composed of "认为" and "think" is: 0.82, the target score of the unknown bilingual pair composed of "认为" and "believe" is: 0.78, the target score of the unknown bilingual pair composed of "认为" and "consider" is: 0.65. Therefore, the enhanced source sentence of the source sentence to be translated generated by the traversal injection method is: 他认为#think这本书对学生来说非常有帮助,尤其在英语学习上。

[0059] Random injection increases data diversity by randomly selecting translation segments, but may cause grammar or semantic errors; Selective network injection relies on the context information of the selection model to select the optimal unknown structure to be replaced, and the translation quality is relatively high; The traversal injection strategy traverses all candidate elements and selects the injection result with the highest target score, which can effectively guarantee the translation quality.

[0060] In this embodiment, the neural translation model is based on mBART (Multilingual BART), which serves as the base for the machine translation model for training and processing unknown structures. mBART is a pre-trained model based on the Transformer architecture, capable of multilingual processing and suitable for low-resource translation tasks. Through the pre-training method of the autoencoder and the generative objective, it can handle multiple language pairs and effectively enhance the modeling ability for unknown language structures.

[0061] As a multilingual model, mBART has the following advantages: Multilingual ability: mBART can improve the translation quality of low-resource language pairs through shared cross-lingual representations.

[0062] Modeling capabilities for unknown structures: Because mBART's pre-training tasks include autoregressive language modeling, it is better able to handle unknown syntax and structures when generating translations.

[0063] This invention chooses mBART as the base model because of its powerful performance in multilingual learning and autoregressive generative tasks. mBART can learn general language representations from large-scale bilingual parallel data, and then adapt them to specific translation tasks through fine-tuning, especially showing good adaptability when facing low-resource language pairs.

[0064] To implement mBART training and decoding operations, this invention employs the Fairseq framework. Fairseq is a powerful open-source toolkit from meta, widely used for processing Transformer-based neural network tasks, including machine translation, language modeling, and text generation. This framework supports efficient multi-task training and decoding operations and can easily handle large-scale datasets.

[0065] The mBART-based neural translation model includes the following core modules: Encoder: The encoder part of mBART is based on a bidirectional autoregressive Transformer, which can extract contextual information from the source language sentence. Through a multi-layer self-attention mechanism, the encoder can capture long-distance dependencies, which can effectively enhance the model's understanding and generation capabilities, especially when translating complex syntax or unknown structures.

[0066] Decoder: The decoder employs the same Transformer architecture as the encoder, generating translations of the target language through a self-attention mechanism. The decoder not only receives the encoder's output but also generates the final translation based on contextual information from the target language. To enhance its ability to handle unknown structures, the decoder incorporates a structure enhancement module, ensuring better generation of translations with linguistic regularity when processing low-resource languages.

[0067] Output Layer: The output layer generates the word distribution of the target language through a softmax layer, thereby determining the final translation output.

[0068] In the pre-training phase of the neural translation model, the mBART model is first pre-trained using a large-scale bilingual parallel corpus (training set) to enable it to learn general language representation capabilities and machine translation abilities. Based on this pre-training, mBART is fine-tuned to specific low-resource language pairs. Furthermore, by utilizing a set of candidate sentence pairs, the neural translation model is optimized through knowledge distillation and structured enhancement to better handle unknown structures.

[0069] In this embodiment, by selecting a network, the optimal unknown structure is dynamically chosen for injection, enabling the translation model to better cope with the challenges of low resources and unknown structures. Targeted injection of unknown structures during the translation process improves translation quality and avoids noise problems caused by random injection.

[0070] In this embodiment, the neural translation model is the mBART model, and the structure selection network is trained based on the target neural translation model. The training process includes: Obtain the parameters of the target neural translation model and use them as the initial parameters for the structure selection network; The coding layer parameters of the target neural translation model are frozen. The training objective is to minimize the error between the unknown structure output by the structure selection network and the true unknown structure. The structure selection network is then trained to obtain a well-trained structure selection network.

[0071] The network structure is built on a fine-tuned neural machine translation model, specifically based on the mBART model. The encoding layer structure is kept unchanged and its parameters are frozen. Only the decoding layer is trained. During model initialization, the encoding layer parameters of the neural translation model are used as the initial parameters of the network, so that the model retains the general language knowledge learned in the pre-training stage. The encoding layer does not participate in subsequent updates, while the other layers are optimized during training. The structure selection network is fine-tuned using the best unknown structure samples from the ground truth annotations in the network training set as supervision signals. This allows the network to automatically learn and select the most suitable unknown structure for injection based on the context information and semantic features of the source sentence. The training objective is to minimize the selection error between the unknown structure output by the model and the ground truth annotation unknown structure. This enables the model to automatically select the structure that is most beneficial to translation improvement when faced with unknown structures that are not yet registered, thereby improving the model's ability to handle unknown structures and the translation effect.

[0072] By using a selection network based on the mBART coding layer, this invention can dynamically select the optimal injection structure for unknown structures during the translation process, thereby improving translation quality and avoiding the instability caused by random injection.

[0073] The structure selection network relies on the features of the model encoder to select unknown structures that need to be replaced and enhanced, rather than simply relying on dictionary matching to blindly replace them. This enables precise and targeted optimization of the model's weak points. At the same time, by splicing the selected unknown structures and injecting their corresponding target structures, it can directly fill in the scarce structures and cross-structure dependencies that the model has not mastered, without the need for additional large-scale corpus input, effectively optimizing the targeting of the enhancement process.

[0074] This invention first trains an initial neural translation model, and then trains a structure selection network based on this model (with initial parameters derived from the target neural translation model and the coding layer parameters frozen). The structure selection network detects unknown structures (weak links in the model) in the source sentence to be translated. The unknown structures are concatenated with the corresponding target structures to generate enhanced samples, which are then used to fine-tune the model after being screened by a dual-score system, resulting in the target neural translation model. The structure selection network is continuously optimized based on the target neural translation model, dynamically adapting to the current state of the model. In subsequent inference, it continues to detect new unknown structures, repeating the injection-fine-tuning process to form a complete loop.

[0075] The training and optimization of the structure selection network both rely on the parameter feedback of the target neural translation model. It can dynamically identify the weak structures of the model at different training stages, so that the augmentation strategy no longer depends on static dictionaries or manual rules, but is driven by the model's own weaknesses. It realizes a complete cycle of model-detection-injection-retraining-evaluation-model, which allows the augmentation strategy to dynamically adapt to the current state of the model, and optimize the weak links of the model in a targeted manner. It does not require a lot of ineffective augmentation, significantly improves the augmentation efficiency, and achieves significant performance improvement with a small amount of augmentation.

[0076] Therefore, this invention constructs a model feedback-driven closed-loop enhancement mechanism, which, through a cyclical process of model-detection-injection-retraining-evaluation-model, enables the enhancement strategy to dynamically adapt to the model's current state. This mechanism frees the enhancement process from relying on static dictionaries or manual rules, instead driving it based on the model's own weaknesses, and dynamically adjusts the injection structure according to the model's stage-by-stage performance, thereby significantly improving enhancement efficiency and achieving substantial performance improvements with minimal enhancements. This scheme effectively overcomes the shortcomings of existing technologies, such as the disconnect between the enhancement process and model performance, and low efficiency.

[0077] like Figure 4 , Figure 5 As shown, Figure 4 This is a flowchart of the unknown structure injection process. Figure 5 This is an example of injecting an unknown structure.

[0078] During testing, this invention performs word and phrase-level segmentation on the source sentences in the test set and obtains a set of candidate language structures using the fastalign tool. Simultaneously, length and grammatical constraints are applied to the segmented language structures to ensure they conform to the grammatical rules of both the source and target languages. The segmented language structures are then matched with a parallel bilingual dictionary built during the training phase. Language structures appearing in the test sentences but not included in the training dictionary are classified as unknown structures.

[0079] Based on the matching and alignment results, the unknown structures can be further divided into three categories: The first category consists of language structures that exist at the source end but lack reliable translations at the target end; The second category consists of language structures for which there is a corresponding translation on the target end but no matching segment on the source end; The third category consists of language structures that exist in both the source and target ends but have unstable alignment relationships and low translation confidence.

[0080] By combining statistical information from the training set with preset evaluation indicators, the above language structures are scored, and structures with poor translation quality and unreliable alignment are selected, which together constitute the set of unknown structures.

[0081] Different processing methods are adopted for different types of unknown structures: For language structures that exist in the source but are missing in the target, the search scope is expanded using the structure as a template to obtain target translation fragments. If no fragments are available, target translation candidates are generated by a selection network. For language structures that exist in the target but are missing in the source, the optimal source fragment is matched in reverse based on the target structure, and the source statement is inserted under the premise of satisfying the grammatical constraints. For language structures that exist in both the source and target ends but whose alignment is unstable, the target translation with the highest score is selected for enhancement to improve the model's learning effect on stable mapping relationships.

[0082] After identifying the unknown structure, three strategies—random injection, network selection injection, and traversal injection—are employed to enhance the source sentences in real-time during the testing phase using a code-switch approach. Specific steps include: retrieving the optimal bilingual pair corresponding to the unknown structure from a parallel bilingual phrase dictionary; constructing an enhanced fragment in the form of source-side language structure #target-side language structure; locating injection positions that satisfy grammatical constraints within the source sentence; injecting the enhanced fragment through replacement or insertion to generate the structurally enhanced test sentence; and using evaluation metrics to filter the enhancement results, inputting high-quality enhanced sentences into the neural translation model to improve the model's translation performance for unknown structures in real-world testing and application scenarios.

[0083] Among them, network injection is based on the mBART model, which is initialized with the parameters of the trained neural translation model, the coding layer parameters are frozen, and the training objective is to minimize the error between the output unknown structure and the real unknown structure. This enables the model to select the optimal unknown structure for injection based on context information, thereby further improving the accuracy of data augmentation and translation quality.

[0084] When constructing a parallel bilingual dictionary, this invention uses the Moses tool to quantify and score word alignment results and then filters them. It then calculates one or more combinations of the direct translation probability, inverse translation probability, frequency of occurrence, and total length of bilingual pairs as a first preset evaluation index value, and filters bilingual pairs through threshold filtering. When filtering candidate sentence pairs, it calculates the semantic similarity of the embedding vectors of the candidate language structure and the corresponding target structure, and filters candidate sentence pairs based on the first preset evaluation index value. When searching for the target structure corresponding to an unknown structure, it calculates the semantic similarity of the embedding vectors of unknown bilingual pairs, and combines the above-mentioned similar indicators as a second preset evaluation index value, calculates the target score, and selects the optimal unknown bilingual pair.

[0085] This invention uses first and second preset evaluation indicators to form a dictionary score and embeds vector semantic similarity to form a semantic score. The two are combined to form a dual scoring mechanism, which performs dual screening of augmented samples and unknown bilingual pairs in the training and inference phases respectively. This can effectively filter out pseudo samples with inconsistent semantics and low matching degree, avoid a lot of noise caused by random replacement, ensure the semantic consistency of augmented samples, improve the overall effectiveness of augmented samples, and avoid pseudo data from having a negative impact on model training.

[0086] Therefore, this invention introduces a dual-scoring mechanism combining semantic scoring and dictionary scoring to double-screen samples before augmentation data injection, effectively reducing the noise ratio caused by random replacements and ensuring semantic consistency. In low-resource scenarios, this mechanism can prevent pseudo-data from having a negative impact on model training and improve the average effectiveness of augmented samples, making the training process converge faster and more stable, thus solving the problems of excessive noise and increased bit error rate in existing augmentation methods.

[0087] This invention achieves significant improvements in translation performance across various low-resource language pairs through targeted structural enhancements. Specifically, this is manifested in increased BLEU scores and semantic metrics such as COMET, as well as a marked reduction in translation error rates for complex and difficult sentences. This performance improvement is not dependent on larger external models but is inherent to the technical solution itself, thus possessing reproducibility and engineering feasibility.

[0088] The method of this invention has high versatility and transferability, and can be widely applied to standard neural machine translation frameworks. Therefore, this invention can not only be directly applied to different language pairs, especially low-resource languages, but also adapt to various model structures, facilitating practical engineering deployment. It achieves efficient, low-noise, and highly targeted enhancement of translation models under low-resource conditions, significantly improving the model's translation capabilities on complex language structures, overcoming the shortcomings of traditional random enhancement, back-translation, or feedback-free enhancement methods, and has broad practical application value.

[0089] To verify the effectiveness of this invention in low-resource neural machine translation scenarios, experiments were conducted on simulated low-resource and real low-resource language tasks, and the performance of multiple enhancement methods was compared. The main evaluation metrics included BLEU (Bilingual Evaluation Understudy) and COMET (Crosslingual Optimized Metric for Evaluation of Translation) scores; the former measures the surface-level matching degree of the translation, while the latter measures semantics and fluency.

[0090] like Figure 6 , Figure 7 As shown, Figure 6 To simulate the comparison of BLEU and COMET scores for different translation directions in low-resource scenarios, Figure 6 This study compares BLEU and COMET scores for different translation directions in real-world low-resource scenarios.

[0091] Based on the simulation results of Chinese-English and real low-resource scenarios, this invention uses a big data model (PCbig) and a small data model (PCsmall) as the basic system in two language directions: Chinese to English (Zh→En) and German to English (De→En). Different enhancement strategies are introduced, including random enhancement (+Random), boundary constraint enhancement (+Bounded), unbounded structure injection (+UnBounded), and the unknown structure network (+USNet) proposed in this invention.

[0092] On real low-resource language pairs such as Kyrgyz (Ky), Uzbek (Ug), and Kazakh (Kk), Gpt-3.5-turbo, LLaMA-3-8B, Baseline, and various enhancement strategies were evaluated. The "+UnBounded" approach achieved the highest BLEU score across all tasks.

[0093] This embodiment verifies that the present invention can effectively compensate for the missing structure in training data in real-world scenarios with extremely low resources, and achieve a leap in translation performance by injecting the structure.

[0094] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0095] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0096] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0097] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0098] Obviously, the above embodiments are merely illustrative examples for clear explanation and are not intended to limit the implementation. Those skilled in the art will recognize that other variations or modifications can be made based on the above description. It is neither necessary nor possible to exhaustively list all possible implementations here. However, obvious variations or modifications derived therefrom are still within the scope of protection of this invention.

Claims

1. A neural machine translation method for unknown structures, characterized in that, include: The neural translation model is trained using the preprocessed training set to obtain a trained neural translation model; The training set and validation set include multiple source statements and their corresponding target statements; After segmenting the preprocessed training set, word alignment is performed to obtain word alignment results; based on the word alignment results, a parallel bilingual dictionary is constructed. The language structures existing in the parallel bilingual dictionary in the source sentence are used as candidate language structures for the source sentence; the language structures include phrases and words. For each candidate language structure, it is replaced with a concatenation of the candidate language structure and the corresponding target language structure in the parallel bilingual dictionary to generate a corresponding candidate source statement; each candidate source statement is combined with its corresponding target statement to obtain a set of candidate sentence pairs. The parameters of the trained neural translation model are adjusted using the candidate sentence set to obtain the target neural translation model; Extract source language structures whose lengths meet the preset range from the source sentences to be translated to obtain a set of candidate language structures; Determine whether there are any language structures in the source sentence to be translated that are not found in the parallel bilingual dictionary. If not, input the source sentence to be translated into the target neural translation model and output the target sentence of the source sentence to be translated. If it exists, the candidate language structure that does not appear in the parallel bilingual dictionary will be identified as an unknown structure; Select one or more unknown structures from the source sentence to be translated, select the corresponding target language structure from the bilingual parallel corpus, replace the unknown structure with a concatenation of the unknown structure and the corresponding target language structure to generate an enhanced source sentence; input the enhanced source sentence into the target neural translation model, and output the target sentence of the source sentence to be translated.

2. The neural machine translation method for unknown structures according to claim 1, characterized in that, Methods for constructing phrase dictionaries based on word alignment results include: The Moses tool was used to quantify and score the word alignment results; based on the scores of the word alignment results, the target word alignment results were selected. Extract source language structures of a preset length from the source sentences; obtain the target language structures corresponding to the source language structures based on the target word alignment results; and form bilingual pairs by combining the source language structures and their corresponding target language structures. All bilingual pairs are filtered to construct a phrase dictionary.

3. The neural machine translation method for unknown structures according to claim 2, characterized in that, Methods for constructing parallel bilingual dictionaries by filtering all bilingual pairs include: Calculate the first preset evaluation index value for all bilingual pairs, filter the bilingual pairs by comparing their values ​​with the set threshold corresponding to the first preset evaluation index, and construct a parallel bilingual dictionary based on the filtered bilingual pairs.

4. The neural machine translation method for unknown structures according to claim 3, characterized in that, The first preset evaluation index is any one or more combinations of the following: direct translation probability, inverse translation probability, frequency of occurrence of bilingual pairs in the training set, and total length of bilingual pairs; When a bilingual pair is a bilingual phrase pair, the direct translation probability is the same as the direct phrase translation probability, and the inverse translation probability is the same as the inverse phrase translation probability. When a bilingual pair is a bilingual lexical pair, the direct translation probability is weighted by the direct lexical probabilities, and the inverse translation probability is weighted by the inverse lexical probabilities.

5. The neural machine translation method for unknown structures according to claim 1, characterized in that, After obtaining the set of candidate sentence pairs, methods for filtering the set of candidate sentence pairs to obtain the target set of candidate sentence pairs include: For each candidate sentence pair, the semantic similarity of the embedding vector between the candidate language structure and the corresponding target language structure in the parallel bilingual dictionary is calculated, and the first preset evaluation index value of the bilingual pair formed by the candidate language structure and the corresponding target language structure is obtained. Based on the above embedding vector semantic similarity and the first preset evaluation index value, each candidate sentence pair is screened to obtain the target candidate sentence pair set.

6. The neural machine translation method for unknown structures according to claim 1, characterized in that, Methods for selecting one or more unknown structures from the source sentence to be translated and then selecting the corresponding target language structure from a bilingual parallel corpus include: Search for the target language structure corresponding to the unknown structure in the bilingual parallel corpus, and form unknown bilingual pairs by combining the unknown structure with each of its corresponding target language structures. Based on the semantic similarity of the embedding vectors between unknown bilingual pairs and the second preset evaluation index value of the corresponding unknown bilingual pairs, the target score of each unknown bilingual pair is calculated. The second preset evaluation index is any one or more combinations of the following: direct translation probability, inverse translation probability, frequency of occurrence of unknown bilingual pairs in the bilingual parallel corpus, and total length of unknown bilingual pairs; when the unknown bilingual pair is a bilingual phrase pair, the direct translation probability is the direct phrase translation probability, and the inverse translation probability is the inverse phrase translation probability; when the unknown bilingual pair is a bilingual word pair, the direct translation probability is the direct word weighted, and the inverse translation probability is the inverse word weighted. The target language structure in the unknown bilingual pair with the highest target score is selected as the target language structure in the bilingual parallel corpus for that unknown structure.

7. The neural machine translation method for unknown structures according to claim 1, characterized in that, When selecting only one unknown structure from the source statement to be translated, the selection methods include: The unknown structure in the source statement to be translated is obtained through a structure selection network.

8. The neural machine translation method for unknown structures according to claim 7, characterized in that, The neural translation model is the mBART model.

9. The neural machine translation method for unknown structures according to claim 8, characterized in that, The structure selection network is trained based on the target neural translation model. The training process includes: Obtain the parameters of the target neural translation model and use them as the initial parameters for the structure selection network; The coding layer parameters of the target neural translation model are frozen. The training objective is to minimize the error between the unknown structure output by the structure selection network and the true unknown structure. The structure selection network is then trained to obtain a well-trained structure selection network.

10. A neural machine translation system for unknown structures, characterized in that, include: The initial training module is used to train the neural translation model using the preprocessed training set to obtain a trained neural translation model; the training set and validation set include multiple source sentences and their corresponding target sentences; The dictionary construction module is used to perform word alignment operations on the preprocessed training set after word segmentation to obtain word alignment results; and to construct a parallel bilingual dictionary based on the word alignment results. The candidate language structure generation module is used to select language structures existing in the parallel bilingual dictionary in the source sentence as candidate language structures for the source sentence; the language structures include phrases and words. The candidate sentence pair generation module is used to replace each candidate language structure with a concatenation of the candidate language structure and the corresponding target language structure in the parallel bilingual dictionary to generate a corresponding candidate source sentence; and to combine each candidate source sentence with its corresponding target sentence to obtain a set of candidate sentence pairs. The parameter adjustment module is used to adjust the parameters of the trained neural translation model using the candidate sentence set to obtain the target neural translation model; The translation module is used to determine whether there are language structures in the source sentence to be translated that do not appear in the parallel bilingual dictionary. If they do not exist, the source sentence to be translated is input into the target neural translation model, and the target sentence to be translated is output. If it exists, the candidate language structure that does not appear in the parallel bilingual dictionary will be identified as an unknown structure; Select one or more unknown structures from the source sentence to be translated, select the corresponding target language structure from the bilingual parallel corpus, replace the unknown structure with a concatenation of the unknown structure and the corresponding target language structure to generate an enhanced source sentence; input the enhanced source sentence into the target neural translation model, and output the target sentence of the source sentence to be translated.