A multilingual neural machine translation system and method utilizing language family information

By introducing a language family information module into the Transformer model, the performance degradation of multilingual neural machine translation models on high-resource languages ​​is resolved, and translation accuracy is improved, especially on low-resource languages.

CN115481621BActive Publication Date: 2026-06-16TIANJIN UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TIANJIN UNIV
Filing Date
2022-09-13
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing multilingual neural machine translation models suffer from performance degradation in high-resource languages ​​and struggle to fully utilize information from low-resource languages, resulting in poor translation performance.

Method used

A language family information module is added to the basic Transformer model, including a routing module, a gating module, a language family-related feedforward neural network module, and a global feedforward neural network module. Language family information is incorporated into the language family information module to improve translation performance.

🎯Benefits of technology

It alleviated the performance degradation problem in high-resource languages ​​and improved the translation performance of low-resource languages, significantly improving the translation accuracy of medium-resource and low-resource languages.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115481621B_ABST
    Figure CN115481621B_ABST
Patent Text Reader

Abstract

The application discloses a multilingual neural machine translation system using language family information, which is constructed based on a Transformer basic model, and a language family information module is added after each self-attention mechanism module and a feedforward neural network module in the Transformer basic model; the language family information module inputs language family information and the output of a preceding module, and outputs a vector after the language family information is fused. The application further provides a multilingual neural machine translation method using language family information. The multilingual neural machine translation system using language family information provided by the application alleviates the performance decline problem of large-scale multilingual machine translation on high-resource languages, and further improves the translation performance on low-resource languages.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to a neural machine translation system, and more particularly to a multilingual neural machine translation system and method that utilizes language family information. Background Technology

[0002] Currently, machine translation is one of the most important tasks in the field of natural language processing, holding a significant position in both academia and industry. In the 21st century, the internet is undoubtedly the most widely used tool and the most prevalent communication platform globally. The greatest convenience brought by the internet lies in facilitating communication between different countries and regions, and this cross-regional and cross-linguistic communication has led to a surge in demand for machine translation.

[0003] The research goal of Machine Translation (MT) is to use computer technology to translate text from one language to another while preserving as much of the information contained in the original language as possible. Early research in machine translation was mainly limited to Statistical Machine Translation (SMT) and Rule-Based Machine Translation (RBS). However, with the significant increase in computing power brought about by the development of Graphics Processing Units (GPUs), neural network modules have regained importance, leading to Neural Machine Translation (NMT). Current NMT models have far surpassed traditional statistical machine translation models in their performance on multiple bilingual language pairs, becoming the mainstream machine translation system. However, with over 5000 languages ​​in existence, including dozens of commonly used languages, maintaining a separate bilingual machine translation system for each language would be extremely costly. Assuming the goal is to achieve mutual translation between N languages, N systems would need to be trained and maintained. 2 A neural machine translation system has an overall complexity of O(N). 2 As N gradually increases, the overall overhead will increase quadratically, training time will grow, and deployment and maintenance will become difficult.

[0004] Multilingual Neural Machine Translation (MNMT) effectively addresses some of the problems associated with bilingual machine translation systems. The core idea of ​​MNMT is to use a unified translation model to perform translation between multiple languages, thus solving the scalability issue of bilingual machine translation systems in practice. Currently, the industry is also adopting large-scale model approaches to improve the performance of multilingual machine translation, and many companies, including internet giants such as Baidu, Google, Microsoft, ByteDance, and DeepL, have deployed unified multilingual machine translation systems developed based on internal datasets.

[0005] Multilingual machine translation (MMT) significantly reduces the cost of training and deploying a complete multilingual translation system. Furthermore, the joint training method allows for positive transfer from high-resource languages ​​to low-resource languages ​​during training, enabling the model to outperform bilingual MMT models on low-resource languages. However, on the other hand, MMT also leads to a decline in translation performance on high-resource languages. Since the performance of neural network modules depends primarily on the amount of data and the number of model parameters, while multilingual MMT models based on neural networks have sufficient data, the number of parameters in a single model remains limited. This makes it difficult to fully learn all the effective information contained in the dataset, resulting in a decline in translation performance on language pairs with ample data. Additionally, in the process of utilizing language information to improve the performance of machine translation models, the following problem arises during model generalization: multilingual neural machine translation models may exhibit transfer to low-resource languages ​​or even zero-resource languages. Summary of the Invention

[0006] This invention provides a multilingual neural machine translation system and method that utilizes language family information to solve the technical problems existing in the prior art.

[0007] The technical solution adopted by this invention to solve the technical problems existing in the prior art is: a multilingual neural machine translation system utilizing language family information. This system is built on the Transformer basic model, and a language family information module is added after each self-attention mechanism module and feedforward neural network module in the Transformer basic model. The language family information module takes language family information and the output of the preceding module as input, generates a vector incorporating language family information, and then outputs it.

[0008] Furthermore, the language family information module includes a routing module, a gating module, a language family-related feedforward neural network module, and a global feedforward neural network module; the outputs of the front-end module are respectively input to the routing module, the gating module, and the global feedforward neural network module; the routing module also inputs language family information; it combines the outputs of the front-end module and the language family information and inputs them to the language family-related feedforward neural network module; the outputs of the gating module, the global feedforward neural network module, and the language family-related feedforward neural network module are combined and used as the output of the language family information module.

[0009] Furthermore, the system includes an encoder component, which includes a multi-layer encoder. Each encoder layer includes a self-attention mechanism module A, a language family information module A, a normalization module A, a feedforward neural network module A, a language family information module B, and a normalization module B connected in sequence.

[0010] Furthermore, the input data for each encoder layer includes language family information for the current language.

[0011] Furthermore, the system also includes a decoder component, which comprises a multi-layer decoder. Each layer of the decoder includes a self-attention mechanism module B, a language family information module C, a normalization module C, a cross-attention mechanism module, a language family information module D, a normalization module D, a feedforward neural network module B, a language family information module E, and a normalization module E, which are connected in sequence.

[0012] Furthermore, the input data for each decoder layer includes language family information for the current language.

[0013] The present invention also provides a multilingual neural machine translation method that utilizes language family information in the above-mentioned multilingual neural machine translation system, classifying languages ​​according to language family.

[0014] Furthermore, the training dataset for the Indo-European language family was divided into Germanic, Baltic, and Celtic training datasets according to language family; the training dataset for the Semitic-Afro-Asiatic language family contained 5 languages, 4 of which belonged to the Semitic language family and the other 1 to the Chadian language family, and these 5 languages ​​were grouped into one language family.

[0015] Furthermore, the training and test sets of the multilingual neural machine translation system utilizing language family information are derived from the OPUS-100 dataset; the training set includes 94 languages ​​and 26 different language families.

[0016] Furthermore, when training the multilingual neural machine translation system using language family information, Adam was used as the optimizer, the labelsmoothing value was set to 0.1, and the learning rate scheduler adopted the square root reciprocal strategy.

[0017] The advantages and positive effects of this invention are: the multilingual neural machine translation system proposed in this invention, which utilizes language family information, alleviates the performance degradation problem of large-scale multilingual machine translation on high-resource languages ​​and further improves the translation performance on low-resource languages. Attached Figure Description

[0018] Figure 1 This is a schematic diagram of the structure of the present invention.

[0019] Figure 2 This is a thermal diagram illustrating the cosine similarity between language family-shared parameters at the decoder end in the direction of monolingual to multilingual translation.

[0020] Figure 1 In the text: ×6 indicates the number of layers in the corresponding encoder and decoder.

[0021] Figure 2 middle:

[0022] BA stands for Baltic language family name.

[0023] CE stands for Celtic language family.

[0024] GE stands for the Germanic language family.

[0025] IA stands for Indo-Aryan language family.

[0026] IE stands for Greek and Albanian.

[0027] IR stands for Iranian language family name.

[0028] RO stands for Romance languages.

[0029] ES stands for East Slavic language family.

[0030] SS stands for the Yugoslavian language family.

[0031] WS stands for West Slavic language family.

[0032] SE stands for Semitic language family.

[0033] MP stands for Malayo-Polynesian.

[0034] AU stands for Austroasiatic language family.

[0035] DR stands for Dravidian language family name.

[0036] CON stands for Esperanto.

[0037] LI stands for isolating language family.

[0038] JA stands for Japanese abbreviation.

[0039] KO is a Korean term.

[0040] ST stands for Sino-Tibetan language family.

[0041] KA stands for Georgian.

[0042] UR stands for Uralic language family.

[0043] NC stands for Niger-Congo language family.

[0044] TK represents the Zhuang-Dong language family.

[0045] OG stands for Oghuz language family.

[0046] KAL stands for Karluk language family.

[0047] KI stands for the Kipchak language family. Detailed Implementation

[0048] To further understand the invention's content, features, and effects, the following embodiments are provided, along with detailed descriptions in conjunction with the accompanying drawings:

[0049] The Chinese translations of some English words, phrases, and abbreviations used in this invention and its accompanying drawings are as follows:

[0050] Baseline: Baseline model.

[0051] CLSR: Condition-Specific Routing Model.

[0052] Param: Model parameters.

[0053] Transformer: The Transformer model is a type of neural network that learns context and thus meaning by tracking relationships in sequential data.

[0054] High: High-resource language.

[0055] Medium: Medium-level resource language.

[0056] Low: Low-resource language.

[0057] All: All languages.

[0058] BLEU: Accuracy (Machine Translation Evaluation Metric) was proposed by IBM in 2002 for evaluating machine translation tasks.

[0059] OPUS-100: The OPUS-100 dataset.

[0060] GE: A designation for the Germanic language family.

[0061] RO: A term for the Romance language family.

[0062] ST: A term used in the Sino-Tibetan language family.

[0063] Output: Output.

[0064] LB-specific FFN: Language family-related feedforward neural network module.

[0065] Global FFN (Shared by All Languages): Global feedforward neural network module (shared across all languages).

[0066] Router: Routing module.

[0067] Gate: Gating module.

[0068] LB token: Language family information.

[0069] Input: Input.

[0070] LayerNorm: Normalization module.

[0071] LBGM: Language Family Information Module.

[0072] Feed-Forward Network: A feedforward neural network module.

[0073] Self-Attention: The self-attention mechanism module.

[0074] Embedding: Word embedding module.

[0075] Source: Source language.

[0076] Cross-Attention: The cross-attention mechanism module.

[0077] Target: Target language.

[0078] Please see Figures 1 to 2 A multilingual neural machine translation system utilizing language family information is proposed. This system is built on the Transformer basic model, and a language family information module is added after each self-attention mechanism module and feedforward neural network module in the Transformer basic model. The language family information module takes language family information and the output of the preceding module as input, generates a vector incorporating language family information, and then outputs it.

[0079] Preferably, the language family information module may include a routing module, a gating module, a language family-related feedforward neural network module, and a global feedforward neural network module; the output of the front-end module is respectively input to the routing module, the gating module, and the global feedforward neural network module; the routing module also inputs language family information; it combines the output of the front-end module and the language family information and inputs them to the language family-related feedforward neural network module; the outputs of the gating module, the global feedforward neural network module, and the language family-related feedforward neural network module are combined and used as the output of the language family information module.

[0080] Preferably, the system may include an encoder assembly, which may include a multi-layer encoder. Each encoder layer may include a self-attention mechanism module A, a language family information module A, a normalization module A, a feedforward neural network module A, a language family information module B, and a normalization module B connected in sequence. The encoder assembly may include 6-8 layers of encoders.

[0081] Preferably, the input data for each encoder layer may include linguistic family information of the current language.

[0082] Preferably, the system may further include a decoder component, which may include a multi-layer decoder. Each layer of the decoder may include a self-attention mechanism module B, a language family information module C, a normalization module C, a cross-attention mechanism module, a language family information module D, a normalization module D, a feedforward neural network module B, a language family information module E, and a normalization module E, connected in sequence. The decoder component may include 6-8 layers of decoders.

[0083] Preferably, the input data for each decoder layer may include linguistic family information of the current language.

[0084] The first-layer decoder receives input from part of the output of the encoder component and part of the output of the last-layer decoder. The output of the last-layer decoder is then input to the first-layer decoder via the word embedding module B.

[0085] The decoders outside the first layer receive input from both the output of the encoder component and the output of the previous layer's decoder. After processing by the final decoder, the data passes sequentially through the fully connected layer and the excitation layer before being output as the target language output. The excitation layer can be processed using the softmax function.

[0086] The present invention also provides a multilingual neural machine translation method that utilizes language family information in the above-mentioned multilingual neural machine translation system, classifying languages ​​according to language family.

[0087] Furthermore, the training dataset for the Indo-European language family can be divided into Germanic, Baltic, and Celtic training sets according to language groups; the training dataset for the Semitic-Asiatic language family contains 5 languages, 4 of which belong to the Semitic language group and the other to the Chadian language group. These 5 languages ​​can be grouped into one language group. For example, the training dataset for the Semitic-Asiatic language family can be called the Semitic-Asiatic language group training set.

[0088] Preferably, the training and test sets of the multilingual neural machine translation system utilizing language family information can be derived from the OPUS-100 dataset; the training set may include 94 languages ​​and 26 different language families.

[0089] Preferably, when training a multilingual neural machine translation system using language family information, Adam can be used as the optimizer, the labelsmoothing value can be set to 0.1, and the learning rate scheduler can adopt a square root reciprocal strategy.

[0090] The functional modules, including the routing module, gating module, language family-related feedforward neural network module and global feedforward neural network module, word embedding module A, self-attention mechanism module A, language family information module A, normalization module A, cross-attention mechanism module, language family information module B, normalization module B, feedforward neural network module A, language family information module C and normalization module C, word embedding module B, self-attention mechanism module B, language family information module D, normalization module D, feedforward neural network module B, language family information module E and normalization module E, can be applicable functional modules in the prior art, or can be constructed using existing hardware and software and conventional technical means.

[0091] The addition of letter codes after modules such as word embedding, normalization, language family information, feedforward neural network, and self-attention mechanism indicates their different positions within the system. Functional modules with different letter codes can use the same structure.

[0092] The working principle of this invention: The multilingual neural machine translation model is an extension of bilingual neural machine translation, also built upon neural networks. The essence of neural networks in performing machine translation tasks still relies on the semantic information contained in the text. Multilingual machine translation typically uses a sequence-to-sequence neural network framework to construct an end-to-end translation model, directly modeling the process of translating from the source language to the target language. During encoding, the semantic information of the source language is extracted, and during training, this extracted semantic information is mapped to an appropriate vector space. In the decoding stage, the semantics of the source language are recovered and transformed into the target language. Therefore, how to better and more fully utilize the semantic information contained in language is also a research area that needs to be considered in multilingual machine translation.

[0093] Specifically, assuming there are currently source language sentences X = {x 1 ,x 2 ,…,x n The corresponding target language sentence is Y = {y}. 1 ,y 2 ,…,y m The goal of neural machine translation is to translate the source language sentence X into the corresponding target language sentence Y. Multilingual machine translation, on the other hand, adds corresponding linguistic symbols, denoted as lang, to both the source language sentence and the target language sentence. src and lang tgt This transforms the original X and Y into X = {lang} src ,x 1 ,x 2 ,…,x n} and Y = {lang tgt ,y 1 ,y 2 ,…,y m This is how multilingual neural machine translation is achieved.

[0094] Similar to neural machine translation, multilingual machine translation employs an encoder-decoder architecture, in which language pairs are processed as follows:

[0095] h = Encoder([lang src ,x (i) ])

[0096]

[0097] Where x (i) This represents the i-th source language sentence. This represents the first j words of the corresponding i-th target language sentence.

[0098] Next, the decoded information is mapped using the softmax function to obtain the corresponding probability values, thus yielding the overall probability of the target language sentence, as shown below:

[0099]

[0100]

[0101] Because multilingual machine translation models use datasets containing a large number of different languages ​​during training, and these languages ​​can often be classified according to language families, languages ​​within the same language family have higher text similarity and more similar semantic information.

[0102] To fully utilize language family information and integrate it into the model, the languages ​​in the dataset must first be classified by language family. The language family classification used in this invention is based on the Ethnologue (Languages ​​of the World) proposed by SLI (Social Languages ​​Institute), primarily based on the language family to which the current language belongs. The reason for choosing language families is that within the same language family, there are often many languages, and the relationships between them are not as close as those between language families. For example, for the Indo-European language family, which contains a large number of languages, they are classified according to the language family to which each language belongs, such as the Germanic, Baltic, and Celtic families. However, for the Semitic-Asiatic language family, the dataset used in this invention only contains 5 languages, and 4 of these languages ​​belong to the Semitic family and 1 to the Chadian family. Classifying them by language family would not achieve the desired effect of this invention; therefore, these 5 languages ​​are grouped into one category.

[0103] The model architecture used in this invention is based on the Transformer model. Improvements are made to the Transformer model by adding adapters at appropriate locations to realize the model of this invention.

[0104] This invention conducts translation experiments on the publicly available OPUS-100 dataset, involving 94 languages, 26 different language families, and 10 language clans. Through thorough experimental demonstration and analysis of the experimental results, it is demonstrated that the machine translation proposed in this invention, which utilizes language family information, alleviates the performance degradation problem of large-scale multilingual machine translation on high-resource languages ​​and further improves the translation performance on low-resource languages.

[0105] Table 1 presents the experimental results for this part of the experiment on the OPUS-100 dataset. The translation performance was measured by the BLEU metric, which was used to make predictions on the test set with the trained model.

[0106] Table 1: Scores of various models in the monolingual to multilingual translation direction

[0107]

[0108] As shown in Table 1, the LBGM proposed in this invention improves upon the baseline and CLSR models across all language pairs, with even greater improvements for medium-resource and low-resource languages. The LBGM model improves upon the baseline model by 1.59 BLEU and 1.90 BLEU for medium-resource and low-resource languages, respectively. Compared to CLSR, it brings improvements of 0.87 BLEU and 1.46 BLEU for the two languages, while remaining the same for high-resource languages ​​without a significant decline.

[0109] Furthermore, this invention categorizes the languages ​​used in the experiments according to their richness within a language family, resulting in three categories: languages ​​containing high-resource, medium-resource, and low-resource languages ​​(i.e., a rich language family); languages ​​containing only high-resource languages; and languages ​​containing only one or two relatively independent languages. By comparing the experimental results across these three categories, more in-depth conclusions are drawn.

[0110] Table 2: Scores for language families containing rich languages ​​in the O2M direction

[0111]

[0112] In Table 2:

[0113] da represents the Danish word.

[0114] "de" is German.

[0115] "is" is an Icelandic word.

[0116] nl stands for Dutch.

[0117] "No" is a Norwegian word.

[0118] sv stands for Swedish.

[0119] nn represents New Norwegian.

[0120] af stands for Afrikaans.

[0121] nb represents written Norwegian.

[0122] fy represents the Frisian language.

[0123] li represents the Limburg language.

[0124] yi indicates a local language.

[0125] Table 2 presents the experimental results for languages ​​with richness within the same language family. Here, the Germanic and Romance branches of the Indo-European language family are selected for the results. As can be seen from Table 2, for all medium-resource and low-resource languages, the LBGM proposed in this invention shows significant improvements over the CLSR model and the baseline model. For high-resource languages, it achieves better results than the CLSR model in most languages, and in some languages, it achieves results comparable to CLSR, while also showing improvements over the baseline model.

[0126] Table 3: Scores for language families containing only high-resource languages ​​in the O2M direction

[0127]

[0128] In Table 3:

[0129] cs stands for Czech.

[0130] pl stands for Polish.

[0131] sk represents the Slovakian word.

[0132] lt represents Lithuanian.

[0133] lv represents the Latvian language.

[0134] Table 3 illustrates the case where only high-resource languages ​​exist within the same language family, focusing on the West Slavic and Baltic language families within the Indo-European language family. The results in Table 3 show that, when the language family contains only high-resource languages, the LBGM proposed in this invention achieves performance comparable to CLSR and shows improvement over the baseline system, alleviating the problem of insufficient training for high-resource languages ​​during joint training.

[0135] Table 4: Scores for language families containing only isolated languages ​​in the O2M direction

[0136]

[0137] In Table 4:

[0138] ja represents Japanese.

[0139] ko represents Korean.

[0140] eo stands for Esperanto.

[0141] Table 4 presents the translation performance in the last scenario described in this invention, focusing on isolated languages: Esperanto, Japanese, and Korean. The results show significant differences, which this invention attributes to the inherent characteristics of each language. For Japanese and Korean, the proposed model performs worse than the CLSR model. This is because Japanese and Korean are relatively isolated, lacking similarities with other languages, making it difficult to achieve sufficient training and positive transfer from other languages ​​during training. However, the difference compared to the baseline model is small. For Esperanto, being an artificial language based on the Indo-European language family, it contains semantic information similar to other Indo-European languages, potentially leading to positive transfer from other related languages ​​and resulting in stronger performance when using LBGM.

[0142] Figure 2The graph displays the cosine similarity between shared parameters of language families, presented as a heatmap. The horizontal and vertical axes represent the corresponding language family codes. Cosine similarity measures the relationship between two vectors by calculating the angle between them. A cosine similarity closer to 1 indicates that the angle between the two vectors is approximately 0, meaning the two vectors are more similar. The specific calculation formula is as follows:

[0143]

[0144] Where ||·|| represents the L2 norm, and this paper uses the weights of the language family-related parameters in the LBGM module to calculate the cosine similarity, where V represents the vector obtained by reshaping the weights in the FFN of a language family-related parameter.

[0145] Figure 2 In the diagram, darker colors indicate greater similarity between the shared parameters of two language families. The most prominent areas are the rectangle in the upper left corner bounded by BA to WS, which contains languages ​​belonging to the Indo-European language family; another area is the region corresponding to JA, KO, and ST in the middle, representing the languages ​​CJK (Japanese, Korean, and Japanese); and the region in the lower right corner bounded by OG, KAL, and KI represents the Turkic language family. In the Indo-European language family, darker colors indicate that the language families within this family share more similar parameters in the model. Specifically, ES, SS, and WS all belong to the Slavic language family, and the diagram shows greater similarity among them. In the CJK (Japanese, Korean, and Japanese) language family, although they do not belong to the same language family, their geographical proximity leads to more similar parameters learned in the model. The same pattern can be observed for the Turkic language family.

[0146] The embodiments described above are only used to illustrate the technical ideas and features of the present invention. Their purpose is to enable those skilled in the art to understand the content of the present invention and implement it accordingly. The patent scope of the present invention should not be limited by these embodiments. That is, any equivalent changes or modifications made in accordance with the spirit disclosed in the present invention still fall within the patent scope of the present invention.

Claims

1. A multilingual neural machine translation system utilizing language family information, characterized in that, This system is built on the Transformer basic model. It adds a language family information module after each self-attention mechanism module and feedforward neural network module in the Transformer basic model. The language family information module takes language family information and the output of the preceding module as input, generates a vector incorporating language family information and outputs it. The language family information module includes a routing module, a gating module, a language family-related feedforward neural network module, and a global feedforward neural network module. The outputs of the pre-processor modules are respectively input to the routing module, the gating module, and the global feedforward neural network module. The routing module also inputs language family information. It combines the outputs of the pre-processor modules and the language family information and inputs them to the language family-related feedforward neural network module. The outputs of the gating module, the global feedforward neural network module, and the language family-related feedforward neural network module are combined and used as the output of the language family information module.

2. The multilingual neural machine translation system utilizing language family information according to claim 1, characterized in that, The system includes an encoder component, which includes a multi-layer encoder. Each encoder layer includes a self-attention mechanism module A, a language family information module A, a normalization module A, a feedforward neural network module A, a language family information module B, and a normalization module B connected in sequence.

3. The multilingual neural machine translation system utilizing language family information according to claim 2, characterized in that, The input data for each encoder layer includes language family information for the current language.

4. The multilingual neural machine translation system utilizing language family information according to claim 1, characterized in that, The system also includes a decoder component, which comprises a multi-layer decoder. Each layer of the decoder includes a self-attention mechanism module B, a language family information module C, a normalization module C, a cross-attention mechanism module, a language family information module D, a normalization module D, a feedforward neural network module B, a language family information module E, and a normalization module E, which are connected in sequence.

5. The multilingual neural machine translation system utilizing language family information according to claim 4, characterized in that, The input data for each layer of the decoder includes language family information for the current language.

6. A method for multilingual neural machine translation utilizing language family information from a multilingual neural machine translation system based on any one of claims 1 to 5, characterized in that, Languages ​​are classified according to language families.

7. The multilingual neural machine translation method utilizing language family information according to claim 6, characterized in that, The training dataset for the Indo-European language family is divided into Germanic, Baltic, and Celtic languages. The training dataset for the Semitic-Asiatic language family contains five languages, four of which belong to the Semitic language family and one to the Chadian language family. These five languages ​​are grouped into one language family. 8.The method of claim 6, wherein, The training and test sets of the multilingual neural machine translation system that utilizes language family information are derived from the OPUS-100 dataset; the training set includes 94 languages ​​and 26 different language families. 9.The method of claim 6, wherein, The multilingual neural machine translation system utilizing language family information was trained using Adam as the optimizer, with a labelsmoothing value of 0.1, and a learning rate scheduler employing the inverse square root strategy.