A multilingual translation method, electronic device and storage medium
By using dynamic scoring and terminology integrity reward correction in the preference translation model, the problems of resource imbalance and terminology translation distortion in multilingual translation are solved, thereby improving translation accuracy and reliability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHEJIANG DAHUA TECH CO LTD
- Filing Date
- 2026-01-28
- Publication Date
- 2026-06-19
AI Technical Summary
Existing multilingual translation technologies suffer from problems such as uneven resource processing, terminology translation distortion and evaluation defects, and inefficient construction of preference data, resulting in insufficient accuracy in multilingual translation.
A preference-based translation model is adopted, including a ranking module, a dynamic candidate translation generation module, a terminology coverage reward correction module, a triplet construction module, and a dynamic comparison preference optimization module. The translation accuracy is improved through dynamic scoring and terminology integrity reward correction.
It effectively overcomes the bottleneck of resource imbalance caused by multilingual unified training, prevents mistranslation or omission of professional terminology, and significantly improves the reliability and security of translation in professional fields.
Smart Images

Figure CN122242535A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the technical field of multilingual translation, and in particular to a multilingual translation method, electronic device, and storage medium. Background Technology
[0002] Current multilingual translation technologies have achieved cross-language transfer learning through a shared parameter architecture, significantly improving the translation performance of low-resource languages.
[0003] However, current technical solutions suffer from several problems. For example, there's a lack of resource imbalance handling: current preference learning employs a fixed strategy for all languages, such as uniformly generating n candidate translations and fixing the loss hyperparameter. In practical applications, high-resource languages (e.g., English-French) require exploring diverse expressions to avoid mediocre outputs, while low-resource languages (e.g., Swahili) need to avoid noise interference and strengthen basic semantics. This static strategy leads to excessive perturbation of low-resource languages during preference learning, while high-resource languages are underexplored. Secondly, there are distortions in terminology translation and evaluation deficiencies: translations of industry-specific terms often suffer from semantic stripping or conceptual confusion. Existing evaluation tools rely on flawed reference translations, especially for low-resource languages where reference quality is low, leading to distortions in the model's learning objectives. Thirdly, preference data construction is inefficient: current solutions do not dynamically allocate scoring resources based on language resources. Low-resource languages require strong models for accurate discrimination due to poor candidate quality, while high-resource languages can rely on lightweight evaluators. The current static process leads to wasted computational resources and an imbalance in annotation quality, affecting the accuracy of multilingual translation. Summary of the Invention
[0004] The technical solution to the main technical problem addressed in this application is to provide a multilingual translation method, electronic device, and storage medium that can effectively perform preference translation on the text to be translated, thereby improving the accuracy of multilingual translation.
[0005] To address the aforementioned technical problems, this application provides a multilingual translation method, comprising: acquiring a text to be translated; inputting the text to be translated into a trained preference translation model to obtain a target language text, wherein the preference translation model includes a ranking module, a dynamic candidate translation generation module, a terminology coverage reward correction module, a triplet construction module, and a dynamic comparison preference optimization module; the ranking module is configured to determine the resource level of the text to be translated; the dynamic candidate translation generation module is configured to determine multiple initial scores for the text to be translated; the terminology coverage reward correction module is configured to perform terminology coverage reward correction on the initial scores to determine the preferred translation and non-preferred translation of the text to be translated; the triplet construction module is configured to construct triplet data of the text to be translated using the resource level, the preferred translation, and the non-preferred translation; the dynamic comparison preference optimization module is configured to perform terminology integrity reward correction using the triplet data to obtain the target language text; and outputting the target language text as the translation result of the text to be translated.
[0006] In some embodiments, the training process of the preference translation model is as follows: Training text is acquired, and the resource level of the training text is determined using the level classification module; multiple initial scores corresponding to the training text are acquired using various preset scoring methods of the dynamic candidate translation generation module, and the terminology coverage reward correction module is used to correct the terminology coverage reward of the multiple initial scores to determine the preference translation and non-preference translation of the training text; target triplet data for the training text is determined by the triplet data construction module using the preference translation, the non-preference translation, the training text, and the resource level; the dynamic comparison preference optimization module uses the target triplet data to perform terminology integrity reward correction, thereby training the basic translation model to obtain the trained preference translation model.
[0007] In some embodiments, obtaining training text and determining the resource level of the training text using the grading module includes: obtaining a preset multilingual training set as the training text; obtaining the number of parallel sentence pairs, terminology coverage, and monolingual linguistic quality of the training text using the grading module; and determining the resource level of the text to be translated based on the number of parallel sentence pairs, the terminology coverage, and the monolingual linguistic quality.
[0008] In some embodiments, the step of obtaining multiple initial scores corresponding to the training text using various preset scoring methods of the dynamic candidate translation generation module, and performing term coverage reward correction on the multiple initial scores using the term coverage reward correction module to determine the preferred and unpreferred translations of the training text includes: obtaining multiple initial scores corresponding to the training text using each preset evaluation method in the dynamic candidate translation generation module; performing term coverage reward correction on each initial score using the term coverage reward correction module; and determining the top-ranked initial scores after term coverage reward correction as the preferred translations and the bottom-ranked initial scores as the unpreferred translations.
[0009] In some embodiments, the step of using the terminology coverage reward correction module to perform terminology coverage reward correction on each initial score, and obtaining the initial scores after terminology coverage reward correction to determine the top-ranked translations as preferred translations and the bottom-ranked translations as non-preferred translations, includes: performing fine-grained terminology verification on the training text to determine the terminology recognition type of the training text; in response to the terminology recognition type being incorrect, inputting the training text into the terminology database to obtain the first total number of terms in the training text, and inputting the translated text corresponding to the initial score into the terminology database to obtain the corresponding number of correctly translated terms and the number of correctly translated key terms; A first value is determined using the total number of terms, the number of correctly translated terms, and a first preset weight. A second value is determined using the total number of terms, the number of correctly translated key terms, and a second preset weight. Then, the term reward score of the term coverage reward correction module is determined using the first value and the second value. A third value is determined using a third preset weight and the initial score. A fourth value is determined using a fourth preset weight, the term reward score, and a scaling factor. The final translation score corresponding to the training text is determined using the third value and the fourth value. The preferred translation and the non-preferred translation of the training text are determined based on the level of the final translation score.
[0010] In some embodiments, determining the target triplet data of the training text using the triplet construction module based on the preferred translation, the unpreferred translation, the training text, and the resource level includes: determining the preferred translation, the unpreferred translation, and the training text as initial triplet data using the triplet construction module, wherein when the resource level is low, one set of initial triplet data is generated; when the resource level is medium, two sets of initial triplet data are generated; and when the resource level is high, four sets of initial triplet data are generated; obtaining the evaluation score of each initial triplet data, and determining the target triplet data by ranking the top values based on the evaluation scores.
[0011] In some embodiments, the preference translation model further includes a loss function module; obtaining a preference learning term and a negative log-likelihood term; determining a first loss value using the preference translation, the non-preference translation, the training text, and the preference learning term; determining a second loss value using the preference translation, the training text, a fourth preset weight, and the negative log-likelihood term; and determining a preset loss function using the first loss value and the second loss value to determine the loss function module.
[0012] In some embodiments, the method further includes: obtaining the total number of second terms involved in the translation of the training text and the number of correctly translated terms; determining a term error rate using the total number of second terms and the number of terms; determining a penalty item using a penalty weight and the term error rate, wherein the penalty weight is negatively correlated with the resource level; determining a third loss value using the penalty weight, the term error rate, and the penalty item; and determining the preset loss function using the first loss value, the second loss value, and the third loss value.
[0013] To solve the above-mentioned technical problems, another technical solution adopted in this application is to provide an electronic device, the electronic device including a memory and a processor coupled to the memory, the memory storing at least one computer program, which, when loaded and executed by the processor, is used to implement the method as described above.
[0014] To solve the above-mentioned technical problems, another technical solution adopted in this application is to provide a computer-readable storage medium having at least one program, which, when loaded and executed by a processor, is used to implement the method described above.
[0015] Unlike current technologies, the multilingual translation method provided in this application includes: acquiring the text to be translated; inputting the text to be translated into a trained preference translation model to obtain the target language text of the text to be translated, wherein the preference translation model includes a ranking module, a dynamic candidate translation generation module, a terminology coverage reward correction module, a triplet construction module, and a dynamic alignment preference optimization module; the ranking module is configured to determine the resource level of the text to be translated; the dynamic candidate translation generation module is configured to determine multiple initial scores of the text to be translated; the terminology coverage reward correction module is configured to perform terminology coverage reward correction on the initial scores to determine the preference translation and non-preference translation of the text to be translated; the triplet construction module is configured to construct triplet data of the text to be translated using the resource level, preference translation, and non-preference translation; the dynamic alignment preference optimization module is configured to perform terminology integrity reward correction using the triplet data to obtain the target language text; and outputting the target language text as the translation result of the text to be translated. In this application, a preference translation model is obtained through training, and then the preference translation model is used to translate the text to be translated to obtain the translation result. This can overcome the bottleneck of high and low resource imbalance caused by multilingual unified training, and solve the technical defect of neglecting professional terms in preference learning. It can effectively improve translation accuracy, prevent professional terminology mistranslation or omission, and significantly improve the reliability and security of professional field translation. Attached Figure Description
[0016] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort. Wherein:
[0017] Figure 1 This is a flowchart illustrating an embodiment of the multilingual translation method in this application; Figure 2 This is a schematic diagram of the structure of an embodiment of the electronic device in this application; Figure 3 This is a schematic diagram of an embodiment of a computer-readable storage medium in this application. Detailed Implementation
[0018] The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be particularly noted that the following embodiments are for illustrative purposes only and do not limit the scope of the invention. Similarly, the following embodiments are only some, not all, embodiments of the present invention, and all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0019] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of the invention. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.
[0020] Large-scale pre-trained models (such as mT5 and ALMA) combined with supervised fine-tuning (SFT) have become the mainstream paradigm. To further improve translation quality, preference optimization techniques (such as PPO and DPO) have been introduced into the training process, utilizing preference data generated by human preferences or strong models (such as GPT-4) to make the model output more in line with human values and language habits. However, existing methods have many problems. First, there is a lack of handling of resource imbalance: current preference learning adopts a fixed strategy for all languages, such as uniformly generating n candidate translations and fixing the loss hyperparameters. In practical applications, high-resource languages (such as English-French) need to explore diverse expressions to avoid mediocre outputs, while low-resource languages (such as Swahili) need to avoid noise interference and strengthen basic semantics. Static strategies lead to excessive perturbation of low-resource languages in preference learning, while high-resource languages are not explored sufficiently. Second, there are distortions in terminology translation and evaluation defects: the translation of industry terms (such as the cultural term "delivery person" and the medical term "insulin") often suffers from semantic stripping (such as literal translation as "takeaway guy") or conceptual confusion (such as mistranslation). Existing evaluation tools (such as BLEU and XCOMET) rely on flawed reference translations, especially for low-resource languages where the reference quality is poor (e.g., the Icelandic reference translation in FLORES-200 scores below 30 on BLEURT), leading to a distortion of the model's learning objectives. Furthermore, the construction of preference data is inefficient: current solutions do not dynamically allocate scoring resources based on language resources; low-resource languages require strong models (such as GPT-4) for accurate discrimination due to poor candidate quality, while high-resource languages can rely on lightweight evaluators (such as XCOMET). The current static workflow results in wasted computational resources (low-resource languages require an additional 4× candidates) and an imbalance in annotation quality, impacting the accuracy of multilingual translation.
[0021] Therefore, a multilingual translation method, electronic device, and storage medium are provided that can effectively perform preference translation on the text to be translated, thereby improving the accuracy of multilingual translation.
[0022] Please see Figure 1 , Figure 1 This is a flowchart illustrating an embodiment of the multilingual translation method in this application; it should be noted that, if there are substantial results, the method of this application does not necessarily reflect the actual translation process. Figure 1 The sequence of processes shown is limited.
[0023] like Figure 1As shown, the multilingual translation method of this application may include the following operations.
[0024] S10. Obtain the text to be translated.
[0025] The text to be translated refers to the text that needs to be translated from the current language into another language.
[0026] Specifically, it receives the original text to be translated; for example, it receives the text to be translated input by the target object, which can be in English, French, Chinese, etc.
[0027] S20. Input the text to be translated into the trained preference translation model to obtain the target language text. The preference translation model includes a ranking module, a dynamic candidate translation generation module, a terminology coverage reward correction module, a triplet construction module, and a dynamic alignment preference optimization module. The ranking module is configured to determine the resource level of the text to be translated. The dynamic candidate translation generation module is configured to determine multiple initial scores of the text to be translated. The terminology coverage reward correction module is configured to perform terminology coverage reward correction on the initial scores to determine the preferred and non-preferred translations of the text to be translated. The triplet construction module is configured to construct triplet data of the text to be translated using the resource level, preferred translation, and non-preferred translation. The dynamic alignment preference optimization module is configured to perform terminology integrity reward correction using the triplet data to obtain the target language text.
[0028] The target language text refers to the translated text obtained by translating the text to be translated into the specified language.
[0029] Specifically, the basic translation model is trained using a ranking module, a dynamic candidate translation generation module, a terminology coverage reward correction module, a triplet construction module, and a dynamic comparison preference optimization module to obtain a trained preference translation model. The text to be translated is then input into the trained preference translation model to obtain the target language text of the text to be translated.
[0030] S30. Use the target language text as the translation result of the text to be translated and output it.
[0031] Specifically, after obtaining the target language text of the text to be translated, the target language text is used as the translation result of the text to be translated, and then the translation result is output.
[0032] In this embodiment, a preferred translation model is obtained by training the basic translation model through a ranking module, a dynamic candidate translation generation module, a terminology coverage reward correction module, a triplet construction module, and a dynamic comparison preference optimization module. The preferred translation model is then used to translate the text to be translated, and the translation result is obtained. This can overcome the bottleneck of high and low resource imbalance caused by multilingual unified training, and solve the technical defect of neglecting professional terms in preference learning. It can effectively improve translation accuracy, prevent professional terminology mistranslation or omission, and significantly improve the reliability and security of professional field translation.
[0033] In some embodiments, the training process of the preferred translation model includes the following operations.
[0034] Obtain training texts and use the rating module to determine the resource level of the training texts.
[0035] Multiple initial scores corresponding to the training text are obtained by using various preset scoring methods of the dynamic candidate translation generation module, and the term coverage reward correction module is used to correct the term coverage reward of the multiple initial scores to determine the preferred and unpreferred translations of the training text.
[0036] The target triplet data for training text is determined by using a triplet data construction module with preference translation, non-preference translation, training text, and resource level.
[0037] The dynamic comparison preference optimization module uses the target triple data to perform term integrity reward correction, and then trains the basic translation model to obtain a trained preference training model.
[0038] The preference training model includes at least a ranking module, a dynamic candidate translation generation module, a terminology coverage reward correction module, a triplet data construction module, and a dynamic comparison preference optimization module. Training text refers to the samples used to train the basic translation model; these can be pre-defined and include text from every existing language. Resource level refers to the ranking of the training texts after being divided by the ranking module. A higher resource level indicates a richer language range and a larger user base, while a lower resource level indicates a poorer language range and a smaller user base. The preset scoring method refers to using dynamic... The dynamic candidate translation generation module scores the initial translation results in several ways. Specifically, the translation model corresponding to the dynamic candidate translation generation module can be multiple, with each model having a corresponding score. The terminology coverage reward is a reward value measured by the number of correctly translated terms in the initial translation result. The preferred translation is the translation with the highest score; the non-preferred translation is the translation with the lowest score. The target triplet data refers to the triplet data determined by the triplet data construction module from the training text. The terminology integrity reward is a reward value measured by the total number of terms matching the terminology database and the number of correctly translated terms. The base translation model refers to the initial multilingual translation model.
[0039] Specifically, after obtaining training texts that meet the training conditions, the resource level of the training texts is determined by the level classification module; the initial score corresponding to each preset scoring method of the training texts is obtained by the multiple preset scoring methods of the dynamic candidate translation generation module; the term coverage reward correction module is used to correct the term coverage reward of each initial score; and then the initial scores are sorted based on the corrected initial scores, with the translations corresponding to the initial scores in the top ranking being the preferred translations and the translations corresponding to the initial scores in the bottom ranking being the unpreferred translations.
[0040] It's understandable that both preferred and unpreferred translations are good translations; the only difference is the score.
[0041] In some embodiments, the base translation model can be obtained through the SFT (Supervised Fine-Tuning) stage. The SFT stage is the core transitional stage for Large Language Models (LLMs) from a "general pre-trained model" to an "instruction-following model," and it is also the first step in the large model alignment process, occurring after pre-training and before RM (Reward Model) / RLHF (Reinforcement Learning Human Feedback). The core objective of this stage is to use high-quality labeled instruction-response data to allow the pre-trained model to learn the mapping relationship of "understanding instruction intent → generating an expected response," while retaining the general knowledge learned in the pre-training stage, thereby determining the base translation model.
[0042] In some embodiments, obtaining training text and determining the resource level of the training text using a grading module may include the following operations: Obtain a pre-defined multilingual training set as training text.
[0043] The ranking module is used to obtain the number of parallel sentence pairs, terminology coverage, and monolingual linguistic quality of the training text.
[0044] The resource level of the text to be translated is determined based on the number of parallel sentence pairs, terminology coverage, and monolingual linguistic quality.
[0045] Among them, the preset multilingual training set refers to a pre-defined collection of texts in multiple languages; the number of parallel sentence pairs refers to the total number of different language sentence combinations that are semantically completely consistent between the current language and the target language; the terminology coverage rate refers to the proportion of professional terms covered in the terminology database. For example, if the medical terminology database contains 10,000 words and the current language only covers 6,000 words, then the terminology coverage rate is 60%; the monolingual linguistic quality refers to the fluency of monolingual linguistics evaluated by the perplexity (PPL) of the language model. The lower the perplexity, the higher the monolingual linguistic quality.
[0046] Specifically, training texts are determined using a pre-set multilingual training set. A grading module is used to obtain the number of parallel sentence pairs, terminology coverage, and monolingual linguistic quality of the training texts. The resource level of the training text is then determined based on these specific values. Specifically, if the number of parallel sentence pairs is less than a first preset value, or the terminology coverage is less than a first percentage, or the number of parallel sentence pairs is less than the first preset value while the monolingual linguistic quality is greater than a second preset value, then the corresponding training text is classified as a low-resource language. Alternatively, if the number of parallel sentence pairs is greater than the first preset value but less than the third preset value, the terminology coverage is greater than the first percentage but less than the second percentage, and the monolingual linguistic quality is less than the second preset value but greater than a fourth preset value, then the corresponding training text is classified as a medium-resource language. Finally, if the number of parallel sentence pairs is greater than the third preset value, the terminology coverage is greater than the second percentage, and the monolingual linguistic quality is less than the fourth preset value, then the corresponding training text is classified as a high-resource language.
[0047] For example, low-resource languages: number of parallel sentence pairs ≤ 1 million, or terminology coverage ≤ 50%, or monolingual linguistic quality ≥ 80. Medium-resource languages: 1 million < parallel sentence pairs ≤ 5 million, and 50% < terminology coverage ≤ 80%, and 40 ≤ monolingual linguistic quality < 80. High-resource languages: number of parallel sentence pairs > 5 million, and terminology coverage > 80%, and monolingual linguistic quality < 40. That is, the first preset value is 100,000 to 2 million, for example, 1 million; the second preset value is 60 to 100, for example, 80; the third preset value is 3 million to 10 million, for example, 5 million; the fourth preset value is 10 to 50, for example, 40; the first percentage is 20% to 60%, for example, 50%; the second percentage is 70% to 90%, for example, 80%.
[0048] Understandably, using a logical "OR" operator to handle extreme cases (such as languages with sufficient parallel corpora but severe terminology deficiencies still being classified as low-resource) ensures that resource grading reflects the true training challenges. This threshold supports quarterly automatic calibration using validation set BLEURT scores (Bidirectional Encoder Representations from Transformers for Ranking, a semantic matching evaluation metric based on pre-trained Transformer language models). For example, the low-resource cap may be adjusted from 1 million to 800,000 if the language model performance continues to decline within this range.
[0049] In some embodiments, multiple initial scores corresponding to the text to be translated are obtained by using various preset evaluation methods of the dynamic candidate translation generation module, and the term coverage reward correction module is used to correct the term coverage reward of the multiple initial scores to determine the preferred translation and non-preferred translation of the training text, including the following operations.
[0050] Multiple initial scores corresponding to the training text are obtained by using each preset evaluation method in the dynamic candidate translation generation module.
[0051] The terminology coverage reward correction module is used to correct the terminology coverage reward for each initial score. The initial scores after terminology coverage reward correction are used to determine the top-ranked translations as preferred translations and the bottom-ranked translations as unpreferred translations.
[0052] Among them, the dynamic candidate translation generation module refers to the model used to score the translation results; the initial score refers to the accuracy score of the translation results of the training text; the term coverage reward refers to the reward value determined by the number of correctly translated terms; and the ranking refers to the ranking of the corrected initial scores from high to low, that is, the first ranked is the preferred translation and the last ranked is the non-preferred translation.
[0053] Specifically, after translating the training text using a basic translation model to obtain initial translation results, the initial scores are obtained by scoring the translations using a preset scoring method of the dynamic candidate translation generation module. It can be understood that there can be multiple preset scoring methods, each corresponding to a strong language model, such as GPT-4, Qwen-235B, Gemini, etc., thus obtaining an initial score for the translation results of each strong language model. Then, a term coverage reward is applied to each initial score to obtain the corrected initial scores, which are then ranked. The translation results of the strong language models corresponding to the top-ranked initial scores are considered preferred translations, while the translation results of the strong language models corresponding to the bottom-ranked initial scores are considered unpreferred translations.
[0054] Furthermore, the following operations may also be included.
[0055] Fine-grained term validation is performed on the training text to determine the term recognition type of the training text.
[0056] In response to the term recognition type being incorrect, the training text is input into the terminology database to obtain the first total number of terms in the training text, and the translated text corresponding to the initial score is input into the terminology database to obtain the corresponding number of correctly translated terms and the number of correctly translated key terms.
[0057] The first value is determined by using the total number of terms, the number of correctly translated terms, and the first preset weight. The second value is determined by using the total number of terms, the number of correctly translated key terms, and the second preset weight. Then, the term reward score of the term coverage reward correction module is determined by using the first value and the second value.
[0058] The third value is determined using the third preset weight and the initial score, and the fourth value is determined using the fourth preset weight, the terminology bonus score, and the scaling factor.
[0059] The third and fourth values are used to determine the final translation score corresponding to the training text, and the preferred and unpreferred translations of the training text are determined by the level of the final translation score.
[0060] The terminology database refers to a collection of professional terms from all fields. Fine-grained terminology verification involves matching the original terminology with its translation and the standard terms in the terminology database. The first total number of terms refers to the total number of terms involved in the current translation. The number of correctly translated terms refers to the number of terms correctly translated in the current translation compared to the terminology database; the number of correctly translated key terms refers to the number of key terms correctly translated in the current translation compared to the terminology database, such as drug names and pathology names in the medical field. The first preset weight is the weight for basic terminology accuracy, applicable to all terms; the second preset weight is the weight for additional rewards for key terms, which can be key terms pre-annotated in a large model. The terminology reward score refers to the evaluation of correctly translated terms; a higher score indicates better terminology translation quality. The third preset weight is the weight for the initial score, and the fourth preset weight is the weight for the terminology reward score. The scaling factor is used to constrain low-resource and high-resource languages.
[0061] Specifically, fine-grained terminology verification is performed on the training text to obtain the terminology recognition type of the training text. For example, a precise matching engine is built for recognition: when the original text of a term in the training text matches the translation and the standard item in the terminology database, the terminology recognition type is determined to be completely correct; when the edit distance between the translation and the standard item is less than two, the terminology recognition type is determined to be a spelling error, where the edit distance is the string difference; when the translated term falls outside the conceptual category, it is determined to be a conceptual error; when the original text of a term exists but the translation is missing, it is determined to be an omission in translation.
[0062] Understandably, the presence of errors refers to terminology recognition types such as spelling errors, conceptual errors, or omissions. Furthermore, in cases where terminology recognition types contain errors, the training text is compared with standard items in the terminology database to obtain the total number of primary terms involved in the current translation of the training text; and the translated text corresponding to the initial score of the training text is compared with standard items in the terminology database to obtain the number of correctly translated terms and the number of correctly translated key terms involved in the current translation of the corresponding training text.
[0063] The terminology reward score is calculated as follows:
[0064] in, Award points for terms, As the first preset weight, The total number of the first term, To ensure the correct translation of the number of terms, As the second preset weight, To ensure accurate translation of the number of key terms, The first value, This is the second value.
[0065] For multilingual translation, the error cost of low-resource languages can be amplified, while that of high-resource languages can be reduced.
[0066] The final translation score is calculated as follows:
[0067] in, For the final translation score, For the initial score, As the third preset weight, As the fourth preset weight, This is the scaling factor. And the sum of the third and fourth initial weights is 1.
[0068] Because multiple pre-language models can yield multiple initial scores, corresponding to multiple final translation scores, the preferred and unpreferred translations of the training texts are determined by ranking the final translation scores. That is, the highest final translation score is the preferred translation, and the lowest final translation score is the unpreferred translation.
[0069] In some embodiments, a dynamic weighting strategy can also be implemented, that is, spelling errors and omissions are classified as severe and assigned their respective weights. For example, the weight of spelling errors is 0.3 and the weight of omissions is 0.4, in order to avoid the coarse-grained problem of traditional binary judgment; conceptual errors are classified as fatal and can be directly eliminated.
[0070] In some embodiments, the target triplet data for the training text is determined using a triplet construction module based on preferred translation, unpreferred translation, training text, and resource level; including: The triplet construction module determines the preferred translation, non-preferred translation, and training text as the initial triplet data. Specifically, when the resource level is low, one set of initial triplet data is generated; when the resource level is medium, two sets of initial triplet data are generated; and when the resource level is high, four sets of initial triplet data are generated.
[0071] Obtain the evaluation score for each initial triplet data, and determine the target triplet data by sorting the top three by evaluation score.
[0072] For texts with low resource levels, due to the scarcity of bilingual data, the model's understanding of semantics and terminology is weak, and excessively diversified decoding can easily introduce noise. Therefore, two strong language models can be invoked to generate only one high-quality translation each.
[0073] Ultimately, each low-resource-level training sample generates a triple: y=(y gpt-4 y Qwen235B y reference ) Where y gpt-4 and y Qwen235B The translations were generated by the GPT-4 model and the Qwen235B model, respectively. reference It is the reference translation corresponding to the original training text.
[0074] For texts at the medium resource level, the model has a certain semantic understanding ability and can appropriately introduce diversity. Therefore, two strong language models can be called, each generating two translations: a high-quality version that reflects fidelity and accuracy, and a diversified version that reflects different sentence structures and synonym substitutions.
[0075] Ultimately, each training text generates 6 candidates, forming 2 triples.
[0076] For texts with high resource levels, the data is abundant, the basic model has strong understanding capabilities, and it is suitable for exploring the diversity of expressions. Two strong language models can be invoked, each generating four candidates: one most accurate translation and three reasonable variations with differences in style, word order, and vocabulary selection.
[0077] Ultimately, each data point had 12 candidates, forming 4 triplet pairs.
[0078] All candidate translations are automatically scored using improved no-reference evaluation models (such as KIWI-XXL and XCOMET). However, this application introduces a terminology coverage reward correction module on top of this no-reference evaluation model, which additionally weights the accurate matching of specialized terms in the scoring to alleviate the terminology drift problem in low-resource-level languages.
[0079] For each triple y=(y gpt-4 y Qwen235B y reference The three translations were scored using two no-reference models that considered the terminology, and the average score of the two reference models was used as the final evaluation score for each translation. s=(s gpt-4 s Qwen235B s reference ) Among them, s gpt-4 For the translation score of the GPT-4 model, s Qwen235B For the translation score of the Qwen235B model, s reference The translation is rated for reference.
[0080] Mark the highest-scoring translation as the preferred translation. best The lowest score is recorded as the unbiased translation y. better ,Right now , Another translation was discarded. It's important to note that "non-preference" doesn't necessarily mean wrong, but rather that there's room for subtle improvement (e.g., slightly awkward, less precise terminology). For example, translating "machine learning" as "learning machine" (non-preference) instead of "machine learning" (preference) is understandable but not standard. By using such high-quality but suboptimal translations as negative samples, the model can learn finer-grained optimization signals.
[0081] For medium-resource and high-resource languages, each training dataset will have 2 and 4 sets of triplet training data, respectively. To balance the training data across high, medium, and low resources, all low-resource triplet data will be included in the training dataset, while medium- and high-resource triplet data will be selected using a sampling strategy. For example, for any high-resource language triplet dataset, 0 to 4 sets of triplet data will be randomly selected as training data. This will ultimately result in an equal amount of training data for high, medium, and low resources. This approach ensures data balance, reducing overfitting in low-resource language translation, and allows the model to perceive the characteristics of different resources to the greatest extent possible, thus guaranteeing the diversity of high-resource language translation and the accuracy of low-resource language translation.
[0082] Furthermore, to prevent oversampling of training data due to the large number of candidates for high-resource languages, we employ dynamic sampling: Low-resource languages generate one triplet per data point, which is retained; Medium-resource languages generate two triplets per data point, with 1–2 triplets randomly sampled; High-resource languages generate four triplets per data point, with 0–4 triplets randomly sampled. Ultimately, the number of training triplets for high, medium, and low-resource languages is balanced, avoiding overfitting in low-resource models while preserving the diversity advantages of high-resource languages. Through differentiated generation and sampling, a triple balance of "quality, diversity, and data volume" is achieved, improving the model's robustness and adaptability in multilingual scenarios.
[0083] In some embodiments, the preference translation model also includes a loss function module.
[0084] Then we have the preference learning term and the negative log-likelihood term.
[0085] The first loss value is determined using preferred translation, non-preferred translation, training text, and preferred learning term. The second loss value is determined using preferred translation, training text, fifth preset weights, and negative log-likelihood term.
[0086] The first loss value and the second loss value are used to determine the preset loss function, thus determining the loss function module.
[0087] Among them, the preference learning term refers to the computational term that makes the model more inclined to generate preferred translations rather than non-preferred translations; the negative log-likelihood term refers to the standard supervised fine-tuning (SFT) loss, which is also known as the autoregressive negative log-likelihood (NLL) / cross-entropy loss.
[0088] Specifically, obtain the source language sentence x of the training text and the corresponding preference translation. Non-preference translation and the corresponding preference learning items And setting the target model to be trained. .
[0089] The calculation is as follows:
[0090] Among them, alignment and activation function (sigmoid) are used to maximize , This is the negative log-likelihood term, i.e., the standard SFT (Supervised Fine-Tuning) loss. It performs MLE (Maximum Likelihood Estimation) on preference translation, ensuring the model generates the true labels as accurately as possible. Its purpose is to prevent the policy from excessively deviating from the original preference data distribution; it acts as a regularization term, ensuring the model doesn't "go astray." Furthermore... It is used to control the contrast strength between preferred and non-preferred outputs. This is a weight used to control the MLE loss. A larger value for this parameter means the model focuses more on the accuracy of the generated translation, while a smaller value means the model focuses more on contrast bias, i.e., controlling the balance between contrast bias and translation ability.
[0091] Right now, This is the first loss value. This is the second loss value.
[0092] Furthermore, a dynamic scheduling contrastive preference optimization algorithm considering professional terms is proposed to address two major shortcomings of existing preference learning methods in multilingual scenarios: (1) Static loss configuration cannot adapt to the heterogeneity of language resources, resulting in unstable training of low-resource languages and insufficient exploration of high-resource languages; (2) Traditional CPO loss lacks explicit modeling of the integrity of professional terms, and the model is prone to ignoring key entities.
[0093] To address the two aforementioned shortcomings, this application proposes a dynamic loss scheduling mechanism. Specifically, this module proposes... The coefficients of the regularization term λ should be dynamically adjusted according to the language resource level to adapt to the learning difficulty and preference of different languages. Secondly, to prevent the model from omitting or mistranslating key terms (such as the medical term "insulin") during generation, a terminology integrity reward term should be explicitly added to the DCPO loss. This will allow for the construction of a dynamic scheduling and contrastive preference optimization algorithm that considers specialized terminology.
[0094] In fact (beta) is the inverse temperature coefficient in the preference learning term, controlling the smoothness of the contrast signal. It is used to amplify or reduce the log probability difference between the two translation outputs. The larger the difference, the more sensitive the model is to preferences / non-preferences. Conversely, the smaller the difference, the more sensitive the model is to preferences / non-preferences. The smaller the value, the more the difference is compressed, resulting in a smoother gradient signal.
[0095] For example, low-resource languages (such as Zulu and Slovak) have weak initial model capabilities and require strong contrastive signals to drive parameter update direction. Therefore, a high [model configuration] is set. (For example =0.6~1.0), enhancing the discriminative power of preferred translations against unpreferred ones, and preventing gradient signal weakening due to data sparsity. For resource languages (such as Thai and Czech): set to medium. (For example =0.3~0.6) Balance stable training and exploration abilities. High-resource languages (such as French, German): Set low (For example =0~0.3) Reduce the contrast intensity to preserve output diversity and avoid overfitting training preference pairs. This strategy is achieved through a pre-defined mapping table.
[0096] As shown in the formula, for low-resource languages... It can be 0.8, and the resources are in its... It can be 0.5, high resource It can be 0.2.
[0097]
[0098] in, Refers to dynamic adjustment 'r' represents the language resource level, which is output by the language resource dynamic routing module.
[0099] Next, a resource-aware scheduling strategy is defined based on the dynamic SFT regularization coefficient λ. λ (lambda) is the weight coefficient of the NLL loss term, used to balance the contrastive preference learning term ( ) and standard maximum likelihood training ( (i.e., behavioral cloning). If λ=0, only preference learning is performed, without MLE (Regularized Maximum Likelihood Estimation). Regularization may deviate from the original preference data distribution, generating unreasonable outputs. Smaller λ (e.g., 0.1~0.5) emphasizes preference learning, slightly constraining the model to avoid "going astray." Larger λ (e.g., 1.0 or higher) is more like standard SFT (supervised fine-tuning), with weaker preference signals and reduced diversity. Simply performing contrastive learning (e.g., DPO / CPO without NLL) may lead to "extreme" outputs maximizing preference scores, ignoring fluency or factual consistency. Adding NLL can effectively control the model's balance between preference and true translation distributions.
[0100] Here, a dynamic scheduling strategy should be set based on language resources. The behavioral cloning term (i.e., the SFT regularization term) in the CPO loss (Constrained Policy Optimization Loss) is used to prevent the model from deviating from the distribution of preferred data.
[0101] in, This is the regularization strength coefficient. This parameter directly determines whether the SFT module should be dynamically set according to the resource level, rather than fixing this parameter.
[0102] Low-resource languages: assuming high settings (like =1.0), low-resource languages have poor generalization ability and require stronger supervisory signals to constrain the output space to prevent the generation of text that does not conform to language structure or terminology norms. Medium-resource languages: assuming settings... =0.6; High-resource language: Assuming settings =0.3, which allows the model more freedom to explore expression variants.
[0103] For low-resource languages, it can be 1.0; for medium-resource languages, it can be 0.6; and for high-resource languages, it can be 0.3.
[0104]
[0105] In other words, for low-resource languages, the high-intensity model is forced to faithfully reproduce high-quality reference translations; for high-resource languages, the low-intensity model is encouraged to learn translation styles that are "semantically equivalent but express differently".
[0106] It may also include the following:
[0107] Obtain the total number of secondary terms involved in the translation of the training text and the number of terms that are correctly translated.
[0108] The term error rate is determined using the total number of second terms and the number of terms.
[0109] Penalties are determined using penalty weights and terminology error rates, where penalty weights are negatively correlated with resource levels.
[0110] The third loss value is determined using penalty weights, terminology error rate, and penalty terms.
[0111] The preset loss function is determined using the first loss value, the second loss value, and the third loss value.
[0112] To prevent the model from omitting or mistranslating key terms (such as the medical term "insulin") during generation, a terminology penalty term is explicitly added to the total loss. The terminology error rate is defined. :
[0113] in, The total number of second terms involved in the current translation of the training text can be identified through terminology database matching; The number of terms that are correctly translated, i.e., those that perfectly match the standard translation.
[0114] Introducing penalty items :
[0115] Here, μ is the penalty weight, which is tied to the resource level. For example, for low-resource languages, the penalty weight μ = 0.5; for high-resource languages, the penalty weight μ = 0.1. Medium resources can be set to 0.3, and the final setting of 0.5 for low resources will be determined based on iterative optimization.
[0116] Therefore, the preset loss function is calculated as follows: ) in, () represents the third loss value.
[0117] In each training step, the model generates y w and y l Then, the terminology database is automatically invoked for matching, and then the gradient is calculated and backpropagated. The higher the terminology error, the better.
[0118] In some embodiments, a training-evaluation-optimization closed-loop control mechanism is proposed for multilingual translation tasks. This mechanism continuously monitors key indicators, dynamically adjusts training strategies, and automatically terminates training after reaching a preset quality threshold, ultimately outputting a deployable model and resource adaptation suggestions.
[0119] Define a closed-loop validation metric set to simultaneously monitor professional accuracy and general fluency, avoiding bias from a single metric.
[0120] During each training cycle, the system automatically calculates two types of core metrics: Term Accuracy (TermAcc): The proportion of key terms correctly translated by the statistical model on the validation set. It is defined using a constructed terminology database for matching.
[0121] in, To verify the total number of terms in key fields such as medicine and law that were centrally designed, This represents the number of completely correct translations.
[0122] Generalization Score (GS): Measures the model's performance on unseen structures and expression variants. Existing evaluation metrics such as BLEURT can be used here.
[0123] In addition, an online calibration mechanism for the loss coefficient can be set up.
[0124] To avoid biases caused by a fixed scheduling table and achieve online adaptive optimization, allowing the loss configuration to evolve dynamically with the training process, this module designs a validation set feedback-driven dynamic calibration mechanism; that is, the validation set is evaluated every cycle: if the terminology accuracy is low, the error is amplified. When there are many terminology errors, the supervisory signal is enhanced to achieve dynamic gain. If the accuracy target is met, gradual annealing is performed, that is, when the performance target is met, the learning intensity is slowly reduced to prevent overfitting and improve generalization. The training process continues to iterate until the set termination conditions are met, such as term accuracy (TermAcc) ≥ 95% (key terms are almost error-free) or the change in two consecutive evaluation metrics is < 1%.
[0125] This application also provides an electronic device.
[0126] See Figure 2 , Figure 2 This is a schematic diagram of an embodiment of the electronic device in this application. The electronic device can perform the steps in the above method.
[0127] The electronic device 200 includes a memory 220, a processor 210 coupled to the memory, and at least one computer program stored in the memory 220 and executable on the processor 210. When the processor 210 loads and executes the at least one computer program, it implements the steps of the multilingual translation method described above. For related details, please refer to the detailed description in the above method; further elaboration will not be repeated here.
[0128] This application also includes a computer-readable storage medium.
[0129] Please see Figure 3 , Figure 3 This is a schematic diagram of an embodiment of a computer-readable storage medium in this application.
[0130] The computer-readable storage medium 300 stores at least one program 310, which, when loaded and executed by a processor, is used to implement the steps of the multilingual translation method described above. For related details, please refer to the detailed description in the above method; it will not be repeated here.
[0131] The above scheme trains a basic translation model using a preset loss function corresponding to the preference learning term, the negative log-likelihood term, and the penalty term corresponding to the resource level, determined by the terminology coverage reward correction triple data. This preference translation model is then used to translate the text to be translated, yielding the translation result. This approach overcomes the bottleneck of high-low resource imbalance caused by unified multilingual training and addresses the technical defect of neglecting specialized terminology in preference learning. It effectively improves translation accuracy, prevents mistranslation or omission of specialized terminology, and significantly enhances the reliability and security of translation in specialized fields.
[0132] In the several embodiments provided by this invention, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interfaces, devices, or units, and may be electrical, mechanical, or other forms.
[0133] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment, depending on actual needs.
[0134] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0135] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0136] The above description is merely an embodiment of the present invention and does not limit the patent scope of the present invention. Any equivalent structural or procedural transformations made based on the content of the present invention specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of the present invention.
Claims
1. A multilingual translation method, characterized in that, include: Get the text to be translated; The text to be translated is input into a trained preference translation model to obtain the target language text. The preference translation model includes a ranking module, a dynamic candidate translation generation module, a terminology coverage reward correction module, a triplet construction module, and a dynamic comparison preference optimization module. The ranking module is configured to determine the resource level of the text to be translated. The dynamic candidate translation generation module is configured to determine multiple initial scores for the text to be translated. The terminology coverage reward correction module is configured to perform terminology coverage reward correction on the initial scores to determine the preferred and non-preferred translations of the text to be translated. The triplet construction module is configured to construct triplet data for the text to be translated using the resource level, the preferred translation, and the non-preferred translation. The dynamic comparison preference optimization module is configured to perform terminology integrity reward correction using the triplet data to obtain the target language text. The target language text is used as the translation result of the text to be translated and output.
2. The method according to claim 1, characterized in that, The training process of the preference translation model is as follows: Acquire training texts and determine the resource level of the training texts using the level classification module; Multiple initial scores corresponding to the training text are obtained using the various preset scoring methods of the dynamic candidate translation generation module, and the term coverage reward correction module is used to correct the multiple initial scores for term coverage reward to determine the preferred translation and non-preferred translation of the training text. The target triplet data for the training text is determined by the triplet data construction module using the preferred translation, the non-preferred translation, the training text, and the resource level. The dynamic comparison preference optimization module uses the target triple data to perform term integrity reward correction, and then trains the basic translation model to obtain the trained preference translation model.
3. The method according to claim 2, characterized in that, The step of acquiring training text and determining the resource level of the training text using the level classification module includes: Obtain a preset multilingual training set as the training text; The number of parallel sentence pairs, terminology coverage, and monolingual linguistic quality of the training text are obtained using the grading module. The resource level of the training text is determined based on the number of parallel sentence pairs, the terminology coverage, and the monolingual linguistic quality.
4. The method according to claim 2, characterized in that, The process of obtaining multiple initial scores corresponding to the training text using various preset scoring methods of the dynamic candidate translation generation module, and then using the term coverage reward correction module to correct the multiple initial scores for term coverage rewards to determine the preferred and non-preferred translations of the training text includes: Multiple initial scores corresponding to the training text are obtained using each preset evaluation method in the dynamic candidate translation generation module; The terminology coverage reward correction module is used to correct the terminology coverage reward for each initial score, and the initial scores after terminology coverage reward correction are used to determine the top-ranked translations as preferred translations and the bottom-ranked translations as non-preferred translations.
5. The method according to claim 4, characterized in that, The step of using the term coverage reward correction module to perform term coverage reward correction on each initial score, and obtaining the term coverage reward corrected initial scores to determine the top-ranked translations as preferred translations and the bottom-ranked translations as non-preferred translations, includes: Fine-grained terminology validation is performed on the training text to determine the terminology recognition type of the training text; In response to the term recognition type being an error, the training text is input into the terminology database to obtain the first total number of terms in the training text, and the translated text corresponding to the initial score is input into the terminology database to obtain the corresponding number of correctly translated terms and the number of correctly translated key terms. A first value is determined using the total number of terms, the number of correctly translated terms, and a first preset weight; a second value is determined using the total number of terms, the number of correctly translated key terms, and a second preset weight; and then the term reward score of the term coverage reward correction module is determined using the first value and the second value. A third value is determined using a third preset weight and the initial score, and a fourth value is determined using a fourth preset weight, the terminology reward score, and a scaling factor; The third and fourth values are used to determine the final translation score corresponding to the training text, and the preferred translation and the non-preferred translation of the training text are determined by the level of the final translation score.
6. The method according to claim 2, characterized in that, The step of determining the target triplet data of the training text using the triplet data construction module, based on the preferred translation, the non-preferred translation, the training text, and the resource level, includes: The preferred translation, the non-preferred translation, and the training text are determined as initial triplet data through the triplet construction module. When the resource level is low, one set of initial triplet data is generated; when the resource level is medium, two sets of initial triplet data are generated; and when the resource level is high, four sets of initial triplet data are generated. Obtain the evaluation score for each of the initial triplet data, and determine the target triplet data by sorting the top of the evaluation scores.
7. The method according to claim 2, characterized in that, The preference translation model also includes a loss function module; Obtain the preference learning term and the negative log-likelihood term; A first loss value is determined using the preferred translation, the non-preferred translation, the training text, and the preferred learning term; a second loss value is determined using the preferred translation, the training text, the fifth preset weight, and the negative log-likelihood term. The first loss value and the second loss value are used to determine a preset loss function to determine the loss function module.
8. The method according to claim 7, characterized in that, Also includes: Obtain the total number of second terms involved in the translation of the training text and the number of correctly translated terms; The term error rate is determined using the total number of the second term and the number of the term; The penalty item is determined using the penalty weight and the term error rate, wherein the penalty weight is negatively correlated with the resource level; The third loss value is determined using the penalty weight, the term error rate, and the penalty term; The preset loss function is determined using the first loss value, the second loss value, and the third loss value.
9. An electronic device, characterized in that, The electronic device includes a memory and a processor coupled to the memory, the memory storing at least one computer program, which, when loaded and executed by the processor, is used to implement the method as described in any one of claims 1-8.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium has at least one program that, when loaded and executed by a processor, is used to implement the method as described in any one of claims 1-8.