A machine translation quality evaluation method, device, equipment and storage medium

By using multiple evaluation indicators and language similarity fusion processing in machine translation quality assessment, the problem of accuracy differences caused by a single evaluation indicator is solved, and a comprehensive evaluation of translation quality and accuracy improvement are achieved.

CN115310460BActive Publication Date: 2026-06-12JD DIGITS HAIYI INFORMATION TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
JD DIGITS HAIYI INFORMATION TECHNOLOGY CO LTD
Filing Date
2022-08-12
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

In existing technologies, machine translation quality assessment models can only assess the quality of a single aspect of the translated text, and cannot provide a comprehensive assessment. Furthermore, the same assessment method is used for translated texts in different languages, resulting in significant differences in assessment accuracy.

Method used

By acquiring the translated text pairs to be evaluated, the target text is evaluated based on at least two quality evaluation indicators, the results of each evaluation indicator are determined, and the evaluation weights are determined based on the language similarity between the source language and the target language. Finally, the comprehensive evaluation result is obtained by fusion processing of the evaluation weights.

🎯Benefits of technology

It enables a comprehensive assessment of translation quality, avoids bias in assessment results, improves the accuracy of translation assessments for different language pairs, and enhances the robustness of the assessment.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115310460B_ABST
    Figure CN115310460B_ABST
Patent Text Reader

Abstract

Embodiments of the present application disclose a machine translation quality evaluation method, device and equipment and a storage medium, which are applied to the technical field of natural language processing. The method comprises the following steps: obtaining a translation text pair to be evaluated, the translation text pair comprising a source text corresponding to a source language and a target text corresponding to a target language after translation; performing quality evaluation on the target text based on at least two quality evaluation indexes and the source text, and determining an evaluation result corresponding to each quality evaluation index; determining an evaluation weight corresponding to each quality evaluation index based on the similarity between the source language and the target language; and performing fusion processing on the evaluation results based on the evaluation weights, and determining a target evaluation result of the translation text pair. Through the technical scheme of the embodiments of the present application, the translation quality can be comprehensively evaluated, and the translation evaluation accuracy of different language pairs can be ensured.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The embodiments of the present invention relate to the field of natural language processing technology, and in particular to a method, apparatus, device and storage medium for evaluating the quality of machine translation. Background Technology

[0002] With the rapid development of computer technology, it is often necessary to evaluate the quality of text translated using machine translation models.

[0003] Currently, translation quality assessment models can be used to evaluate the quality of translated text. For example, assessment models trained on sentence-level labeled data tend to evaluate the overall fluency of the translated text. Alternatively, assessment models trained on word-level labeled data tend to evaluate the fidelity of the translated text.

[0004] However, in the process of realizing this invention, the inventors discovered at least the following problems in the prior art:

[0005] The evaluation results obtained by each translation quality assessment model can only be biased towards assessing the quality of a single aspect of the translated text, such as the overall fluency or fidelity of the translated text. They cannot comprehensively assess the translation quality. Furthermore, the same evaluation method is used for translated texts of different language pairs, which leads to a large difference in the accuracy of translation evaluation for different language pairs. Summary of the Invention

[0006] This invention provides a machine translation quality assessment method, apparatus, device, and storage medium to comprehensively assess translation quality and ensure the accuracy of translation assessment for different language pairs.

[0007] In a first aspect, embodiments of the present invention provide a machine translation quality assessment method, comprising:

[0008] Obtain the translation text pair to be evaluated, wherein the translation text pair includes the source text corresponding to the source language and the target text corresponding to the target language after translation;

[0009] The target text is evaluated for quality based on at least two quality assessment metrics and the source text, and the evaluation result corresponding to each quality assessment metric is determined.

[0010] Based on the language similarity between the source language and the target language, the evaluation weight corresponding to each quality evaluation indicator is determined.

[0011] Based on the respective evaluation weights, the evaluation results are fused to determine the target evaluation result for the translated text pair.

[0012] Secondly, embodiments of the present invention also provide a machine translation quality assessment device, comprising:

[0013] The translation text pair acquisition module is used to acquire translation text pairs to be evaluated, wherein the translation text pairs include source text corresponding to the source language and target text corresponding to the target language after translation;

[0014] The evaluation result determination module is used to perform quality evaluation on the target text based on at least two quality evaluation indicators and the source text, and determine the evaluation result corresponding to each of the quality evaluation indicators;

[0015] The evaluation weight determination module is used to determine the evaluation weight corresponding to each of the quality evaluation indicators based on the language similarity between the source language and the target language.

[0016] The evaluation result fusion module is used to fuse the evaluation results based on the evaluation weights to determine the target evaluation result of the translated text pair.

[0017] Thirdly, embodiments of the present invention also provide an electronic device, the electronic device comprising:

[0018] One or more processors;

[0019] Memory, used to store one or more programs;

[0020] When the one or more programs are executed by the one or more processors, the one or more processors implement the machine translation quality assessment method provided in any embodiment of the present invention.

[0021] Fourthly, embodiments of the present invention also provide a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the machine translation quality assessment method as provided in any embodiment of the present invention.

[0022] The embodiments of the above invention have the following advantages or beneficial effects:

[0023] By evaluating the quality of the target text based on at least two quality assessment indicators and the source text of the translation text pair to be evaluated, the evaluation result corresponding to each quality assessment indicator is determined. Based on the language similarity between the source and target languages, the evaluation weight corresponding to each quality assessment indicator is determined. Then, based on each evaluation weight, the evaluation results are fused to determine the target evaluation result for the translation text pair. This allows for the fusion of evaluation results corresponding to at least two different quality assessment indicators, comprehensively evaluating the translation quality and avoiding biased evaluation results. Furthermore, determining the evaluation weights based on the language similarity between the source and target languages ​​takes into account the language differences between different language pairs, effectively avoiding significant differences in the accuracy of translation evaluations for different language pairs, thus ensuring the accuracy of translation evaluations for different language pairs. Attached Figure Description

[0024] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0025] Figure 1 This is a flowchart of a machine translation quality assessment method provided in an embodiment of the present invention;

[0026] Figure 2 This is a flowchart of another machine translation quality assessment method provided in an embodiment of the present invention;

[0027] Figure 3 This is a schematic diagram of the structure of a machine translation quality assessment device provided in an embodiment of the present invention;

[0028] Figure 4 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation

[0029] The present invention will now be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and not intended to limit it. Furthermore, it should be noted that, for ease of description, the accompanying drawings show only the parts relevant to the present invention, and not all of the structures.

[0030] Figure 1This is a flowchart illustrating a machine translation quality assessment method provided in an embodiment of the present invention. This embodiment is applicable to evaluating the quality of text translated by a machine translation model. The method can be executed by a machine translation quality assessment device, which can be implemented in software and / or hardware and integrated into an electronic device. Figure 1 As shown, the method specifically includes the following steps:

[0031] S110. Obtain the translation text pair to be evaluated. The translation text pair includes the source text corresponding to the source language and the target text corresponding to the target language after translation.

[0032] In this context, "source language" refers to the language to be translated, and "target language" refers to the language in which the translation takes place. "Source text" can refer to the original text expressed in the source language, i.e., the sentence to be translated. "Target text" can refer to the translated text that expresses the same meaning as the source text in the target language, i.e., the translated sentence.

[0033] Specifically, the source text can be input into a machine translation model for translation, and the target text output by the machine translation model can be obtained, thus obtaining the translated text pair to be evaluated.

[0034] S120. Based on at least two quality assessment indicators and the source text, conduct a quality assessment of the target text and determine the assessment result corresponding to each quality assessment indicator.

[0035] The quality assessment indicators can be used to evaluate the translation quality of the target text. Different types of quality assessment indicators are biased towards evaluating different aspects of the translation. For example, quality assessment indicators can be, but are not limited to, fluency assessment indicators that are biased towards evaluating the overall fluency of the target text, or fidelity assessment indicators that are biased towards evaluating the fidelity of the target text. Fluency assessment indicators can be used to characterize the overall fluency of the translated text, whether it conforms to expression habits, and other information. Fidelity assessment indicators can be used to characterize whether the details in the translated text faithfully reflect the meaning of the original text, that is, to judge details such as mistranslations, omissions, and emotional errors in the translation. Each quality assessment indicator can correspond to one or more quality assessment models, so that the assessment result corresponding to that quality assessment indicator can be determined using one or more quality assessment models. This embodiment can use a scoring method to indicate the assessment result. For example, the higher the score in the assessment result, the higher the quality level corresponding to that quality assessment indicator, such as higher fluency or higher fidelity.

[0036] Specifically, based on business needs, at least two different quality assessment indicators can be selected. For each quality assessment indicator, the target text can be quality assessed based on at least one corresponding quality assessment model to determine the assessment result for that quality assessment indicator. For example, if the quality assessment indicator corresponds to multiple quality assessment models, one model can be randomly selected from among them. The target text can then be quality assessed based on the selected model and the source text, and the obtained assessment result can be used as the assessment result for that quality assessment indicator. Alternatively, the target text can be quality assessed based on each quality assessment model and the source text, and the obtained assessment results can be averaged to obtain the average assessment result for that quality assessment indicator, thereby further improving the accuracy of the quality assessment.

[0037] S130. Based on the language similarity between the source language and the target language, determine the evaluation weight corresponding to each quality evaluation indicator.

[0038] Language similarity can refer to the linguistic similarity between the source language and the target language in terms of language family, vocabulary, and grammatical structure.

[0039] Specifically, the language similarity between any two languages ​​can be predetermined, allowing for direct acquisition of the language similarity between the source and target languages, or it can be determined in real-time. Based on the language similarity between the source and target languages, the optimal evaluation weights for different quality assessment indicators can be determined. In other words, different evaluation weights can be assigned to different language pairs, thus taking into account the language differences between them. This effectively avoids significant differences in translation evaluation accuracy between different language pairs, thereby ensuring the accuracy of translation evaluation across different language pairs and improving the universality of quality translation evaluation.

[0040] S140. Based on each evaluation weight, the evaluation results are integrated to determine the target evaluation result for the translated text pair.

[0041] Specifically, the evaluation results and corresponding evaluation weights of each quality assessment indicator can be multiplied together, and the results of each multiplication can be added together. The resulting weighted average is used as the target evaluation result. This allows for the integration of evaluation results from at least two quality assessment indicators to comprehensively evaluate the translation quality. This avoids bias in the evaluation of translated texts caused by a single evaluation indicator, thereby improving the accuracy and robustness of the quality assessment.

[0042] The technical solution of this embodiment assesses the quality of the target text based on at least two quality assessment indicators and the source text in the translation text pair to be assessed. It determines the assessment result corresponding to each quality assessment indicator and, based on the language similarity between the source and target languages, determines the assessment weight corresponding to each quality assessment indicator. Then, it fuses the assessment results based on these weights to determine the target assessment result for the translation text pair. This allows for the fusion of assessment results corresponding to at least two different quality assessment indicators, comprehensively assessing the translation quality and avoiding biased assessment results. Furthermore, by determining the assessment weights based on the language similarity between the source and target languages, it takes into account the language differences between different language pairs, effectively avoiding significant differences in the accuracy of translation assessments for different language pairs, thereby ensuring the accuracy of translation assessments for different language pairs.

[0043] Based on the above technical solution, S130 may include: inputting the language similarity between the source language and the target language into a preset network model, wherein the preset network model is obtained by pre-training on the data and label evaluation results based on the translation samples; and determining the evaluation weight corresponding to each quality evaluation index according to the output of the preset network model.

[0044] The pre-defined network model can be used to represent the mapping relationship between the optimal evaluation weights for each quality assessment indicator and the language similarity. This mapping relationship can be learned based on the evaluation results of translation sample pairs and labels. For example, based on at least two quality assessment indicators and the source sample text in the translation sample pairs data, the quality of the target sample text in the translation sample pairs data can be evaluated to obtain the evaluation results of each sample corresponding to each translation sample pair data. The language similarity between the sample language pairs is then input into the pre-defined network model to be trained. Based on the output of the pre-defined network model, the sample evaluation weights corresponding to each quality assessment indicator are determined. Based on the evaluation results of each sample, the evaluation weights of each sample are fused to obtain the evaluation results of the target sample. The training error is determined based on the evaluation results of the target sample and the label evaluation results. The training error is then backpropagated to the pre-defined network model to be trained, and the model parameters in the pre-defined network model are adjusted until a pre-defined convergence condition is met, such as the number of iterations reaching a pre-defined number or the training error converging. At this point, the training of the pre-defined network model is considered complete.

[0045] It should be noted that the network architecture of the preset network model can be configured based on business requirements. For example, the preset network model can directly output the evaluation weights corresponding to each quality assessment indicator, or it can output only the evaluation weights corresponding to one quality assessment indicator and determine the evaluation weights corresponding to other quality assessment indicators based on the output evaluation weights. For example, if there are two quality assessment indicators A and B, and the preset network model is used to output the evaluation weights corresponding to quality assessment indicator A, then since the sum of the evaluation weights corresponding to A and B is 1, the difference between 1 and the evaluation weight corresponding to indicator A can be determined as the evaluation weight corresponding to indicator B.

[0046] Based on the above technical solution, before S130, it may further include: based on a preset multilingual model, determining the source language representation vector corresponding to the source language and the target language representation vector corresponding to the target language according to the source corpus corresponding to the source language and the target corpus corresponding to the target language; and determining the language similarity between the source language and the target language based on the source language representation vector and the target language representation vector.

[0047] The pre-defined multilingual model can be a model that performs language processing on texts in different languages. For example, the pre-defined multilingual model can be, but is not limited to, the XLM-RoBERTa model.

[0048] Specifically, based on a pre-defined multilingual model and source corpus, source language representation vectors v can be determined to represent the source linguistics. i Based on a pre-defined multilingual model and target corpus, a target language representation vector v can be determined to represent the target linguistics. j This embodiment can represent the source language vector v i With the target language representation vector v j The cosine distance between them is cos(v) i ,v j This is determined as the language similarity between the source language and the target language.

[0049] For example, based on a pre-defined multilingual model, determining the source language representation vector corresponding to the source language and the target language representation vector corresponding to the target language, according to the source corpus corresponding to the source language and the target language corpus corresponding to the target language, may include:

[0050] Each source text in the source corpus corresponding to the source language is input into a preset multilingual model to determine the source language representation vector corresponding to each source text. Based on each source language representation vector, the source language representation vector corresponding to the source language is determined. Each target text in the target corpus corresponding to the target language is input into a preset multilingual model to determine the target language representation vector corresponding to each target text. Based on each target language representation vector, the target language representation vector corresponding to the target language is determined.

[0051] For example, determining the source language representation vector corresponding to the source language based on each source language representation vector may include: averaging each source language representation vector and determining the average vector as the source language representation vector corresponding to the source language.

[0052] For example, determining the target language representation vector corresponding to the target language based on each target language representation vector may include: averaging each target language representation vector and determining the average vector as the target language representation vector corresponding to the target language.

[0053] Specifically, each source text in the source corpus can be input into a pre-trained multilingual model, and the source language representation vector R(x) corresponding to each source text can be determined based on the output of the pre-trained multilingual model. im ), where i represents the source language and m represents the m-th source text. This is achieved by representing each source language vector R(x) im The average vector obtained by averaging is determined as the source language representation vector v. i ,Right now Where, n i This represents the number of source texts. Similarly, the target language representation vector R(x) corresponding to each target text can be determined based on a pre-defined multilingual model. jm ), where j represents the target language. This is achieved by analyzing the representation vectors R(x) of each target language. jm The average vector obtained by averaging is determined as the target language representation vector v. j ,Right now Where, n j This represents the number of target texts.

[0054] Figure 2 This is a flowchart illustrating another machine translation quality assessment method provided by an embodiment of the present invention. Based on the above embodiments, this embodiment details the entire translation quality assessment process when the quality assessment indicators include fluency and fidelity indicators. Explanations of terms identical or corresponding to those in the above embodiments are not repeated here.

[0055] See Figure 2 Another machine translation quality assessment method provided in this embodiment specifically includes the following steps:

[0056] S210. Obtain the translation text pair to be evaluated, which includes the source text corresponding to the source language and the target text corresponding to the target language after translation.

[0057] S220. Based on at least one preset fluency evaluation model and the source text, evaluate the fluency of the target text and determine the evaluation result corresponding to the fluency evaluation index.

[0058] Fluency evaluation metrics can be used to characterize the overall fluency of the translated text, whether it conforms to expression habits, and other information. Pre-defined fluency evaluation models can be evaluation models biased towards assessing the overall fluency of the target text, in order to obtain the evaluation results corresponding to the fluency evaluation metrics. Pre-defined fluency evaluation models can include, but are not limited to, at least one of the following: COMET-MQM (Multidimensional Quality Metric) cross-lingual multidimensional quality model, COMET-QE cross-lingual quality evaluation model, and BLEURT (Bilingual Evaluation Understudy with Representations from Transformers) bilingual evaluation alternative model. COMET (Crosslingual Optimized Metric for Evaluation of Translation) is a collective term for a series of translation evaluation models; COMET is a model framework, and these metrics are all trained through human evaluation. MQM is a multi-dimensional, multi-level human evaluation method; the COMET-MQM model is obtained by training the COMET model on MQM data. QE (Quality Estimation) is a specific task in the field of translation evaluation; this task does not allow the use of reference translations and can only evaluate based on the source text. The COMET-QE model is obtained by training the COMET model on QE data. The BLEURT model is a translation evaluation model that uses the Transformers model to obtain a bilingual evaluation alternative.

[0059] Specifically, the fluency of the target text can be evaluated based on one or more preset fluency evaluation models to determine the evaluation result corresponding to the fluency evaluation index. For example, if multiple preset fluency evaluation models exist, one can be randomly selected from them, and the target text can be quality evaluated based on the selected model and the source text. The obtained evaluation result can then be used as the evaluation result corresponding to the fluency evaluation index. Alternatively, the target text can be quality evaluated based on each preset fluency evaluation model and the source text, and the obtained evaluation results can be averaged. This average result can then be used as the evaluation result corresponding to the fluency evaluation index to further improve the accuracy of the quality evaluation.

[0060] It should be noted that different preset fluency assessment models use different assessment methods, thus requiring different reference texts when evaluating the quality of the target text. For example, the COMET-MQM cross-lingual multidimensional quality model requires evaluation based on the source text and its corresponding reference translation to obtain the assessment results corresponding to the COMET-MQM fluency assessment indicators. The COMET-QE cross-lingual quality assessment model requires evaluation based on the source text to obtain the assessment results corresponding to the COMET-QE fluency assessment indicators. The BLEURT bilingual assessment alternative model requires evaluation based on the reference translation corresponding to the source text to obtain the assessment results corresponding to the BLEURT fluency assessment indicators.

[0061] S230. Based on at least one preset loyalty assessment model and source text, perform loyalty assessment on the target text and determine the assessment results corresponding to the loyalty assessment indicators.

[0062] Among them, fidelity assessment metrics can be used to characterize whether the details in the translated text faithfully reflect the meaning of the original text, that is, to judge details such as mistranslation, omission, and emotional errors in the translation. The pre-set fidelity assessment model can be an assessment model biased towards evaluating the fidelity of the target text, in order to obtain the assessment results corresponding to the fidelity assessment metrics. Pre-set fidelity assessment models can include, but are not limited to: the OpenKiwi (Open-Source Machine Translation Quality Estimation in PyTorch) assessment model and the Yisi-2 semantic assessment model. Both the OpenKiwi assessment model and the Yisi-2 semantic assessment model need to be evaluated based on the source text to obtain the assessment results corresponding to the OpenKiwi and Yisi-2 fidelity assessment metrics.

[0063] Specifically, the fidelity of the target text can be assessed based on one or more pre-defined fidelity assessment models to determine the assessment results corresponding to the fidelity assessment indicators. For example, if multiple pre-defined fidelity assessment models exist, one model can be randomly selected, and the target text can be quality-assessed based on the selected model and the source text. The obtained assessment result can then be used as the assessment result corresponding to the fidelity assessment indicator. Alternatively, the target text can be quality-assessed based on each pre-defined fidelity assessment model and the source text, and the obtained assessment results can be averaged. The average assessment result can then be used as the assessment result corresponding to the fidelity assessment indicator to further improve the accuracy of the quality assessment.

[0064] S240. Input the language similarity between the source language and the target language into the preset network model, and determine the evaluation weights corresponding to the fluency evaluation index and the fidelity evaluation index based on the output of the preset network model.

[0065] Specifically, the pre-defined network model can directly output the evaluation weights corresponding to the fluency evaluation index and the fidelity evaluation index, or it can output only the evaluation weights corresponding to either the fluency evaluation index or the fidelity evaluation index, and determine the evaluation weight of the other index based on the output evaluation weights. This embodiment, by determining the optimal evaluation weights for the fluency and fidelity evaluation indices based on linguistic similarity, can effectively solve the problem of different biases in evaluating fluency and fidelity for translated texts in different languages, further improving the robustness of quality assessment.

[0066] For example, S240 may include: determining the evaluation weight corresponding to the fluency evaluation index based on the output of the preset network model; and determining the evaluation weight corresponding to the fidelity evaluation index based on the evaluation weight corresponding to the fluency evaluation index.

[0067] Specifically, when the preset network model is used to predict the evaluation weights corresponding to the fluency evaluation metric, the weights output by the preset network model can be used as the evaluation weights corresponding to the fluency evaluation metric. Since the sum of the two evaluation weights corresponding to the fluency evaluation metric and the fidelity evaluation metric is 1, the difference between 1 and the evaluation weights corresponding to the fluency evaluation metric can be determined as the evaluation weights corresponding to the fidelity evaluation metric.

[0068] S250. Based on each evaluation weight, the evaluation results are integrated to determine the target evaluation result for the translated text pair.

[0069] Specifically, the evaluation results and evaluation weights corresponding to the fluency evaluation index can be multiplied together, and the evaluation weights and evaluation weights corresponding to the fidelity evaluation index can be multiplied together. The two multiplication results are then added together, and the resulting weighted average is used as the target evaluation result. This allows for a comprehensive evaluation that integrates fidelity and fluency, avoiding the bias towards fidelity or fluency that can occur when evaluating translated texts using a single evaluation index. This, in turn, improves the accuracy and robustness of the quality assessment.

[0070] The technical solution of this embodiment determines the optimal evaluation weights for fluency and fidelity evaluation indicators based on linguistic similarity, and performs fusion processing based on each evaluation weight. This effectively solves the problem of different biases in fluency and fidelity evaluation when evaluating translated texts in different languages, and further improves the accuracy and robustness of quality assessment.

[0071] The following are embodiments of the machine translation quality assessment device provided in this invention. This device and the machine translation quality assessment methods in the above embodiments belong to the same inventive concept. For details not described in detail in the embodiments of the machine translation quality assessment device, please refer to the embodiments of the above machine translation quality assessment methods.

[0072] Figure 3 This is a schematic diagram of a machine translation quality assessment device provided in an embodiment of the present invention. This embodiment is applicable to situations where machine translation quality is assessed for pre-trained models, especially in fine-tuning scenarios where the downstream task is a cross-language task such as translation. Figure 3 As shown, the device specifically includes: a translation text pair acquisition module 310, an evaluation result determination module 320, an evaluation weight determination module 330, and an evaluation result fusion module 340.

[0073] The translation text pair acquisition module 310 is used to acquire the translation text pairs to be evaluated, which include the source text corresponding to the source language and the target text corresponding to the target language after translation; the evaluation result determination module 320 is used to evaluate the quality of the target text based on at least two quality evaluation indicators and the source text, and determine the evaluation result corresponding to each quality evaluation indicator; the evaluation weight determination module 330 is used to determine the evaluation weight corresponding to each quality evaluation indicator based on the language similarity between the source language and the target language; and the evaluation result fusion module 340 is used to fuse the evaluation results based on each evaluation weight to determine the target evaluation result of the translation text pair.

[0074] The technical solution of this embodiment assesses the quality of the target text based on at least two quality assessment indicators and the source text in the translation text pair to be assessed. It determines the assessment result corresponding to each quality assessment indicator and, based on the language similarity between the source and target languages, determines the assessment weight corresponding to each quality assessment indicator. Then, it fuses the assessment results based on these weights to determine the target assessment result for the translation text pair. This allows for the fusion of assessment results corresponding to at least two different quality assessment indicators, comprehensively assessing the translation quality and avoiding biased assessment results. Furthermore, by determining the assessment weights based on the language similarity between the source and target languages, it takes into account the language differences between different language pairs, effectively avoiding significant differences in the accuracy of translation assessments for different language pairs, thereby ensuring the accuracy of translation assessments for different language pairs.

[0075] Optionally, the quality assessment metrics include: fluency assessment metrics and fidelity assessment metrics; the assessment result determination module 320 is specifically used for:

[0076] Based on at least one preset fluency evaluation model and source text, the fluency of the target text is evaluated, and the evaluation results corresponding to the fluency evaluation indicators are determined; based on at least one preset fidelity evaluation model and source text, the fidelity of the target text is evaluated, and the evaluation results corresponding to the fidelity evaluation indicators are determined.

[0077] Optionally, the preset fluency assessment model includes at least one of the following: the COMET-MQM cross-lingual multidimensional quality model, the COMET-QE cross-lingual quality assessment model, and the BLEURT bilingual assessment alternative model;

[0078] The preset loyalty assessment models include: the OpenKiwi assessment model and the Yisi-2 semantic assessment model.

[0079] Optionally, the evaluation weight determination module 330 includes:

[0080] The language similarity input unit is used to input the language similarity between the source language and the target language into the preset network model. The preset network model is obtained by training the data and label evaluation results based on the translation samples in advance.

[0081] The evaluation weight determination unit is used to determine the evaluation weight corresponding to each quality evaluation index based on the output of the preset network model.

[0082] Optionally, when the quality assessment indicators include fluency assessment indicators and fidelity assessment indicators, the assessment weight determination unit is specifically used for:

[0083] Based on the output of the preset network model, the evaluation weights corresponding to the fluency evaluation index are determined; based on the evaluation weights corresponding to the fluency evaluation index, the evaluation weights corresponding to the fidelity evaluation index are determined.

[0084] Optionally, the device further includes:

[0085] The language similarity determination module is used to: determine the evaluation weight corresponding to each quality evaluation indicator based on the language similarity between the source language and the target language, before determining the evaluation weight corresponding to each quality evaluation indicator based on the language similarity between the source language and the target language, based on a preset multilingual model and according to the source corpus corresponding to the source language and the target corpus corresponding to the target language; and determine the language similarity between the source language and the target language based on the source language representation vector and the target language representation vector.

[0086] Optionally, the language similarity determination module is specifically used for:

[0087] Each source text in the source corpus corresponding to the source language is input into a preset multilingual model to determine the source language representation vector corresponding to each source text. Based on each source language representation vector, the source language representation vector corresponding to the source language is determined. Each target text in the target corpus corresponding to the target language is input into a preset multilingual model to determine the target language representation vector corresponding to each target text. Based on each target language representation vector, the target language representation vector corresponding to the target language is determined.

[0088] Optionally, the language similarity determination module is also specifically used to: average the representation vectors of each source language, and determine the average vector as the source language representation vector corresponding to the source language.

[0089] The machine translation quality assessment device provided in the embodiments of the present invention can execute the machine translation quality assessment method provided in any embodiment of the present invention, and has the corresponding functional modules and beneficial effects for executing the machine translation quality assessment method.

[0090] It is worth noting that in the embodiments of the machine translation quality assessment device described above, the various units and modules included are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be achieved; in addition, the specific names of each functional unit are only for easy differentiation and are not used to limit the scope of protection of the present invention.

[0091] Figure 4 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention. Figure 4 A block diagram is shown of an exemplary electronic device 12 suitable for implementing embodiments of the present invention. Figure 4 The electronic device 12 shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of the present invention.

[0092] like Figure 4 As shown, the electronic device 12 is represented in the form of a general-purpose computing device. The components of the electronic device 12 may include, but are not limited to: one or more processors or processing units 16, system memory 28, and bus 18 connecting different system components (including system memory 28 and processing unit 16).

[0093] Bus 18 represents one or more of several bus architectures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of the various bus architectures. For example, these architectures include, but are not limited to, the Industry Standard Architecture (ISA) bus, the Micro Channel Architecture (MAC) bus, the Enhanced ISA bus, the Video Electronics Standards Association (VESA) local bus, and the Peripheral Component Interconnect (PCI) bus.

[0094] Electronic device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by electronic device 12, including volatile and non-volatile media, removable and non-removable media.

[0095] System memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and / or cache memory 32. Electronic device 12 may further include other removable / non-removable, volatile / non-volatile computer system storage media. By way of example only, storage system 34 may be used to read and write non-removable, non-volatile magnetic media (… Figure 4 Not shown; usually referred to as a "hard drive"). Although Figure 4 Not shown, a disk drive for reading and writing to a removable non-volatile disk (e.g., a "floppy disk") and an optical disk drive for reading and writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 via one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to perform the functions of the embodiments of the present invention.

[0096] A program / utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28. Such program modules 42 include, but are not limited to, an operating system, one or more application programs, other program modules, and program data. Each or some combination of these examples may include an implementation of a network environment. Program modules 42 typically perform the functions and / or methods described in the embodiments of the present invention.

[0097] Electronic device 12 can also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), and with one or more devices that enable a user to interact with electronic device 12, and / or with any device that enables electronic device 12 to communicate with one or more other computing devices (e.g., network card, modem, etc.). This communication can be performed via input / output (I / O) interface 22. Furthermore, electronic device 12 can also communicate with one or more networks (e.g., local area network (LAN), wide area network (WAN), and / or public networks, such as the Internet) via network adapter 20. As shown, network adapter 20 communicates with other modules of electronic device 12 via bus 18. It should be understood that, although not shown in the figures, other hardware and / or software modules can be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems.

[0098] Processing unit 16 executes various functional applications and data processing by running programs stored in system memory 28, such as implementing the steps of a machine translation quality assessment method provided in this embodiment, the method including:

[0099] Obtain the translation text pairs to be evaluated, which include the source text in the source language and the target text in the target language after translation;

[0100] Based on at least two quality assessment indicators and the source text, the target text is assessed for quality, and the assessment result corresponding to each quality assessment indicator is determined.

[0101] Based on the language similarity between the source language and the target language, the evaluation weight corresponding to each quality assessment indicator is determined.

[0102] Based on the various evaluation weights, the evaluation results are integrated to determine the target evaluation result for the translated text pair.

[0103] Of course, those skilled in the art will understand that the processor can also implement the technical solutions of the machine translation quality assessment method provided in any embodiment of the present invention.

[0104] This embodiment provides a computer-readable storage medium storing a computer program thereon. When executed by a processor, the program implements the steps of the machine translation quality assessment method provided in any embodiment of the present invention, the method comprising:

[0105] Obtain the translation text pairs to be evaluated, which include the source text in the source language and the target text in the target language after translation;

[0106] Based on at least two quality assessment indicators and the source text, the target text is assessed for quality, and the assessment result corresponding to each quality assessment indicator is determined.

[0107] Based on the language similarity between the source language and the target language, the evaluation weight corresponding to each quality assessment indicator is determined.

[0108] Based on the various evaluation weights, the evaluation results are integrated to determine the target evaluation result for the translated text pair.

[0109] The computer storage medium of this invention can be any combination of one or more computer-readable media. A computer-readable medium can be a computer-readable signal medium or a computer-readable storage medium. For example, a computer-readable storage medium can be, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of computer-readable storage media include: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this document, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

[0110] Computer-readable signal media may include data signals propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. Computer-readable signal media may also be any computer-readable medium other than computer-readable storage media, capable of sending, propagating, or transmitting programs for use by or in connection with an instruction execution system, apparatus, or device.

[0111] Program code contained on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wireless, wire, optical fiber, RF, etc., or any suitable combination thereof.

[0112] Computer program code for performing the operations of this invention can be written in one or more programming languages ​​or a combination thereof, including object-oriented programming languages ​​such as Java, Smalltalk, and C++, as well as conventional procedural programming languages—such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0113] Those skilled in the art will understand that the modules or steps of the present invention described above can be implemented using general-purpose computing devices. They can be centralized on a single computing device or distributed across a network of multiple computing devices. Optionally, they can be implemented using computer-executable program code, thereby allowing them to be stored in a storage device for execution by a computing device, or they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. Thus, the present invention is not limited to any particular combination of hardware and software.

[0114] Note that the above description is merely a preferred embodiment of the present invention and the technical principles employed. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and various obvious changes, readjustments, and substitutions can be made without departing from the scope of protection of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and may include many other equivalent embodiments without departing from the concept of the present invention, the scope of which is determined by the scope of the appended claims.

Claims

1. A method for evaluating the quality of machine translation, characterized in that, include: Obtain the translation text pair to be evaluated, wherein the translation text pair includes the source text corresponding to the source language and the target text corresponding to the target language after translation; The target text is evaluated for quality based on at least two quality assessment metrics and the source text, and the evaluation result corresponding to each quality assessment metric is determined. Based on a preset multilingual model, the source language representation vector corresponding to the source language and the target language representation vector corresponding to the target language are determined according to the source corpus corresponding to the source language and the target language representation vector corresponding to the target language. Based on the source language representation vector and the target language representation vector, the language similarity between the source language and the target language is determined; wherein, the language similarity refers to the linguistic similarity between the source language and the target language in terms of language family, vocabulary and grammatical structure. The language similarity between the source language and the target language is input into a preset network model, which is obtained by pre-training the data and label evaluation results based on translation samples. Based on the output of the preset network model, determine the evaluation weight corresponding to each of the quality evaluation indicators; Based on the respective evaluation weights, the evaluation results are fused to determine the target evaluation result for the translated text pair.

2. The method according to claim 1, characterized in that, The quality assessment indicators include: fluency assessment indicators and fidelity assessment indicators; The process of evaluating the quality of the target text based on at least two quality assessment metrics and the source text, and determining the evaluation result corresponding to each of the quality assessment metrics, includes: Based on at least one preset fluency evaluation model and the source text, the fluency of the target text is evaluated, and the evaluation result corresponding to the fluency evaluation index is determined. Based on at least one preset loyalty assessment model and the source text, the loyalty assessment of the target text is performed to determine the assessment result corresponding to the loyalty assessment index.

3. The method according to claim 1, characterized in that, When the quality evaluation indicators include fluency evaluation indicators and fidelity evaluation indicators, the evaluation weight corresponding to each of the quality evaluation indicators is determined based on the output of the preset network model, including: Based on the output of the preset network model, determine the evaluation weights corresponding to the fluency evaluation index; Based on the evaluation weights corresponding to the fluency evaluation index, the evaluation weights corresponding to the loyalty evaluation index are determined.

4. The method according to claim 1, characterized in that, The step of determining the source language representation vector corresponding to the source language and the target language representation vector corresponding to the target language based on a preset multilingual model and according to the source corpus corresponding to the source language and the target language representation vector corresponding to the target language includes: Each source text in the source corpus corresponding to the source language is input into a preset multilingual model to determine the source language representation vector corresponding to each source text, and based on each source language representation vector, the source language representation vector corresponding to the source language is determined. Each target text in the target corpus corresponding to the target language is input into a preset multilingual model to determine the target language representation vector corresponding to each target text, and based on each target language representation vector, the target language representation vector corresponding to the target language is determined.

5. The method according to claim 4, characterized in that, The step of determining the source language representation vector corresponding to the source language based on each of the source language representation vectors includes: The average vector obtained by averaging the source language representation vectors is determined as the source language representation vector corresponding to the source language.

6. A machine translation quality assessment device, characterized in that, include: The translation text pair acquisition module is used to acquire translation text pairs to be evaluated, wherein the translation text pairs include source text corresponding to the source language and target text corresponding to the target language after translation; The evaluation result determination module is used to perform quality evaluation on the target text based on at least two quality evaluation indicators and the source text, and determine the evaluation result corresponding to each of the quality evaluation indicators; The evaluation weight determination module is used to input the language similarity between the source language and the target language into a preset network model, which is obtained by pre-training the data and label evaluation results based on translation samples. Based on the output of the preset network model, determine the evaluation weight corresponding to each of the quality evaluation indicators; The evaluation result fusion module is used to fuse the evaluation results based on the evaluation weights to determine the target evaluation result of the translated text pair; The device further includes: The language similarity determination module is used to determine the evaluation weight corresponding to each quality evaluation indicator based on the language similarity between the source language and the target language, before determining the evaluation weight corresponding to each quality evaluation indicator based on the language similarity between the source language and the target language. Based on a preset multilingual model, it determines the source language representation vector corresponding to the source language and the target language representation vector corresponding to the target language. Based on the source language representation vector and the target language representation vector, it determines the language similarity between the source language and the target language. The language similarity refers to the linguistic similarity between the source language and the target language in terms of language family, vocabulary, and grammatical structure.

7. An electronic device, characterized in that, The electronic device includes: One or more processors; Memory, used to store one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors implement the machine translation quality assessment method as described in any one of claims 1-5.

8. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the program implements the machine translation quality assessment method as described in any one of claims 1-5.