Model training method, text translation method and device

By introducing a rhetorical consistency reward mechanism and a relative advantage value loss function into the machine translation model, the model parameters are optimized, solving the problem of rhetorical device loss in traditional machine translation and improving the accuracy and quality of translation.

CN122242537APending Publication Date: 2026-06-19NEW ORIENTAL EDUCATION & TECH GRP CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NEW ORIENTAL EDUCATION & TECH GRP CO LTD
Filing Date
2026-03-30
Publication Date
2026-06-19

Smart Images

  • Figure CN122242537A_ABST
    Figure CN122242537A_ABST
Patent Text Reader

Abstract

This disclosure relates to a model training method, a text translation method, and an apparatus, belonging to the field of machine learning technology. The method includes: generating multiple candidate translations of a rhetorical source text using a first model; determining a reward value for each candidate translation based at least on the degree of consistency of rhetorical devices between the candidate translation and the source text, the reward value being used to quantify the translation quality of the candidate translation; determining the relative advantage value of each candidate translation among the multiple candidate translations based on the reward value; constructing a loss function based on the relative advantage value, the relative advantage value being used to amplify the gradient of high-quality translation samples and suppress the gradient of low-quality samples; and optimizing the parameters of the first model based on the loss function. The trained first model possesses the ability to translate rhetorical texts, especially Chinese-to-English translations, with high quality.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of machine learning technology, and more specifically, to a model training method, a text translation method, and an apparatus. Background Technology

[0002] With the continuous development of artificial intelligence technology, machine learning is being applied more and more widely in daily life. For example, in the field of text translation, with the acceleration of globalization, the demand for cross-language content in literature, advertising, film and television, and other fields through machine translation has surged.

[0003] Traditional machine translation, when processing texts containing rhetorical devices, often results in the loss of these devices through literal translation, affecting the accuracy of the translation. Summary of the Invention

[0004] The purpose of this disclosure is to provide a model training method, a text translation method, and an apparatus to at least solve the problem of low translation accuracy of models for original texts containing rhetorical devices.

[0005] To achieve the above objectives, in a first aspect, this disclosure provides a model training method, the method comprising: Multiple candidate translations of the original rhetorical text are generated using the first model; The reward value of the candidate translation is determined based at least on the degree of consistency of the rhetorical devices between the candidate translation and the original text. The reward value is used to quantify the translation quality of the candidate translation. Based on the reward value, determine the relative advantage value of the candidate translation among multiple candidate translations; A loss function is constructed based on the relative advantage value, which is used to amplify the gradient of high-quality translation samples and suppress the gradient of low-quality samples. The parameters of the first model are optimized based on the loss function.

[0006] Secondly, this disclosure provides a text translation method, including: Get the input statement; The translation result of the input statement is generated by a first model, wherein the first model is trained by the model training method described in the first aspect and any feasible implementation thereof.

[0007] Thirdly, this disclosure provides a model training apparatus, the apparatus comprising: The first translation module is used to generate multiple candidate translations of the original rhetorical text using the first model; The reward determination module is used to determine the reward value of the candidate translation based at least on the degree of consistency of the rhetorical devices between the candidate translation and the original text, wherein the reward value is used to quantify the translation quality of the candidate translation. The advantage value determination module is used to determine the relative advantage value of the candidate translation among multiple candidate translations based on the reward value; The loss function construction module is used to construct a loss function based on the relative advantage value, which is used to amplify the gradient of high translation quality samples and suppress the gradient of low quality samples. The parameter update module is used to optimize the parameters of the first model according to the loss function.

[0008] Fourthly, this disclosure provides a text translation apparatus, the apparatus comprising: The acquisition module is used to acquire input statements; The second translation module is used to generate the translation result of the input sentence through the first model, wherein the first model is trained by the model training method described in the first aspect and any feasible implementation thereof.

[0009] Fifthly, this disclosure provides an electronic device, comprising: A memory on which computer programs are stored; A processor for executing the computer program in the memory to implement the method described in the first aspect.

[0010] In a sixth aspect, this disclosure provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method described in the first or second aspect.

[0011] In a seventh aspect, this disclosure provides a computer program product, including a computer program that, when executed by a processor, implements the method described in the first or second aspect.

[0012] The above technical solution designs a reward mechanism based on the consistency of rhetorical devices between the candidate translation and the original text, and converts the reward value into a relative advantage value within the group. A loss function for gradient update is constructed using the relative advantage value, thereby optimizing the parameters of the first model. This ensures that the trained first model retains the rhetorical devices of the original text in the translation, avoiding the low translation quality problem caused by literal translation. Thus, the problem of low rhetorical consistency before and after translation in traditional models can be solved, improving translation accuracy.

[0013] Other features and advantages of this disclosure will be described in detail in the following detailed description section. Attached Figure Description

[0014] The accompanying drawings are provided to further illustrate the present disclosure and form part of the specification. They are used together with the following detailed description to explain the present disclosure, but do not constitute a limitation thereof. In the drawings: Figure 1 This is a flowchart illustrating a model training method according to an exemplary embodiment of the present disclosure.

[0015] Figure 2 This is a flowchart illustrating a GRPO training method according to an exemplary embodiment of the present disclosure.

[0016] Figure 3 This is a flowchart illustrating a text translation method according to an exemplary embodiment of the present disclosure.

[0017] Figure 4 This is a schematic diagram of a training process output according to an exemplary embodiment of the present disclosure.

[0018] Figure 5 This is a block diagram of a model training apparatus according to an exemplary embodiment of the present disclosure.

[0019] Figure 6 This is a block diagram illustrating a text translation apparatus according to an exemplary embodiment of the present disclosure.

[0020] Figure 7 This is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure. Detailed Implementation

[0021] Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodiments of this disclosure are shown in the drawings, it should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.

[0022] It should be understood that the steps described in the method embodiments of this disclosure may be performed in different orders and / or in parallel. Furthermore, the method embodiments may include additional steps and / or omit the steps shown. The scope of this disclosure is not limited in this respect.

[0023] The term "comprising" and its variations as used herein are open-ended inclusions, meaning "including but not limited to". The term "based on" means "at least partially based on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Definitions of other terms will be given in the description below.

[0024] It should be noted that the concepts of "first" and "second" mentioned in this disclosure are used only to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or their interdependencies.

[0025] It should be noted that the terms "a" and "a plurality of" used in this disclosure are illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".

[0026] The names of messages or information exchanged between multiple devices in the embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

[0027] Traditional machine translation methods include rule-based and neural machine-based methods. Rule-based methods rely on manually constructed rhetoric databases, which have limited coverage and require continuous maintenance by experts, resulting in high update and maintenance costs. Neural machine-based methods, such as those based on NMT models, perform well on metrics like BLEU (Bilingual Evaluation Understudy) and TER (Translation Error Rate), but suffer from low accuracy in rhetorical transformations, making it difficult to produce high-quality translations of texts containing rhetorical devices.

[0028] To address the aforementioned issues, this disclosure provides a model training method that measures the translation quality of multiple candidate translations generated by a first model by at least the consistency of rhetorical devices. Then, it increases the influence of high-quality samples on model parameter updates while suppressing the influence of low-quality samples, thereby training a first model. The trained first model possesses the ability to translate rhetorically-driven texts, particularly Chinese-to-English translations, to a high quality.

[0029] Figure 1 This is a flowchart illustrating a model training method according to an exemplary embodiment of this disclosure, such as... Figure 1 As shown, the model training method may include steps S101 to S105.

[0030] Step S101: Generate multiple candidate translations of the original rhetorical text using the first model.

[0031] Here, the first model refers to the model to be trained. The rhetorical text refers to the text that uses rhetorical devices. The rhetorical text may include multiple input statements, at least one of which contains a rhetorical device. Rhetorical devices include metaphor, parallelism, personification, contrast, hyperbole, metonymy, rhetorical question, antithesis, repetition, etc.

[0032] For the same rhetorical text, the first model generates multiple candidate translations with different meanings through diversity sampling. For example, the first model generates multiple exploration branches for the same translation task through Top-p or Top-k sampling, with each exploration branch corresponding to a candidate translation.

[0033] Step S102: Determine the reward value of the candidate translation based at least on the degree of consistency of the rhetorical devices between the candidate translation and the original text. The reward value is used to quantify the translation quality of the candidate translation.

[0034] For example, if both the original text and the candidate translation contain a metaphor, it indicates that the rhetorical devices are consistent. Conversely, if the original text contains a metaphor but the candidate translation does not, it indicates that the rhetorical devices are inconsistent. In some embodiments, the original text and the candidate translation may contain multiple rhetorical devices, and the degree of consistency in the rhetorical devices indicates whether the multiple rhetorical devices are consistent before and after translation. It is understandable that among the multiple candidate translations generated for the original text, if the original text contains a metaphor, then the candidate translation containing the metaphor has a higher degree of consistency in the rhetorical devices.

[0035] This step quantifies the translation quality of candidate translations based on the consistency of rhetorical devices before and after translation. If the consistency of rhetorical devices is low, the translation quality of the candidate translation is low, and the first model will not select the candidate translation. This allows the first model to be trained in the direction of high consistency of rhetorical devices, so that the trained first model can have the ability to translate rhetorical devices.

[0036] Step S103: Determine the relative advantage value of the candidate translation among multiple candidate translations based on the reward value.

[0037] This step converts the reward value into a relative advantage value. This means that if multiple candidate translations are considered as a group, the relative advantage value refers to whether the translation quality of a candidate translation is superior to that of other candidate translations within the group. In some embodiments, the translation quality of all candidate translations within the group can be averaged; in this case, the relative advantage value refers to whether the translation quality of a candidate translation is superior to the average translation quality.

[0038] Specifically, the larger the relative advantage value, the greater the advantage of the candidate translation among multiple candidate translations, and thus the higher the translation quality of the candidate translation; conversely, the smaller the relative advantage, the smaller the advantage of the candidate translation among multiple candidate translations, and thus the lower the translation quality of the candidate translation.

[0039] Step S104: Construct a loss function based on the relative advantage value. The relative advantage value is used to amplify the gradient of high-quality translation samples and suppress the gradient of low-quality samples.

[0040] In machine learning, the goal of the loss function is to make the predicted value as close as possible to the true value. Therefore, when constructing the loss function, a relative advantage value is introduced, which transforms the relative advantage value of the candidate translation within the group into an update signal for the model parameters, driving the model to generate a translation with more consistent rhetoric before and after translation.

[0041] More specifically, the relative advantage value is added to the original loss function through mathematical calculation, such as adding the relative advantage value as a weight value to the original loss function to obtain the new loss function. Compared to the original loss function, the new loss function uses the consistency of rhetorical devices before and after translation in the candidate translation as an update condition. This allows the first model to learn how to translate rhetorical devices, thereby improving translation quality.

[0042] Step S105: Optimize the parameters of the first model according to the loss function.

[0043] Specifically, the training objective of the first model is to minimize the loss function, drive the model parameters to be updated in the direction of high reward, and preserve the rhetorical devices in the translation through gradient descent, thereby enabling the first model to have the ability to translate rhetorical devices and avoid the problem of low translation quality caused by literal translation.

[0044] Using the technical solutions described in steps S101 to S105, a reward mechanism is designed based on the consistency of rhetorical devices between the candidate translation and the original text, and the reward value is converted into a relative advantage value within the group. A loss function for gradient update is constructed using the relative advantage value, thereby optimizing the parameters of the first model. This ensures that the trained first model can retain the rhetorical devices of the original text in the translation, avoiding the problem of low translation quality caused by literal translation. Therefore, the problem of low rhetorical consistency before and after translation in traditional models can be solved, improving translation accuracy.

[0045] In some feasible embodiments, if the relative advantage value is greater than 1, it indicates that the translation quality of the candidate translation is higher than the average translation quality of multiple candidate translations; if the relative advantage value is less than 1, it indicates that the translation quality of the candidate translation is lower than the average translation quality of multiple candidate translations. Step S104 above includes: multiplying the relative advantage value by the policy probability ratio of the candidate translations in the original loss function to obtain the loss function, wherein the policy probability ratio is the ratio of the probability of the first model generating a candidate translation using the new policy to the probability of generating a candidate translation using the old policy.

[0046] This embodiment uses relative advantage values ​​as weights to increase the influence of high-quality samples on model parameter updates in a weighted manner. By making only minor improvements to the original loss function, model parameters can be trained based on relative advantage values.

[0047] In some feasible embodiments, the model training method provided in this disclosure further includes: Determine the matching value of rhetorical types and the difference value of the number of rhetorical devices between the original text and the candidate translation; The product of the matching value of the rhetorical type and the difference value of the number of rhetorical devices is calculated to obtain the rhetorical consistency reward of the candidate translation. The rhetorical consistency reward is used to represent the degree of consistency of rhetorical devices between the candidate translation and the original text.

[0048] Specifically, the rhetorical consistency reward can be determined by expression (4).

[0049] (4) In expression (4), Rrh(x,y) represents the rhetorical consistency reward, Match Type Density represents the matching value for the rhetorical type. Control This represents the difference in the number of rhetorical devices.

[0050] Regarding rhetorical devices, the aforementioned metaphors, parallelism, personification, contrast, hyperbole, metonymy, rhetorical questions, interrogative sentences, antithesis, and repetition each represent a rhetorical device type. Continuing with the example of the original text containing a metaphor, if the candidate translation also contains a metaphor, it indicates a rhetorical device type match. In this embodiment, the matching value of the rhetorical device type quantifies the degree of matching. A higher matching value indicates a more accurate match; a lower matching value indicates a less accurate match.

[0051] Regarding the number of rhetorical devices, if the number of rhetorical devices in the original text is 1, then if the number of rhetorical devices in the candidate translation is 2, it indicates a mismatch in the number of rhetorical devices. If the number of rhetorical devices in the candidate translation is 3, it indicates an even greater mismatch. If the number of rhetorical devices in the candidate translation is 1, it indicates a match in the number of rhetorical devices. There can be multiple rhetorical device counts in the original text and multiple rhetorical device counts in the candidate translation. In this embodiment, the degree of difference in the number of rhetorical devices is quantified by the difference value. The smaller the difference value, the smaller the gap between the two, and the higher the translation quality.

[0052] This embodiment considers the rhetorical consistency of the translation from both quantity and type perspectives, providing a more comprehensive approach and enabling the first model to have a better ability to translate rhetorical texts.

[0053] In this feasible embodiment, as an example, determining the matching value of rhetorical types and the difference value of the number of rhetorical devices between the original text and the candidate translation includes: The rhetorical type matching value between the original text and the candidate translation is obtained by dividing twice the number of rhetorical types shared by the original text and the candidate translation by the total number of rhetorical types. The total number of rhetorical types is the sum of the number of rhetorical types in the original text and the number of rhetorical types in the candidate translation. The absolute value of the difference in the number of rhetorical devices between the original text and the candidate translation is divided by the sum of the number of rhetorical devices in the original text and the smoothing term. The negative value of the quotient is used as the exponential input to calculate the value of the natural exponential function, thus obtaining the difference in the number of rhetorical devices between the original text and the candidate translation.

[0054] Specifically, the matching value of the rhetorical type can be determined by expression (5).

[0055] (5) In expression (5), R x R represents the set of rhetorical types of the original text x to be translated. y R represents the set of rhetorical types of candidate translations y. x and R y The text can be identified using a pre-trained rhetoric detector. |R x ∩R y | Indicates the number of rhetorical devices shared by the source text x and the candidate translation y. |R x | indicates the number of rhetorical devices in the original text, R y This indicates the number of rhetorical devices in the candidate translations.

[0056] Specifically, the difference in rhetorical quantity can be determined by expression (6).

[0057] (6) In expression (6), Nx N represents the number of rhetorical devices used in the original text. y The expression (6) represents the number of rhetorical devices in the candidate translations, and k represents the smoothing term used to avoid division by zero. The negative sign in expression (6) is used to map the difference in quantity to a decay factor; that is, when the difference is 0, Density... Control The maximum output value is 1; the greater the difference, the smaller the output value.

[0058] In this embodiment, the calculation formulas for quantity and type can both be understood as follows: the more consistent the rhetoric, the closer the value is to 1. Therefore, the rhetorical consistency reward obtained by multiplying the two is a value closer to 1, indicating better translation quality. This method of quantifying rhetorical consistency allows this parameter to be easily incorporated as a weight into the loss function.

[0059] In some feasible embodiments, step S102 above includes: determining the reward value of the candidate translation by combining the degree of consistency of rhetorical devices between the candidate translation and the original text, the semantic fidelity reward, the language fluency consistency reward, and the emotional consistency reward.

[0060] The semantic fidelity reward is used to represent the degree of semantic fidelity between the candidate translation and the original rhetorical text. For example, if the translation changes the original meaning, the semantic fidelity is low, which is reflected in the semantic fidelity reward as low. Conversely, if the translation preserves the semantics of the original rhetorical text, the semantic fidelity is high, which is reflected in the semantic fidelity reward as high.

[0061] The fluency reward is used to indicate the degree of fluency between the candidate translation and the original text. For example, if grammatical errors such as missing subject and verb appear in the translation, the translation will be stiff and unnatural, resulting in a low fluency reward. Conversely, if the original grammar is preserved in the translation, the translation will be fluent, resulting in a high fluency reward.

[0062] The emotional consistency reward is used to indicate the degree of emotional consistency between the candidate translation and the original rhetorical text. For example, if the translation changes the original emotion, the emotional consistency is low, which is reflected in a low emotional consistency reward. Conversely, if the translation retains the emotion of the original rhetorical text, the emotional consistency is high, which is reflected in a high emotional consistency reward.

[0063] In this embodiment, the translation possesses four advantages: rhetorical consistency, semantic fidelity, linguistic fluency, and emotional consistency. The first model trained in this way achieves higher accuracy in translating input sentences due to these multiple advantages.

[0064] Specifically, the overall reward function can be determined by expression (3).

[0065] (3) In expression (3), x represents the original text, which can be a Chinese sentence. y represents the candidate translation given by the model, which can be an English sentence. R rh This indicates a reward for consistency in rhetorical devices. sem Represents a semantic fidelity reward. R flu Represents a consistent reward for language fluency. (R) emo This indicates a reward for emotional consistency.

[0066] Where, ω rh ω sem ω flu and ω emo These represent the weights of the four rewards: rhetorical devices, semantic fidelity, linguistic fluency, and affective consistency. For example, all weights can be 0.25, and this application does not impose any restrictions on this. It is understood that adjusting the weights can adjust the training direction of the model; for example, a higher weight for the rhetorical device consistency reward will enhance the model's ability to translate rhetorical devices.

[0067] In this feasible embodiment, as an example, the model training method further includes: Determine the semantic similarity between the original rhetorical text and the candidate translation, and determine the BLEU value between the back-translated text of the candidate translation and the original rhetorical text. Calculate the product of the BLEU value and the penalty factor for the difference in semantic distribution. The semantic fidelity reward of the candidate translation is calculated by weighted summation of the semantic similarity product.

[0068] It is understandable that back-translated text refers to the text generated after the candidate translation is back-translated by the back-translation model.

[0069] In this embodiment, semantic fidelity rewards are used to ensure high semantic similarity before and after translation.

[0070] Specifically, the semantic loyalty reward can be determined by expression (7).

[0071] (7) In expression (7), ω1 represents the first weight; ω2 represents the second weight; S sim (x,y) represents the semantic similarity between the original rhetorical text x and the candidate translation y; BLUE(x,BT(y)) represents the BLRU value between the back-translated text and the original rhetorical text. Existing translation models can be used for back-translation, which will not be elaborated upon in this application; D JS (E x (x)||E BT(y) represents the distribution difference between the back-translated text and the original text. The larger the distribution difference, the larger the value. γ represents the hyperparameter, and its value depends on the quality of the back-translation model.

[0072] The following details how each term in expression (7) is calculated.

[0073] S sim (x, y): The rhetorical source text x and the candidate translation y can be embedded using the multilingual pre-trained model BGE-M3, S sim (x,y) can be determined by expression (8).

[0074] (8) In expression (8), e x This represents the embedding vector obtained by representing the original text x using BGE-M3; e y This represents the embedding vector obtained by representing the candidate translation y using BGE-M3.

[0075] BLUE(x,bt(y)): can use an existing translation model as a back-translation model, determined by expression (9).

[0076] (9) In expression (9), BP is the brevity penalty, which is used to penalize translations that are too short (i.e., candidate translations); it can be determined by expression (10).

[0077] (10) In expression (10), c represents the length of the back-translated text; r represents the length of the original rhetorical text; Pn represents the n-gram precision, that is, the proportion of n-grams in the back-translated text that match the original rhetorical text.

[0078] ω n To represent the weight, you can take... , where N is the largest n-gram order (e.g., N=4).

[0079] D JS (E x (x)||E BT (y)): The Jensen-Shannon divergence can be used to measure the distributional difference between the back-translated text and the original rhetorical text, as shown in expression (11).

[0080] (11) Where M is: ; KL divergence is defined as: .

[0081] In this feasible embodiment, as an example, the model training method further includes: Divide the perplexity of the candidate translation by the temperature coefficient and then subtract the preset offset. The difference is used as the exponential input to calculate the natural exponential function value. The reciprocal of the result obtained by adding the natural exponential function value to 1 is used as the language fluency reward value of the candidate translation. The perplexity is used to determine the language fluency of the candidate translation based on the number of candidate words when predicting each word in the candidate translation based on the first model. The temperature coefficient represents the parameter that is adaptively adjusted based on the length of the candidate translation.

[0082] In this embodiment, a language fluency reward is used to ensure that the language of the translation remains fluent.

[0083] Specifically, the language fluency reward can be determined by expression (12).

[0084] (12) In expression (12), PPL(y) is the perplexity (PPL) of the candidate translation y.

[0085] The temperature coefficient T is used to control PPL sensitivity. For example, T=30 for short texts (<15 words) and T=50 for long texts (≥15 words). By dynamically adjusting the value of the temperature coefficient T, the PPL inflation problem of long texts can be mitigated.

[0086] The offset β is the baseline threshold, for example, fixed at 0.8, so that the gate value reaches 0.5 when PPL≈24.

[0087] In this feasible embodiment, as an example, the model training method further includes: The probability that the candidate translation and the original rhetorical text are consistent in emotional direction is multiplied by the intensity difference attenuation factor, and then multiplied by the conflict penalty term to obtain the emotional consistency reward. The intensity difference attenuation factor refers to the exponent of the power function with the attenuation base as the base and the absolute value of the difference in emotional intensity between the original rhetorical text and the candidate translation as the power function. The conflict penalty term is calculated by subtracting the product of the polarity conflict indicator function and the conflict grading penalty from 1.

[0088] In this embodiment, the emotional consistency reward ensures that the emotions of the original text and the translated text remain consistent, thereby improving the translation accuracy of the first model.

[0089] Specifically, emotions can be divided into three types: positive, negative, and neutral, with corresponding emotion intensities of +1, 0, and -1, respectively. The reward for emotional consistency can be determined by expression (13).

[0090] (13) In expression (13), Pr(y∈E) x ) represents the probability that the candidate translation y and the original rhetorical text x are consistent in terms of emotional direction, where the emotional direction can be classified into three categories: positive, neutral, and negative.

[0091] S x S indicates the intensity of emotion in the original rhetorical text. y This indicates the emotional intensity of the candidate translation, where the scalar value of emotional intensity can include: +1 for positive, 0 for neutral, and -1 for negative.

[0092] α represents the intensity attenuation base, α∈(0,1), and the default value of α is 0.6.

[0093] λ represents the conflict level penalty, with a default value of 0.3.

[0094] I conflict The polarity conflict indicator function is shown in expression (14): (14) Based on the emotion consistency reward calculation, if the emotion of the original rhetorical text is not neutral (i.e., the emotion of the original rhetorical text is positive or negative), and the emotion of the candidate translation is completely opposite to that of the original rhetorical text, then the emotion polarity conflict is 1, which is even smaller for the reward value, thus achieving the effect of punishing the emotion conflict between the reference translation and the original rhetorical text during translation. Similarly, if the emotion of the original rhetorical text is not neutral and the emotion of the candidate translation is neutral, or if the emotion of the original rhetorical text is neutral and the emotion of the candidate translation is not neutral, then a certain penalty is given, which is less severe than the case of opposite emotions. If the emotions are consistent, no penalty is given.

[0095] In some feasible embodiments, step S103 above includes: The relative advantage value of a candidate translation is obtained by subtracting the mean of the reward values ​​of multiple candidate translations from the reward value of the candidate translation, and then dividing by the standard deviation of the reward values ​​of multiple candidate translations.

[0096] This embodiment calculates the relative advantage value without an additional value model. Instead, it estimates the value of the candidate translation relative to other outputs through simple calculation based on the reward value, which has the effect of being computationally simple.

[0097] Specifically, the relative advantage value can be determined by expression (15).

[0098] (15) In expression (15), i represents the i-th candidate translation, and r i This represents the reward system for the i-th candidate translation, where G represents the total number of candidate translations. This represents the average reward value of the G candidate translations within the group. This represents the variance of the reward values ​​for the G candidate translations within the group.

[0099] The above section explained the four reward methods. By constructing four reward functions for the first model, the first model gains advantages in four directions for translating sentences with rhetoric: rhetorical consistency, semantic fidelity, linguistic fluency, and emotional consistency. The following section explains the GRPO training of the first model.

[0100] Figure 2 This is a flowchart illustrating a GRPO training method according to an exemplary embodiment of this disclosure. Figure 2 As shown, the GRPO training method includes steps S201 to S223.

[0101] Step S201: Initialization.

[0102] Step S202: Initialize the policy model π_θ.

[0103] In this step, a language model is initialized as the policy model π_θ, for example, Qwen-7B-Instruct is used as the policy model.

[0104] Step S203: Set the hyperparameters ε and β.

[0105] Step S204: Initialize the reference strategy π_ref.

[0106] Step S205, Sampling stage.

[0107] Step S206: Sample the text q to be translated from the distribution P(Q).

[0108] In this step, q sentences (equivalent to the original text with rhetorical devices) are sampled from the distribution of texts to be translated containing rhetorical devices.

[0109] Step S207: Sample G outputs for each text q to be translated from the current policy π_θ_old.

[0110] In this step, for each text q, G distinct outputs (equivalent to the candidate translations mentioned above) are sampled from the current policy π_θ_old, denoted as O1, O2, ..., O G These outputs represent multiple translations of the same text q by the policy model.

[0111] Step S208: Calculate the reward for each output.

[0112] In this step, the rewards for the G outputs are denoted as r1, r2, ..., r GThe reward mainly consists of four parts, as explained in steps S209 to S212.

[0113] Step S209: Calculate the rhetorical consistency reward R_rh.

[0114] Step S210: Calculate the semantic fidelity reward R_sem.

[0115] Step S211: Calculate the language fluency reward R_flu.

[0116] Step S212: Calculate the emotional consistency reward R_emo.

[0117] Step S213: Calculate R by weighted summation.

[0118] Step S214: Calculate the relative advantage value of each output.

[0119] Step S215: Calculate the final loss function.

[0120] In this step, the policy gradient loss is calculated, which consists of two parts: PPO pruning objective: limiting the difference between the old and new policies; and KL divergence regularization: preventing the policy from deviating too far from the reference policy.

[0121] The loss function is determined by expression (16).

[0122] (16) in,

[0123] Step S216: Parameter update.

[0124] Step S217: Calculate the gradient.

[0125] Step S218: Apply gradient update strategy parameters θ.

[0126] Step S219: Determine convergence.

[0127] Step S220: Determine whether the maximum number of training steps has been reached.

[0128] If yes, proceed to step S221; otherwise, proceed to step S222.

[0129] Step S221: End training.

[0130] Step S222: Determine whether the performance has reached the target level.

[0131] If yes, proceed to step S221; otherwise, proceed to step S223.

[0132] Step S223: Update the old strategy π_θ_old = π_θ.

[0133] Then proceed to step S205 for the next round of training.

[0134] Repeat steps S205 to S223 until the convergence condition is met.

[0135] The foregoing described the model training method provided in the embodiments of this disclosure, and the first model obtained through training can be used for text translation.

[0136] Figure 3 This is a flowchart illustrating a text translation method according to an exemplary embodiment of this disclosure, such as... Figure 3 As shown, the text translation method includes steps S301 and S306.

[0137] Step S301: Obtain the input statement.

[0138] Step S306: Generate the translation result of the input statement through the first model.

[0139] In this embodiment, the first model is trained using the method described above. Therefore, the translation results generated by the first model can retain the rhetorical devices used before translation, making the translation more accurate. Furthermore, multiple rewards are used to ensure the translated text possesses four advantages: rhetorical consistency, semantic fidelity, linguistic fluency, and emotional consistency. Thus, the first model trained in this way achieves higher accuracy in translating input sentences due to these combined advantages.

[0140] Continue to refer to Figure 3 In some embodiments, the text translation method further includes steps S302 to S305.

[0141] Step S302: Identify the rhetorical devices in the input statement.

[0142] Step S303: Search the knowledge base for the function of rhetorical devices.

[0143] Step S304: Combining the rhetorical devices, their functions, and the context of the input statement, perform semantic analysis on the input statement to obtain the semantically analyzed text.

[0144] Step S305: Input the semantically analyzed text and rhetorical devices into the first model.

[0145] In this embodiment, the original text input to the model includes the function of rhetorical devices and the context of the rhetorical text, enabling the first model to translate original texts in different contexts.

[0146] It should be noted that in the model training method provided in this embodiment, the input of the first model can also be determined by the method of recognizing rhetoric in the model application scenario described in steps S302 to S305 above. After step S305, as... Figure 4 As shown, steps S401 to S404 are executed.

[0147] Step S401: Construct a reward function for training the first model.

[0148] For details, please refer to the explanation of expressions (4) to (14) above.

[0149] Step S402: Train the first model using GRPO.

[0150] For details, please refer to the explanations of steps S201 to S223 above.

[0151] Step S403: Apply the first model.

[0152] Step S404: Output the translation results.

[0153] In some embodiments, step S302 includes: identifying the rhetorical devices in the input statement through at least one method, namely rule base matching or model recognition. This embodiment provides multiple methods for identifying rhetorical devices, thereby increasing the diversity of identification methods.

[0154] In the above embodiments, as an example, when combining rule base matching and model recognition to identify the rhetorical devices in the input statement, the text translation method further includes: For the first rhetorical device matched by the rule base, determine the rule matching strength between the input statement and the rule corresponding to the first rhetorical device in the rule base, where the rule matching strength is the number of rules matched by the first rhetorical device divided by the total number of rules corresponding to the first rhetorical device. For the first rhetorical device identified by the model, determine the model prediction probability of the first rhetorical device; The discriminant value is obtained by weighting and fusing the rule matching strength with the model prediction probability. If the discriminant value is greater than the threshold, the rhetorical devices used in the input statement are determined to include the first rhetorical device.

[0155] This embodiment, by combining multiple methods, can improve the accuracy of the identification results for rhetorical devices.

[0156] Specifically, the rule base can be pre-built and covers rules across multiple dimensions, including grammar, vocabulary, and context. Taking metaphor as an example, the rule base contains the types of metaphors (simile, metaphor, allegory), regular expressions for each type of metaphor, examples, and conditions. By matching sentences with rules in the rule base, the rhetorical devices used in the sentences can be determined.

[0157] For the ten rhetorical devices mentioned above, a rhetorical device recognition model is constructed, as shown in expression (1).

[0158] (1) Wherein, RhetoricCheck(X) represents the rhetoric recognition model; Ψ(X) represents the rhetoric type discrimination model, such as a model based on BERT pre-training and rule combination; τ represents the hyperparameter, that is, the rhetoric type predicted by the model is used only when the model prediction reaches a certain confidence level, otherwise it is considered that the input sentence does not contain rhetoric. The default value of the hyperparameter can be set to 0.9.

[0159] The model for identifying rhetorical devices is shown in expression (2).

[0160] (2) Expression (2) adopts a rule-and-model fusion method, where P k (X) represents the strength of the match for the k-th rule, which is a normalized value. The strength of a rule match is measured by dividing the number of rules matched for each rhetoric by the total number of rules for that rhetoric. , n represents the number of rules matched by the k-th rhetorical device. k Let k represent the number of rules defined by the k-th rhetorical device.

[0161] Bert k (X) represents the probability predicted by Bert for the k-th rhetorical device. Bert is an 11-class discriminant model based on fine-tuning of a pre-trained model.

[0162] α k α represents the fusion coefficient for the k-th category, which can be dynamically adjusted based on the accuracy of detection for different categories according to the rules. k The default value can be 0.5.

[0163] Based on the same inventive concept, this disclosure also provides a model training device, such as... Figure 5 As shown, the model training device 500 includes: The first translation module 501 is used to generate multiple candidate translations of the rhetorical original text through the first model; The reward determination module 502 is used to determine the reward value of the candidate translation based at least on the degree of consistency of the rhetorical devices between the candidate translation and the original text. The reward value is used to quantify the translation quality of the candidate translation. The advantage value determination module 503 is used to determine the relative advantage value of a candidate translation among multiple candidate translations based on the reward value; The loss function construction module 504 is used to construct a loss function based on the relative advantage value, which is used to amplify the gradient of high translation quality samples and suppress the gradient of low quality samples. The parameter update module 505 is used to optimize the parameters of the first model based on the loss function.

[0164] In the above technical solution, a reward mechanism is designed based on the consistency of rhetorical devices between the candidate translation and the original text, and the reward value is converted into a relative advantage value within the group. A loss function for gradient update is constructed using the relative advantage value, thereby optimizing the parameters of the first model. This ensures that the trained first model can retain the rhetorical devices of the original text in the translation, avoiding the problem of low translation quality caused by literal translation. Thus, the problem of low rhetorical consistency before and after translation in traditional models can be solved, improving translation accuracy.

[0165] Furthermore, the reward determination module 502 is also used to determine the matching value of the rhetorical type between the original rhetorical text and the candidate translation, as well as the difference value of the number of rhetorical devices; The product of the matching value of the rhetorical type and the difference value of the number of rhetorical devices is calculated to obtain the rhetorical consistency reward of the candidate translation. The rhetorical consistency reward is used to represent the degree of consistency of rhetorical devices between the candidate translation and the original text.

[0166] Furthermore, the reward determination module 502 is also used to divide twice the number of rhetorical types shared by the original text and the candidate translation by the total number of rhetorical types to obtain the rhetorical type matching value between the original text and the candidate translation, wherein the total number of rhetorical types is the sum of the number of rhetorical types in the original text and the number of rhetorical types in the candidate translation. The absolute value of the difference in the number of rhetorical devices between the original text and the candidate translation is divided by the sum of the number of rhetorical devices in the original text and the smoothing term. The negative value of the quotient is used as the exponential input to calculate the value of the natural exponential function, thus obtaining the difference in the number of rhetorical devices between the original text and the candidate translation.

[0167] Furthermore, the reward determination module 502 is used to determine the reward value of the candidate translation by combining the degree of consistency of rhetorical devices, semantic fidelity reward, language fluency consistency reward, and emotional consistency reward between the candidate translation and the original text.

[0168] Furthermore, the reward determination module 502 is also used to determine the semantic similarity between the original rhetorical text and the candidate translation, as well as to determine the BLEU value between the back-translated text of the candidate translation and the original rhetorical text, and to calculate the product of the BLEU value and the penalty factor for the difference in semantic distribution. The semantic fidelity reward of the candidate translation is calculated by weighted summation of the semantic similarity product.

[0169] Furthermore, the reward determination module 502 is also used to divide the perplexity of the candidate translation by the temperature coefficient and then subtract a preset offset. The difference is used as the exponential input to calculate the natural exponential function value. The reciprocal of the calculation result obtained by adding the natural exponential function value to 1 is used as the language fluency reward value of the candidate translation. Here, the perplexity is used to determine the language fluency of the candidate translation based on the number of candidate words when predicting each word in the candidate translation by the first model. The temperature coefficient represents a parameter that is adaptively adjusted based on the length of the candidate translation.

[0170] Furthermore, the reward determination module 502 is also used to multiply the probability that the candidate translation and the original rhetorical text are consistent in the emotional direction by the intensity difference attenuation factor, and then by the conflict penalty term to obtain the emotional consistency reward. The intensity difference attenuation factor refers to the exponent of the power function with the attenuation base as the base and the absolute value of the difference in emotional intensity between the original rhetorical text and the candidate translation as the power function. The conflict penalty term is calculated by subtracting the product of the polarity conflict indicator function and the conflict grading penalty from 1.

[0171] Furthermore, the advantage value determination module 503 is used to subtract the mean of the reward values ​​of multiple candidate translations from the reward value of the candidate translation, and then divide it by the standard deviation of the reward values ​​of multiple candidate translations to obtain the relative advantage value of the candidate translation among multiple candidate translations.

[0172] Furthermore, the first translation module 501 is also used to identify the rhetorical devices in the original text; Search the knowledge base for the function of rhetorical devices; By combining the rhetorical devices, their functions, and the context of the original text, a semantic analysis of the original text is conducted to obtain a semantically analyzed text. Input the semantically analyzed text and rhetorical devices into the first model.

[0173] Furthermore, if the relative advantage value is greater than 1, it indicates that the translation quality of the candidate translation is higher than the average translation quality of multiple candidate translations; if the relative advantage value is less than 1, it indicates that the translation quality of the candidate translation is lower than the average translation quality of multiple candidate translations. The loss function construction module 504 is used to multiply the relative advantage value by the strategy probability ratio of the candidate translation in the original loss function to obtain the loss function, where the strategy probability ratio is the ratio of the probability of the first model generating a candidate translation through the new strategy to the probability of generating a candidate translation through the old strategy.

[0174] Based on the same inventive concept, this disclosure also provides a text translation device, such as... Figure 6 As shown, the text translation device 600 includes: Module 601 is used to acquire input statements; The second translation module 602 is used to generate the translation result of the input sentence through the first model, wherein the first model is trained by the model training method provided above.

[0175] In the above technical solution, a reward mechanism is designed based on the consistency of rhetorical devices between the candidate translation and the original text, and the reward value is converted into a relative advantage value within the group. A loss function for gradient update is constructed using the relative advantage value, thereby optimizing the parameters of the first model. This ensures that the trained first model can retain the rhetorical devices of the original text in the translation, avoiding the problem of low translation quality caused by literal translation. Thus, the problem of low rhetorical consistency before and after translation in traditional models can be solved, improving translation accuracy.

[0176] Furthermore, the acquisition module 601 is also used to identify the rhetorical devices in the input statement; Search the knowledge base for the function of rhetorical devices; By combining the rhetorical devices, their functions, and the context of the input statement, semantic analysis is performed on the input statement to obtain the semantically analyzed text. Input the semantically analyzed text and rhetorical devices into the first model.

[0177] Furthermore, the acquisition module 601 is also used to identify the rhetorical devices of the input statement through at least one of rule base matching or model recognition.

[0178] Furthermore, when the rhetorical devices of the input statement are identified by combining rule base matching and model recognition, the acquisition module 601 is also used to determine the rule matching strength between the input statement and the rule corresponding to the first rhetorical device in the rule base for the first rhetorical device matched by the rule base. The rule matching strength is the number of rules matched by the first rhetorical device divided by the total number of rules corresponding to the first rhetorical device. For the first rhetorical device identified by the model, determine the model prediction probability of the first rhetorical device; The discriminant value is obtained by weighting and fusing the rule matching strength with the model prediction probability. If the discriminant value is greater than the threshold, the rhetorical devices used in the input statement are determined to include the first rhetorical device.

[0179] Figure 7 This is a block diagram illustrating an electronic device 500 according to an exemplary embodiment. For example... Figure 5As shown, the electronic device 700 may include a processor 701 and a memory 702. The electronic device 700 may also include one or more of a multimedia component 703, an input / output (I / O) interface 704, and a communication component 705.

[0180] The processor 701 controls the overall operation of the electronic device 700 to complete all or part of the steps in the aforementioned model training method or text translation method. The memory 702 stores various types of data to support the operation of the electronic device 700. This data may include, for example, instructions for any application or method operating on the electronic device 700, and application-related data such as text images, target images, text information, etc. The memory 702 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk. The multimedia component 703 may include a screen and audio components. The screen may be, for example, a touchscreen, and the audio component is used to output and / or input audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signals may be further stored in memory 702 or transmitted via communication component 705. The audio component also includes at least one speaker for outputting audio signals. I / O interface 704 provides an interface between processor 701 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual or physical buttons. Communication component 705 is used for wired or wireless communication between the electronic device 700 and other devices. Wireless communication may include Wi-Fi, Bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination thereof; therefore, the corresponding communication component 705 may include a Wi-Fi module, a Bluetooth module, or an NFC module.

[0181] In an exemplary embodiment, the electronic device 700 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the model training method or text translation method described above.

[0182] In another exemplary embodiment, a computer-readable storage medium including program instructions is also provided, which, when executed by a processor, implement the steps of the model training method or text translation method described above. For example, the computer-readable storage medium may be the memory 702 including the program instructions described above, which may be executed by the processor 701 of the electronic device 700 to complete the model training method or text translation method described above.

[0183] In another exemplary embodiment, a computer program product is also provided, which includes a computer program executable by a processor, which, when executed by the processor, implements the steps of the model training method or text translation method described above.

[0184] The preferred embodiments of this disclosure have been described in detail above with reference to the accompanying drawings. However, this disclosure is not limited to the specific details of the above embodiments. Within the scope of the technical concept of this disclosure, various simple modifications can be made to the technical solutions of this disclosure, and these simple modifications all fall within the protection scope of this disclosure.

[0185] It should also be noted that the various specific technical features described in the above embodiments can be combined in any suitable manner without contradiction. To avoid unnecessary repetition, this disclosure will not describe the various possible combinations separately.

[0186] Furthermore, various different embodiments of this disclosure can be combined in any way, as long as they do not violate the spirit of this disclosure, they should also be regarded as the content disclosed in this disclosure.

Claims

1. A model training method, characterized in that, The method includes: Multiple candidate translations of the original rhetorical text are generated using the first model; The reward value of the candidate translation is determined based at least on the degree of consistency of the rhetorical devices between the candidate translation and the original text. The reward value is used to quantify the translation quality of the candidate translation. Based on the reward value, determine the relative advantage value of the candidate translation among multiple candidate translations; A loss function is constructed based on the relative advantage value, which is used to amplify the gradient of high-quality translation samples and suppress the gradient of low-quality samples. The parameters of the first model are optimized based on the loss function.

2. The model training method according to claim 1, characterized in that, The method further includes: Determine the matching value of the rhetorical type and the difference value of the number of rhetorical devices between the original text and the candidate translation; The product of the matching value of the rhetorical type and the difference value of the number of rhetorical devices is calculated to obtain the rhetorical consistency reward of the candidate translation, wherein the rhetorical consistency reward is used to represent the degree of consistency of rhetorical devices between the candidate translation and the original text.

3. The model training method according to claim 2, characterized in that, The determination of the matching value of rhetorical type and the difference value of rhetorical quantity between the original text and the candidate translation includes: The matching value of the rhetorical types between the original text and the candidate translation is obtained by dividing twice the number of rhetorical types shared by the total number of rhetorical types by the total number of rhetorical types. The total number of rhetorical types is the sum of the number of rhetorical types in the original text and the number of rhetorical types in the candidate translation. The absolute value of the difference in the number of rhetorical devices between the original text and the candidate translation is divided by the sum of the number of rhetorical devices in the original text and the smoothing term. The negative value of the quotient is used as the exponential input to calculate the value of the natural exponential function, thus obtaining the difference in the number of rhetorical devices between the original text and the candidate translation.

4. The model training method according to claim 1, characterized in that, The determination of the reward value of the candidate translation based at least on the degree of consistency of rhetorical devices between the candidate translation and the original text includes: The reward value of the candidate translation is determined by combining the degree of consistency of rhetorical devices between the candidate translation and the original text, the semantic fidelity reward, the language fluency consistency reward, and the emotional consistency reward.

5. The model training method according to claim 4, characterized in that, The method further includes: Determine the semantic similarity between the original rhetorical text and the candidate translation, and determine the BLEU value between the back-translated text of the candidate translation and the original rhetorical text, and calculate the product of the BLEU value and the penalty factor for the semantic distribution difference; The semantic fidelity reward of the candidate translation is obtained by weighted summation of the semantic similarity and the product.

6. The model training method according to claim 4, characterized in that, The method further includes: The perplexity of the candidate translation is divided by the temperature coefficient, and then a preset offset is subtracted. The difference is used as the exponential input to calculate the natural exponential function value. The reciprocal of the result obtained by adding the natural exponential function value to 1 is used as the language fluency reward value of the candidate translation. The perplexity is used to determine the language fluency of the candidate translation based on the number of candidate words for each word in the candidate translation predicted by the first model. The temperature coefficient represents a parameter that is adaptively adjusted based on the length of the candidate translation.

7. The model training method according to claim 4, characterized in that, The method further includes: The probability that the candidate translation and the original rhetorical text are consistent in emotional direction is multiplied by the intensity difference attenuation factor, and then multiplied by the conflict penalty term to obtain the emotional consistency reward. The intensity difference attenuation factor refers to the exponent of a power function with the attenuation base as the base and the absolute value of the difference in emotional intensity between the original rhetorical text and the candidate translation as the power function. The conflict penalty term is calculated by subtracting the product of the polarity conflict indicator function and the conflict grading penalty from 1.

8. The model training method according to claim 1, characterized in that, The step of determining the relative advantage value of the candidate translation among multiple candidate translations based on the reward value includes: The relative advantage value of the candidate translation among the candidate translations is obtained by subtracting the mean of the reward values ​​of the multiple candidate translations from the reward value of the candidate translation, and then dividing by the standard deviation of the reward values ​​of the multiple candidate translations.

9. The model training method according to claim 1, characterized in that, Before generating multiple candidate translations of the rhetorical original text through the first model, the method further includes: Identify the rhetorical devices used in the original text. Search the knowledge base for the function of the rhetorical device described; By combining the rhetorical devices, their functions, and the context of the original text, a semantic analysis is performed on the original text to obtain a semantically analyzed text. The semantically analyzed text and the rhetorical devices are input into the first model.

10. The model training method according to claim 1, characterized in that, If the relative advantage value is greater than 1, it means that the translation quality of the candidate translation is higher than the average translation quality of the multiple candidate translations; if the relative advantage value is less than 1, it means that the translation quality of the candidate translation is lower than the average translation quality of the multiple candidate translations. The step of constructing the loss function based on the relative advantage value includes: The relative advantage value is multiplied by the strategy probability ratio of the candidate translation in the original loss function to obtain the loss function, wherein the strategy probability ratio is the ratio of the probability of the first model generating a candidate translation through the new strategy to the probability of generating a candidate translation through the old strategy.

11. A text translation method, characterized in that, include: Get the input statement; The translation result of the input statement is generated by the first model, wherein the first model is trained by the model training method of any one of claims 1 to 10.

12. The text translation method according to claim 11, characterized in that, The method further includes: Identify the rhetorical devices used in the input statement; Search the knowledge base for the function of the rhetorical device described; By combining the rhetorical devices, their functions, and the context of the input statement, semantic analysis is performed on the input statement to obtain semantically analyzed text. The semantically analyzed text and the rhetorical devices are input into the first model.

13. The text translation method according to claim 12, characterized in that, The methods for identifying the rhetorical devices in the input statement include: The rhetorical devices in the input statement are identified by at least one of rule base matching or model recognition.

14. The text translation method according to claim 13, characterized in that, When identifying the rhetorical devices in the input statement by combining rule base matching and model recognition, the method further includes: For a first rhetorical device matched by the rule base, the rule matching strength between the input statement and the rule corresponding to the first rhetorical device in the rule base is determined, wherein the rule matching strength is the number of rules matched by the first rhetorical device divided by the total number of rules corresponding to the first rhetorical device; For the first rhetorical device identified by the model, determine the model prediction probability of the first rhetorical device; The discriminant value is obtained by weighting and fusing the rule matching strength with the model prediction probability. If the discrimination value is greater than the threshold, the rhetorical device used in the input statement is determined to include the first rhetorical device.

15. A model training device, characterized in that, The device includes: The first translation module is used to generate multiple candidate translations of the original rhetorical text using the first model; The reward determination module is used to determine the reward value of the candidate translation based at least on the degree of consistency of the rhetorical devices between the candidate translation and the original text, wherein the reward value is used to quantify the translation quality of the candidate translation. The advantage value determination module is used to determine the relative advantage value of the candidate translation among multiple candidate translations based on the reward value; The loss function construction module is used to construct a loss function based on the relative advantage value, which is used to amplify the gradient of high translation quality samples and suppress the gradient of low quality samples. The parameter update module is used to optimize the parameters of the first model according to the loss function.

16. A text translation device, characterized in that, The device includes: The acquisition module is used to acquire input statements; The second translation module is used to generate the translation result of the input sentence through the first model, wherein the first model is trained by the model training method of any one of claims 1 to 10.

17. An electronic device, characterized in that, include: A memory on which computer programs are stored; A processor for executing the computer program in the memory to implement the steps of the method as claimed in any one of claims 1 to 10, or to implement the steps of the method as claimed in any one of claims 11 to 14.

18. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the computer program implements the steps of the method as claimed in any one of claims 1 to 10, or implements the steps of the method as claimed in any one of claims 11 to 14.

19. A computer program product, comprising a computer program, characterized in that, When executed by a processor, the computer program implements the steps of the method as claimed in any one of claims 1 to 10, or implements the steps of the method as claimed in any one of claims 11 to 14.