Domain-specific language generation model training method and device, and computer device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By decomposing domain-specific language description information into multiple semantic modules and adjusting model parameters based on module quality scores, the problem of poor training performance caused by character-level consistency judgment in existing technologies is solved, thereby improving the accuracy and stability of the model.

CN122242632APending Publication Date: 2026-06-19KINGDEE SOFTWARE(CHINA) CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: KINGDEE SOFTWARE(CHINA) CO LTD
Filing Date: 2026-03-18
Publication Date: 2026-06-19

Application Information

Patent Timeline

18 Mar 2026

Application

19 Jun 2026

Publication

CN122242632A

IPC: G06N3/092; G06N3/096; G06N3/045; G06N3/0475; G06F40/58; G06F40/30

AI Tagging

Application Domain

Natural language translation Semantic analysis

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In existing domain-specific language generation model training methods, the binary reward mechanism only makes judgments based on character-level consistency, which affects the training effect of the model.

Method used

By inputting natural language description information into the initial domain-specific language generation model, domain-specific language description information is generated and decomposed into multiple semantic modules. Based on the semantic module description information and domain-specific language annotation information, the module quality score is determined. The reward score is determined by combining the quality scores of each module, and the model parameters are adjusted to improve accuracy and stability.

Benefits of technology

This improves the training performance of domain-specific language generation models, avoids the problem of a single overall score masking local semantic errors, and enhances the accuracy and stability of the model.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122242632A_ABST

Patent Text Reader

Abstract

This application relates to a method, apparatus, and computer device for training a domain-specific language generation model. The method includes: inputting natural language description information into an initial domain-specific language generation model to obtain domain-specific language description information corresponding to the natural language description information; determining semantic module description information corresponding to each semantic module based on the domain-specific language description information; for each semantic module, determining a module quality score based on the domain-specific language annotation information corresponding to the natural language description information and the semantic module description information corresponding to the semantic module; determining a reward score corresponding to the domain-specific language description information based on each module quality score; and adjusting the initial domain-specific language generation model based on the reward score to obtain a target domain-specific language generation model. This method can improve the training effect of the domain-specific language generation model.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to a method, apparatus, computer device, storage medium, and computer program product for training a domain-specific language generation model. Background Technology

[0002] With the development of computer technology, ChatBI (Chat-based Business Intelligence) is increasingly being applied to enterprise data analysis scenarios. ChatBI allows users to ask business questions in natural language and uses a domain-specific language (DSL) generation model to convert the natural language descriptions into DSL descriptions in order to generate corresponding queries.

[0003] In related technologies, a binary reward mechanism is typically used when training domain-specific language generation models. This mechanism performs a string-level exact match between the generated domain-specific language description and the pre-annotated domain-specific language annotation. A reward score of 1 is given if the two characters are completely identical, and 0 otherwise. However, this binary reward mechanism, based solely on character-level consistency, negatively impacts the training performance of the domain-specific language generation model. Summary of the Invention

[0004] Therefore, it is necessary to provide a domain-specific language generation model training method, apparatus, computer equipment, computer-readable storage medium, and computer program product that can improve the training effect of domain-specific language generation models in response to the above-mentioned technical problems.

[0005] Firstly, this application provides a method for training a domain-specific language generation model. The method includes:

[0006] The natural language description information is input into the initial domain-specific language generation model to obtain the domain-specific language description information corresponding to the natural language description information;

[0007] Based on the domain-specific language description information, determine the semantic module description information corresponding to each semantic module;

[0008] For each semantic module, a module quality score is determined based on the domain-specific language annotation information corresponding to the natural language description information and the semantic module description information corresponding to the semantic module.

[0009] Based on the quality scores of each module, the reward score corresponding to the domain-specific language description information is determined;

[0010] The initial domain-specific language generation model is adjusted based on the reward score to obtain the target domain-specific language generation model.

[0011] Secondly, this application also provides a training device for a domain-specific language generation model. The device includes:

[0012] The input module is used to input natural language description information into the initial domain-specific language generation model to obtain domain-specific language description information corresponding to the natural language description information;

[0013] The determining module is used to determine the semantic module description information corresponding to each semantic module based on the domain-specific language description information;

[0014] The first calculation module is used to determine the module quality score of each semantic module based on the domain-specific language annotation information corresponding to the natural language description information and the semantic module description information corresponding to the semantic module.

[0015] The second calculation module is used to determine the reward score corresponding to the domain-specific language description information based on the quality scores of each module.

[0016] The training module is used to adjust the initial domain-specific language generation model based on the reward score to obtain the target domain-specific language generation model.

[0017] Thirdly, this application also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of any of the methods described in the first aspect.

[0018] Fourthly, this application also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the method described in any one of the first aspects.

[0019] Fifthly, this application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the method described in any of the first aspects.

[0020] The aforementioned domain-specific language generation model training method, apparatus, computer equipment, storage medium, and computer program product input natural language description information into an initial domain-specific language generation model to obtain domain-specific language description information corresponding to the natural language description information; based on the domain-specific language description information, determine the semantic module description information corresponding to each semantic module; for each semantic module, determine the module quality score of the semantic module based on the domain-specific language annotation information corresponding to the natural language description information and the semantic module description information corresponding to the semantic module; based on the quality scores of each module, determine the reward score corresponding to the domain-specific language description information; and adjust the initial domain-specific language generation model based on the reward score to obtain the target domain-specific language generation model. By inputting natural language description information into an initial domain-specific language generation model, domain-specific language description information is generated. This domain-specific language description information is then further decomposed into semantic module description information for multiple semantic modules. A module quality score is determined for each semantic module based on domain-specific language annotation information. A reward score is then determined based on each module quality score, ensuring that the reward score derives from independent quality evaluation results across multiple semantic module dimensions. The reward score comprehensively reflects the correctness of the domain-specific language description information across different semantic modules. The initial domain-specific language generation model is optimized using the reward score. During the model parameter update process, the parameters can be collaboratively adjusted to address quality deviations across different semantic modules, avoiding the problem of a single overall score masking local semantic errors. This improves the accuracy and stability of the target domain-specific language generation model and enhances its training effect. Attached Figure Description

[0021] Figure 1 This is a diagram illustrating the application environment of a domain-specific language generation model training method in one embodiment.

[0022] Figure 2 This is a flowchart illustrating a domain-specific language generation model training method in one embodiment;

[0023] Figure 3 This is a schematic diagram illustrating domain-specific language description information in one embodiment;

[0024] Figure 4 This is a flowchart illustrating the steps for determining the module quality score in one embodiment;

[0025] Figure 5 This is a schematic diagram of the internal structure of a DSL quality assessment module in one embodiment;

[0026] Figure 6 This is a schematic diagram of the overall framework of a domain-specific language generation model training method in one embodiment;

[0027] Figure 7This is a structural block diagram of a domain-specific language generation model training device in one embodiment;

[0028] Figure 8 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation

[0029] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0030] The domain-specific language generation model training method provided in this application can be applied to, for example... Figure 1 In the application environment shown, terminal 102 communicates with server 104 via a network. A data storage system can store the data that server 104 needs to process. The data storage system can be integrated on server 104 or placed on a cloud or other network server. Both the terminal and the server can be used independently to execute the domain-specific language generation model training method provided in this embodiment. The terminal and server can also be used collaboratively to execute the domain-specific language generation model training method provided in this embodiment. For example, terminal 102 inputs natural language description information into the initial domain-specific language generation model to obtain domain-specific language description information corresponding to the natural language description information; based on the domain-specific language description information, it determines the semantic module description information corresponding to each semantic module; for each semantic module, based on the domain-specific language annotation information corresponding to the natural language description information and the semantic module description information corresponding to the semantic module, it determines the module quality score of the semantic module; based on the quality scores of each module, it determines the reward score corresponding to the domain-specific language description information; based on the reward score, it adjusts the initial domain-specific language generation model to obtain the target domain-specific language generation model. The terminal 102 can be, but is not limited to, various personal computers, laptops, smartphones, tablets, IoT devices, and portable wearable devices. IoT devices can include smart speakers, smart TVs, smart air conditioners, and smart in-vehicle systems. Portable wearable devices can include smartwatches, smart bracelets, and head-mounted devices. The server 104 can be implemented using a standalone server or a server cluster consisting of multiple servers.

[0031] In one embodiment, such as Figure 2 As shown, a method for training a domain-specific language generation model is provided. This embodiment uses the application of this method to a computer device as an example for illustration, including steps 202 to 210.

[0032] Step 202: Input the natural language description information into the initial domain-specific language generation model to obtain the domain-specific language description information corresponding to the natural language description information.

[0033] Natural language description information refers to business questions or query requirements in natural language text. Natural language description information can be training data in a training set, which is a collection of training data. A set of training data includes one piece of natural language description information and the corresponding domain-specific language annotation information.

[0034] Domain-specific language generation (LSG) models are machine learning models used to convert natural language descriptions into domain-specific language descriptions. These models can be found in ChatBI, intelligent reporting systems, or conversational analytics systems. The LSG module can be a Large Language Model (LLM), taking natural language descriptions as input and outputting domain-specific language descriptions. The initial LSG model refers to the LSG model that needs to be trained.

[0035] Domain-specific language (DSL) description information refers to structured query statements used to express query logic for a specific business domain (such as finance, business analysis, or manufacturing). DSL description information represents data query statements corresponding to natural language description information. In the ChatBI scenario, DSL description information can adopt an expression form with structured query syntax features, such as a query structure similar to SQL (Structured Query Language). DSL description information can include multiple semantic functional modules, including but not limited to date processing modules, information extraction and matching modules, and a data lookup DSL module. The data lookup DSL module can include sub-modules such as QUERY (query field definition), FILTER (condition filtering), and AGGREGATE (aggregation and grouping). For example, a diagram illustrating DSL description information is shown below. Figure 3 As shown, the domain-specific language description information includes a date processing module 302, an information extraction and matching module 304, and a lookup DSL module 306. The lookup DSL module 306 includes a QUERY submodule, a FILTER submodule, and an AGGREGATE submodule.

[0036] For example, the computer device obtains training data from the training set, including natural language description information and domain-specific language annotation information corresponding to the natural language description information, inputs the natural language description information in the training data into the initial domain-specific language generation model, the initial domain-specific language generation model outputs the domain-specific language description information, and the computer device obtains the domain-specific language description information corresponding to the natural language description information.

[0037] Step 204: Based on the domain-specific language description information, determine the semantic module description information corresponding to each semantic module.

[0038] A semantic module refers to a functional structural unit derived from the semantic functions of a domain-specific language, used for modular decomposition of domain-specific language descriptive information. Semantic modules can be configured according to actual business needs; for example, a semantic module may include multiple modules such as a time module (T), a dimension module (E), a metric module (M), a filtering logic module (F), a granularity module (G), and a structural validity module (S). The semantic module description information refers to the semantic expression content extracted from the domain-specific language description information and matching the corresponding semantic module.

[0039] For example, the computer device obtains semantic module description information corresponding to each semantic module from domain-specific language description information. Specifically, the computer device can obtain the semantic module description information corresponding to each semantic module from the domain-specific language description information based on a pre-written program.

[0040] Step 206: For each semantic module, determine the module quality score based on the domain-specific language annotation information corresponding to the natural language description information and the semantic module description information corresponding to the semantic module.

[0041] Domain-specific language annotation information refers to pre-annotated standard domain-specific language expressions that correspond to natural language description information. Module quality score refers to a continuous scoring result obtained by semantically comparing the semantic module description information with the corresponding domain-specific language annotation information for each semantic module.

[0042] For example, for each semantic module, the computer device obtains the semantic module annotation information corresponding to the semantic module from the domain-specific language annotation information corresponding to the natural language description information, and determines the module quality score of the semantic module based on the semantic module annotation information and the semantic module description information. Specifically, the module quality score of the semantic module can be determined based on the similarity between the semantic module annotation information and the semantic module description information.

[0043] Step 208: Based on the quality scores of each module, determine the reward score corresponding to the domain-specific language description information.

[0044] The reward score refers to the numerical evaluation result determined based on the quality scores of each module, used to guide the optimization of the parameters of the initial domain-specific language generation model. The reward score is used to characterize the comprehensive quality level of the domain-specific language description information across various semantic module dimensions. The higher the reward score, the higher the degree of matching between the domain-specific language description information and the domain-specific language annotation information in terms of semantic structure, logical consistency, and completeness of business expression.

[0045] For example, the computer device determines the reward score corresponding to the domain-specific language description information based on the module quality score and module weight corresponding to each semantic module.

[0046] Step 210: Adjust the initial domain-specific language generation model based on the reward score to obtain the target domain-specific language generation model.

[0047] Here, "adjustment" refers to the process of updating the model parameters of the initial domain-specific language generation model. The target domain-specific language generation model refers to the domain-specific language generation model obtained after training.

[0048] For example, the computer device adjusts the initial domain-specific language generation model based on the reward score to obtain an updated initial domain-specific language generation model, and then returns to execute steps 202 to 210 until the training stopping condition is met, thus obtaining the target domain-specific language generation model. Here, the training stopping condition refers to the condition for stopping training. The training stopping condition can be set according to actual needs, for example, the reward score is greater than a score threshold, or the number of training iterations equals a threshold.

[0049] In the aforementioned domain-specific language generation model training method, natural language description information is input into the initial domain-specific language generation model to generate domain-specific language description information. This domain-specific language description information is then further decomposed into semantic module description information for multiple semantic modules. Module quality scores for each semantic module are determined based on domain-specific language annotation information, and reward scores are determined based on these module quality scores. This ensures that the reward scores derive from independent quality evaluation results across multiple semantic module dimensions, comprehensively reflecting the correctness of the domain-specific language description information across different semantic modules. By optimizing the initial domain-specific language generation model using reward scores, the model parameters can be collaboratively adjusted during updates to address quality deviations across different semantic modules. This avoids the problem of a single overall score masking local semantic errors, thereby improving the accuracy and stability of the target domain-specific language generation model and enhancing its training performance.

[0050] In one embodiment, such as Figure 4 As shown, for each semantic module, based on the domain-specific language annotation information corresponding to the natural language description information and the semantic module description information corresponding to the semantic module, the module quality score of the semantic module is determined, including:

[0051] Step 402: For each semantic module, obtain the semantic module annotation information corresponding to the semantic module from the domain-specific language annotation information corresponding to the natural language description information.

[0052] Among them, semantic module annotation information refers to the standard semantic expression content that matches the semantic module and is extracted from the domain-specific language annotation information, which is used as a reference benchmark for evaluating the quality of the semantic module.

[0053] For example, for each semantic module, the computer device obtains the semantic module annotation information corresponding to the semantic module from the domain-specific language annotation information corresponding to the natural language description information.

[0054] Step 404: Based on the semantic module annotation information and semantic module description information corresponding to the semantic module, determine the sub-item score corresponding to each annotation sub-item in the semantic module.

[0055] In this context, annotation sub-items refer to the subdivided evaluation units obtained by further dividing the semantic module according to the preset semantic evaluation dimensions. These units are used to evaluate the quality of different components within the semantic module. The annotation sub-items of the semantic module are pre-set and can be configured according to specific business scenarios; no restrictions are imposed here. For example, the time module (T) includes three annotation sub-items: whether the date range is correct, whether the upper and lower bounds are compatible, and whether the granularity is consistent; the dimension module (E) includes two annotation sub-items: whether the dimension is correctly identified and whether the enumerated values match precisely; the indicator module (M) includes two annotation sub-items: whether the indicator name is correct, whether the indicator is incorrectly referenced, or whether the indicator does not exist; the filtering logic module (F) includes two annotation sub-items: whether the logical expression is complete and whether there is any ambiguity; and the granularity module includes one annotation sub-item: whether the aggregation method matches the user's query intent. Figure 1 The Structural Validity module includes one annotation sub-item, which determines whether the DSL structure conforms to the syntax rules. Sub-item scoring refers to the evaluation value determined by comparing the semantic module description information with the corresponding semantic module annotation information for a given annotation sub-item, used to characterize the degree of matching or quality level of that annotation sub-item.

[0056] For example, the computer device determines the sub-item score corresponding to each annotation sub-item in the semantic module based on the semantic module annotation information and semantic module description information corresponding to the semantic module.

[0057] Step 406: Convert the sub-item score corresponding to the annotation sub-item into the sub-item penalty factor corresponding to the annotation sub-item.

[0058] Among them, the sub-item penalty factor refers to the numerical factor obtained after mapping and transformation based on the sub-item score. It is used to adjust the contribution of the corresponding annotation sub-item to the semantic module quality score and to characterize the constraint strength of the annotation sub-item on the overall semantic consistency.

[0059] For example, the computer device converts the sub-item score corresponding to the annotation sub-item into the sub-item penalty factor corresponding to the annotation sub-item.

[0060] In one embodiment, converting the sub-item score corresponding to the annotation sub-item into a sub-item penalty factor corresponding to the annotation sub-item includes: determining candidate penalty factors for the annotation sub-item based on the sub-item score and the expected quality value of the sub-item corresponding to the annotation sub-item; comparing the candidate penalty factors with the lower limit penalty factor; if the candidate penalty factor is less than the lower limit penalty factor, determining the lower limit penalty factor as the sub-item penalty factor corresponding to the annotation sub-item; if the candidate penalty factor is equal to or greater than the lower limit penalty factor, determining the candidate penalty factor as the sub-item penalty factor corresponding to the annotation sub-item.

[0061] In one embodiment, the sub-item score corresponding to the annotation sub-item is converted into the sub-item penalty factor corresponding to the annotation sub-item, using the following formula:

[0062] Formula (1)

[0063] in, is the penalty factor for the annotation sub-item; s is the sub-item score for the annotation sub-item; t is the expected quality value of the sub-item, i.e., the annotation threshold of the annotation sub-item; As an adjustment factor, a preset very small integer is used to prevent the denominator from being zero; The penalty intensity is used to control the decay of the penalty factor when the sub-item score is lower than the expected quality value of the sub-item. The larger the value, the faster the penalty factor decreases when the sub-item score deviates from the expected sub-item quality, thus strengthening the constraint on semantic bias. This is the lower limit penalty factor; To obtain and The maximum value in.

[0064] Step 408: Determine the module quality score of the semantic module based on the sub-item penalty factor and sub-item weight corresponding to each annotation sub-item.

[0065] Among them, the sub-item weight refers to the weighting coefficient used to characterize the difference in importance of annotation sub-items in the semantic module quality assessment process.

[0066] For example, the computer device obtains the sub-item weights corresponding to each annotation sub-item, and determines the module quality score of the semantic module based on the sub-item penalty factor and sub-item weights corresponding to each annotation sub-item.

[0067] In one embodiment, determining the module quality score of the semantic module based on the sub-item penalty factor and sub-item weight corresponding to each annotation sub-item includes: adding the product of the sub-item penalty factor and the sub-item weight corresponding to each annotation sub-item to obtain a first statistical value; adding the sub-item penalty factors corresponding to each annotation sub-item to obtain a second statistical value; and dividing the first statistical value by the second statistical value to obtain the module quality score of the semantic module.

[0068] In this embodiment, semantic module annotation information corresponding to the semantic module is extracted from domain-specific language annotation information. Sub-item scores for each annotation sub-item are determined based on the semantic module annotation information and semantic module description information. These sub-item scores are then converted into sub-item penalty factors and combined with sub-item weights to determine the module quality score. This ensures that the quality evaluation of the semantic module is based on the independent evaluation results of each annotation sub-item. The sub-item penalty factors constrain and adjust deviations in sub-item scores, while sub-item weights regulate the influence of different annotation sub-items on the module quality calculation. This prevents the overall semantic module score from masking local semantic errors in annotation sub-items, resulting in a more objective, stable, and discriminative module quality score. This provides a more refined evaluation basis for the accurate determination of subsequent reward scores.

[0069] In one embodiment, the module quality score of the semantic module is determined based on the sub-item penalty factor and sub-item weight corresponding to each annotation sub-item, including:

[0070] Based on the sub-item weights of each annotation sub-item, the sub-item penalty factors corresponding to multiple annotation sub-items are aggregated to obtain the sub-item aggregate value; the sub-item weights of multiple annotation sub-items are statistically analyzed to obtain the sub-item weight statistical value; based on the sub-item aggregate value and the sub-item weight statistical value, the module quality score of the semantic module is determined.

[0071] Among them, the sub-item aggregate value refers to the comprehensive value obtained by aggregating the sub-item penalty factors based on the sub-item weights corresponding to multiple annotation sub-items. The sub-item aggregate value represents the overall constraint degree of the penalty results of each annotation sub-item. The sub-item weight statistical value refers to the value obtained by statistically calculating the sub-item weights of multiple annotation sub-items.

[0072] For example, the computer device first aggregates the sub-item penalty factors corresponding to multiple annotation sub-items based on the sub-item weights of each annotation sub-item to obtain the sub-item aggregate value. Then, it statistically analyzes the sub-item weights of multiple annotation sub-items to obtain the sub-item weight statistical value. Finally, based on the sub-item aggregate value and the sub-item weight statistical value, it determines the module quality score of the semantic module.

[0073] In one embodiment, the formula for calculating the module quality score of a semantic module is as follows:

[0074] Formula (2)

[0075] in, Score the module quality of semantic module k; This is the sub-item penalty factor for the i-th annotation sub-item in semantic module k; Let be the weight of the i-th annotation sub-item in semantic module k; This is the aggregate value of the sub-items; α is the sub-item weight statistical value, used for normalization to make the module quality score of the semantic module more stable and to make the module quality scores of multiple semantic modules comparable; α is the semantic module strictness control parameter, used to adjust the strictness of the entire semantic module score. It can be dynamically adjusted according to business needs. α>1 indicates stricter, low-score items have a more obvious impact, α=1 is the ordinary geometric mean, and α<1 indicates more lenient.

[0076] In this embodiment, multiple sub-item penalty factors are aggregated based on the sub-item weights of each annotation sub-item to obtain an aggregated sub-item value. Then, the sub-item weight statistics are obtained by statistically analyzing the multiple sub-item weights. Finally, the module quality score of the semantic module is determined based on the aggregated sub-item value and the sub-item weight statistics. This creates a dual-scale adjustment structure in the module quality score calculation process, simultaneously considering both the penalty result and the weight scale. This comprehensively reflects the degree of impact of each annotation sub-item penalty while performing scale correction on the differences in the number and weight distribution of annotation sub-items in different semantic modules. This avoids imbalances in module quality scores caused by changes in the number of annotation sub-items or differences in the total weight, improves the comparability and stability of score results between different semantic modules, and provides a unified and stable evaluation basis for the subsequent determination of reward scores.

[0077] In one embodiment, based on the quality scores of each module, the reward score corresponding to the domain-specific language description information is determined, including:

[0078] A baseline score is obtained by linearly weighting the quality scores of multiple modules; a module aggregation value is determined based on the module weights and module quality scores corresponding to multiple semantic modules; and a reward score is determined based on the baseline score and the module aggregation value for the domain-specific language description information.

[0079] The baseline score is a comprehensive evaluation value obtained by linearly weighting the module quality scores of multiple semantic modules. It is used to characterize the overall basic quality level of domain-specific language description information across each semantic module dimension. Module weights are weighting coefficients used to characterize the differences in importance among different semantic modules during the reward score calculation. The module aggregation value is a comprehensive value obtained by aggregating the module weights and module quality scores of multiple semantic modules. It reflects the overall constraint result of each semantic module under weight adjustment.

[0080] For example, the computer device performs a linear weighted average of the quality scores of multiple modules to obtain a baseline score, determines a module aggregation value based on the module weights and module quality scores corresponding to multiple semantic modules, and multiplies the baseline score by the module aggregation value to obtain the reward score corresponding to the domain-specific language description information.

[0081] In one embodiment, when the semantic module includes a time module (T), a dimension module (E), a metric module (M), a filtering logic module (F), a granularity module (G), and a structural validity module (S), the formula for calculating the benchmark score is as follows:

[0082] Formula (3)

[0083] in, As the benchmark score; The module weights for the time module; The module weights for the dimension modules; The module weights of the indicator modules; Module weights for filtering logic modules; The module weights for granularity modules; The module weight for the structure validity module; The module quality score for the time module; Calculate the module quality score for the dimension module; The module quality score is given to the indicator module; Calculate the module quality score for the filtering logic module; Scoring the module quality of the granularity module; The module quality score is given for the structural legality module.

[0084] In one embodiment, the formula for calculating the reward score corresponding to the domain-specific language description information is as follows:

[0085] Formula (4)

[0086] Where R is the reward score corresponding to the domain-specific language description information; base is the baseline score; This is the module's aggregated value.

[0087] In this embodiment, a baseline score is obtained by linearly weighting the module quality scores of multiple semantic modules. The module aggregate value is then determined by combining the module weights and module quality scores of each semantic module. Finally, a reward score is determined based on the baseline score and the module aggregate value. This process of determining the reward score introduces an adjustment mechanism for the differences in the importance of semantic modules on the basis of the overall quality level evaluation of semantic modules. This avoids weakening the influence of key semantic modules, makes the contribution ratio of different semantic modules to the final reward score controllable, reduces the reward score deviation caused by changes in the number of modules or differences in weights, improves the stability and adjustability of the reward score calculation results, and enhances the constraint strength on key semantic modules during the training of the domain-specific language generation model.

[0088] In one embodiment, a module aggregation value is determined based on the module weights and module quality scores corresponding to multiple semantic modules, including:

[0089] The module quality score corresponding to the semantic module is converted into the module penalty factor corresponding to the semantic module; based on the module weight of each semantic module, the module penalty factors corresponding to multiple semantic modules are aggregated to obtain the module aggregate value.

[0090] Among them, the module penalty factor refers to the numerical factor obtained by converting the module quality score corresponding to the semantic module, which is used to characterize the degree of deviation of the module quality score of the semantic module from the expected value of module quality.

[0091] For example, the computer device converts the module quality score corresponding to a semantic module into a module penalty factor corresponding to the semantic module using a preset conversion formula. Then, based on the module weights of each semantic module, the module penalty factors corresponding to multiple semantic modules are aggregated to obtain a module aggregate value. Here, the preset conversion formula refers to a pre-set formula used to convert the module quality score corresponding to a semantic module into a module penalty factor corresponding to the semantic module.

[0092] In one embodiment, the preset conversion formula is as follows:

[0093] Formula (5)

[0094] in, This is the module penalty factor corresponding to semantic module k; The module quality score corresponding to semantic module k; This represents the expected quality value for the module. As an adjustment factor, a preset very small integer is used to prevent the denominator from being zero; The penalty intensity is used to control the decay of the penalty factor when the module quality score is lower than the expected module quality value; As the lower limit penalty factor, ; To obtain and The maximum value in.

[0095] In one embodiment, when the semantic module includes a time module (T), a dimension module (E), a metric module (M), a filtering logic module (F), a granularity module (G), and a structural validity module (S), the formula for determining the module aggregation value is as follows:

[0096] Formula (6)

[0097] in, For module aggregate values; The module penalty factor for the time module; The module penalty factor for the dimension module; This is the module penalty factor for the indicator module; The module penalty factor for the filtering logic module; This is the module penalty factor for the granularity module; The module penalty factor for the structural legality module; The module weights for the time module; The module weights for the dimension modules; The module weights of the indicator modules; Module weights for filtering logic modules; The module weights for granularity modules; The module weight for the structure validity module; As a global strictness adjustment parameter, when the module quality score of a semantic module is significantly lower than the expected module quality value, the module penalty factor will be significantly less than 1, and thus, through aggregation, [the penalty factor will be adjusted].

[0098] Lower it.

[0099] In this embodiment, by converting the module quality score corresponding to the semantic module into a module penalty factor, and aggregating multiple module penalty factors based on the module weight of each semantic module to obtain a module aggregate value, the aggregation method of the semantic module level is changed from directly weighting the module quality score to a constraint-type aggregation structure based on the penalty adjustment mechanism. This maintains the adjustment effect of module weights while introducing a nonlinear suppression mechanism for quality deviations, making the lower-quality semantic modules have a more obvious inhibitory effect during the aggregation process. This avoids the simple average masking of low-quality semantic modules by high-quality semantic modules, improves the sensitivity of the module aggregate value to local quality defects, and provides a more stable and constrained evaluation basis for the subsequent determination of reward scores.

[0100] In one embodiment, converting the module quality score corresponding to the semantic module into a module penalty factor corresponding to the semantic module includes:

[0101] Based on the module quality score and the expected value of the module quality corresponding to the semantic module, a reference penalty factor for the semantic module is determined; the reference penalty factor and the lower limit penalty factor are compared; if the reference penalty factor is less than the lower limit penalty factor, the lower limit penalty factor is determined as the module penalty factor for the semantic module; if the reference penalty factor is equal to or greater than the lower limit penalty factor, the reference penalty factor is determined as the module penalty factor for the semantic module.

[0102] The reference penalty factor is a penalty coefficient calculated based on the relationship between the module quality score corresponding to the semantic module and the expected value of the module quality corresponding to the semantic module, used to initially characterize the degree of quality deviation of the semantic module. The lower limit penalty factor is a preset penalty coefficient threshold used to limit the minimum value of the module penalty factor. For example, as shown in formula (5) For reference penalty factors, This is the lower limit penalty factor.

[0103] For example, the computer device determines a reference penalty factor for the semantic module based on the module quality score and the expected value of the module quality corresponding to the semantic module. It compares the reference penalty factor with the lower limit penalty factor. If the reference penalty factor is less than the lower limit penalty factor, the lower limit penalty factor is determined as the module penalty factor corresponding to the semantic module. If the reference penalty factor is equal to or greater than the lower limit penalty factor, the reference penalty factor is determined as the module penalty factor corresponding to the semantic module.

[0104] In one embodiment, the formula for calculating the reference penalty factor is as follows:

[0105] Formula (7)

[0106] in, This is the reference penalty factor corresponding to semantic module k; The module quality score corresponding to semantic module k; This represents the expected quality value for the module. As an adjustment factor, a preset very small integer is used to prevent the denominator from being zero; The penalty intensity is used to control the decay of the penalty factor when the module quality score is lower than the expected module quality value.

[0107] In this embodiment, a reference penalty factor is determined based on the module quality score and the expected value of the module quality. The reference penalty factor is then compared with the lower limit penalty factor, and the final module penalty factor is determined based on the comparison result. This constrains the range of module penalty factors by a lower limit, thereby maintaining the quality deviation adjustment function while avoiding excessive contraction or decay of the module penalty factor due to extreme scoring conditions. This improves the numerical stability of the module penalty factor under different semantic module conditions, enhances the controllability of the module aggregation process, and provides a more stable and adjustable penalty basis for determining the subsequent module aggregation value and reward score.

[0108] In one exemplary embodiment, a training method for a domain-specific language generation model in the ChatBI scenario is proposed. Through a technical process of "semantic parsing - module decomposition - multi-dimensional annotation sub-item scoring - soft-AND geometric aggregation - global scoring - reinforcement learning feedback", the method achieves a quality assessment of the structured, continuous, and interpretable domain-specific language description information generated by the domain-specific language generation model, and uses it as a reward signal for reinforcement learning, thereby improving the accuracy and stability of the domain-specific language generation model.

[0109] Domain-specific language generation models can be machine learning models used in ChatBI, intelligent reporting systems, or conversational analytics systems to convert natural language descriptions into domain-specific language descriptions. The training method for these models combines large-scale model structured evaluation techniques with reinforcement learning training optimization techniques. This is a complete technical solution that can be directly applied to RLHF (Reinforcement Learning from Human Feedback) / DAPO (Direct Alignment Preference Optimization) training pipelines.

[0110] A domain-specific language generation model training method based on "multidimensional semantic annotation + geometric weighted aggregation" is proposed. In the execution of the domain-specific language generation model training method, a DSL quality assessment module is used to determine the reward score corresponding to the domain-specific language description information. A schematic diagram of the internal structure of the DSL quality assessment module is shown below. Figure 5 As shown, it includes a DSL input module, a semantic parser, a semantic module decomposer, a multidimensional annotation scorer, a soft-AND geometric weighted aggregator, and a global DSL quality score generator.

[0111] (1) DSL input module

[0112] The DSL input module receives domain-specific language description information generated by the domain-specific language generation model. It does not participate in semantic processing; its sole responsibility is to pass the domain-specific language description information to the subsequent parsing module. The DSL input module ensures a unified input channel, enabling the system to be compatible with various domain-specific language description information generation methods (such as template-based, LLM-based, and hybrid constraint-based methods).

[0113] (2) Semantic parser

[0114] The semantic parser performs lexical and structural parsing of domain-specific language description information, including:

[0115] Keyword recognition (such as QUERY, FILTER, AGGREGATE, etc.);

[0116] Extract semantic elements such as field names, enumeration values, indicator names, and time expressions;

[0117] The semantic parser outputs a structured intermediate representation, enabling subsequent modules to perform analysis based on semantic relationships rather than strings. This module ensures the stability and formal basis of DSL quality assessment, avoiding inconsistencies in scoring due to different string orders.

[0118] (3) Semantic module decomposer

[0119] In the ChatBI scenario, the semantic module decomposer automatically divides the structured intermediate representation into the following six semantic modules: time module (T), dimension module (E), indicator module (M), filtering logic module (F), granularity module (G), and structural validity module (S).

[0120] (4) Multidimensional annotation scorer

[0121] It includes time module annotation scorers, dimension module annotation scorers, indicator module annotation scorers, filter module annotation scorers, granularity module annotation scorers, and structure module annotation scorers.

[0122] Each semantic module k contains several annotation items. Each annotation scorer outputs consecutive scores for multiple annotation items within the corresponding semantic module. .

[0123] The evaluation criteria of the annotation scorer are as follows:

[0124] Time module annotation scorer: whether the date range is correct, whether the upper and lower bounds are compatible, and whether the granularity is consistent.

[0125] Dimension module annotation scorer: whether the dimension is correctly identified and whether the enumeration value is exactly matched.

[0126] Indicator module annotation scorer: Checks if the indicator name is correct, if an incorrect indicator is referenced, or if the indicator does not exist.

[0127] The filter module annotation scorer checks whether the logical expression is complete and whether there is any ambiguity.

[0128] Granularity module annotation scorer: Whether the aggregation method matches the user's query intent. Figure 1 To.

[0129] Structural module annotation scorer: Whether the DSL structure conforms to the syntax specification.

[0130] Each annotation scorer performs independent scoring through custom annotation logic, resulting in a set of scores for multiple annotation sub-items within the corresponding semantic module. .

[0131] (5) Soft-AND geometric weighted aggregator

[0132] Determine the module quality score for the semantic module.

[0133] (6) Global DSL quality score generator

[0134] Determine the reward score corresponding to the domain-specific language description information.

[0135] A schematic diagram of the overall framework of the domain-specific language generation model training method is shown below. Figure 6 As shown, it includes:

[0136] Step 602: The computer device acquires the training set.

[0137] Step 604: The computer device obtains training data from the training set, including natural language description information and domain-specific language annotation information corresponding to the natural language description information.

[0138] Step 606: The computer device inputs the natural language description information from the training data into the initial domain-specific language generation model.

[0139] Step 608: The initial domain-specific language generation model outputs domain-specific language description information, and the computer device obtains the domain-specific language description information corresponding to the natural language description information.

[0140] Step 610, semantic module decomposition.

[0141] The computer device obtains the semantic module description information corresponding to each semantic module from the domain-specific language description information. Specifically, the computer device can obtain the semantic module description information corresponding to each semantic module from the domain-specific language description information based on a pre-written program.

[0142] Step 612, scoring of multidimensional annotation sub-items.

[0143] For each semantic module, the computer device obtains the semantic module annotation information corresponding to the semantic module from the domain-specific language annotation information corresponding to the natural language description information; based on the semantic module annotation information and semantic module description information, it determines the sub-item score corresponding to each annotation sub-item in the semantic module.

[0144] Step 614, Soft-AND Geometric Aggregation

[0145] Formula (1) is used to convert the sub-item scores corresponding to the annotation sub-items into sub-item penalty factors corresponding to the annotation sub-items; Formula (2) is used to perform soft-AND geometric aggregation on the sub-item penalty factors of multiple annotation sub-items of the semantic module to obtain the module quality score of the semantic module.

[0146] Step 616, Global DSL Quality Score.

[0147] The benchmark score is obtained by linearly weighting the module quality scores of multiple semantic modules using formula (3).

[0148] Formula (5) is used to convert the module quality score corresponding to the semantic module into the module penalty factor corresponding to the semantic module; Formula (6) is used to aggregate the module penalty factors corresponding to multiple semantic modules to obtain the module aggregate value.

[0149] Using formula (4), the reward score corresponding to the domain-specific language description information is determined based on the benchmark score and the module aggregation value.

[0150] Step 618, reinforce learning feedback.

[0151] The computer device adjusts the initial domain-specific language generation model based on the reward score to obtain an updated initial domain-specific language generation model, and returns to execute steps 604 to 618 until the training stopping condition is met to obtain the target domain-specific language generation model.

[0152] In the aforementioned domain-specific language generation model training method, natural language description information is input into the initial domain-specific language generation model to generate domain-specific language description information. This domain-specific language description information is then further decomposed into semantic module description information for multiple semantic modules. Module quality scores for each semantic module are determined based on domain-specific language annotation information, and reward scores are determined based on these module quality scores. This ensures that the reward scores derive from independent quality evaluation results across multiple semantic module dimensions, comprehensively reflecting the correctness of the domain-specific language description information across different semantic modules. By optimizing the initial domain-specific language generation model using reward scores, the model parameters can be collaboratively adjusted during updates to address quality deviations across different semantic modules. This avoids the problem of a single overall score masking local semantic errors, thereby improving the accuracy and stability of the target domain-specific language generation model and enhancing its training performance.

[0153] It should be understood that although the steps in the flowcharts of the embodiments described above are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the embodiments described above may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.

[0154] Based on the same inventive concept, this application also provides a domain-specific language generation model training device for implementing the domain-specific language generation model training method described above. The solution provided by this device is similar to the implementation described in the above method; therefore, the specific limitations in one or more embodiments of the domain-specific language generation model training device provided below can be found in the limitations of the domain-specific language generation model training method described above, and will not be repeated here.

[0155] In one embodiment, such as Figure 7 As shown, a domain-specific language generation model training device is provided, comprising: an input module 702, a determination module 704, a first calculation module 706, a second calculation module 708, and a training module 710, wherein:

[0156] The input module 702 is used to input natural language description information into the initial domain-specific language generation model to obtain domain-specific language description information corresponding to the natural language description information.

[0157] Module 704 is used to determine the semantic module description information corresponding to each semantic module based on the domain-specific language description information.

[0158] The first calculation module 706 is used to determine the module quality score of each semantic module based on the domain-specific language annotation information corresponding to the natural language description information and the semantic module description information corresponding to the semantic module.

[0159] The second calculation module 708 is used to determine the reward score corresponding to the domain-specific language description information based on the quality scores of each module.

[0160] Training module 710 is used to adjust the initial domain-specific language generation model based on the reward score to obtain the target domain-specific language generation model.

[0161] In one embodiment, the first calculation module 706 is further configured to: for each semantic module, obtain semantic module annotation information corresponding to the semantic module from the domain-specific language annotation information corresponding to the natural language description information; determine the sub-item score corresponding to each annotation sub-item in the semantic module based on the semantic module annotation information and the semantic module description information; convert the sub-item score corresponding to the annotation sub-item into the sub-item penalty factor corresponding to the annotation sub-item; and determine the module quality score of the semantic module based on the sub-item penalty factor and sub-item weight corresponding to each annotation sub-item.

[0162] In one embodiment, the first calculation module 706 is further configured to: aggregate the sub-item penalty factors corresponding to multiple annotation sub-items based on the sub-item weights of each annotation sub-item to obtain a sub-item aggregate value; statistically analyze the sub-item weights of multiple annotation sub-items to obtain a sub-item weight statistical value; and determine the module quality score of the semantic module based on the sub-item aggregate value and the sub-item weight statistical value.

[0163] In one embodiment, the second calculation module 708 is further configured to: perform a linear weighted average of the quality scores of multiple modules to obtain a benchmark score; determine a module aggregation value based on the module weights and module quality scores corresponding to multiple semantic modules; and determine a reward score corresponding to the domain-specific language description information based on the benchmark score and the module aggregation value.

[0164] In one embodiment, the second calculation module 708 is further configured to: convert the module quality score corresponding to the semantic module into the module penalty factor corresponding to the semantic module; and aggregate the module penalty factors corresponding to multiple semantic modules based on the module weights of each semantic module to obtain a module aggregate value.

[0165] In one embodiment, the second calculation module 708 is further configured to: determine a reference penalty factor for the semantic module based on the module quality score corresponding to the semantic module and the expected value of the module quality corresponding to the semantic module; compare the reference penalty factor with the lower limit penalty factor; if the reference penalty factor is less than the lower limit penalty factor, determine the lower limit penalty factor as the module penalty factor corresponding to the semantic module; if the reference penalty factor is equal to or greater than the lower limit penalty factor, determine the reference penalty factor as the module penalty factor corresponding to the semantic module.

[0166] Each module in the aforementioned domain-specific language generation model training device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the memory of a computer device as software, so that the processor can call and execute the operations corresponding to each module.

[0167] In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as follows: Figure 8 As shown, the computer device includes a processor, memory, input / output interfaces, a communication interface, a display unit, and an input device. The processor, memory, and input / output interfaces are connected via a system bus, and the communication interface, display unit, and input device are also connected to the system bus via the input / output interfaces. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The input / output interfaces are used for exchanging information between the processor and external devices. The communication interface is used for wired or wireless communication with external terminals; wireless communication can be achieved through Wi-Fi, mobile cellular networks, NFC (Near Field Communication), or other technologies. When the computer program is executed by the processor, it implements a domain-specific language generation model training method. The display unit is used to form a visually visible image and can be a display screen, a projection device, or a virtual reality imaging device. The display screen can be an LCD screen or an e-ink screen. The input device of the computer device can be a touch layer covering the display screen, or buttons, trackballs, or touchpads set on the casing of the computer device, or external keyboards, touchpads, or mice, etc.

[0168] Those skilled in the art will understand that Figure 8The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0169] In one embodiment, a computer device is provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the above-described method embodiments.

[0170] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon that, when executed by a processor, implements the steps in the above method embodiments.

[0171] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps in the above method embodiments.

[0172] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties.

[0173] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments described above. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.

[0174] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0175] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.

Claims

1. A method for training a domain-specific language generation model, characterized in that, The method includes: The natural language description information is input into the initial domain-specific language generation model to obtain the domain-specific language description information corresponding to the natural language description information; Based on the domain-specific language description information, determine the semantic module description information corresponding to each semantic module; For each semantic module, a module quality score is determined based on the domain-specific language annotation information corresponding to the natural language description information and the semantic module description information corresponding to the semantic module. Based on the quality scores of each module, the reward score corresponding to the domain-specific language description information is determined; The initial domain-specific language generation model is adjusted based on the reward score to obtain the target domain-specific language generation model.

2. The method according to claim 1, characterized in that, For each semantic module, based on the domain-specific language annotation information corresponding to the natural language description information and the semantic module description information corresponding to the semantic module, a module quality score is determined, including: For each semantic module, the semantic module annotation information corresponding to the semantic module is obtained from the domain-specific language annotation information corresponding to the natural language description information; Based on the semantic module annotation information and semantic module description information corresponding to the semantic module, the sub-item score corresponding to each annotation sub-item in the semantic module is determined; Convert the sub-item score corresponding to the annotation sub-item into the sub-item penalty factor corresponding to the annotation sub-item; The module quality score of the semantic module is determined based on the sub-item penalty factor and sub-item weight corresponding to each of the annotation sub-items.

3. The method according to claim 2, characterized in that, The process of determining the module quality score of the semantic module based on the sub-item penalty factor and sub-item weight corresponding to each of the annotation sub-items includes: Based on the sub-item weights of each of the annotation sub-items, the sub-item penalty factors corresponding to multiple annotation sub-items are aggregated to obtain the sub-item aggregate value; The weights of the sub-items of the multiple annotation sub-items are statistically analyzed to obtain the sub-item weight statistics; Based on the aggregated value of the sub-items and the statistical value of the sub-items weights, the module quality score of the semantic module is determined.

4. The method according to claim 1, characterized in that, The step of determining the reward score corresponding to the domain-specific language description information based on the quality scores of each module includes: A baseline score is obtained by linearly weighting the quality scores of multiple modules. Based on the module weights and module quality scores corresponding to multiple semantic modules, a module aggregation value is determined. Based on the benchmark score and the module aggregation value, the reward score corresponding to the domain-specific language description information is determined.

5. The method according to claim 4, characterized in that, The process of determining the module aggregation value based on the module weights and module quality scores corresponding to multiple semantic modules includes: Convert the module quality score corresponding to the semantic module into the module penalty factor corresponding to the semantic module; Based on the module weights of each semantic module, the module penalty factors corresponding to multiple semantic modules are aggregated to obtain a module aggregate value.

6. The method according to claim 5, characterized in that, The step of converting the module quality score corresponding to the semantic module into the module penalty factor corresponding to the semantic module includes: Based on the module quality score corresponding to the semantic module and the expected value of the module quality corresponding to the semantic module, a reference penalty factor for the semantic module is determined. Compare the reference penalty factor and the lower limit penalty factor; If the reference penalty factor is less than the lower limit penalty factor, the lower limit penalty factor is determined as the module penalty factor corresponding to the semantic module; If the reference penalty factor is equal to or greater than the lower limit penalty factor, the reference penalty factor is determined as the module penalty factor corresponding to the semantic module.

7. A domain-specific language generation model training device, characterized in that, The device includes: The input module is used to input natural language description information into the initial domain-specific language generation model to obtain domain-specific language description information corresponding to the natural language description information; The determining module is used to determine the semantic module description information corresponding to each semantic module based on the domain-specific language description information; The first calculation module is used to determine the module quality score of each semantic module based on the domain-specific language annotation information corresponding to the natural language description information and the semantic module description information corresponding to the semantic module. The second calculation module is used to determine the reward score corresponding to the domain-specific language description information based on the quality scores of each module. The training module is used to adjust the initial domain-specific language generation model based on the reward score to obtain the target domain-specific language generation model.

8. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 6.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 6.

10. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 6.