Method and apparatus for training a language model

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By generating semantically similar positive examples and semantically opposite negative examples, and optimizing training using language model feature distance, the problem of low semantic feature quality in language model training is solved, semantic recognition and training efficiency is improved, and the performance of downstream tasks is enhanced.

CN116610949BActive Publication Date: 2026-06-16JD DIGITS HAIYI INFORMATION TECHNOLOGY CO LTD

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: JD DIGITS HAIYI INFORMATION TECHNOLOGY CO LTD
Filing Date: 2023-05-08
Publication Date: 2026-06-16

Application Information

Patent Timeline

08 May 2023

Application

16 Jun 2026

Publication

CN116610949B

IPC: G06F18/214; G06F18/22; G06F18/25; G06F40/35; G06F40/284; G06F40/237

CPC: G06F18/214; G06F18/22; G06F18/253; G06F40/35; G06F40/284; G06F40/237

AI Tagging

Application Domain

Natural language data processing

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Source Identifying Forensics for Digital Media
US20260161748A1Natural language data processingProgram/content distribution protection
Document review generation method, apparatus, and electronic device
CN122197830ANatural language data processing Office automation
A multi-screen presentation page code overflow identification method, device, equipment and storage medium
CN122195370AAvoid error reportingThe recognition effect is accurateNatural language data processing Digital output to display device
Prompt word optimization method and device based on reasoning model, electronic equipment, storage medium and program product
CN122196112ADigital data information retrieval Natural language data processing
Document processing method and apparatus, electronic device, storage medium, and program product
CN122197828ANatural language data processing Input/output processes for data processing

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing technologies struggle to effectively optimize language model training, resulting in low-quality semantic features and an inability to effectively distinguish between textual and semantic similarity.

⚗Method used

By generating positive example statements that are semantically similar to the sample statements and a first negative example statement that is semantically opposite, the language model is used to obtain features, and the contrast loss is determined based on the feature distance to train the language model.

🎯Benefits of technology

It improves the semantic recognition performance and training efficiency of language models, enabling them to better distinguish between textual similarity and semantic similarity, thereby enhancing the performance of downstream natural language processing tasks.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116610949B_ABST

Patent Text Reader

Abstract

The application discloses a language model training method and device, and relates to the technical field of computers. A specific implementation of the method comprises the following steps: obtaining a sample sentence; generating a positive example sentence and a first negative example sentence of the sample sentence; wherein the positive example sentence is similar in semantics to the sample sentence, and the first negative example sentence is opposite in semantics to the sample sentence; obtaining a sample feature of the sample sentence, a positive example feature of the positive example sentence and a first negative example feature of the first negative example sentence by using a language model; determining a contrast loss according to the sample feature, the positive example feature and the first negative example feature; and training the language model according to the contrast loss. The implementation can effectively optimize the training of the language model, and thus obtain high-quality semantic features of a text sentence.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and in particular to a method and apparatus for training a language model. Background Technology

[0002] A language model is a deep learning model that transforms text sentences into semantic features. Through a language model, the semantic features of a text sentence can be obtained, enabling the execution of various downstream natural language processing tasks. Therefore, the successful execution of downstream natural language processing tasks is closely related to the quality of the semantic features of the text sentence. How to effectively optimize the training of language models to obtain high-quality semantic features is a problem that urgently needs to be solved. Summary of the Invention

[0003] In view of this, embodiments of the present invention provide a method and apparatus for training a language model, which can effectively optimize the training of the language model and thereby obtain high-quality semantic features of text sentences.

[0004] In a first aspect, embodiments of the present invention provide a method for training a language model, comprising:

[0005] Get sample statements;

[0006] Generate positive example statements and a first negative example statement of the sample statement; wherein the positive example statement has a similar semantics to the sample statement, and the first negative example statement has an opposite semantics to the sample statement;

[0007] Using a language model, the sample features of the sample statement, the positive features of the positive example statement, and the first negative feature of the first negative example statement are obtained respectively.

[0008] The contrast loss is determined based on the sample features, the positive example features, and the first negative example features;

[0009] The language model is trained based on the contrastive loss.

[0010] Optionally, determining the contrast loss based on the sample features, the positive example features, and the first negative example features includes:

[0011] Determine the positive distance between the sample features and the positive feature;

[0012] Determine the first negative example distance between the sample feature and the first negative example feature;

[0013] The contrast loss is determined based on the positive example distance and the first negative example distance.

[0014] Optionally, generating positive example statements for the sample statements includes:

[0015] Using data augmentation methods, positive example statements of the sample statements are generated. The data augmentation methods include: synonym substitution or other language escaping.

[0016] Optionally, generating the positive example statement and the first negative example statement of the sample statement includes:

[0017] The sample statement is segmented to generate at least one sample word;

[0018] Determine whether the antonym library contains any of the aforementioned sample word segments;

[0019] In response to the presence of any of the sample word segments in the antonym library, the target antonym corresponding to the target word segment in the antonym library is obtained; the target word segment in the sample statement is replaced with the target antonym to generate the first negative example statement;

[0020] In response to the fact that the antonym library does not contain any of the sample word segments, the first negative example statement is generated by inserting a negative word.

[0021] Optionally, after obtaining the sample statement, the method further includes:

[0022] Generate a second negative example statement from the sample statement; wherein the second negative example statement is randomly obtained from the corpus;

[0023] The language model is used to obtain the second negative example features of the second negative example statement;

[0024] The step of determining the contrast loss based on the sample features, the positive example features, and the first negative example features includes:

[0025] The contrast loss is determined based on the sample features, the positive example features, the first negative example features, and the second negative example features.

[0026] Optionally, determining the contrast loss based on the sample features, the positive example features, the first negative example features, and the second negative example features includes:

[0027] Determine the positive distance between the sample features and the positive feature;

[0028] Determine the first negative example distance between the sample feature and the first negative example feature;

[0029] Determine the second negative example distance between the sample feature and the second negative example feature;

[0030] The contrast loss is determined based on the positive example distance, the first negative example distance, and the second negative example distance.

[0031] Optionally, after obtaining the sample statement, the method further includes:

[0032] Generate a second negative example statement from the sample statement; wherein the second negative example statement is randomly obtained from the corpus;

[0033] The language model is used to obtain the second negative example features of the second negative example statement;

[0034] Based on the positive example features and the second negative example features, the fusion features of the sample statement are generated;

[0035] The fusion loss is determined based on the fusion distance between the sample features and the fusion features;

[0036] The step of training the language model based on the contrastive loss includes:

[0037] The total loss is determined based on the comparison loss and the fusion loss.

[0038] The language model is trained based on the total loss.

[0039] Secondly, embodiments of the present invention provide a language model training apparatus, comprising:

[0040] The sample acquisition module is used to acquire sample statements;

[0041] A positive and negative example generation module is used to generate positive example statements and a first negative example statement for the sample statement; wherein the positive example statement has a similar semantics to the sample statement, and the first negative example statement has the opposite semantics to the sample statement;

[0042] The feature acquisition module is used to acquire, using a language model, the sample features of the sample statement, the positive features of the positive example statement, and the first negative feature of the first negative example statement, respectively.

[0043] The loss determination module is used to determine the contrast loss based on the sample features, the positive example features, and the first negative example features;

[0044] The model training module is used to train the language model based on the contrastive loss.

[0045] Thirdly, embodiments of the present invention provide an electronic device, including:

[0046] One or more processors;

[0047] Storage device for storing one or more programs.

[0048] When the one or more programs are executed by the one or more processors, the one or more processors implement the method described in any of the above embodiments.

[0049] Fourthly, embodiments of the present invention provide a computer-readable medium having a computer program stored thereon, which, when executed by a processor, implements the methods described in any of the above embodiments.

[0050] One embodiment of the above invention has the following advantages or beneficial effects: It generates positive example statements and a first negative example statement for the sample statements, where the positive example statements are semantically similar to the sample statements, and the first negative example statement has the opposite semantics to the sample statements. A contrastive loss is determined based on the sample features of the sample statements, the positive features of the positive example statements, and the first negative feature of the first negative example statement. Based on the contrastive loss, the language model's parameters are iterated to minimize the distance between the sample features and the positive feature, and maximize the distance between the sample features and the first negative feature. This enables the language model to perform contrastive learning based on the semantic information of the sample statements, thereby effectively optimizing the training of the language model.

[0051] Furthermore, randomly selecting negative examples from a corpus cannot guarantee the semantic correlation between the selected negative examples and the sample sentences. The solution of this invention constructs a first negative example with semantics opposite to the sample sentence, thereby improving the semantic recognition performance of the final language model and increasing the efficiency of language model training.

[0052] The further effects of the aforementioned unconventional alternative methods will be explained below in conjunction with specific implementation methods. Attached Figure Description

[0053] The accompanying drawings are provided to better understand the invention and are not intended to unduly limit the scope of the invention. Wherein:

[0054] Figure 1 This is a schematic diagram illustrating the process of a language model training method according to an embodiment of the present invention;

[0055] Figure 2 This is a schematic diagram of the process of a language model training method provided in another embodiment of the present invention;

[0056] Figure 3 This is a schematic diagram of the process of a language model training method provided in another embodiment of the present invention;

[0057] Figure 4 This is a schematic diagram of the structure of a language model training device provided in one embodiment of the present invention;

[0058] Figure 5This is a schematic diagram of the structure of a computer system suitable for implementing terminal devices or servers of the present invention. Detailed Implementation

[0059] The following description, in conjunction with the accompanying drawings, illustrates exemplary embodiments of the present invention, including various details to aid understanding. These details should be considered merely exemplary. Therefore, those skilled in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the invention. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description.

[0060] It should be noted that the acquisition, storage, use, and processing of data in the technical solutions of this invention comply with the relevant provisions of national laws and regulations.

[0061] Figure 1 This is a schematic diagram illustrating the flow of a language model training method according to an embodiment of the present invention. Figure 1 As shown, the method includes:

[0062] Step 101: Obtain sample statements.

[0063] Step 102: Generate positive example statements and the first negative example statement of the sample statement; wherein, the positive example statement has a similar semantics to the sample statement, and the first negative example statement has the opposite semantics to the sample statement.

[0064] Positive examples of sample statements can be generated using the following methods: Data augmentation techniques, including synonym substitution or other language escaping, can be used to generate positive examples of sample statements.

[0065] The implementation process of synonym replacement is as follows: A synonym database is pre-set in the system, containing multiple groups of synonyms, where the words in each group are semantically similar. The sample sentence is segmented to generate at least one sample word. It is then determined whether the synonym database contains any of these sample words. If it does, the target synonym corresponding to the target word in the database is obtained. The target word in the sample sentence is then replaced with the target synonym to generate a positive example sentence.

[0066] The implementation method for escaping in other languages is as follows: First, determine the escaping language, which must be different from the target language used in the sample statement. For example, if the target language of the sample statement is Chinese, the escaping language could be English, Japanese, etc. First, use translation software to convert the sample statement into a first translated statement corresponding to the escaping language. Then, convert the first translated statement into a second translated statement corresponding to the target language. The second translated statement is the correct example statement of the sample statement.

[0067] The positive example statement and the first negative example statement can be generated as follows: The sample statement is segmented to generate at least one sample segment. It is then determined whether the antonym library contains any of the sample segment. If the antonym library contains any of the sample segment, the target antonym corresponding to the target segment is obtained from the antonym library; the target segment in the sample statement is replaced with the target antonym to generate the first negative example statement. If the antonym library does not contain any of the sample segment, the first negative example statement is generated by inserting a negation word.

[0068] An antonym library is pre-configured in the system. This library contains multiple antonym pairs, each consisting of two words that are the opposite of each other, such as <virtual, reality>, <gentle, rough>, etc. If the target word from the sample segmentation exists in the antonym library, its corresponding antonym is used to replace the target word in the sample sentence. For example, if the sample sentence is "This young lady is so gentle," then "gentle" will be replaced with "rough," resulting in the first negative example sentence being "This young lady is so rough."

[0069] Alternatively, the first negative example statement can be generated by inserting negative words or negative features. Specifically, first determine whether the sample statement contains negative words, such as "not" or "didn't". If the sample statement contains negative words, remove the negative words to generate the first negative example statement.

[0070] If the sample statement does not contain a negative word, determine whether the sample statement contains an affirmative word, such as "is," "equivalent to," "equal to," "must," etc. If the sample statement contains an affirmative word, add a negative word before the affirmative word to generate the first negative example statement.

[0071] If the sample statement does not contain affirmative words, determine whether it contains a verb or adjective. If the sample statement contains a verb or adjective, add a negative word before the verb or adjective to generate the first negative example statement. For example, if the sample statement is "The weather is really nice today," and it contains the adjective "nice," the first negative example statement is "The weather is really bad today." If the sample statement is "I'm going to climb a mountain," and it contains the verb "go," the first negative example statement is "I'm not going to climb a mountain."

[0072] Step 103: Using the language model, obtain the sample features of the sample statement, the positive features of the positive example statement, and the first negative feature of the first negative example statement.

[0073] The language module may include neural language models, pre-trained language models, etc. Using language models, the semantic features of text sentences can be obtained.

[0074] The process involves determining the positive distance between the sample feature and the positive example feature; determining the first negative distance between the sample feature and the first negative example feature; and determining the contrast loss based on the positive and first negative distances. The distance between two semantic features can be calculated using cosine distance, exp distance, Euclidean distance, etc. This embodiment of the invention uses cosine distance as the distance between two semantic features for illustration.

[0075] Step 104: Determine the contrast loss based on the sample features, positive example features, and the first negative example features.

[0076] The loss function for contrastive loss can be in the following form:

[0077]

[0078] Among them, e i Let be the sample features of the i-th sample statement. Let be the positive example features of the positive example statement of the i-th sample statement. Let be the first negative example feature of the first negative example statement of the i-th sample statement. τ is a constant. The sample set includes multiple sample statements. For each sample statement, the contrastive loss is calculated, and then the contrastive losses of each sample are summed to obtain the sample loss of the sample set.

[0079] By introducing the first negative example feature with the opposite semantics, the language model can effectively learn semantic features. The loss function allows the language model to pair the semantic vectors corresponding to positive example statements. Being close in the feature space also allows the language model to pair the semantic vectors corresponding to negative example statements. The distance between them in the feature space is relatively large. Therefore, by learning through comparison between positive and negative example pairs, the language model can learn both the semantic similarity and textual similarity of semantic vector pairs. Thus, contrastive representation learning based on semantically opposite negative examples can effectively learn the textual and semantic similarity of sentences, and the language model trained in this way can naturally obtain high-quality sentence vectors that fully reflect their corresponding texts.

[0080] Step 105: Train the language model based on the contrastive loss.

[0081] In this embodiment of the invention, positive example statements and a first negative example statement are generated from the sample statements. The positive example statements are semantically similar to the sample statements, while the first negative example statements are semantically opposite to the sample statements. A contrastive loss is determined based on the sample features of the sample statements, the positive features of the positive example statements, and the first negative example features of the first negative example statements. Based on the contrastive loss, the language model is iterated to minimize the distance between the sample features and the positive features, and maximize the distance between the sample features and the first negative example features, thereby effectively optimizing the training of the language model.

[0082] Language models often fail to effectively distinguish between textual similarity and semantic similarity. Textual similarity refers to two sentences containing many identical words, but their meanings are not necessarily similar. Semantic similarity refers to two sentences expressing the same meaning. For example, "I am happy today" and "I am unhappy today" are typical examples of textually similar but semantically dissimilar sentences. Conversely, "You decide" and "I'll listen to you" are typical examples of semantically similar but textually dissimilar sentences.

[0083] Normally, language models assume that the more words two sentences overlap, the more semantically similar they are. However, if negative examples are randomly selected, it's difficult to guarantee that they are semantically inconsistent with the sample sentences. This creates a false impression in contrastive learning that textual similarity equates to semantic similarity, thus hindering language model optimization. The solution in this invention constructs a first negative example sentence with semantically opposite meaning to the sample sentence, resulting in better semantic recognition performance of the final language model and improved training efficiency.

[0084] Figure 2 This is a schematic diagram illustrating the flow of another language model training method provided in an embodiment of the present invention. Figure 2 As shown, the method includes:

[0085] Step 201: Obtain sample statements.

[0086] Step 202: Generate positive example statements and the first negative example statement of the sample statement; wherein, the positive example statement has a similar semantics to the sample statement, and the first negative example statement has the opposite semantics to the sample statement.

[0087] Step 203: Generate a second negative example statement of the sample statement; wherein the second negative example statement is randomly obtained from the corpus.

[0088] The second negative example statement is different from both the positive example statement and the first negative example statement. At least one second negative example statement can be randomly selected from the corpus.

[0089] Step 204: Using the language model, obtain the sample features of the sample statement, the positive features of the positive example statement, the first negative feature of the first negative example statement, and the second negative feature of the second negative example statement.

[0090] Determine the positive distance between the sample feature and the positive example feature; determine the first negative distance between the sample feature and the first negative example feature; determine the second negative distance between the sample feature and the second negative example feature; determine the contrast loss based on the positive distance, the first negative distance, and the second negative distance. The distance between two semantic features can be calculated using cosine distance, exp distance, Euclidean distance, etc.

[0091] Step 205: Determine the contrast loss based on the sample features, positive example features, first negative example features, and second negative example features.

[0092] The loss function for contrastive loss can be in the following form:

[0093]

[0094] Among them, e i Let be the sample features of the i-th sample statement. Let be the positive example features of the positive example statement of the i-th sample statement. Let be the first negative example feature of the first negative example statement of the i-th sample statement. N second negative example statements are constructed, where N is a positive integer. Let τ be the second negative example feature of the j-th second negative example statement of the i-th sample statement. τ is a constant. The sample set includes multiple sample statements. For each sample statement, the contrastive loss is calculated, and then the contrastive losses of each sample are summed to obtain the sample loss of the sample set.

[0095] Step 206: Train the language model based on the contrastive loss.

[0096] The contrastive loss function allows the language model to pair the semantic vectors corresponding to positive example statements. Being close in the feature space also allows the language model to pair the semantic vectors corresponding to negative example statements. The distance between them in the feature space is relatively large. Therefore, by learning through contrastive comparison of positive and negative example pairs, the language model can learn both the semantic similarity and textual similarity of semantic vector pairs. Thus, contrastive representation learning based on semantically opposite negative examples can effectively learn the textual and semantic similarity of sentences, thereby training a high-quality language model.

[0097] The scheme in this embodiment of the invention differs from the contrastive learning strategy that merely randomly selects one negative example statement. It constructs both a first negative example statement and at least one second negative example statement. The first negative example statement has the opposite semantics to the sample statement, while the second negative example statement is randomly selected. The language model uses the semantically opposite features of the first negative example statement to... And a second negative example feature pair It can learn features that are similar in text but not semantically. It effectively solves the problem that contrastive learning methods rely solely on randomly selected negative examples and cannot effectively distinguish between textual and semantic similarity.

[0098] Figure 3 This is a schematic diagram illustrating the flow of another language model training method provided in an embodiment of the present invention. Figure 3 As shown, the method includes:

[0099] Step 301: Obtain sample statements.

[0100] Step 302: Generate positive example statements and the first negative example statement of the sample statement; wherein, the positive example statement has a similar semantics to the sample statement, and the first negative example statement has the opposite semantics to the sample statement.

[0101] Step 303: Generate a second negative example statement of the sample statement; wherein the second negative example statement is randomly obtained from the corpus.

[0102] The second negative example statement is different from the positive example statement. A second negative example statement can be randomly selected from the corpus.

[0103] Step 304: Using the language model, obtain the sample features of the sample statement, the positive features of the positive example statement, the first negative feature of the first negative example statement, and the second negative feature of the second negative example statement.

[0104] Step 305: Determine the contrast loss based on the sample features, positive example features, first negative example features, and second negative example features.

[0105] Step 306: Generate fused features of the sample statements based on the positive example features and the second negative example features.

[0106] Feature fusion is achieved by combining positive example features Second negative example features The result is obtained through weighted fusion. The form is as follows: in, Let be the fusion feature of the i-th sample statement. Let be the positive example feature of the i-th sample statement. Let a be the second negative example feature of the i-th sample statement. In actual business scenarios, different weight constraints can be selected according to different scenarios. For example, in some scenarios, a+b=1 can be set. In other scenarios, a and b may be set as machine learning parameters, allowing the model to learn the specific values of a and b itself.

[0107] Step 307: Determine the fusion loss based on the fusion distance between the sample features and the fusion features.

[0108] By introducing semantic fusion negative example features, the language model can learn deeper semantic features. Because the fused features contain semantic information from positive example statements, there is a semantic relationship between the sample features and the fused features, resulting in an abstract semantic region between them. However, because the fused features contain semantic information from second negative example statements, there is a semantic difference between the sample features and the fused features. Therefore, the semantic regions of the sample features and the fused features are not at a single point, but rather within a range. Introducing a semantic range fusion loss function confines the semantic information between the sample features and the fused features to a specific semantic range.

[0109] The parameters of the language model are continuously updated using the gradient backpropagation algorithm in deep learning, thereby continuously learning the semantic range, that is, learning the differences and relationships between sample features and fused features. Therefore, representation learning based on semantic fusion negative example features can further and effectively learn the semantic information between semantic vector pairs, enabling the language model to effectively distinguish different semantic vectors. The language model trained in this way can also obtain high-quality sentence vectors that fully reflect the corresponding text. The formula for calculating the fusion loss is as follows:

[0110] Loss C =Relu(Δ+α)+Relu(Δ+β)

[0111] in, cosine() is the cosine similarity function, where α and β represent two semantic boundary similarities, which determine the similarity constraint of Δ. i Let be the sample features of the i-th sample statement. Let be the fusion feature of the i-th sample statement. This is the second negative example feature of the i-th sample statement.

[0112] From the above formula, we can see that Loss C The purpose is to make semantic pairs and semantic pairs The semantic similarity difference is restricted to a certain semantic interval, which is Δ∈[-β,α], so that the language model can learn the difference and connection between sample features and fused features. The language model achieves semantic boundary learning based on fusion loss.

[0113] Step 308: Determine the total loss based on the contrast loss and fusion loss; train the language model based on the total loss.

[0114] The total loss can be the sum of the contrastive loss and the fusion loss, or it can be a weighted sum of the contrastive loss and the fusion loss. Based on the total loss, the language model's parameters are iterated to minimize the distance between sample features and positive example features, maximize the distance between sample features and the first negative example feature, and constrain the distance between sample features and the fused features within a certain semantic space. This allows the language model to perform contrastive learning and boundary learning based on the semantic information of sample sentences, thereby effectively optimizing the language model training.

[0115] The solution in this embodiment of the invention constructs semantic fusion features to form corresponding semantic fusion negative example feature pairs. The semantic fusion features are obtained by weighted fusion of positive example features and randomly sampled second negative example features. By learning from the semantic fusion negative example feature pairs, the language model can learn both the differences between semantic fusion negative example feature pairs and the relationships between them, thereby further improving the representational ability of the language model.

[0116] The solution presented in this invention can handle various natural language processing tasks, exhibiting high methodological versatility and superiority. It is applicable to most natural language processing tasks in dialogue systems. This solution is significant for improving the understanding of user questions and enhancing user engagement, and it has high engineering application value. The solution provided in this invention has been effectively validated, and its performance on various tasks is significantly superior to other conventional sentence representation methods.

[0117] Taking the engineering application of a dialogue system as an example, the original system uses the BERT model, while the improved system uses an optimized BERT model. The optimized BERT model is trained using the scheme of this invention. First, users typically input a query in the dialogue system, such as "I want to buy an Apple phone". Then, the language model maps the user's question to semantic features, which are then input into different downstream tasks to obtain the return results of different tasks.

[0118] For entity recognition tasks, after obtaining the semantic features corresponding to the user's statement using a language model, these semantic features are input into the entity recognition model to obtain the corresponding entity words. The original system recognized two entity words: "apple" and "phone". The improved system recognizes only one entity word: "apple phone". It can be seen that the original system only recognized "apple" and "phone" as different entity words based on the meaning of the text. The improved system can learn not only text information but also semantic information to accurately recognize entities. In engineering, this can effectively avoid misidentification and fuzzy recognition, thus bringing better performance to the product.

[0119] For recommendation system tasks, after obtaining the semantic features output by the language model, these features are input into the recommendation system model. Combined with the results of intent recognition and entity recognition, the system ultimately obtains the specific product links recommended. The original system recommended two product links: "Red Fuji" and "XX Mobile Phone." Due to misidentification and fuzzy recognition, the original system identified "Apple" and "Mobile Phone" as entities respectively. At this point, the original system assumed the user wanted to purchase both "Apple" and "Mobile Phone" entities, thus recommending the links for "Red Fuji" and "XX Mobile Phone." The improved system recommends product links related to "Apple Mobile Phone." Therefore, the language model trained through this embodiment of the invention has better semantic recognition performance. In engineering applications, the optimized language model can accurately identify user intent and recommend corresponding product links to the user.

[0120] The above examples demonstrate that the quality of the language model directly impacts the input of downstream tasks in a dialogue system, thus affecting their execution performance. Therefore, a good language model plays a crucial role in improving the specific effectiveness of downstream tasks and enhancing the user experience. Furthermore, the solutions in this invention are not limited to dialogue systems but are also applicable to most natural language understanding tasks.

[0121] Figure 4 This is a schematic diagram of the structure of a language model training device provided in one embodiment of the present invention. Figure 4 As shown, the device includes:

[0122] Sample acquisition module 401 is used to acquire sample statements;

[0123] The positive and negative example generation module 402 is used to generate positive example statements and a first negative example statement of the sample statement; wherein, the positive example statement has a similar semantics to the sample statement, and the first negative example statement has the opposite semantics to the sample statement;

[0124] The feature acquisition module 403 is used to acquire the sample features of the sample sentence, the positive features of the positive example sentence, and the first negative example feature of the first negative example sentence using the language model.

[0125] The loss determination module 404 is used to determine the contrast loss based on the sample features, positive example features, and the first negative example features;

[0126] Model training module 405 is used to train the language model based on contrastive loss.

[0127] Optionally, the loss determination module 404 is specifically used for:

[0128] Determine the positive distance between sample features and positive example features;

[0129] Determine the first negative example distance between the sample feature and the first negative example feature;

[0130] The contrast loss is determined based on the distance to the positive example and the distance to the first negative example.

[0131] Optionally, the positive and negative example generation module 402 is specifically used for:

[0132] Using data augmentation techniques, positive example statements of sample sentences are generated. Data augmentation techniques include: synonym substitution or other language escaping.

[0133] Optionally, the positive and negative example generation module 402 is specifically used for:

[0134] Perform word segmentation on the sample statement to generate at least one sample word;

[0135] Determine whether the antonym library contains any of the sample word segments;

[0136] In response to the presence of any sample word in the antonym library, obtain the target antonym corresponding to the target word in the antonym library; replace the target word in the sample sentence with the target antonym to generate the first negative example sentence;

[0137] In response to the fact that the antonym library does not contain any sample word segmentation, the first negative example statement is generated by inserting a negative word.

[0138] Optionally, the positive and negative example generation module 402 is also used for:

[0139] Generate a second negative example statement from the sample statement; wherein the second negative example statement is randomly obtained from the corpus;

[0140] Use a language model to obtain the second negative example features of the second negative example statement;

[0141] The loss determination module 404 is specifically used for:

[0142] The contrast loss is determined based on the sample features, positive example features, first negative example features, and second negative example features.

[0143] Optionally, the loss determination module 404 is specifically used for:

[0144] Determine the positive distance between sample features and positive example features;

[0145] Determine the first negative example distance between the sample feature and the first negative example feature;

[0146] Determine the second negative example distance between the sample feature and the second negative example feature;

[0147] The contrast loss is determined based on the positive example distance, the first negative example distance, and the second negative example distance.

[0148] Optionally, the positive and negative example generation module 402 is also used for:

[0149] Generate a second negative example statement from the sample statement; wherein the second negative example statement is randomly obtained from the corpus;

[0150] Use a language model to obtain the second negative example features of the second negative example statement;

[0151] The loss determination module 404 is also used for:

[0152] Based on the positive example features and the second negative example features, generate the fused features of the sample sentences;

[0153] The fusion loss is determined based on the fusion distance between sample features and fusion features;

[0154] Model training module 405 is specifically used for:

[0155] The total loss is determined based on the comparison loss and fusion loss;

[0156] The language model is trained based on the total loss.

[0157] This invention provides an electronic device, comprising:

[0158] One or more processors;

[0159] Storage device for storing one or more programs.

[0160] When one or more programs are executed by one or more processors, the one or more processors implement the methods of any of the above embodiments.

[0161] The following is for reference. Figure 5 It shows a schematic diagram of the structure of a computer system 500 suitable for implementing a terminal device of the present invention. Figure 5The terminal device shown is merely an example and should not impose any limitations on the functionality and scope of use of the embodiments of the present invention.

[0162] like Figure 5 As shown, the computer system 500 includes a central processing unit (CPU) 501, which can perform various appropriate actions and processes based on programs stored in read-only memory (ROM) 502 or programs loaded from storage section 508 into random access memory (RAM) 503. The RAM 503 also stores various programs and data required for the operation of the system 500. The CPU 501, ROM 502, and RAM 503 are interconnected via a bus 504. An input / output (I / O) interface 505 is also connected to the bus 504.

[0163] The following components are connected to I / O interface 505: an input section 506 including a keyboard, mouse, etc.; an output section 507 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and speakers, etc.; a storage section 508 including a hard disk, etc.; and a communication section 509 including a network interface card such as a LAN card, modem, etc. The communication section 509 performs communication processing via a network such as the Internet. A drive 510 is also connected to I / O interface 505 as needed. A removable medium 511, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on drive 510 as needed so that computer programs read from it can be installed into storage section 508 as needed.

[0164] In particular, according to the embodiments disclosed in this invention, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments disclosed in this invention include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication section 509, and / or installed from removable medium 511. When the computer program is executed by central processing unit (CPU) 501, it performs the functions defined above in the system of this invention.

[0165] It should be noted that the computer-readable medium shown in this invention can be a computer-readable signal medium or a computer-readable storage medium, or any combination thereof. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this invention, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this invention, a computer-readable signal medium can include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. Computer-readable signal media can also be any computer-readable medium other than computer-readable storage media, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wireless, wire, optical fiber, RF, etc., or any suitable combination thereof.

[0166] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram or flowchart, and combinations of blocks in a block diagram or flowchart, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0167] The modules described in the embodiments of the present invention can be implemented in software or hardware. These modules can also be housed in a processor, and for example, can be described as: a sample acquisition module, a positive and negative example generation module, a feature acquisition module, a loss determination module, and a model training module. The names of these modules do not necessarily limit the module itself; for example, the sample acquisition module can also be described as a "module for acquiring sample statements."

[0168] In another aspect, the present invention also provides a computer-readable medium, which may be included in the device described in the above embodiments; or it may exist independently and not assembled into the device. The computer-readable medium carries one or more programs, which, when executed by the device, cause the device to include:

[0169] Get sample statements;

[0170] Generate positive example statements and a first negative example statement of the sample statement; wherein the positive example statement has a similar semantics to the sample statement, and the first negative example statement has an opposite semantics to the sample statement;

[0171] Using a language model, the sample features of the sample statement, the positive features of the positive example statement, and the first negative feature of the first negative example statement are obtained respectively.

[0172] The contrast loss is determined based on the sample features, the positive example features, and the first negative example features;

[0173] The language model is trained based on the contrastive loss.

[0174] According to the technical solution of the present invention, positive example statements and first negative example statements of sample statements are generated. The positive example statements are semantically similar to the sample statements, and the first negative example statements are semantically opposite to the sample statements. Based on the sample features of the sample statements, the positive features of the positive example statements, and the first negative example features of the first negative example statements, a contrastive loss is determined. Based on the contrastive loss, the language model is iterated to minimize the distance between the sample features and the positive features, and maximize the distance between the sample features and the first negative example features, thereby effectively optimizing the training of the language model.

[0175] Furthermore, randomly selecting negative examples from a corpus cannot guarantee the semantic correlation between the selected negative examples and the sample sentences. The solution of this invention constructs a first negative example with semantics opposite to the sample sentence, thereby improving the semantic recognition performance of the final language model and increasing the efficiency of language model training.

[0176] The specific embodiments described above do not constitute a limitation on the scope of protection of this invention. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can occur depending on design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this invention should be included within the scope of protection of this invention.

Claims

1. A method for training a language model, characterized in that, include: Get sample statements; Generate positive example statements and a first negative example statement of the sample statement; wherein the positive example statement has a similar semantics to the sample statement, and the first negative example statement has an opposite semantics to the sample statement; Using a language model, the sample features of the sample statement, the positive features of the positive example statement, and the first negative feature of the first negative example statement are obtained respectively. The contrast loss is determined based on the sample features, the positive example features, and the first negative example features; The language model is trained based on the contrastive loss. After obtaining the sample statement, the method further includes: generating a second negative example statement of the sample statement; wherein the second negative example statement is randomly obtained from the corpus; and using the language model to obtain the second negative example features of the second negative example statement. Determining the contrast loss based on the sample features, the positive example features, and the first negative example features includes: determining the contrast loss based on the sample features, the positive example features, the first negative example features, and the second negative example features.

2. The method according to claim 1, characterized in that, The step of determining the contrast loss based on the sample features, the positive example features, and the first negative example features includes: Determine the positive distance between the sample features and the positive feature; Determine the first negative example distance between the sample feature and the first negative example feature; The contrast loss is determined based on the positive example distance and the first negative example distance.

3. The method according to claim 1, characterized in that, The positive example statements for generating the sample statements include: Using data augmentation methods, positive example statements of the sample statements are generated. The data augmentation methods include: synonym substitution or other language escaping.

4. The method according to claim 1, characterized in that, The generation of the positive example statement and the first negative example statement of the sample statement includes: The sample statement is segmented to generate at least one sample word; Determine whether the antonym library contains any of the aforementioned sample word segments; In response to the presence of any of the sample word segments in the antonym library, the target antonym corresponding to the target word segment in the antonym library is obtained; the target word segment in the sample statement is replaced with the target antonym to generate the first negative example statement; In response to the fact that the antonym library does not contain any of the sample word segments, the first negative example statement is generated by inserting a negative word.

5. The method according to claim 1, characterized in that, The step of determining the contrast loss based on the sample features, the positive example features, the first negative example features, and the second negative example features includes: Determine the positive distance between the sample features and the positive feature; Determine the first negative example distance between the sample feature and the first negative example feature; Determine the second negative example distance between the sample feature and the second negative example feature; The contrast loss is determined based on the positive example distance, the first negative example distance, and the second negative example distance.

6. The method according to claim 1, characterized in that, Following the statement for obtaining the sample, the following is also included: Based on the positive example features and the second negative example features, the fusion features of the sample statement are generated; The fusion loss is determined based on the fusion distance between the sample features and the fusion features; The step of training the language model based on the contrastive loss includes: The total loss is determined based on the comparison loss and the fusion loss. The language model is trained based on the total loss.

7. A training device for a language model, characterized in that, include: The sample acquisition module is used to acquire sample statements; A positive and negative example generation module is used to generate positive example statements and a first negative example statement for the sample statement; wherein the positive example statement has a similar semantics to the sample statement, and the first negative example statement has the opposite semantics to the sample statement; The feature acquisition module is used to acquire, using a language model, the sample features of the sample statement, the positive features of the positive example statement, and the first negative feature of the first negative example statement, respectively. The loss determination module is used to determine the contrast loss based on the sample features, the positive example features, and the first negative example features; The model training module is used to train the language model based on the contrastive loss. The positive and negative example generation module is further configured to: generate a second negative example statement of the sample statement; wherein the second negative example statement is randomly obtained from the corpus; and use the language model to obtain the second negative example features of the second negative example statement; The model training module is specifically used to: determine the contrast loss based on the sample features, the positive example features, the first negative example features, and the second negative example features.

8. An electronic device, characterized in that, include: One or more processors; Storage device for storing one or more programs. When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any one of claims 1-6.

9. A computer-readable medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the method as described in any one of claims 1-6.