Pre-training method and device of language model, electronic equipment and storage medium

CN115965051BActive Publication Date: 2026-06-16BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD
Filing Date: 2021-10-18
Publication Date: 2026-06-16

Application Information

Patent Timeline

18 Oct 2021

Application

16 Jun 2026

Publication

CN115965051B

IPC: G06N3/0455; G06N3/08

AI Tagging

Application Domain

Neural learning methods

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Sar ship detection method and system with hierarchical attention fusion and edge enhancement
CN121962936Breduce overfittingImprove stability Character and pattern recognition Neural learning methods
Property prediction system
US12658286B2Geometric CAD Chemical property prediction
Multi-scale neural network for anomaly detection
CN122197978AKernel methods Neural learning methods
A multimodal fusion video conference content real-time abstract generation method and system
CN122205030ATelevision conference systemsTwo-way working systems
A vehicle position estimation method of a fusion filtering network and a computer readable medium
CN116086476BInstruments for road network navigation Internal combustion piston engines

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN115965051B_ABST

Patent Text Reader

Abstract

The application provides a pre-training method and device of a language model, an electronic device and a storage medium, which comprises the following steps: inputting training corpus into an encoder of a language model based on obtained training corpus and entity text annotated according to the context of the training corpus, obtaining hidden state encoding of the training corpus, determining position encoding according to the order of each character position in the context to be decoded, combining the hidden state encoding and the position encoding to obtain predicted characters at each character position decoded by a decoder of the language model, and comparing the predicted characters with the entity text to pre-train the encoder and the decoder of the language model. Therefore, in the process of pre-training the encoder and the decoder of the language model, the words in the decoded entity text are no longer used as the input of the undecoded words of the entity when decoding the entity, so that the pre-training model can learn the entity text completely, the dependence on the decoder language model is reduced, and the training effect of the language model is improved.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of natural language processing, and in particular, to a pre-training method, apparatus, electronic device, and storage medium for a language model. Background Art

[0002] Pre-trained language models are an important branch in the field of computer technology. After learning the general mapping relationship between contexts through pre-training, they can perform the required tasks after a short adaptation training for the required tasks, such as: abstract generation tasks, question and answer tasks, keyword extraction tasks, and so on.

[0003] In the related art, usually, the decoder of a traditional pre-trained language model generates text fragments character by character. For example, when generating "Beijing", when generating "Jing", "Bei" will be input into the decoder. However, this method allows the decoder to actually generate only relying on the language model at the decoding end, reducing the difficulty of the decoded entity, resulting in the pre-trained language model being unable to completely learn all entity texts. Summary of the Invention

[0004] This application proposes a pre-training method, apparatus, electronic device, and storage medium for a language model.

[0005] In a first aspect embodiment of this application, a pre-training method for a language model is proposed, including the following steps: obtaining training corpus and entity texts annotated for the following context of the training corpus; inputting the training corpus into an encoder of the language model to obtain hidden state encodings of the training corpus; determining position encodings according to the sorting of each character position to be decoded in the following context; inputting the hidden state encodings and the position encodings into a decoder of the language model for decoding to obtain predicted characters at each character position; and pre-training the encoder and decoder of the language model according to the differences between the predicted characters and the corresponding characters in the entity texts.

[0006] In an embodiment of this application, the obtaining of the training corpus includes: reading the corpus from a corpus collection; and masking some characters in the read corpus to obtain the training corpus.

[0007] In an embodiment of this application, the masking of some characters in the read corpus to obtain the training corpus includes: selecting multiple characters from the corpus according to a set ratio; and replacing the multiple characters in the corpus with target characters to obtain the training corpus.

[0008] In one embodiment of this application, determining the position encoding based on the order of the positions of the characters to be decoded in the following text includes: determining the number of characters contained in the entity text; normalizing the order of the positions of the characters in the following text based on the number of characters; and determining the position encoding of each character position based on the normalized order of the positions of the characters.

[0009] A second aspect of this application proposes a language model adaptation training method, the method comprising: acquiring a language model pre-trained as described in the first aspect; acquiring the preceding text and standard context corresponding to the task to be adapted by the language model; inputting the preceding text into the encoder of the language model to obtain the hidden state encoding of the preceding text; decoding the hidden state encoding of the preceding text using the decoder of the language model to obtain the predicted context corresponding to the preceding text; and adjusting the parameters of the language model based on the difference between the standard context and the predicted context.

[0010] In one embodiment of this application, the implicit state encoding of the preceding text is decoded using the decoder of the language model to obtain the predicted following text corresponding to the preceding text, including: inputting a set start character and the implicit state encoding of the preceding text into the decoder of the language model for decoding to obtain the first predicted character in the predicted following text; repeatedly executing the step of inputting the predicted character obtained from the previous decoding by the decoder and the implicit state encoding of the preceding text into the decoder for decoding to obtain subsequent predicted characters in the predicted following text, until the decoded predicted character is a set end character; and generating the predicted following text according to each of the predicted characters obtained in sequence.

[0011] This application proposes a pre-training method for a language model. Based on acquired training corpora and entity text annotated with the following text, the training corpora are input into the encoder of the language model to obtain the hidden state encoding of the training corpora. Simultaneously, positional encoding is determined according to the order of the positions of each character to be decoded in the following text. Combining the hidden state encoding and the positional encoding, the predicted characters at each character position for decoding by the decoder of the language model are obtained and compared with the entity text to pre-train the encoder and decoder of the language model. Thus, during the pre-training process of the encoder and decoder of the language model, when decoding entities, the characters in the decoded entity text are no longer used as input for the undecoded characters of that entity, thereby enabling the pre-trained model to learn the entity text completely, reducing the dependence on the decoder language model, and improving the training effect of the language model.

[0012] A third aspect of this application provides a pre-training apparatus for a language model, comprising: a first acquisition module for acquiring training corpus and entity text annotated with the following text of the training corpus; a first input module for inputting the training corpus into the encoder of the language model to obtain the hidden state encoding of the training corpus; a determination module for determining the position encoding based on the order of the positions of the characters to be decoded in the following text; a first decoding module for inputting the hidden state encoding and the position encoding into the decoder of the language model for decoding to obtain the predicted characters at each of the character positions; and a pre-training module for pre-training the encoder and decoder of the language model based on the difference between the predicted characters and the corresponding characters in the entity text.

[0013] In one embodiment of this application, the first acquisition module includes: a reading unit for reading corpus from a corpus set; and a masking unit for masking a portion of the characters in the read corpus to obtain the training corpus.

[0014] In one embodiment of this application, the masking unit is specifically used for: selecting multiple characters from the corpus according to a set ratio; replacing the multiple characters in the corpus with target characters to obtain the training corpus.

[0015] In one embodiment of this application, the determining module is specifically used to: determine the number of characters contained in the entity text; normalize the order of each character position in the following text based on the number of characters; and determine the position code of each character position based on the normalized order of each character position.

[0016] A fourth aspect of this application provides a language model adaptation training apparatus, the apparatus comprising: a second acquisition module for acquiring a language model pre-trained by the apparatus described in the third aspect; a third acquisition module for acquiring the preceding text and standard context corresponding to the task to be adapted by the language model; a second input module for inputting the preceding text into the encoder of the language model to obtain the hidden state encoding of the preceding text; a second decoding module for decoding the hidden state encoding of the preceding text using the decoder of the language model to obtain the predicted context corresponding to the preceding text; and an adjustment module for adjusting the parameters of the language model based on the difference between the standard context and the predicted context.

[0017] In one embodiment of this application, the second decoding module is specifically configured to: input a set start character and the hidden state encoding of the preceding text into the decoder of the language model for decoding to obtain the first predicted character in the predicted following text; repeatedly execute the step of inputting the predicted character obtained by the decoder in the previous decoding and the hidden state encoding of the preceding text into the decoder for decoding to obtain subsequent predicted characters in the predicted following text, until the decoded predicted character is a set end character; and generate the predicted following text according to each of the predicted characters obtained by sequential decoding.

[0018] This application proposes a pre-training device for a language model. Based on acquired training corpus and entity text annotated with the following text of the training corpus, the training corpus is input into the encoder of the language model to obtain the hidden state encoding of the training corpus. Simultaneously, based on the order of the positions of each character to be decoded in the following text, the position encoding is determined. Combining the hidden state encoding and the position encoding, the predicted characters at each character position for decoding by the decoder of the language model are obtained and compared with the entity text to pre-train the encoder and decoder of the language model. Thus, during the pre-training process of the encoder and decoder of the language model, when decoding entities, the characters in the decoded entity text are no longer used as input for the undecoded characters of that entity, thereby enabling the pre-trained model to learn the entity text completely, reducing the dependence on the decoder language model, and improving the training effect of the language model.

[0019] A fifth aspect of this application provides an electronic device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, it implements the pre-training method of the language model in the embodiments of this application.

[0020] A sixth aspect of this application provides a computer-readable storage medium storing a computer program, which, when executed by a processor, represents a pre-training method for a language model in this application.

[0021] Other effects of the above-mentioned alternative methods will be described below in conjunction with specific embodiments. Attached Figure Description

[0022] Figure 1 This is a flowchart illustrating a language model pre-training method provided in an embodiment of this application;

[0023] Figure 2 This is a flowchart illustrating another language model pre-training method provided in an embodiment of this application;

[0024] Figure 3 This is a flowchart illustrating a language model adaptation training method provided in an embodiment of this application.

[0025] Figure 4 This is a schematic diagram of the structure of a pre-training device for a language model provided in an embodiment of this application;

[0026] Figure 5 This is a schematic diagram of the structure of a pre-training device for another language model provided in an embodiment of this application;

[0027] Figure 6 This is a schematic flowchart of a language model adaptation training device provided in an embodiment of this application;

[0028] Figure 7 This is a block diagram of an electronic device according to an embodiment of this application. Detailed Implementation

[0029] The embodiments of this application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain this application, and should not be construed as limiting this application.

[0030] The following description, with reference to the accompanying drawings, describes a language model pre-training method, apparatus, and electronic device according to embodiments of this application.

[0031] Figure 1 This is a schematic flowchart illustrating a language model pre-training method provided in this embodiment. It should be noted that the execution entity of the language model pre-training method provided in this embodiment is a language model pre-training device, which can be implemented in software and / or hardware. In this embodiment, the language model pre-training device can be configured in an electronic device, which may include a server. This embodiment does not specifically limit the type of electronic device.

[0032] Figure 1 This is a flowchart illustrating a language model pre-training method provided in an embodiment of this application.

[0033] like Figure 1 As shown, the pre-training methods for this language model can include:

[0034] Step 101: Obtain the training corpus and the entity text with the following annotations for the training corpus.

[0035] In some embodiments, the training data obtained may be a given set of monolingual data, but is not limited thereto.

[0036] In some embodiments, the entity text annotated below the acquired training corpus may be a portion of the text of a sentence in monolingual data, for example, 30% of the text of that sentence.

[0037] In some embodiments, an exemplary implementation of obtaining training corpus involves reading corpus data from a corpus set, masking a portion of the characters in the read corpus data, and obtaining training corpus data. For example, reading corpus data from a corpus set, selecting multiple characters from the corpus data according to a set ratio, and replacing these multiple characters with target characters to obtain training corpus data.

[0038] Step 102: Input the training corpus into the encoder of the language model to obtain the hidden state encoding of the training corpus.

[0039] In some embodiments, after obtaining the training corpus, the training corpus is input into the encoder of the language model. The encoder encodes the training corpus to obtain the corresponding latent vectors, and the latent state encoding of the training corpus is obtained based on the latent vectors.

[0040] Step 103: Determine the position encoding based on the order of the positions of the characters to be decoded in the following text.

[0041] In some embodiments, determining the position encoding based on the order of the positions of the characters to be decoded in the following text involves first determining the number of characters contained in the entity text, then normalizing the order of the positions of the characters in the following text based on the number of characters, and finally determining the position encoding of each character position based on the normalized order of the positions of the characters.

[0042] Step 104: Input the hidden state encoding and position encoding into the decoder of the language model for decoding to obtain the predicted character at each character position.

[0043] In some embodiments, the relative position of the corresponding entity text is obtained by inputting the latent state encoding into the decoder of the language model, and then combined with the position encoding to determine the predicted character at each character position.

[0044] Step 105: Based on the difference between the predicted character and the corresponding character in the entity text, pre-train the encoder and decoder of the language model.

[0045] In some embodiments, if the predicted character decoded is different from the obtained entity text when decoding the entity, then the encoder and decoder of the language model need to be pre-trained; otherwise, it means that the language model meets the requirements.

[0046] This application proposes a pre-training method for a language model. Based on acquired training corpora and entity text annotated with the following text, the training corpora are input into the encoder of the language model to obtain the hidden state encoding of the training corpora. Simultaneously, positional encoding is determined according to the order of the positions of each character to be decoded in the following text. Combining the hidden state encoding and the positional encoding, the predicted characters at each character position for decoding by the decoder of the language model are obtained and compared with the entity text to pre-train the encoder and decoder of the language model. Thus, during the pre-training process of the encoder and decoder of the language model, when decoding entities, the characters in the decoded entity text are no longer used as input for the undecoded characters of that entity, thereby enabling the pre-trained model to learn the entity text completely, reducing the dependence on the decoder language model, and improving the training effect of the language model.

[0047] Figure 2 This is a flowchart illustrating another language model pre-training method provided in an embodiment of this application.

[0048] Step 201: Obtain the training corpus and the entity text with the following annotations for the training corpus.

[0049] Step 202: Input the training corpus into the encoder of the language model to obtain the hidden state encoding of the training corpus.

[0050] It should be noted that the specific implementation methods of steps 201 to 202 can be found in the relevant descriptions in the above embodiments.

[0051] Step 203: Determine the number of characters contained in the entity text.

[0052] Step 204: Normalize the order of each character's position in the following text based on the number of characters.

[0053] Optionally, the ratio of the order of each character position in the following text to the number of characters can be used as the normalized order, so that the normalized order maps to a decimal between (0, 1). Standardizing the value of the normalized order ensures that the range of this value does not fluctuate with the number of characters contained in the entity text, which facilitates model training.

[0054] Step 205: Determine the positional encoding of each character based on the normalized sorting of their positions. This is based on the hidden state encoding x corresponding to the training corpus x. m The language model's decoder performs multiple decoding operations, where each decoding operation yields the predicted character at the corresponding character position. Here, x represents a specific corpus within the dataset. mThe training corpus obtained by deleting a continuous segment composed of 30% of the text (including entities) in x and replacing it with the character [MASK], and then encoding it to obtain the hidden state encoding. When decoding the predicted character at the first character position, the normalized first ranking is used as the position encoding. When decoding the predicted character at the second character position, the normalized second ranking is used as the position encoding, and so on.

[0055] For example: The position of "North" in the entity "Beijing" is 1. When decoding "North", p t = p1. Similarly, when decoding "Jing", p t = p2, and so on.

[0056] Among them, p t represents the position encoding of the relative position of the character position t in the entity text annotated in the training corpus

[0057] Step 206, input the hidden state encoding and the position encoding into the decoder of the language model for decoding to obtain the predicted characters at each character position.

[0058] In a possible scenario, during the process of performing multiple decodings using the decoder of the language model based on the hidden state encoding x m , the predicted character output by the language model at any character position (t - 1) will not be used as the decoding input for the subsequent character position t, but is replaced by the vector p t . Thus, during the decoding process of the decoder, it does not rely on the predicted characters that have been predicted for decoding, but more on the input hidden state encoding.

[0059] For example: When decoding "North" at the character position (t - 1), "North" will not be used as the input for decoding at the character position t, but the position encoding p2 and x m are used as the input to the decoder for decoding to obtain the predicted character at the character position t.

[0060] Step 207, pre-train the encoder and decoder of the language model according to the difference between the predicted character and the corresponding character in the entity text.

[0061] As a possible implementation, determine the value of the loss function Lt according to the difference between each predicted character and the corresponding character in the entity text. For example:

[0062] L t = -logP(x t |x m , p t )

[0063] Among them, x mTo train the hidden state of the corpus. P t This represents the relative position of character position t within the annotated entity text, i.e., the positional encoding. t To predict the t-th predicted character in the following text, the decoder of the expected language model is based on x. m and p t The output predicted character is the t-th character in the labeled entity text. P is used to indicate x. t , with the decoder at input x m and P t The difference between the outputs under different circumstances.

[0064] Based on the value of the loss function Lt, adjust the model parameters of the encoder and decoder in the language model to minimize the value of the loss function Lt.

[0065] This application proposes a pre-training method for a language model. Based on acquired training corpora and entity text annotated with the following text of the training corpora, the training corpora are input into the encoder of the language model to obtain the hidden state encoding of the training corpora. The number of characters contained in the entity text and the order of each character position in the following text are normalized to determine the positional encoding of each character position. Combining the hidden state encoding and the positional encoding, the predicted characters at each character position for decoding by the decoder of the language model are obtained and compared with the entity text to pre-train the encoder and decoder of the language model. Thus, during the pre-training process of the encoder and decoder of the language model, when decoding entities, the characters in the decoded entity text are no longer used as input for the undecoded characters of that entity. This allows the pre-trained model to learn the entity text completely, reduces the dependence on the decoder language model, and improves the training effect of the language model.

[0066] Figure 3 This is a flowchart illustrating a language model adaptation training method provided in an embodiment of this application.

[0067] Step 301: Obtain the pre-training method of the language model. The language model is obtained through pre-training.

[0068] Based on the above embodiments, a language model pre-training method can be obtained to pre-train the language model.

[0069] Step 302: Based on the task to be adapted by the language model, obtain the preceding text and standard following text corresponding to the task.

[0070] In some embodiments, the preceding text corresponding to the task is the same as the training expectation in the above embodiments, which may be a given set of monolingual data, but is not limited thereto.

[0071] In some embodiments, the standard context text corresponding to the task is the same as the entity text annotated in the context text of the training corpus in the above embodiments, which may be a part of a sentence in monolingual data.

[0072] Step 303: Input the preceding text into the encoder of the language model to obtain the hidden state encoding of the preceding text.

[0073] In some embodiments, the preceding text is input into the encoder of a language model, wherein the language model can be based on an encoder-decoder framework of a translation model (Transformer) to encode the preceding text using the encoder, thereby generating a latent state encoding of the preceding text.

[0074] Step 304: The latent state encoding of the preceding text is decoded using a language model decoder to obtain the predicted following text corresponding to the preceding text.

[0075] In some embodiments, the hidden state encoding of the generated preceding text is obtained, and the decoder generates the predicted following text corresponding to the preceding text based on the hidden state encoding of the preceding text.

[0076] In other embodiments, the latent state encoding of the preceding text is decoded using a language model decoder to obtain the predicted following text. One implementation involves inputting a set start character and the latent state encoding of the preceding text into the language model decoder for decoding to obtain the first predicted character in the predicted following text. The process of inputting the predicted character obtained from the previous decoding and the latent state encoding of the preceding text into the decoder for decoding to obtain subsequent predicted characters in the predicted following text is repeated until the decoded predicted character is the set end character. Thus, the predicted following text is generated based on the predicted characters decoded in sequence.

[0077] Step 305: Adjust the parameters of the language model based on the difference between the standard context and the predicted context.

[0078] In some embodiments, if the predicted context obtained when decoding an entity differs from the standard context obtained, then the language model parameters need to be adjusted; otherwise, it indicates that the language model meets the requirements.

[0079] This application proposes a pre-training method for a language model. The method pre-trains the language model using a pre-training approach and, based on the task to be adapted to, obtains the preceding text and standard context corresponding to the task. The preceding text is then input into the language model's encoder to obtain its hidden state encoding. This hidden state encoding is then decoded to obtain the predicted context. Finally, the language model's parameters are adjusted by comparing the differences between the standard and predicted contexts. Thus, during the pre-training process of the encoder and decoder, the language model's parameters are adjusted by comparing the differences between the standard and predicted contexts. This allows the language model to learn entity text less reliant on the decoder, and further improves the accuracy of the language model.

[0080] Figure 4 This is a schematic diagram of the structure of a pre-training device for a language model provided in an embodiment of this application.

[0081] like Figure 4 As shown, the pre-training device 400 for the language model includes:

[0082] The first acquisition module 401 is used to acquire training corpus and entity text annotated with the following text of the training corpus.

[0083] The first input module 402 is used to input the training corpus into the encoder of the language model to obtain the hidden state encoding of the training corpus.

[0084] The determination module 403 is used to determine the position encoding based on the order of the positions of the characters to be decoded in the following text.

[0085] The first decoding module 404 is used to input the hidden state encoding and position encoding into the decoder of the language model for decoding, so as to obtain the predicted character at each character position.

[0086] The pre-training module 405 is used to pre-train the encoder and decoder of the language model based on the difference between the predicted character and the corresponding character in the entity text.

[0087] This application proposes a pre-training device for a language model. Based on acquired training corpus and entity text annotated with the following text of the training corpus, the training corpus is input into the encoder of the language model to obtain the hidden state encoding of the training corpus. Simultaneously, based on the order of the positions of each character to be decoded in the following text, the position encoding is determined. Combining the hidden state encoding and the position encoding, the predicted characters at each character position for decoding by the decoder of the language model are obtained and compared with the entity text to pre-train the encoder and decoder of the language model. Thus, during the pre-training process of the encoder and decoder of the language model, when decoding entities, the characters in the decoded entity text are no longer used as input for the undecoded characters of that entity, thereby enabling the pre-trained model to learn the entity text completely, reducing the dependence on the decoder language model, and improving the training effect of the language model.

[0088] In one embodiment of this application, such as Figure 5 The first acquisition module 401 includes:

[0089] The reading unit 4011 is used to read corpus from the corpus set.

[0090] The masking unit 4012 is used to mask a portion of the characters in the read corpus to obtain the training corpus.

[0091] In one embodiment of this application, such as Figure 5 Mask unit 4012 is specifically used for:

[0092] Based on a set ratio, multiple characters are selected from the corpus.

[0093] The training corpus is obtained by replacing multiple characters in the corpus with the target characters.

[0094] In one embodiment of this application, such as Figure 5 Module 403 is specifically used for:

[0095] Determine the number of characters contained in the entity text.

[0096] Based on the number of characters, the order of each character's position in the following text is normalized.

[0097] The positional encoding of each character is determined based on the normalized sorting of each character position.

[0098] This application proposes a pre-training device for a language model. Based on acquired training corpus and entity text annotated with the following text of the training corpus, the training corpus is input into the encoder of the language model to obtain the hidden state encoding of the training corpus. Simultaneously, based on the order of the positions of each character to be decoded in the following text, the position encoding is determined. Combining the hidden state encoding and the position encoding, the predicted characters at each character position for decoding by the decoder of the language model are obtained and compared with the entity text to pre-train the encoder and decoder of the language model. Thus, during the pre-training process of the encoder and decoder of the language model, when decoding entities, the characters in the decoded entity text are no longer used as input for the undecoded characters of that entity, thereby enabling the pre-trained model to learn the entity text completely, reducing the dependence on the decoder language model, and improving the training effect of the language model.

[0099] Figure 6 This is a schematic diagram of the process of a language model adaptation training device provided in an embodiment of this application.

[0100] The second acquisition module 601 is used to acquire the language model pre-trained by the language model pre-training device.

[0101] The third acquisition module 602 is used to acquire the preceding text and standard following text corresponding to the task to be adapted by the language model.

[0102] The second input module 603 is used to input the preceding text into the encoder of the language model to obtain the hidden state encoding of the preceding text.

[0103] The second decoding module 604 is used to encode the hidden state of the preceding text and decode it using a language model decoder to obtain the predicted following text corresponding to the preceding text.

[0104] The adjustment module 605 is used to adjust the parameters of the language model based on the difference between the standard context and the predicted context.

[0105] In one embodiment of this application, such as Figure 6 The second decoding module is specifically used for:

[0106] The set start character and the hidden state encoding of the preceding text are input into the decoder of the language model to decode and obtain the predicted first character in the following text.

[0107] The process of repeatedly decoding the predicted character obtained from the previous decoding and the hidden state encoding of the preceding text into the decoder to predict the subsequent predicted characters in the following text continues until the decoded predicted character is the set end character.

[0108] Based on the predicted characters obtained from sequential decoding, the predicted text is generated.

[0109] This application proposes a pre-training device for a language model. Based on acquired training corpus and entity text annotated with the following text of the training corpus, the training corpus is input into the encoder of the language model to obtain the hidden state encoding of the training corpus. Simultaneously, based on the order of the positions of each character to be decoded in the following text, the position encoding is determined. Combining the hidden state encoding and the position encoding, the predicted characters at each character position for decoding by the decoder of the language model are obtained and compared with the entity text to pre-train the encoder and decoder of the language model. Thus, during the pre-training process of the encoder and decoder of the language model, when decoding entities, the characters in the decoded entity text are no longer used as input for the undecoded characters of that entity, thereby enabling the pre-trained model to learn the entity text completely, reducing the dependence on the decoder language model, and improving the training effect of the language model.

[0110] like Figure 7 The diagram shown is a block diagram of an electronic device according to an embodiment of this application.

[0111] like Figure 7 As shown, the electronic device includes:

[0112] The memory 701, the processor 702, and computer instructions stored in the memory 701 and executable on the processor 702.

[0113] When the processor 702 executes instructions, it implements the pre-training method for the language model provided in the above embodiments.

[0114] Furthermore, electronic devices also include:

[0115] Communication interface 703 is used for communication between memory 701 and processor 702.

[0116] Memory 701 is used to store computer instructions that can be executed on processor 702.

[0117] The memory 701 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk storage device.

[0118] The processor 702 is used to implement the pre-training method of the language model in the above embodiments when executing the program.

[0119] If the memory 701, processor 702, and communication interface 703 are implemented independently, then the communication interface 703, memory 701, and processor 702 can be interconnected via a bus to complete communication between them. The bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of representation, Figure 7 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.

[0120] Optionally, in a specific implementation, if the memory 701, processor 702, and communication interface 703 are integrated on a single chip, then the memory 701, processor 702, and communication interface 703 can communicate with each other through an internal interface.

[0121] The processor 702 may be a central processing unit (CPU), an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of this application.

[0122] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this application, "multiple" means at least two, such as two, three, etc., unless otherwise explicitly specified.

[0123] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., refer to specific features, structures, materials, or characteristics described in connection with that embodiment or example, which are included in at least one embodiment or example of this application. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.

[0124] Although embodiments of this application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting this application. Those skilled in the art can make changes, modifications, substitutions and variations to the above embodiments within the scope of this application.

Claims

1. A pre-training method for a language model, characterized in that, Includes the following steps: Read the corpus from the corpus set; A portion of the characters in the read corpus is masked to obtain the training corpus, wherein the portion of characters is a continuous text segment including entities; Obtain the entity text of the following text annotation for the training corpus, wherein the entity text is used to annotate the masked portion of characters; The training corpus is input into the encoder of the language model to obtain the hidden state encoding of the training corpus; The positional encoding is determined based on the order of the positions of the characters to be decoded as described below; The hidden state encoding and the position encoding are input into the decoder of the language model for decoding to obtain the predicted character at each character position; The encoder and decoder of the language model are pre-trained based on the difference between the predicted character and the corresponding character in the entity text.

2. The method according to claim 1, characterized in that, The process of masking a portion of the characters in the read corpus to obtain the training corpus includes: Based on a set ratio, select multiple characters from the corpus; The training corpus is obtained by replacing the multiple characters in the corpus with the target characters.

3. The method according to any one of claims 1-2, characterized in that, The step of determining the position encoding based on the order of the positions of the characters to be decoded as described below includes: Determine the number of characters contained in the entity text; Based on the number of characters, the order of the positions of each character in the following text is normalized; The positional encoding of each character position is determined based on the normalized sorting of each character position.

4. A method for adapting and training a language model, characterized in that, The method includes: Obtain the language model pre-trained by the method described in any one of claims 1-3; Based on the task to be adapted to the language model, obtain the preceding text and standard following text corresponding to the task; The preceding text is input into the encoder of the language model to obtain the hidden state encoding of the preceding text; The latent state encoding of the preceding text is decoded using the decoder of the language model to obtain the predicted following text corresponding to the preceding text. The language model is adjusted based on the difference between the standard context and the predicted context.

5. The method according to claim 4, characterized in that, The latent state encoding of the preceding text is decoded using the decoder of the language model to obtain the predicted following text corresponding to the preceding text, including: The set start character and the hidden state encoding of the preceding text are input into the decoder of the language model for decoding to obtain the first predicted character in the predicted following text; The process of repeatedly inputting the predicted character obtained from the previous decoding of the decoder and the hidden state encoding of the preceding text into the decoder for decoding to obtain the predicted character of the following text is repeated until the predicted character obtained by decoding is the set end character. The predicted following text is generated based on the predicted characters obtained by sequential decoding.

6. A pre-training device for a language model, characterized in that, Includes the following devices: The first acquisition module is used to acquire training corpus and entity text annotated with the following text of the training corpus; The first input module is used to input the training corpus into the encoder of the language model to obtain the hidden state encoding of the training corpus. The determination module is used to determine the position encoding based on the order of the positions of the characters to be decoded as described below; The first decoding module is used to input the hidden state encoding and the position encoding into the decoder of the language model for decoding, so as to obtain the predicted character at each character position; The pre-training module is used to pre-train the encoder and decoder of the language model based on the difference between the predicted character and the corresponding character in the entity text. The first acquisition module includes: The reading unit is used to read corpus data from the corpus set. A masking unit is used to mask a portion of the characters in the read corpus to obtain the training corpus, wherein the portion of characters is a continuous text segment including entities; The entity text is used to annotate the masked portion of the characters.

7. The apparatus according to claim 6, characterized in that, The mask unit is specifically used for: Based on a set ratio, select multiple characters from the corpus; The training corpus is obtained by replacing the multiple characters in the corpus with the target characters.

8. The apparatus according to any one of claims 6-7, characterized in that, The determining module is specifically used for: Determine the number of characters contained in the entity text; Based on the number of characters, the order of the positions of each character in the following text is normalized; The positional encoding of each character position is determined based on the normalized sorting of each character position.

9. A language model adaptation training device, characterized in that, The device includes: The second acquisition module is used to acquire the language model pre-trained by the device as described in any one of claims 6-8; The third acquisition module is used to acquire the preceding text and standard following text corresponding to the task to be adapted by the language model. The second input module is used to input the preceding text into the encoder of the language model to obtain the hidden state encoding of the preceding text; The second decoding module is used to encode the hidden state of the preceding text and decode it using the decoder of the language model to obtain the predicted following text corresponding to the preceding text. An adjustment module is used to adjust the parameters of the language model based on the difference between the standard context and the predicted context.

10. The apparatus according to claim 9, characterized in that, The second decoding module is specifically used for: The set start character and the hidden state encoding of the preceding text are input into the decoder of the language model for decoding to obtain the first predicted character in the predicted following text; The process of repeatedly inputting the predicted character obtained from the previous decoding of the decoder and the hidden state encoding of the preceding text into the decoder for decoding to obtain the predicted character of the following text is repeated until the predicted character obtained by decoding is the set end character. The predicted following text is generated based on the predicted characters obtained by sequential decoding.

11. An electronic device, characterized in that, include: A memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, when the processor executes the program, it implements a pre-training method for a language model as described in any one of claims 1-3 or 4-5.

12. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by the processor, the program implements the pre-training method for the language model as described in any one of claims 1-3 or 4-5.