Language model pre-training method

A language model and pre-training technology, applied in the artificial field, can solve problems such as hindering Chinese migration, unbalanced samples, random effects of training effects, etc., to achieve the effect of improving prediction accuracy and improving prediction results.

Inactive Publication Date: 2019-07-19
人立方智能科技有限公司
View PDF6 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This pre-training method has the following problems: 1. It directly predicts the words that should appear in each position. Due to the huge vocabulary, high-frequency words appear in each position, resulting in unbalanced samples; 2. It is based on sequence modeling after word segmentation , which is not friendly to Chinese, which has ambiguous word segmentation, and hinders the migration of Chinese in downstream applications; 3. In modeling the relationship between sentence pairs, the negative example construction of the upper and lower sentences is arbitrary, which causes randomness to the final training effect. sexual influence

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Language model pre-training method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0014] Using the company's internal job classification data as the corpus, the goal is to predict the job classification corresponding to each piece of work experience. In this example, there are 1930 categories in job classification. The network structure uses the Transformer in the BERT model as the basis, and a piece of work experience is input. After the Transformer, the output feature representation uses the Attention mechanism, and the prediction of the 1930 class is output. The training target uses cross-entropy optimization, and the parameters in the Transformer use the parameter values ​​​​in the pre-trained BERT model. Using direct prediction, prediction after pre-training based on BERT, and prediction after pre-training with the method proposed in the present invention, three groups of experiments were used to compare the prediction results.

[0015] Among them, the pre-training process carried out by the method proposed in this application is as follows:

[0016]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a language model pre-training method. The language model pre-training method comprises the following steps of carrying out word segmentation on the corpora in a model accordingto characters and sub-words; extracting 15% of the generated segmented words immediately for position covering, and calculating the covered semantic distribution; controlling the sub-word mixing in the model by using an independent door control unit; and performing synchronous training on the semantic distribution and the prediction of the masking words. According to the method, the prediction result of the model after the BERT pre-training can be obviously improved.

Description

technical field [0001] The invention belongs to the field of artificial technology, and in particular relates to a language model pre-training method based on an improved BERT model of mixed characters and subwords. Background technique [0002] Natural language processing is an important branch of artificial intelligence. Pre-trained language models have been proven to be quite effective in practice. Language Model (Language Model) is the probability distribution of a sequence of words. Specifically, the language model is to determine a probability distribution P for a text of length m, indicating the possibility of the existence of this text. The more commonly used language pre-training method is language pre-training based on the BRRT model, which includes the following steps: 1. Prepare text corpus with upper and lower sentences; 2. Use BPE (byte piece encoding, that is, simple word segmentation algorithm) to convert the text corpus 3. Covering / replacing 15% of the wo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F17/27
CPCG06F16/35G06F40/289G06F40/30
Inventor 陈瑶文
Owner 人立方智能科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products