Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

BERT model training method and system based on multiplier alternating direction method

An alternate direction method and model training technology, applied in the field of BERT model training methods and systems based on the multiplier alternate direction method, can solve problems such as large memory space, consumption, and large memory space consumption, so as to improve efficiency and accuracy, improve Training efficiency, the effect of reducing the amount of calculation

Pending Publication Date: 2022-07-29
NAT UNIV OF DEFENSE TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] 1. Gradient disappearance and gradient explosion: Since the gradient in the backpropagation process has a strong dependence, if the gradient value is too large or too small, it will affect the parameters of the model training, resulting in a decline in the performance of the final model, resulting in gradient disappearance and gradient explosion Problems, especially when applied to application fields such as event extraction (or text classification), because the model needs to be calculated according to the context information of the input sequence, during the reverse calculation process, the calculation of the gradient value of the target loss function is prone to the above problems, and then As a result, BERT model training is difficult, performance is poor, and there will be problems that cannot solve long text encoding
[0007] 2. GPU memory is limited: the number of parameters of the BERT model needs to depend on the matrix multiplication scale in the model, while the number of parameters of the BERT model for event extraction is large, and it needs to consume a large amount of memory space. When the number of parameters reaches a certain value After that, the current GPU devices cannot meet the memory requirements of training the BERT model
[0008] 3. Parallelism is not easy to implement: In order to solve the problem of insufficient GPU memory space, a distributed training method is required. There are two main methods of distributed training: data parallelism and model parallelism. Data parallelism is based on the number of computing nodes in the training system. The data is divided, and each computing node only processes the allocated data. Although this type of method is simple to implement, it needs to save a copy of the completed model on each computing node. The BERT model still needs to spend time when training in this type of parallel mode. A large amount of memory space, that is, there will still be problems that the memory requirements cannot be met; model parallelism is to divide the neural network model and allocate it to the computing nodes in the training system, but it is difficult for the BERT model to achieve model parallelism

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • BERT model training method and system based on multiplier alternating direction method
  • BERT model training method and system based on multiplier alternating direction method
  • BERT model training method and system based on multiplier alternating direction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0057] The present invention will be further described below with reference to the accompanying drawings and specific preferred embodiments, but the protection scope of the present invention is not limited thereby.

[0058] like figure 1 As shown, the steps of the BERT model training method based on the multiplier alternating direction method in this embodiment include:

[0059] Step S1. Data input: take out the sentence to be trained from the training set and extract the word vector and input it into the BERT model;

[0060] Step S2. Multiplier Alternate Direction Method Solution: When the BERT model trains the input word vector, the multiplier alternate direction method is used to solve the objective function, wherein the representation of the BERT model by the Encoder module in the Transformer model is used to determine the objective function, Constraints are added to the input word vector, and the enhanced Lagrangian algorithm is used to transform the objective function i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a BERT model training method and system based on a multiplier alternating direction method. The method comprises the steps that S1, statements to be trained are extracted from a training set, word vectors are extracted, and then the word vectors are input into a BERT model; s2, when the BERT model trains an input word vector, solving a target function by using a multiplier alternating direction method, determining the target function, adding a limiting condition to the input word vector, converting the determined target function into an enhanced Lagrange function, and obtaining an enhanced Lagrange function; solving variable parameters in the target function and an output result of the BERT model by solving the enhanced Lagrange function; s3, variable parameters solved in the objective function are updated until training is completed, and a final BERT model training result is obtained and output. The method has the advantages that the problems of gradient disappearance and explosion in the training process can be avoided, parallel implementation is easy, the training efficiency is high, and the training performance is good.

Description

technical field [0001] The invention relates to the technical field of information extraction, in particular to a BERT model training method and system based on a multiplier alternating direction method. Background technique [0002] With the rapid development of the Internet in recent years, the network media generates massive amounts of unstructured information such as news, pushes, announcements, and notifications every day. For these massive amounts of information, information extraction technology is to quickly and effectively capture valuable information from this information to help analyze some specific information and make decisions based on this specific information. Event extraction is a branch of information extraction technology, which mainly extracts the event information that users are most interested in from unstructured natural language text and displays it in a structured form. Event extraction has a wide range of applications in many fields, such as build...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/30G06K9/62G06N3/04G06N3/08
CPCG06F40/30G06N3/084G06N3/045G06F18/241G06F18/214
Inventor 唐宇乔林波阚志刚梁鹏高翊夫韩毅李东升
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products