Unlock instant, AI-driven research and patent intelligence for your innovation.

Multi-task-oriented pre-trained language model automatic compression method and platform

A technology of language model and compression method, which is applied in the direction of reasoning method, neural learning method, biological neural network model, etc., can solve the problems of increased estimation error, impossibility of manual design, and easy to be affected by overfitting in training, etc., to achieve The effect of improving compression efficiency

Active Publication Date: 2021-03-30
ZHEJIANG LAB
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, under small sample data, training is easily affected by overfitting, and the estimation error increases significantly and propagates layer by layer.
Therefore, the core challenge of neural network compression in the case of small samples is that the compressed model is easy to overfit on the few-shot training instances, resulting in a large estimation error between the original network and the inference process.
Estimation errors may accumulate and propagate layer by layer, eventually corrupting the network output
[0004] In addition, the existing knowledge distillation methods are mainly data-driven sparse constraints or artificially designed distillation strategies; considering that a BERT network usually has 12 layers of Transformer units, each unit contains 8 heads of self-attention units; self-attention units There are hundreds of millions of possible connection methods. Due to the limitation of computing resources, it is almost impossible to manually design all possible distillation structures and find the optimal structure.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-task-oriented pre-trained language model automatic compression method and platform
  • Multi-task-oriented pre-trained language model automatic compression method and platform
  • Multi-task-oriented pre-trained language model automatic compression method and platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] Inspired by neural network architecture search, especially in the case of few samples, automatic machine learning can perform automatic knowledge distillation in an iterative manner based on a feedback loop. This invention studies the knowledge distillation based on meta-learning to generate multiple pre-trained language models. Generic compression architecture. Specifically, the present invention first constructs a knowledge distillation encoding vector based on Transformer layer sampling, and distills the knowledge structure of the large model at different levels. A meta-network of a structure generator is designed to generate a distilled structure model corresponding to the current input encoding vector. At the same time, a Bernoulli distribution sampling method is proposed to train the structure generator. In each round of iteration, Bernoulli distribution sampling is used to generate the transferred encoder units to form a corresponding encoding vector. By changi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-task-oriented pre-training language model automatic compression method and platform. This method designs a meta-network of a structure generator, constructs a knowledge distillation encoding vector based on the knowledge distillation method of Transformer layer sampling, and uses the structure generator to generate a distillation structure model corresponding to the current input encoding vector; at the same time, a Bernoulli distribution sampling is proposed The method of training the structure generator; in each round of iteration, use the Bernoulli distribution sampling method to migrate each encoder unit to form the corresponding encoding vector; by changing the input encoding vector of the structure generator and the training data of the small batch, the joint Training the structure generator and the corresponding distillation structure can learn the structure generator that can generate weights for different distillation structures; at the same time, on the basis of the trained meta-learning network, the evolutionary algorithm is used to search for the optimal compression structure, thus obtaining the same as An optimal general compression architecture for task-independent pretrained language models.

Description

technical field [0001] The invention belongs to the field of language model compression, and in particular relates to a multi-task-oriented pre-training language model automatic compression method and platform. Background technique [0002] Large-scale pre-trained language models have achieved excellent performance in both natural language understanding and generation tasks. However, deploying pre-trained language models with massive parameters to devices with limited memory still faces great challenges. In the field of model compression, the existing language model compression methods are all task-specific language model compression. Although knowledge distillation for specific tasks is effective, fine-tuning and inference on large models is still time-consuming and computationally expensive. When facing other downstream tasks, the pre-trained model generated by distillation of specific task knowledge still needs to re-fine-tune the large model and generate related large-s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06N3/08G06N5/04
CPCG06N3/082G06N5/04
Inventor 王宏升胡胜健傅家庆杨非
Owner ZHEJIANG LAB