Multi-task-oriented pre-trained language model automatic compression method and platform
A technology of language model and compression method, which is applied in the direction of reasoning method, neural learning method, biological neural network model, etc., can solve the problems of increased estimation error, impossibility of manual design, and easy to be affected by overfitting in training, etc., to achieve The effect of improving compression efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0040] Inspired by neural network architecture search, especially in the case of few samples, automatic machine learning can perform automatic knowledge distillation in an iterative manner based on a feedback loop. This invention studies the knowledge distillation based on meta-learning to generate multiple pre-trained language models. Generic compression architecture. Specifically, the present invention first constructs a knowledge distillation encoding vector based on Transformer layer sampling, and distills the knowledge structure of the large model at different levels. A meta-network of a structure generator is designed to generate a distilled structure model corresponding to the current input encoding vector. At the same time, a Bernoulli distribution sampling method is proposed to train the structure generator. In each round of iteration, Bernoulli distribution sampling is used to generate the transferred encoder units to form a corresponding encoding vector. By changi...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


