Unlock instant, AI-driven research and patent intelligence for your innovation.

Deep Transformer cascade neural network model compression algorithm

A neural network model and compression algorithm technology, applied in the field of deep Transformer cascaded neural network model compression algorithm, to achieve the effect of reducing model size and low computational cost

Pending Publication Date: 2021-03-02
东南数字经济发展研究院
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, from the perspective of compressing the model volume, the BERT-of-Theseus algorithm still has room for further improvement

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Deep Transformer cascade neural network model compression algorithm
  • Deep Transformer cascade neural network model compression algorithm
  • Deep Transformer cascade neural network model compression algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The technical solutions of the present invention will be further described below using preferred embodiments of the present invention in conjunction with the accompanying drawings, but the present invention is not limited to these embodiments.

[0021] See attached figure 1 , a deep Transformer cascaded neural network model compression algorithm provided by an embodiment of the present invention, comprising:

[0022] Step A: Pre-train the deep Transformer cascaded neural network on the text data set. The pre-training is specifically to perform self-supervised pre-training on the deep Transformer cascaded neural network model on the unlabeled text data set. The training task is to mask words Prediction and text prediction before and after, update the parameters of the model through the backpropagation algorithm and the gradient descent algorithm, and obtain the pre-training model.

[0023] Step B: Divide the Transformer cascaded model into several modules in sequence. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a deep Transformer cascade neural network model compression algorithm, and solves the problem that an algorithm in the prior art still has a further compression space. The method comprises the following steps: pre-training a deep Transformer cascade neural network on a text data set; dividing the Transformer cascade model into a plurality of modules according to a sequence;randomly selecting a certain layer of Transformers in the pre-trained deep Transformer cascade neural network as a replacement module, wherein the module is named as Transformercomplication; selectinga certain layer of Transformers in the pre-trained deep Transformer cascade neural network as the replacement module, and finely adjusting the pre-trained model in the small data set, and compressingthe model by using a mode of gradual replacement of modules and parameter sharing among the modules. The invention has the advantage that the model compression efficiency is further improved.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a deep Transformer cascaded neural network model compression algorithm. Background technique [0002] In recent years, as deep learning has achieved great success in the field of images, it has also made breakthroughs in the field of natural language processing. The cascaded neural network based on deep Transformer has achieved good performance in the new paradigm of natural language processing, that is, self-supervised pre-training plus supervised fine-tuning, and has continuously refreshed the GLUE list record, becoming one of the new research hotspots in the field of natural language processing . As one of the classic models, BERT has broad application prospects in text-based user portraits, sentiment analysis and public opinion analysis. However, such models are often large in size, with millions or even billions of parameters, resulting in high memory usage and d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/08G06N3/04
CPCG06N3/082G06N3/084G06N3/047G06N3/045
Inventor 陈轶张文崔浩亮牛少彰王让定
Owner 东南数字经济发展研究院