Language model compression method based on uncertainty estimation knowledge distillation

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An uncertainty and language model technology, applied in the field of compression of pre-trained language models, can solve problems such as low network compression rate, low efficiency, and large computational burden, and achieve the goals of reducing the number of parameters, improving training efficiency, and improving reasoning performance Effect

Pending Publication Date: 2022-07-29

XIDIAN UNIV

View PDF0 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The scheme is verified on the Chinese named entity recognition NER task. Although the calculation cost of the compression method is not large, the accuracy of the compressed network model is reduced by 1-2 percentage points compared with the original network, and the network compression rate is low. , which reduces the efficiency of model operation and leads to a waste of a large amount of computing resources

[0006] The disadvantages of the above-mentioned existing network lightweight methods are: 1) lack of supervision on the intermediate reasoning process of the network, 2) insufficient utilization of the original network parameters, 3) lack of noise estimation in the knowledge distillation process

In turn, the calculation burden of the network compression process is too large, the efficiency of the compression process is too low, and the performance accuracy of the compressed network is low.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0031] The present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.

[0032] refer to figure 1 , a pre-trained language model compression method based on uncertainty estimation knowledge distillation, the implementation steps are as follows:

[0033] Step 1. Obtain training and testing datasets.

[0034] Obtain the data set in GLUE, the basic task of public natural language understanding. The data set contains various tasks of common natural language processing, which can better test the comprehensive performance of the language model.

[0035] This example is obtained from the following four types of data sets in this data set, and subsequent experimental test tasks are performed:

[0036] First, the language acceptability corpus CoLA is a single-sentence classification task, and its corpus comes from language theory books and journals, where each word sequence is marked as grammatical;

[0037] Secon...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a language model compression method for estimating knowledge distillation based on uncertainty, and mainly solves the problems of high training cost, low speed and noise interference in the knowledge distillation process in the existing network compression technology. According to the implementation scheme, the method comprises the steps of 1) performing half-and-half compression on an original language model to obtain a compressed neural network; 2) reasonably initializing parameters of the compressed neural network by using an original language model; 3) adding a parameter distillation loss function of a feedforward network structure, and designing an uncertainty estimation loss function and a cross entropy loss function of a natural language processing task; and 4) training the compressed neural network model by using the designed loss function. According to the method, the calculation amount in the network compression training process is reduced, the network compression rate is improved, the network reasoning speed is increased, the method can be widely applied to model deployment and model compression tasks, and a new model compression solution is provided for an application scene in shortage of hardware resources.

Description

technical field [0001] The invention belongs to the field of neural network compression, and in particular relates to a compression method for a pre-trained language model, which can be used for model deployment, model compression, and alleviation of model hardware burden. Background technique [0002] In recent years, the natural language processing research community has witnessed a revolution in pre-training and self-supervised models, with the research and application of large-scale pre-trained language models, first pre-trained on large-scale text data, and then on downstream tasks. Transfer learning, pre-training and fine-tuning have gradually become the basic paradigm for natural language processing solutions. The emergence of BERT has significantly improved the performance of many natural language processing tasks. However, pre-trained language models are often computationally expensive and memory intensive. These models typically have hundreds of millions of parame...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N3/08G06N3/04

CPCG06N3/082G06N3/048G06N3/045

Inventor 董伟生黄天瑜毋芳芳石光明

Owner XIDIAN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Language model compression method based on uncertainty estimation knowledge distillation

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology