Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Compression method and system for multi-language BERT sequence labeling model

A sequence labeling, multi-language technology, applied in the field of knowledge distillation of BERT-like models, can solve the problem of not taking into account and unable to effectively fit the output of the teacher model.

Active Publication Date: 2021-04-06
SHANGHAI JIAO TONG UNIV
View PDF6 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of the above existing technologies, there are the following technical defects. In related technologies, the knowledge distillation technology for the multilingual BERT sequence labeling model adopts a one-to-one training method, that is, distillation from the multilingual BERT teacher model to the multilingual BERT student model. This type of method does not take into account that the multilingual BERT model is not superior to the monolingual BERT model in all language categories; in addition, due to the large difference in structural complexity between the student model and the teacher model, a single student model Inability to efficiently fit the output of the teacher model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Compression method and system for multi-language BERT sequence labeling model
  • Compression method and system for multi-language BERT sequence labeling model
  • Compression method and system for multi-language BERT sequence labeling model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0095] The present invention will be described in detail below in conjunction with specific embodiments. The following examples will help those skilled in the art to further understand the present invention, but do not limit the present invention in any form. It should be noted that those skilled in the art can make several changes and improvements without departing from the concept of the present invention. These all belong to the protection scope of the present invention.

[0096] The embodiment of the present invention provides a compression method for the multilingual BERT sequence labeling model, refer to figure 1 shown, including:

[0097] Step 1: Extract vocabulary from multilingual corpus based on Wordpiece algorithm;

[0098] Step 2: Pre-train multilingual BERT teacher model and multilingual BERT student model;

[0099] Step 3: Fine-tune the multi / single language BERT teacher model based on the downstream task data manually labeled;

[0100] Step 4: Use the multi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a compression method and system for a multi-language BERT sequence labeling model, and relates to the technical field of knowledge distillation of BERT models, and the method comprises the steps: 1, extracting a word list from a multi-language corpus based on a Wordpiece algorithm; 2, pre-training the multi-language / single-language BERT teacher model and the multi-language BERT student model; 3, making fine adjustment on the multi-language / single-language BERT teacher model based on downstream task data marked manually; 4, performing residual knowledge distillation on the pre-trained multi-language BERT student model by utilizing the multi-language / single-language BERT teacher model; and 5, finely adjusting the distilled multi-language BERT student model based on manually labeled downstream task data. By residual learning and many-to-one knowledge distillation, the accuracy and generalization degree of the student model are improved, and hardware resources required for deploying the BERT type sequence labeling model in a multi-language environment are reduced.

Description

technical field [0001] The present invention relates to the technical field of knowledge distillation of BERT models, in particular, to a compression method and system for multilingual BERT sequence labeling models. Background technique [0002] BERT is a large-scale pre-trained language model based on Transformers encoder. In recent years, BERT has demonstrated strong strength in many downstream tasks. Sequence tagging is a class of tasks that classify elements in a sequence. Common sequence tagging tasks include named entity recognition, part-of-speech tagging, and so on. In a multilingual environment, if multiple monolingual BERT models are used to model different language texts at the same time, it will take up a lot of computing resources; at the same time, for some language categories with scarce training corpora, both BERT and traditional models are very difficult. Difficult to achieve good results. Multilingual BERT can simultaneously model hundreds of languages ​...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/126G06F40/242G06F40/289G06F40/295
CPCG06F40/126G06F40/242G06F40/289G06F40/295
Inventor 撖朝润李琦傅洛伊王新兵
Owner SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products