Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Language model training method and device based on knowledge distillation and text classification method and device

A language model and training method technology, applied in semantic analysis, natural language data processing, character and pattern recognition, etc., can solve the problem of decreased accuracy, inability to accurately transfer teacher model sentence grammar and semantic representation, and poor student model transfer ability. To guarantee and other issues, to achieve the effect of meeting application requirements and improving migration ability

Pending Publication Date: 2021-04-30
IFLYTEK CO LTD
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The existing pre-trained language model distillation methods usually use the distillation method of aligning the output scores and the middle layer. This method can make the output scores of the student model close to the output scores of the teacher model on the data of a specific task. However, , if the data in a new field is tested, the transfer ability of the student model obtained by distillation cannot be guaranteed, and the rich sentence grammar and semantic representation of the teacher model cannot be accurately transferred, resulting in the accuracy of the student model being lower than that of the teacher model. Many, thus unable to meet cross-field application requirements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Language model training method and device based on knowledge distillation and text classification method and device
  • Language model training method and device based on knowledge distillation and text classification method and device
  • Language model training method and device based on knowledge distillation and text classification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

[0031] Terms used in the embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to limit the present application. The singular forms "a", "said" and "the" used in the embodiments of this application and the appended claims are also intended to include plural forms, unless the above clearly indicates otherwise, "multiple "Generally includes at least two, but does not exclude the inclusion of at least one.

[0032] It ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a language model training method and device based on knowledge distillation and a text classification method and device. The language model training method comprises the steps of inputting a training corpus into a first model and a second model for processing to obtain corresponding intermediate layer data and an output result; calculating and obtaining a first hidden layer sentence content and a second hidden layer sentence content by utilizing the corresponding intermediate layer data, constructing a comparison learning positive and negative example based on the first hidden layer sentence content and the second hidden layer sentence content, and training a second model by utilizing the comparison learning positive and negative example, the corresponding intermediate layer data and an output result, and determining the trained second model as a language model. Through the classification model, rich sentence grammars and semantic representations of the first model can be migrated to the second model, so that the second model obtained through distillation has better migration capability, and cross-domain application requirements are met.

Description

technical field [0001] The present application relates to the field of natural language processing and model compression, in particular to a language model training method based on knowledge distillation, a text classification method and a device. Background technique [0002] Knowledge distillation is a teacher-student-based model compression method proposed by Hinton et al. in 2015. By introducing a large-scale teacher model to induce the training of a small-scale student model, knowledge transfer is realized. The method is to train a teacher model first, and then use the output of the teacher model and the annotation labels of the data to train the student model, so that the student model can not only learn how to judge the correct sample category from the labeled data, but also learn from the teacher model. relation. [0003] The existing pre-trained language model distillation methods usually use the distillation method of aligning the output scores and the middle laye...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/30G06F40/211G06K9/62
CPCG06F40/30G06F40/211G06F18/214Y02D10/00
Inventor 朱磊孙瑜声李宝善
Owner IFLYTEK CO LTD
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More