Knowledge distillation method and device based on self-attention and computer device

A distillation method and attention technology, applied in the field of artificial intelligence, can solve problems that cannot meet the requirements of different task types, model knowledge distillation training, etc.

A distillation method and attention technology, applied in the field of artificial intelligence, can solve problems that cannot meet the requirements of different task types, model knowledge distillation training, etc.

CN112365385AActive Publication Date: 2021-02-12深圳市友杰智新科技有限公司

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Knowledge distillation method and device based on self-attention and computer device
  • Knowledge distillation method and device based on self-attention and computer device
  • Knowledge distillation method and device based on self-attention and computer device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.

[0048] refer to figure 1 , a knowledge distillation method based on self-attention in an embodiment of the present application, including:

[0049] S1: Input the input data into the first model to obtain the first feature matrix output by the intermediate layer of the first model, input the input data into the second model to obtain the second feature matrix output by the intermediate layer of the second model, Wherein, the first model is a trained teacher model, the second model is a student model to be trained, and the first feature matrix and the second feature matrix ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the field of artificial intelligence, and discloses a knowledge distillation method based on self-attention, which comprises the following steps: inputting input data into a first model to obtain a first feature matrix output by a middle layer of the first model, and inputting the input data into a second model to obtain a second feature matrix output by a middle layer ofthe second model, wherein the first model is a trained teacher model, and the second model is a to-be-trained student model; calculating first self-attention weight distribution corresponding to the teacher model according to the first feature matrix, and calculating second self-attention weight distribution corresponding to the student model according to the second feature matrix; calculating a distribution difference between the first self-attention weight distribution and the second self-attention weight distribution; taking the distribution difference as a knowledge distillation loss function between the teacher model and the student model; according to the knowledge distillation loss function, migrating the data mapping relation of the middle layer of the teacher model to the middle layer of the student model, and knowledge distillation training of models of different task types can be met.

Description

technical field [0001] This application relates to the field of artificial intelligence, in particular to a self-attention-based knowledge distillation method, device and computer equipment. Background technique [0002] Knowledge Distillation is a special transfer learning method whose purpose is to compress the volume of the training model while ensuring the training effect. Using the trained teacher model to guide the learning of the small-volume student model to be trained, and learning the knowledge of the large model through training the small model, compared with directly training the small model, the effect is better and the speed is faster. [0003] At present, the loss function used for knowledge distillation is more for the classification model, which requires the number of categories of the large model and the small model or the dimension of the network output features to be consistent, which limits the application range of knowledge distillation and cannot meet ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
12 Feb 2021
Publication
CN112365385A
IPC
G06Q50/20; G06Q10/06; G06N3/04; G06N20/00; G06F17/16
CPC
G06Q50/205; G06Q10/067; G06N20/00; G06F17/16; G06N3/045
Inventors
徐泓洋; 王广新