Unlock instant, AI-driven research and patent intelligence for your innovation.

Model training method, device and electronic device based on knowledge distillation

A technology for model training and model recognition, which is applied in the computer field, can solve the problems of limited model use, large data volume, and inability to deploy large neural network models, and achieve the effect of reduced calculation and good compression effect

Active Publication Date: 2022-08-05
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, for a better learning effect, the neural network model often has a large number of parameters, and generally requires a huge calculation example for inference and deployment, that is, it will occupy a large amount of computing resources during the training and inference phase, so some resources are limited. Such a large neural network model cannot be deployed on the device
That is, while ensuring excellent performance, due to the large scale of the model and the large amount of data, large-scale neural network models often have high requirements for the deployment environment, which greatly limits the use of such models

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Model training method, device and electronic device based on knowledge distillation
  • Model training method, device and electronic device based on knowledge distillation
  • Model training method, device and electronic device based on knowledge distillation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

[0026] In the prior art, the Transformer model is a new type of artificial intelligence model developed by a famous Internet company. Recently, this model has been frequently used in the field of computer vision (CV field), and it has been proved that it can achieve excellent results. However, compared with other models (such as convolutional neural network models),...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present disclosure provides a model training method, device, electronic device and storage medium based on knowledge distillation, and relates to the field of computers, in particular to the fields of artificial intelligence technologies such as computer vision and NLP. The specific implementation scheme is as follows: input the feature vectors obtained based on the training samples into the first coding layer and the second coding layer respectively, wherein the first coding layer belongs to the first model and the second coding layer belongs to the second model; The result output from a coding layer is subjected to aggregation processing to obtain the first feature vector; the second feature vector is determined according to the output of the second coding layer; the first feature vector and the second feature vector are distilled to obtain the updated The first eigenvector. This scheme is used for model compression distillation training, which can be flexibly used in any layer of the model, and the compression effect is good. The compressed model can be used for image recognition and can be deployed on a variety of devices with limited computing power.

Description

technical field [0001] The present disclosure relates to the field of computer technology, in particular to the field of artificial intelligence technologies such as computer vision and NLP (Natural Language Processing), and in particular to a model training method, device, electronic device and storage medium based on knowledge distillation. Background technique [0002] With the development of information technology, neural network models are widely used to inject into machine learning tasks such as computer vision, information retrieval, and information recognition. However, for better learning effect, neural network models often have a large number of parameters, which generally require huge calculation examples for inference and deployment. Such large neural network models cannot be deployed accordingly on devices of . That is to say, while ensuring excellent performance, due to the large scale of the model and the large amount of data, large neural network models ofte...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06V10/764G06K9/62G06V10/82G06V10/774G06N3/04G06N3/08G06N5/00
CPCG06N5/00G06N3/082G06N3/047G06F18/214G06F18/2415G06F18/2431G06V10/82G06V10/771G06F18/2113G06V10/50G06V20/00G06V10/72G06V10/764
Inventor 李建伟
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD