Multi-module knowledge distillation method based on MoCo model

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A distillation method, multi-module technology, applied in neural learning methods, character and pattern recognition, biological neural network models, etc., to achieve the effect of improving accuracy, good update, and reducing errors

Active Publication Date: 2022-07-22

CHINA UNIV OF MINING & TECH

View PDF9 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The purpose of the present invention is to provide a multi-module knowledge distillation method based on the MoCo model, which solves the problem of training large-scale data sets in the case of limited memory, and achieves In order to reduce the amount of calculation and improve the effect of memory efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0065] The multi-module knowledge distillation method based on MoCo model of the present invention, the steps are as follows:

[0066] Step S1: Randomly collect 5,000 labeled images in Imagenet, unify the sizes of these 5,000 images one by one, and then perform data enhancement to obtain 10,000 images with a pixel size of 256×256, which constitute the teacher network training set.

[0067] Step S2: Input the teacher network training set into the teacher network, and use the teacher network training set to pre-train the teacher network to obtain a pre-trained teacher network.

[0068] Step S3: Randomly collect 50,000 images in Instagram, unify the sizes of these 50,000 images one by one, and then perform data enhancement to obtain 100,000 images with a pixel size of 256×256, which constitute a teacher-student network training set.

[0069] Step S4, build a MoCo model of multi-module knowledge distillation:

[0070] The MoCo model includes a pre-training teacher network and a s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multi-module knowledge distillation method based on a MoCo model, and the method comprises the steps: dividing a teacher network and a student network into a plurality of corresponding modules through the characteristic that features generated in an intermediate process have similarity, extracting the features generated by each module of the teacher network and the student network through the MoCo model, calculating the similarity, and carrying out the knowledge distillation of the teacher network and the student network. And the purpose that the teacher network guides the student network is achieved by using the similarity. According to the method, the sample features can be automatically and dynamically updated on the basis of only a small number of tags, the memory efficiency of the method is higher, the problem of training a large-scale data set under the condition of a limited memory is solved, and a student network under the guidance of a teacher network has robustness and generalization at the same time.

Description

technical field [0001] The invention belongs to a model lightweight technology, and in particular relates to a multi-module knowledge distillation method based on a MoCo model. Background technique [0002] In recent years, machine learning and deep learning have made remarkable progress in computer vision, natural language processing, prediction, and audio processing. Deploying it on devices is difficult. In knowledge distillation, a larger cumbersome network (teacher model) trained on a large dataset can transfer the learned knowledge well to a smaller and lighter network as a student model. [0003] In the research based on cues from slender networks, a two-stage strategy is introduced to train deep networks, but there is no significant speed improvement; deep mutual learning proposes teacher-student networks to learn from each other and update at the same time, but it is difficult to extract learning More detailed information brings greater errors; in the regeneration ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06V10/774G06V10/776G06V10/778G06V10/82G06V10/74G06V10/772G06N3/08G06N3/04G06K9/62

CPCG06N3/08G06N3/084G06N3/088G06N3/045G06F18/28G06F18/2155G06F18/217G06F18/22G06F18/214

Inventor 王军袁静波刘新旺李玉莲李兵

Owner CHINA UNIV OF MINING & TECH

Multi-module knowledge distillation method based on MoCo model

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology