Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Image classification method based on linear self-attention Transform

A classification method and attention technology, applied in the field of computer vision, can solve problems such as large amount of calculation, lack of translation invariance and locality, and achieve the effect of reducing computational complexity, reducing dependence, and improving network performance

Pending Publication Date: 2022-07-29
NANTONG UNIVERSITY
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, when the input image resolution is too high, the amount of calculation required for image classification using ViT will be very large.
Moreover, ViT lacks inductive biases such as translation invariance and locality compared to convolutional neural networks, so the amount of data required for ViT model training is greater than the amount of data required for convolutional neural networks.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image classification method based on linear self-attention Transform
  • Image classification method based on linear self-attention Transform
  • Image classification method based on linear self-attention Transform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The present invention is described in detail below in conjunction with the accompanying drawings for further explanation, so that those skilled in the art can more deeply understand the present invention and can implement it, but the following examples are only used to explain the present invention, not as the present invention limit.

[0032] like figure 1 As shown, an image classification method based on linear self-attention Transformer proposed by the present invention, the main body of the network model is composed of four different stages, and each stage is composed of overlapping convolutional coding module and Transformer module. The computational complexity of the attention mechanism of the Transformer module is linearly related to the number of input tokens. Compared with the ViT model and some of its variants, the computational complexity of the network model of the present invention is significantly reduced. Information is modeled, the present invention use...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of computer vision, in particular to an image classification method based on a linear self-attention Transform. The method comprises the following steps: S1, sending a picture to an overlapping convolution coding module of a first stage, and coding the picture into a picture token by using roll operation; s2, sending the picture token into a Transform module in the stage, and extracting a picture feature vector; s3, the extracted picture feature vectors are sent to an overlapping convolution coding module of the next stage, and the feature vector dimensionality is increased while the number of the feature vectors is reduced; s4, repeating the step S2 and the step S3, and obtaining a final output vector from the Transform module of the last stage; and S5, converting the final output vector into probability representation through a classifier unit, and completing image classification. According to the method, the image features can be effectively extracted from the image, the calculation complexity of the Transform module is remarkably reduced, and the ability of the model to extract the image features is improved through the overlapping convolutional coding module and the convolutional feedforward neural network module.

Description

technical field [0001] The invention relates to the technical field of computer vision, in particular to an image classification method based on linear self-attention Transformer. Background technique [0002] From the revolutionary performance of AlexNet in the ImageNet classification challenge, the CNN network architecture has developed rapidly. Since then, deeper and more efficient convolutional network structures have been proposed to further promote the wave of deep learning in the field of computer vision, such as VGG, GoogleNet, Resnet, DenseNet, HRNet, and EfficientNet. CNN and its variants have become the main backbone architecture for computer vision applications. [0003] Transformer was first proposed by the Google team in 2017 for translation tasks in the field of natural language processing (NLP). It can improve computational efficiency by using a multi-head attention mechanism to model long-range dependencies and parallel computing. Therefore, the Transform...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06N3/04G06N3/08G06V10/764G06V10/82
CPCG06N3/08G06N3/045G06F18/241
Inventor 王则林徐昂陈方宁张玮业刘欣珂
Owner NANTONG UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products