Multi-label image classification method based on multi-scale and cross-modal attention mechanism

A classification method and multi-scale technology, applied in neural learning methods, computer components, character and pattern recognition, etc., can solve the limitation of feature extraction, insufficient utilization of label features and image features, and the fusion angle of image features and label features Single and other problems, to increase the angle of information utilization, improve the effect of prediction, and increase the effect of richness

Active Publication Date: 2021-11-16
SOUTH CHINA NORMAL UNIVERSITY
View PDF8 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] 1. Since the image size of the input model is fixed, there will be restrictions on feature extraction
[0006] 2. In a single model, the fusion angle of image features and label features is single, and the utilization of label features and image features is not sufficient
[0007] For the establishment of the relationship between the image local area and the label feature, there is a shortcoming of insufficient label feature representation ability. At the same time, there are more ways to explore how to use the learned semantic attention (CN2020111001588: A cross-modal based State-of-the-art fast multi-label image classification method and system)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-label image classification method based on multi-scale and cross-modal attention mechanism
  • Multi-label image classification method based on multi-scale and cross-modal attention mechanism
  • Multi-label image classification method based on multi-scale and cross-modal attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0072] A multi-label image classification method based on multi-scale and cross-modal attention mechanisms, such as figure 1 , figure 2 shown, including the following steps:

[0073] S1. Construct a label map and learn label features through a graph convolutional neural network;

[0074] In this embodiment, the first training set MS-COCO is obtained, the number of occurrences of various labels in the first training set is counted, and the conditional probability between any two types of labels is calculated according to the number of occurrences of each type of label in the first training set, All conditional probabilities form a relationship matrix A, and the obtained tag word vector matrix H and relationship matrix A are input into the graph convolutional neural network (GCN) to obtain the co-occurrence relationship word vector matrix W corresponding to all C-type tags.

[0075] Step S1 specifically includes the following steps:

[0076] S1.1. Count the number of occurre...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-label image classification method based on a multi-scale and cross-modal attention mechanism. The method comprises the following steps: constructing a label graph, and learning label features through a graph convolutional neural network; obtaining a to-be-classified image, and extracting image features from a pre-trained convolutional neural network; constructing a classification model, and respectively inputting the obtained label features and image features into an MSML-GCN module and a GCN-SGA module for feature fusion calculation; fusing the obtained prediction results to obtain a final prediction label, and performing iterative training on the classification model by using a multi-label classification loss function to obtain a trained classification model; and inputting the extracted image features of the to-be-classified image into the trained classification model to obtain a multi-label image classification result. The technical problem that the image classification effect is poor due to the fact that an existing image classification method fully learns the dependency relationship between labels every year can be solved.

Description

technical field [0001] The invention relates to the field of multi-label image classification, in particular to a multi-label image classification method based on a multi-scale and cross-modal attention mechanism. Background technique [0002] Nowadays, Multi-label image classification (Multi-label image classification) has been widely used in the field of computer vision, including multi-target recognition, sentiment analysis, medical diagnosis recognition, etc. Since each image contains multiple objects, how to effectively learn the relationship between these objects and how to fuse these relationships with image features is still full of challenges. Regarding how to learn label features, the mainstream method is mainly through simple fully connected network learning and the graph neural network that has become popular in recent years. Fully connected network learning has a weak ability to represent label dependencies, while graph neural networks are not capable of label d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06N3/045G06F18/2431G06F18/22G06F18/214
Inventor 余松森许飞腾梁军
Owner SOUTH CHINA NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products