Multi-label image classification method based on multi-scale and cross-modal attention mechanism

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A classification method and multi-scale technology, applied in neural learning methods, computer components, character and pattern recognition, etc., can solve the limitation of feature extraction, insufficient utilization of label features and image features, and the fusion angle of image features and label features Single and other problems, to increase the angle of information utilization, improve the effect of prediction, and increase the effect of richness

Active Publication Date: 2021-11-16

SOUTH CHINA NORMAL UNIVERSITY

View PDF8 Cites 6 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] 1. Since the image size of the input model is fixed, there will be restrictions on feature extraction

[0006] 2. In a single model, the fusion angle of image features and label features is single, and the utilization of label features and image features is not sufficient

[0007] For the establishment of the relationship between the image local area and the label feature, there is a shortcoming of insufficient label feature representation ability. At the same time, there are more ways to explore how to use the learned semantic attention (CN2020111001588: A cross-modal based State-of-the-art fast multi-label image classification method and system)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0072] A multi-label image classification method based on multi-scale and cross-modal attention mechanisms, such as figure 1 , figure 2 shown, including the following steps:

[0073] S1. Construct a label map and learn label features through a graph convolutional neural network;

[0074] In this embodiment, the first training set MS-COCO is obtained, the number of occurrences of various labels in the first training set is counted, and the conditional probability between any two types of labels is calculated according to the number of occurrences of each type of label in the first training set, All conditional probabilities form a relationship matrix A, and the obtained tag word vector matrix H and relationship matrix A are input into the graph convolutional neural network (GCN) to obtain the co-occurrence relationship word vector matrix W corresponding to all C-type tags.

[0075] Step S1 specifically includes the following steps:

[0076] S1.1. Count the number of occurre...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multi-label image classification method based on a multi-scale and cross-modal attention mechanism. The method comprises the following steps: constructing a label graph, and learning label features through a graph convolutional neural network; obtaining a to-be-classified image, and extracting image features from a pre-trained convolutional neural network; constructing a classification model, and respectively inputting the obtained label features and image features into an MSML-GCN module and a GCN-SGA module for feature fusion calculation; fusing the obtained prediction results to obtain a final prediction label, and performing iterative training on the classification model by using a multi-label classification loss function to obtain a trained classification model; and inputting the extracted image features of the to-be-classified image into the trained classification model to obtain a multi-label image classification result. The technical problem that the image classification effect is poor due to the fact that an existing image classification method fully learns the dependency relationship between labels every year can be solved.

Description

technical field [0001] The invention relates to the field of multi-label image classification, in particular to a multi-label image classification method based on a multi-scale and cross-modal attention mechanism. Background technique [0002] Nowadays, Multi-label image classification (Multi-label image classification) has been widely used in the field of computer vision, including multi-target recognition, sentiment analysis, medical diagnosis recognition, etc. Since each image contains multiple objects, how to effectively learn the relationship between these objects and how to fuse these relationships with image features is still full of challenges. Regarding how to learn label features, the mainstream method is mainly through simple fully connected network learning and the graph neural network that has become popular in recent years. Fully connected network learning has a weak ability to represent label dependencies, while graph neural networks are not capable of label d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/62G06N3/04G06N3/08

CPCG06N3/08G06N3/045G06F18/2431G06F18/22G06F18/214

Inventor 余松森许飞腾梁军

Owner SOUTH CHINA NORMAL UNIVERSITY

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-label image classification method based on multi-scale and cross-modal attention mechanism

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology