An Image Description Method Based on Conditional Random Fields and Intrinsic Semantic Attention

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A conditional random field and image description technology, applied in the field of semantic description, can solve the problems of single image description, inaccurate results, time-consuming and other problems, achieve good results, ensure accuracy, and enhance the effect of connection

Active Publication Date: 2021-12-10

BEIJING INSTITUTE OF TECHNOLOGYGY

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] Traditional image semantic description methods include image description methods based on template filling and image description methods based on retrieval, the results of these methods are not accurate enough, and consume a lot of time and extra work

For several years, the deep learning method based on the encoder-decoder architecture has also been applied to the field of image semantic description generation, but the inconsistency between the image description training and generation process has always made the generated image description too single, and also Not accurate enough

However, the current method still cannot effectively solve the problem of inaccurate sentence structure and repeated phrases in the generated description.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0071] This embodiment is a process of training and applying on the MS COCO dataset.

[0072] An image description method based on conditional random fields and internal semantic attention, such as figure 1 shown, including the following steps:

[0073] Step 1: Process the training data. The dataset for image description generation on MS COCO is used as the training set. Preprocess all the descriptions in the data set, convert all descriptions to lowercase representation, then count the number of each word, save the words that appear more than 5 times into the dictionary, and save the words that appear less than 5 times Words and blank positions are replaced with "UNK", and the dictionary Vocab is finally obtained. At the same time, for the descriptions in the data set whose reference descriptions are less than 5 sentences, the existing reference descriptions are randomly copied so that each image has at least 5 corresponding descriptions. Then use the spaCy method to extr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to an image description method based on conditional random fields and internal semantic attention, and belongs to the cross technical field of computer vision and natural language processing. First process the training data, then design the network structure model, use the existing convolutional neural network and target detection network to extract image features, and design a cyclic neural network with internal semantic attention mechanism and attention feature residual structure to generate images corresponding to Description; Next, the combination of cross-entropy loss function and conditional random field loss function is used as the training target, and the processed training data is used to train the network model to obtain a network with image semantic description generation function; finally, input any image The network, get the corresponding description. Compared with the prior art, this method not only ensures the accuracy of the sentence structure of the generated description, but also solves the problem of repeated phrases in the generated description, so that the generated description can better capture the key information in the image.

Description

technical field [0001] The present invention relates to an image description method based on conditional random field and internal semantic attention, in particular to a deep network model based on conditional random field, using a unique internal semantic attention mechanism combined with a network model of attention feature residual structure To generate a semantic description corresponding to an image, which belongs to the cross-technical field of computer vision and natural language processing. Background technique [0002] With the explosive growth of image data on the Internet, it has become unrealistic to identify and retrieve image semantic information manually. The data structure of the image itself is relatively abstract, but it contains a wealth of information. The method of deep learning to generate descriptions for images and mine the semantic information in them has a wide range of application scenarios in image retrieval, AI question answering, intelligent rec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G06K9/62G06N3/04G06N3/08

CPCG06N3/08G06N3/044G06N3/045G06F18/2411G06F18/25

Inventor宋丹丹骆源

OwnerBEIJING INSTITUTE OF TECHNOLOGYGY

An Image Description Method Based on Conditional Random Fields and Intrinsic Semantic Attention

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology