Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Image understanding method based on depth residual error network and LSTM

An image understanding and residual technology, applied in the field of deep learning and image semantic understanding, can solve problems such as difficulty in implementation, low recognition rate, and poor generalization

Active Publication Date: 2017-05-10
SOUTH CHINA UNIV OF TECH
View PDF4 Cites 46 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

From the perspective of algorithm implementation, currently commonly used image understanding algorithms have shortcomings such as poor generalization, low robustness, strong local dependence, difficult implementation, and low recognition rate.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image understanding method based on depth residual error network and LSTM
  • Image understanding method based on depth residual error network and LSTM
  • Image understanding method based on depth residual error network and LSTM

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0060] Such as figure 1 Shown is the method flowchart of the present invention, comprises the steps:

[0061] (1) Download training data sets: Download ImageNet and MS-COCO public image data sets from http: / / www.image-net.org and http: / / mscoco.org respectively. The ImageNet dataset is divided into a training image set and a test image set. The training image set contains 1,000 categories of pictures, 1,300 for each category, and the test image set contains 50,000 pictures; the MS-COCO dataset is divided into a training image set and a test image set. , the training image set contains 82,783 pictures, and the test image set contains 40,504 pictures. Correspondingly, each picture has 5 natural language sentences used to describe its content information.

[0062] (2), pretreatment:

[0063] For the ImageNet dataset: for each image, the image is scaled to a size of 256×256, and then 5 standard-size images with a size of 224×224 are intercepted from the top, middle, bottom, left,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an image understanding method based on a depth residual error network and an LSTM; the method comprises the following steps: firstly building a depth residual error network model so as to extract image abstract features, and storing the features as a feature matrix; using a dynamic attention mechanism in a LSTM model to dynamically form a proper feature vector according to the feature matrix; finally using the LSTM model to form a natural language (English) according to the feature vector. The method uses the advantages of the depth residual error network on image feature extraction and LSTM advantages on time sequence modeling; the depth residual error network and the LSTM model can form an encode-decode framework so as to convert the image content information into the natural language, thus extracting the deep information from the image.

Description

technical field [0001] The invention relates to the fields of image semantic understanding and deep learning, in particular to an image understanding method based on a deep residual network and LSTM (Long Short-term Memory). Background technique [0002] Image understanding refers to the understanding of image semantics. It takes the image as the object, knowledge as the core, and studies the location of the image, the relationship between the objects, and the scene of the image. [0003] The input of image understanding is image data, and the output is knowledge, which belongs to the high-level content in the field of image processing research. The focus is to further study the nature and relationship of each target in the image on the basis of image target recognition, and obtain an understanding of the meaning of the image content and an interpretation of the original objective scene, and then guide and plan behaviors. [0004] Currently commonly used image understandin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06K9/46
CPCG06V10/40G06F18/214
Inventor 胡丹袁东芝余卫宇李楚怡
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products