Visual salience and semantic attribute based cross-modal image natural language description method

A technology of semantic attributes and natural language, applied in the field of natural language description of cross-modal images based on visual salience and semantic attributes, can solve the problems of lack of focus and low accuracy of target description, so as to increase the importance and reduce the contribution , the effect of improving the accuracy

Active Publication Date: 2018-02-13
XIDIAN UNIV
View PDF8 Cites 51 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] To sum up, the problems existing in the existing technology are: the current top-down image

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Visual salience and semantic attribute based cross-modal image natural language description method
  • Visual salience and semantic attribute based cross-modal image natural language description method
  • Visual salience and semantic attribute based cross-modal image natural language description method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0035] The application principle of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0036] Such as figure 1 As shown, the cross-modal image natural language description method based on visual salience and semantic attributes provided by the embodiment of the present invention includes the following steps:

[0037] S101: Divide the image into sub-regions and use CNN to extract multi-scale depth visual features from the image;

[0038] S102: Input the multi-scale feature vector extracted by CNN into the pre-trained saliency model, regress the saliency score of each s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of computer vision and natural language processing, and discloses a visual salience and semantic attribute based cross-modal image natural language description method. The method comprises the steps that multiscale deep visual features of all regions are extracted by adopting a convolutional neural network; by means of a pre-trained significance model,an image significance graph is returned, and an original image is weighted; a predefined dictionary is built to serve as a semantic attribute category, and semantic attribute detection is conducted ona visual significance image; semantic attributes are calculated through multi-instance learning; image features are weighted through the semantic attributes; visual-salience-based semantic attributefeatures are decoded through a long short-term memory network, and image description is generated. The method has the advantage of being high in accuracy and can be used for image retrieval under complex scenes, multi-objective image semantic understanding and the like.

Description

technical field [0001] The invention belongs to the technical field of computer vision and natural language processing, and in particular relates to a natural language description method for cross-modal images based on visual salience and semantic attributes. Background technique [0002] The automatic image description system can automatically generate accurate, fluent, and close to human natural language descriptions based on the interactive relationship between objects and the environment in the image, so as to understand the semantics of the content in the visual scene. The system unifies image visual features and semantic information, makes image semantic information reflect its visual content more objectively, and uses semantic information for high-level reasoning, large-scale image organization, and final image understanding. Compared with other popular directions in the field of computer vision such as image retrieval, image segmentation and other fields, the essence...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06N3/08
CPCG06N3/084G06F18/217G06F18/214
Inventor 田春娜王蔚高新波李明郎君王秀美张相南刘恒袁瑾
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products