Image description model training method and device

A technology of image description and training methods, applied in neural learning methods, biological neural network models, character and pattern recognition, etc., can solve problems such as loss of image visual scene details

Active Publication Date: 2020-09-08
苏州遐迩信息技术有限公司
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the state-of-the-art, encoder-decoder image captioning models usually generate image captions based on global features extracted from the image, even though attention mechanisms are combined with encoder-decoder architectures to extract regions of interest from global features. feature to focus on the region of interest in the image, a large amount of detailed information in the visual scene of the image is still lost in the generation process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image description model training method and device
  • Image description model training method and device
  • Image description model training method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The present invention will be described in detail below with reference to the embodiments shown in the accompanying drawings. However, this embodiment does not limit the present invention, and any structural, method, or functional changes made by those skilled in the art according to this embodiment are included in the protection scope of the present invention.

[0026] The following description and the accompanying drawings sufficiently illustrate specific embodiments herein to enable those skilled in the art to practice them. Portions and features of some embodiments may be included in or substituted for those of other embodiments. The scope of the embodiments herein includes the full scope of the claims, and all available equivalents of the claims. Herein, the terms "first", "second", etc. are only used to distinguish one element from another element without requiring or implying any actual relationship or order between these elements. In fact the first element can...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a training method for an image description model. The method comprises the following steps: receiving a plurality of training images, and extracting an interested region feature vector, a category feature word vector and an image entity feature vector corresponding to each training image; creating an image description model, wherein the image description model comprises anencoding device comprising a plurality of layers of encoding modules, a decoding device comprising a plurality of layers of decoding modules, a self-attention mechanism feature fusion layer and a multi-dimensional convolution kernel feature extractor; the encoding module comprises a multi-dimensional convolution kernel feature extractor, two Self Attention feature extractors and a simple feedforward network; wherein the decoding module comprises a multi-dimensional convolution kernel feature extractor, a multi-head action feature extractor with a mask, two multi-head action feature extractorsand a simple feed-forward network, and the multi-head action feature extractor and the multi-head action feature extractors are connected with the simple feed-forward network. The encoding devices areconnected through a self-attention mechanism feature fusion layer; performing cross entropy loss and reinforcement learning training on the image description model based on the plurality of trainingimages; therefore, a training method is provided.

Description

technical field [0001] The invention relates to the technical field of image description, in particular to a training method and device for an image description model. Background technique [0002] The main purpose of Image Caption is to generate a natural language description for an image, and then through this natural language description, it can help applications understand the semantics expressed in the image visual scene. For example, image description can transform image retrieval into text retrieval, which can be used to classify images and improve image retrieval results. [0003] Early image description methods can be summarized as: extracting objects and attributes from images, and then filling the obtained objects and attributes into predefined sentence templates. With the popularity of deep learning, modern image description methods mainly adopt encoder-decoder architecture, in which Convolutional Neural Network (CNN) is usually used as an encoder for feature ex...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06K9/32G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06V40/23G06V10/25G06N3/047G06N3/045G06F18/2415G06F18/241
Inventor 罗轶凤王俊豪
Owner 苏州遐迩信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products