Image description generation method of multi-modal feature fusion network

A feature fusion and image description technology, applied in neural learning methods, biological neural network models, character and pattern recognition, etc., can solve the problems of insufficient image feature extraction and utilization, and the existence of semantic gaps.

Active Publication Date: 2021-11-19
CHONGQING NORMAL UNIVERSITY
View PDF7 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Existing representative models such as LSTM-A, Plstm-a-2, VS-LSTM, DAA, RFNet, Up-Down, and VSV-VRV-POS have insufficient extraction and utilization of image features, and semantic gaps still exist

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image description generation method of multi-modal feature fusion network
  • Image description generation method of multi-modal feature fusion network
  • Image description generation method of multi-modal feature fusion network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

[0035] see Figure 1 ~ Figure 4 , the present invention provides a method for generating an image description of a multimodal feature fusion network, comprising:

[0036] S101 constructing a multimodal feature fusion network;

[0037] The multimodal feature fusion network is formed by cascading multi-layer feature fusion modules, and each layer is composed of an attention module and a recurrent neural network; each layer includes local feature information and global feature information, and the The local feature information is ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of image data processing, and discloses an image description generation method of a multi-modal feature fusion network, which comprises the following steps: constructing the multi-modal feature fusion network; designing a decoding end on the infrastructure of the Up-Down model; integrating the multi-modal feature fusion network into a decoding end to form an image description generation model based on the multi-modal feature fusion network; training an image description generation model based on the multi-modal feature fusion network; and inputting a test image, and verifying the performance of the image description generation model based on the multi-modal feature fusion network. According to the invention, a hierarchical structure is constructed by using a recurrent neural network to fuse coding features, and input information is weighted by using an attention mechanism, so that single image features extracted by an encoder are associated with each other, feature interaction is enhanced, the association between hidden layer vectors and object features is better mined, and higher image description generation performance is realized.

Description

technical field [0001] The invention relates to the field of image data processing, in particular to an image description generation method of a multimodal feature fusion network. Background technique [0002] Understanding an image largely depends on the obtained image features, and the techniques used to obtain features include traditional machine learning techniques and deep machine learning techniques. Traditional machine learning techniques extract artificial features such as LBPs, SIFT, and HOG, and send them or their combination into classifiers such as SVM to determine object categories. This method has two shortcomings. One is that these artificial features are Task-oriented, it is not feasible to extract such features from large and diverse datasets; second, real-world data is complex and has different semantic interpretations. On the contrary, deep machine learning technology can automatically learn features from the training set, and is suitable for processing l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/46G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06N3/044G06N3/045G06F18/253G06F18/214
Inventor 杨有陈立志杨学森余平尚晋
Owner CHONGQING NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products