Multi-angle and multi-mode fused image description generation method and system

An image description and multi-modal technology, applied in the field of image processing, can solve the problems of single angle of image description content, lack of content, and inability to fully describe image content, etc., to achieve the effect of eliminating redundancy and improving learning ability
CN110458282AActive Publication Date: 2019-11-15QILU UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
QILU UNIV OF TECH
Publication Date
2019-11-15

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a multi-angle and multi-mode fused image description generation method and system, and the method comprises the following steps: receiving a to-be-described image, extracting the global visual features and local visual features of the image, and carrying out the fusion of the global visual features and local visual features, and obtaining fused visual features; using a single-layer long-short-term memory network, the fused visual features serving as input, and obtaining a first sentence of image description; generating a first sentence semantic vector according to the first sentence image description; and generating a next image description sentence by adopting an attention-based long-term and short-term memory network language generation model and taking the localvisual features and the first sentence semantic vector as input, thereby obtaining complete image description. According to the method, two modes of visual features and text semantic features are fused, and an attention mechanism is combined, so that multi-angle comprehensive description of the image is realized.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention belongs to the technical field of image processing, and in particular relates to an image description generation method and system that integrates multiple angles and multiple modes. Background technique

[0002] The statements in this section merely provide background information related to the present disclosure and do not necessarily constitute prior art.

[0003] In recent years, the fields of natural language processing (NLP) and computer vision (CV) have made tremendous progress in analyzing and generating text and understanding images and videos. In daily work, there are many scenarios that require combining language and visual information, such as interpreting photos in the context of newspaper articles. In addition to this, the web provides a wealth of data combining linguistic and visual information: labeled photos, newspaper illustrations, videos with subtitles, and multimodal information on social media. In these scenarios, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More