Multi-angle and multi-mode fused image description generation method and system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An image description and multi-modal technology, applied in the field of image processing, can solve the problems of single angle of image description content, lack of content, and inability to fully describe image content, etc., to achieve the effect of eliminating redundancy and improving learning ability

Active Publication Date: 2019-11-15

QILU UNIV OF TECH

View PDF5 Cites 35 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The traditional image description method has a single angle of image description content, lack of content, and cannot fully describe the content displayed in the image

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0030] Such as figure 1 As shown, someone will see an adult wearing a blue shirt and a blue baseball cap, someone will see a child holding a doll, someone will see a red car next to an adult, someone will see a white car next to a red car, The scenes that people see are all the pictures shown on the image, but the viewing angles are different. figure 1 (a)-(d) are different objects identified from the figure respectively, for figure 1 A corresponding descriptive statement may include:

[0031] 1.a man in a blue shirt playing frisbee with a little boy in the park.

[0032] 2.a red car beside the man dressing a blue shirt in the park.

[0033] 3.a little boy holding a toy in the park.

[0034] 4.a white beside the tree in the park.

[0035] The purpose of this embodiment is to learn a complete image description from multiple perspectives by combining image and text modalities, so as to fully express the content contained in the image. Based on this, this embodiment disclos...

Embodiment 2

[0080] The purpose of this embodiment is to provide an image description generation system that integrates multiple angles and multiple modalities.

[0081] In order to achieve the above purpose, this embodiment provides a multi-angle and multi-modal image description generation system, including:

[0082] The visual feature extraction module receives the image to be described, extracts the global visual features and local visual features of the image and fuses them to obtain the fused visual features;

[0083] The sentence generation module adopts a single-layer long-short-term memory network, and takes the fusion visual features as input to obtain the first sentence image description;

[0084] The sentence regeneration module generates the first sentence semantic vector according to the first sentence image description; adopts the attention-based long-short-term memory network language generation model, and uses the local visual features and the first sentence semantic vecto...

Embodiment 3

[0086] The purpose of this embodiment is to provide an electronic device.

[0087] An electronic device, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor executes the program, the following steps are implemented, including:

[0088] receiving the image to be described, extracting the global visual features and local visual features of the image and fusing them to obtain the fused visual features;

[0089] Using a single-layer long-short-term memory network, the fusion of visual features is used as input to obtain the first image description;

[0090] Generate the first sentence semantic vector according to the first sentence image description;

[0091] An attention-based long-short-term memory network language generation model is adopted, and the local visual features and the first sentence semantic vector are used as input to generate the next image description sentence, thereby obtaining a comple...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multi-angle and multi-mode fused image description generation method and system, and the method comprises the following steps: receiving a to-be-described image, extracting the global visual features and local visual features of the image, and carrying out the fusion of the global visual features and local visual features, and obtaining fused visual features; using a single-layer long-short-term memory network, the fused visual features serving as input, and obtaining a first sentence of image description; generating a first sentence semantic vector according to the first sentence image description; and generating a next image description sentence by adopting an attention-based long-term and short-term memory network language generation model and taking the localvisual features and the first sentence semantic vector as input, thereby obtaining complete image description. According to the method, two modes of visual features and text semantic features are fused, and an attention mechanism is combined, so that multi-angle comprehensive description of the image is realized.

Description

technical field [0001] The invention belongs to the technical field of image processing, and in particular relates to an image description generation method and system that integrates multiple angles and multiple modes. Background technique [0002] The statements in this section merely provide background information related to the present disclosure and do not necessarily constitute prior art. [0003] In recent years, the fields of natural language processing (NLP) and computer vision (CV) have made tremendous progress in analyzing and generating text and understanding images and videos. In daily work, there are many scenarios that require combining language and visual information, such as interpreting photos in the context of newspaper articles. In addition to this, the web provides a wealth of data combining linguistic and visual information: labeled photos, newspaper illustrations, videos with subtitles, and multimodal information on social media. In these scenarios, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N3/04G06N3/08G06K9/62G06F17/27

CPCG06N3/08G06N3/044G06N3/045G06F18/253

Inventor 杨振宇张姣

Owner QILU UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-angle and multi-mode fused image description generation method and system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology