Video description text generation method based on multi-modal fusion

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A video description and text technology, applied in the field of image processing, can solve problems such as unstable semantic direction, inability to reflect video dynamic content and time domain information, and large divergence of description text content, so as to improve accuracy and robustness Effect

Pending Publication Date: 2020-12-11

新华智云科技有限公司

View PDF6 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] Then the above-mentioned existing technology will frame the video, and use the image after the frame extraction as an independent feature to output the description text, but the independent image after the frame extraction cannot reflect the dynamic content and time domain information of the video; and naturally The output of language description text requires the support of text-level information. However, the above-mentioned existing technologies do not integrate the characteristics of text-level information, which leads to large divergence of the output description text content and unstable semantic direction.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0045] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0046] It should be noted that, in the case of no conflict, the embodiments of the present invention and the features in the embodiments can be combined with each other.

[0047] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments, but not as a limitation of the present invention.

[0048] The present invention includes a method for generating video description text based on mult...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a video description text generation method based on multi-modal fusion, and the method comprises the steps: obtaining a to-be-described video which comprises a video frame, wherein the to-be-described video is provided with a corresponding video description statement; obtaining text theme information of the video description statement, and setting a text theme information code for each piece of text theme information; respectively obtaining dynamic time domain information codes, static information codes and audio feature vector codes of the to-be-described video; performing fusion processing on the dynamic time domain information code, the static information code and the audio feature vector code of the to-be-described video to obtain a fusion result; inputting the fusion result and the text theme information code into a first recurrent neural network for iterative processing, and determining a video content description text of the to-be-described video. The beneficial effects of the invention are that the method achieves the generation of the natural language description of the video on the basis of the fusion of various modes of the video, audio and text, and improves the generation accuracy and robustness.

Description

technical field [0001] The invention relates to the technical field of image processing, in particular to a method for generating video description text based on multimodal fusion. Background technique [0002] Video resources have become the most popular and favorite way for people to obtain information, especially after the emergence of some video APPs, watching videos every day has become an indispensable way of leisure and entertainment for many people. In order to better serve users, it is necessary to express the core information in the video in text form for recommendation display. Therefore, there must be a method capable of outputting the core content information of a given video. [0003] At present, video content description (video captioning) is usually performed on a video. The video content description is to generate a piece of text describing the video content by giving a piece of video. The video content description needs to describe the video content in a ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/78

CPCG06F16/7867

Inventor 刘辉

Owner 新华智云科技有限公司

Video description text generation method based on multi-modal fusion

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology