Video generation model based on text description information and generative adversarial network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for describing information and generating models, applied in the field of computer vision, can solve the problems of lack of generalization, difficult training of GAN, inflexibility, etc., and achieve the effect of good generalization ability

Active Publication Date: 2018-09-28

SUN YAT SEN UNIV

View PDF4 Cites 29 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In 2016, M.Saito et al. began to apply GAN to video generation in the paper Temporal generative adversarial nets and C.Vondrick et al. Mapped to the same space, it can only handle videos of the same length, which is relatively inflexible

[0010] The existing GAN technology is mainly applied to image generation. Although only the dimension of time is added when generating video, it makes GAN very difficult to train. This is because video is a record of time and space. , the model not only needs to learn the appearance of the object as in image generation, but also needs to learn the laws of its motion to generate realistic videos

In addition, adding the dimension of time will bring huge changes. For example, the same person doing the same action at different speeds will be judged as a different video. The training mechanism proposed by MoCoGAN solves this problem to a certain extent, but The model only implements several simple videos about human activities and lacks generalization

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0023] The present invention mainly includes two parts: text information processing and generative confrontation model design, corresponding to figure 1 The parts indicated by (1) and (2) in.

[0024] The first is the effective processing of text information. The main purpose is to obtain word vectors that are closely related to videos and have generalization properties. The present invention refers to the objects2action model proposed by Mihir Jain et al. figure 2 shown. The model uses an image data set with text information (usually label information, not a single word, usually 2 to 4 words, such as: brush hair, diving springboard 3m, etc.) as a training set, expressed as D≡{x,y }, where x is the image and y is the label information; the video with label information is used as the test set, expressed as T≡{v,z}, where v is the video and z is the video label information. And y and z are completely disjoint.

[0025] like figure 2 As shown, first, the model will use the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a video generation model based on text description information and a generative adversarial network. According to the model, videos with the text description information are used as training data; a self-help sampling method is adopted to extract part of the videos in the training data and the text description information corresponding to the videos, and the videos and thecorresponding text description information are jointly input into an action recognition model for training; the remaining training data is input into the action recognition model for training after the corresponding text description information is removed, so that effective word vectors of the text description information in the training data are trained; and the word vectors and the videos are input into a provided generative adversarial network model, and the word vectors are used as limitation conditions so that a generator in the model can generate videos.

Description

technical field [0001] The present invention relates to the technical field of computer vision, and more specifically, relates to a video generation model based on text description information and a generation confrontation network. Background technique [0002] Image generation and video generation have always been a very important part of computer vision. In recent years, the technology of using generative models in machine learning to realize image generation has attracted much attention. Since Ian Goodfellow proposed Generative Adversarial Network (GAN) in 2014, After providing new ideas and methods for generative learning, both image generation and video generation technologies have been greatly improved. [0003] The present invention mainly involves the design of two parts, which are respectively the processing of text information and the design of generating confrontation network. [0004] The first is the processing of text information to obtain generalized and eff...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/62G06K9/00G06F17/27

CPCG06F40/289G06V40/20G06F18/24G06F18/214

Inventor 吴贺俊练紫莹

Owner SUN YAT SEN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Video generation model based on text description information and generative adversarial network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology