Video action migration method

A video, action technology, applied in the field of computer vision, can solve problems such as lack of effective methods

Active Publication Date: 2019-09-03
SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Therefore, there is a lack of an effective...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video action migration method
  • Video action migration method
  • Video action migration method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0029] The problem addressed by this application is located in the migration of human actions in videos. V={I 1 , I 2 ,...,I N} represents an N-frame video in which a single person performs whole-body movements, such as dancing. To simplify the problem, it is assumed that both the viewpoint (camera) and the background are stationary, even so, it remains an unsolved challenging problem. Given a source video V S and the target action video V T , the goal of action transfer is to put V T The action migrates to V S , while maintaining V S appearance characteristics. In this way, for the generated target video V O , you can control both motion and appearance while displaying. A pre-trained 2D pose detection model is used to extract the action sequences P={p 1 ,p 2 ,...,p N}. each p tIndicates the posture of the t-th frame, and the representation in the implementation is a thermal value map of M channels, where M=14 represents the number of key points. Denote the sou...

Embodiment 2

[0071] This application uses PSNR and VFID as evaluation indicators. To calculate the VFID, first use a pre-trained video classification model I3D to extract video features, and then calculate the mean and covariance matrix on all videos in the dataset The final VFID is calculated by the formula:

[0072]

[0073] VFID measures both visual effect and temporal continuity.

[0074]For transfer within the same video, the real video is the target video, and PSNR and VFID can be easily calculated. For cross-video transfer, PSNR cannot be calculated because there is no real frame correspondence. At the same time, the reference significance of VFID is also greatly reduced, because the appearance and background will also greatly affect the features extracted by the I3D network. Therefore, only quantitative results of in-video action transfer are provided.

[0075] Table 1 Quantitative results

[0076]

[0077] The above table shows the PSNR and VFID scores of different met...

Embodiment 3

[0087] This application has also done a qualitative experiment. Two scenarios of intra-video action migration and cross-video action migration were tested respectively. These two scenarios correspond to two different test subsets: i) Cross-video test set, source character / background frame and target action video are from different video sequences. ii) In-video test set, source person / background frames and target action videos are from the same video sequence. For each set, 50 pairs of videos were randomly selected from the test set as a test subset. Note that in the intra-video test subset, it is ensured that the source and target sequences do not intersect or overlap.

[0088] In the results generated by the base model for a single frame, obvious blurring and unnaturalness can be observed.

[0089] The results of the max pooling fusion method tend to generate strange colors and shadows in the foreground and background, and the reason is guessed to be the persistence effect...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a video action migration method. The video action migration method comprises the following steps: extracting action sequences of a source video and a target action video, and respectively generating a source posture and a target posture; receiving an image input of the source video; performing preliminary feature extraction on the foreground and the background; respectivelyfusing the preliminary features of the background and the foreground to generate a fusion feature of the background and a fusion feature of the foreground; synthesizing fusion features through the fusion features of the background to synthesize the background; synthesizing fusion features through the fusion features of the foreground to synthesize a foreground and a foreground mask, and further obtaining a frame model of the target video after action migration at the t moment; and adding a loss function to the frame model, wherein the loss function comprises a content loss function and a confrontation loss function, the content loss function comprises pixel-level error loss and perception error loss, and the confrontation loss function comprises spatial confrontation loss and multi-scale time domain confrontation loss. An overall assembly line model with universality and flexibility is constructed.

Description

technical field [0001] The invention relates to the technical field of computer vision, in particular to a video action transfer method. Background technique [0002] Portrait video generation is a cutting-edge topic with a large number of application scenarios. It can be used to generate training data for higher-level vision tasks, such as human pose estimation, object detection and grouping, individual identification, and more. It also helps in the development of more powerful video directional editing tools. There are three main types of existing portrait video generation methods: unconditional video generation, video frame prediction, and video action transfer. [0003] Unconditional video generation focuses on mapping multiple sets of 1D latent vectors to portrait videos, and this method relies on 1D latent vectors to simultaneously generate video appearance and motion information. After training, different generated videos can be obtained by randomly sampling in the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/00G06K9/62
CPCG06V20/46G06F18/253
Inventor 袁春成昆黄浩智刘威
Owner SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products