Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Video prediction method based on time sequence correction convolution

A prediction method and timing correction technology, applied in the field of computer vision, can solve problems such as insufficient description of spatial features, large computing overhead, and obstacles to the description of model space features, so as to enhance the ability to capture long-term dependencies and enhance the ability of relationships , Enhance the effect of depicting ability

Active Publication Date: 2022-07-15
HANGZHOU DIANZI UNIV
View PDF7 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The shortcomings of the above methods are mainly manifested in three aspects: 1) The same convolution kernel is used to act on each area of ​​the video frame at different times, but the spatial features in the video sequence change with time, and the same convolution kernel is used at different times. The convolution kernel of the parameter will hinder the model's description of the spatial characteristics; 2) The method of adaptively adjusting the convolution kernel parameters (called dynamic convolution) to deal with different video frames requires a large computational overhead to correct the high-dimensional Feature representation to adapt it to the current video frame; 3) If the current video frame contains context-independent targets (such as new objects), the appearance features of the frame are not similar to the spatio-temporal features of the historical frame, making it difficult to effectively use the historical spatio-temporal features
Therefore, in order to alleviate the problems of insufficient characterization of spatial features, difficulty in effectively utilizing historical spatio-temporal features, and high computational overhead in existing methods, there is an urgent need for a method that can adaptively learn convolution kernel parameters according to input video frames at different times and can be more efficient. A method to efficiently utilize historical spatio-temporal features to improve the clarity of predicted videos

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video prediction method based on time sequence correction convolution

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The present invention will be further described below with reference to the accompanying drawings.

[0044] like figure 1 , a video prediction method based on time series correction convolution, first obtain the original video data set, and then perform the following operations in turn: first, uniformly sample the original video to obtain a video frame sequence; build a time series context fusion module to obtain fusion appearance features and fusion spatio-temporal coding features Then construct the time series convolution correction module and output the convolution correction tensor; then input the fusion appearance feature, fusion spatiotemporal coding feature map and convolution correction tensor into the adaptive convolutional spatiotemporal encoder to obtain the predicted spatiotemporal coding feature map; Finally, the predicted spatiotemporally encoded feature maps are decoded into predicted video frames using a spatiotemporal memory decoder; this method uses th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a video prediction method based on time sequence correction convolution. The method comprises the following steps: sampling and preprocessing a given original video to obtain a video frame sequence, inputting the sequence into a time sequence context fusion module to obtain a fused appearance feature map and a fused space-time coding feature map, and inputting the sequence into a time sequence convolution correction module to obtain a convolution correction tensor; then, generating a prediction space-time coding feature map from the obtained fusion appearance feature map, the fusion space-time coding feature map and the convolution correction tensor through an adaptive convolution space-time encoder; and finally, decoding the predicted space-time coding feature map through a space-time memory decoder, and outputting a predicted video frame sequence. The method not only can correct the convolution kernel parameters according to the video frames at different moments, but also can model the internal relation of the space-time coding characteristics of the current video frame and the historical frame through the time sequence context fusion strategy, thereby generating a predicted video frame sequence with higher visual quality.

Description

technical field [0001] The invention belongs to the technical field of computer vision, in particular to the field of video prediction in video understanding, and relates to a video prediction method based on time series correction convolution. Background technique [0002] In recent years, with the rapid development of the mobile Internet and the widespread popularization of video sensing devices, massive amounts of video data are continuously generated from various terminals. How to predict the future through historical video data has become a problem that researchers are concerned about, that is, the task of Video Prediction. This task aims to generate video frames of the future time given the video frames of the past time, which can be widely used in the fields of weather forecasting, urban traffic situation prediction, robot action planning, and unmanned driving. For example, radar echo images can reflect the local rainfall. The video prediction method can generate vid...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06V20/40G06V10/80G06V10/82G06K9/62G06N3/04G06N3/08H04N19/136
CPCG06N3/08H04N19/136G06N3/044G06N3/045G06F18/253
Inventor 李平张陈翰王涛
Owner HANGZHOU DIANZI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products