Video description method based on space-time super-resolution and electronic equipment

A super-resolution and video description technology, applied in the fields of computer vision and natural language, can solve the problems of reduced model running speed and large computing costs, and achieve the effects of high-efficiency operation, low computing overhead, and low computing cost

Pending Publication Date: 2022-05-27
TONGJI UNIV
View PDF1 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this approach ignores the information loss caused by frame sampling and image compression. At the same time, if frame sampling is not performed and the high resolution of the original image is maintained for feature extraction, a large amount of computing costs will be introduced, and the running speed of the model will drop significantly.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video description method based on space-time super-resolution and electronic equipment
  • Video description method based on space-time super-resolution and electronic equipment
  • Video description method based on space-time super-resolution and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. This embodiment is implemented on the premise of the technical solution of the present invention, and provides a detailed implementation manner and a specific operation process, but the protection scope of the present invention is not limited to the following embodiments.

[0033] This embodiment proposes a video description method based on spatiotemporal super-resolution, such as figure 1 As shown, the method is implemented based on a video description model, and includes the following steps: acquiring an input video, sampling the input video to obtain a video frame sequence including several compressed size frames; Perform multi-modal feature extraction and feature encoding, dynamically fuse the encoded multi-modal features, and gradually decode to generate video description sentences. During the training of the video description model, the or...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a video description method based on space-time super-resolution and electronic equipment, and the method is realized based on a video description model, and comprises the following steps: obtaining an input video, and sampling the input video to obtain a video frame sequence containing a plurality of compression size frames; performing multi-modal feature extraction and feature coding on the video frame sequence through the video description model, dynamically fusing the coded multi-modal features, and gradually decoding to generate a video description statement; wherein when the video description model is trained, a frame with an original resolution and a middle missing frame between adjacent sampling frames are reconstructed from two dimensions of space and time, a loss function is constructed by a reconstruction error and a decoding prediction error, and model training is realized. Compared with the prior art, the method has the advantages of rich and accurate description, strong generalization ability, low calculation overhead and the like.

Description

technical field [0001] The invention relates to the fields of computer vision and natural language, and in particular to a video description method and electronic device based on spatiotemporal super-resolution. Background technique [0002] In recent years, with the popularization of 5G network, video as a medium of information interaction has been widely spread in people's daily life, and it has also brought various new challenges, such as automatic classification and retrieval of large-scale videos. , motion and event detection and other video understanding tasks. As one of the key tasks of video understanding tasks, video description aims to automatically generate a natural language description for a given video clip, which has a very broad application prospect in the fields of human-computer interaction, infant teaching and visual impairment assistance. Due to the richness and complex timing of video scenes, it is difficult to model video information. Compared with sta...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06T3/40G06T9/00G06F40/30G06N3/04G06N3/08
CPCG06T3/4053G06N3/08G06T9/002G06F40/30G06T2207/10016G06N3/045
Inventor 王瀚漓曹铨辉
Owner TONGJI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products