Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Video encoding and decoding method and system

A video codec and video frame technology, which is applied in the field of video codec methods and systems, can solve problems such as slow model convergence, large computing power, and loss, and achieve the effect of smooth logic and coherent semantics

Pending Publication Date: 2021-07-09
CENT SOUTH UNIV +1
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional video understanding methods are difficult to fully consider the temporal relevance of human behavior in each frame of the video and the causality of events, and the extracted global temporal features contain a large number of redundant frame features, which will not only consume huge computing power, It also makes the model converge too slowly during the training phase, and cannot understand the development of things well from the human perspective of behavior as clues, so that the machine can understand the video more intelligently

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video encoding and decoding method and system
  • Video encoding and decoding method and system
  • Video encoding and decoding method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present invention proposes a mixed 2D / 3D convolutional network (Mixed 2D / 3D Convolutional Networks, MCN). By constructing a dual-branch network structure, the first branch uses a 2D convolutional network to generate the features of each frame, and the second branch uses a 3D convolutional network to extract global feature information in all frames of the video.

[0031] Construct a deep fusion of video static and dynamic information: first sample a fixed number of frames throughout the video to cover the long-range temporal structure for understanding the video. The sampled frames span the entire video regardless of the length of the video. Therefore, we use a constant number of frame sequences Frame-by-frame input of 2D convolutional network branches to generate single-frame visual features , which represents the sampling frame number. The 2D convolutional network here uses the Inception v3 network pre-trained by ImageNet as the backbone network to extract all...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a video encoding and decoding method and system. Firstly, 2D features and processed 3D features are superposed according to a time sequence, and deep fusion of static and dynamic information is realized. Then, an attention mechanism is introduced to encode the fusion features at each moment t, a normalized weight is obtained through a softmax function, different weights are allocated to the fusion features, new fusion features are obtained, people-oriented features are learned, and thus final language description related to human behaviors is promoted. Finally, the new fusion features are inputted into a long short-term memory (LSTM) network, and decoding is carreid out as time goes on to obtain video description sentences. The video description obtained by the method is more logically fluent, and semantic coherence and clearness are realized.

Description

technical field [0001] The invention relates to the field of machine learning, in particular to a video encoding and decoding method and system. Background technique [0002] At present, although deep learning algorithms in artificial intelligence can perform video description functions, video information can be easily converted into language content. For example, before users watch a large amount of video information, it will save a lot of time and cost by forming an accurate text summary of the video information so that users can quickly understand the development of the event and its impact. In addition, extracting highlights from a two-hour movie and converting it into a text summary summarizing the movie will bring users a more perfect recommendation experience. However, this indiscriminate function of performing descriptions on video information cannot fully reflect the imagination, curiosity and intelligence of human beings to understand things, and these natures hav...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): H04N19/172H04N19/42H04N19/44G06N3/04G06N3/08
CPCH04N19/172H04N19/42H04N19/44G06N3/08G06N3/045G06N3/044
Inventor 郭克华申长春奎晓燕刘斌王凌风刘超
Owner CENT SOUTH UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products