Video Semantic Segmentation Using Multi-Frequency Dynamic Atrous Convolution

A semantic segmentation and video technology, applied in the field of computer vision, can solve the problems of poor fault tolerance, large memory usage, bloated models, etc., to reduce computational complexity, improve processing efficiency, and improve segmentation speed.

Active Publication Date: 2022-06-24
HANGZHOU DIANZI UNIV
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] There are still many shortcomings in the existing semantic segmentation methods: 1) The spatial pyramid pooling technology considers both local and global spatio-temporal structure information to make the segmentation results more reliable, but using the maximum average pooling operation for high-resolution feature maps will cause Poor fault tolerance, poor generalization ability, high computational complexity, etc.; 2) Although the long-term semantic dependency between feature maps is strengthened by using the attention mechanism, the model is bloated and takes up a lot of memory, which is not conducive to the real-time deployment of the model; 3) Transformer encoder, which is widely used in natural language processing as a feature extractor, takes the one-dimensional embedded feature representation sequence of two-dimensional images as input, and uses the self-attention mechanism and multi-layer perceptron stacking to capture the long-term dependencies, but the lack of weight sharing in the model leads to a huge amount of parameters, and the high computational complexity of self-attention makes it difficult to guarantee real-time performance
At the same time, the accuracy and real-time performance of most segmentation methods cannot be effectively balanced, resulting in the inability to effectively meet the needs of actual segmentation tasks
Therefore, in view of the high computational complexity and poor generalization ability of the segmentation model, there is an urgent need for a method that can not only guarantee the real-time performance of the segmentation model but also achieve high semantic segmentation accuracy.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video Semantic Segmentation Using Multi-Frequency Dynamic Atrous Convolution

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The present invention will be further described below with reference to the accompanying drawings.

[0042] like figure 1 , using the video semantic segmentation method of multi-frequency dynamic hole convolution, firstly sample the given video and input it into the encoder composed of convolutional neural network to obtain the shallow visual feature map of the video frame; The feature frequency separation module composed of , Gaussian filter, and inverse Fourier transform separates the multi-frequency feature map from the shallow visual feature map; and then uses the dynamic feature map composed of a weight calculator and multiple parallel hole convolution kernels. The atrous convolution processes the multi-frequency feature maps at different depths to obtain the multi-frequency high-level semantic feature maps; finally, the multi-frequency high-level semantic feature maps are spliced ​​and input to the decoder for up-sampling to obtain the semantic segmentation result...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a video semantic segmentation method using multi-frequency dynamic hole convolution. The method of the present invention first enhances the sampling frame image of the video data, and extracts the shallow visual feature map through the encoder; then constructs the feature frequency separation module to obtain the multi-frequency feature map corresponding to the video frame, and inputs it into the dynamic hole convolution Module to obtain the corresponding multi-frequency high-level semantic feature map, and then obtain the segmentation mask of the video frame through the upsampling convolutional encoder; use the stochastic gradient descent algorithm to iteratively train the model until convergence, and input the new video into the model to obtain the semantic segmentation result. The method of the present invention separates the feature maps of video frames according to different frequencies to describe changes in different visual regions, can reduce redundant information in low-frequency visual space, reduce computational complexity, and adaptively expand the receptive field of multi-frequency feature maps through dynamic atrous convolution , to improve the ability to discriminate different semantic categories of videos, so as to obtain better video semantic segmentation results.

Description

technical field [0001] The invention belongs to the technical field of computer vision, in particular to the field of semantic segmentation in video processing, and relates to a video semantic segmentation method using multi-frequency dynamic hole convolution. Background technique [0002] With the increasing number of various types of vehicles, driving safety has become a very concerned aspect of the government and the public. Generally speaking, continuous driving for a long time will make people tired and distracted. At the same time, drivers of large vehicles are prone to visual blind spots, which brings great hidden dangers to driving safety. In recent years, autonomous driving technology has aroused strong interest in autonomous driving technology in the industry, and more and more research efforts have been invested in this field. Efficient visual understanding can guarantee the safety of autonomous driving, and video semantic segmentation is one of its core technolo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06T7/11G06K9/00G06K9/62G06N3/04G06N3/08
CPCG06T7/11G06N3/08G06T2207/10016G06N3/045G06F2218/12G06F18/214
Inventor 李平陈俊杰王然徐向华
Owner HANGZHOU DIANZI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products