Video target detection method based on multi-layer feature fusion

A technology of target detection and feature fusion, applied in instruments, biological neural network models, character and pattern recognition, etc., can solve the problems of high computing resources, large amount of network parameters, and high model complexity, and achieve enhanced foreground features, Strong robustness, the effect of suppressing background features

Active Publication Date: 2019-11-08
XIAMEN BICHI INFORMATION TECH CO LTD
View PDF5 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The current video target detection method mainly uses a two-stage detection model, which has the problems of high model compl...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video target detection method based on multi-layer feature fusion
  • Video target detection method based on multi-layer feature fusion
  • Video target detection method based on multi-layer feature fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] With the popularity of camera equipment and the development of multimedia technology, the amount of video information in life is increasing day by day. How to understand and apply video content and find useful information from a large number of videos has become a hot research direction. Among them, video object detection as the basis of other tasks is an important research direction. Compared with image target detection, the input of video target detection is a certain video, and the video provides more inter-frame timing information and redundant information. At the same time, the target in the video is prone to occlusion, deformation, blurring and other problems, directly using the image The object detection method performs object detection on video, which is not only ineffective, but also slow. Most of the current video target detection methods use a two-stage detection model, and comprehensively utilize video information by introducing optical flow networks or trac...

Embodiment 2

[0042] The video target detection method based on multi-layer feature fusion is the same as Example 1. In step (1), the current frame, previous frame and rear frame images are input into the improved convolutional neural network to extract the feature map F t , F t- , F t+ , including the following steps:

[0043] (1a) Input the image into the improved convolutional neural network, add a shallow attention mechanism module after the convolutional layer at one-third of the depth of the network, optimize the shallow feature map extracted by the convolutional layer, and use it as Input to the next convolutional layer. The feature map extracted by the convolutional layer at one-third of the depth position contains the texture and position information of the target, and the texture and position information are selectively enhanced by using the attention mechanism module.

[0044] (1b) Add a middle-layer attention mechanism module after the convolutional layer at the two-thirds de...

Embodiment 3

[0048] The video target detection method based on multi-layer feature fusion is the same as example 1-2. The fusion network mentioned in step (1) fuses the feature map information of the previous frame and the rear frame into the feature map of the current frame. The process includes:

[0049] (a) First connect the feature maps of the current frame, the previous frame and the subsequent frame according to the first dimension, and input them to the sampling network layer to obtain the sampling map H of the feature maps of the previous frame and the subsequent frame t- , H t+ , as the input when calculating the sampling coefficient. The sampling network layer of the present invention includes 5 layers of convolutional layers, and the size of the convolution kernel of each layer of convolutional layers is 5*5, 3*3, 1*1, 3*3, 5*5, and the size of the 5 layers of convolutional layers The structure is similar to the pyramid structure, and the sampling information of different resol...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a video target detection method based on multi-layer feature fusion, which solves the problems that the existing detection method does not utilize video time sequence information and is poor in detection effect, and adopts the technical scheme of inputting a frame of video image as a current frame, selecting a front frame of image from the front 9 frames, and selecting a rear frame of image from the rear 9 frames; inputting the three frames of images into an improved convolutional neural network to obtain three feature maps respectively; inputting into a sampling network to obtain sampling images of the front and back frame feature images, and calculating sampling coefficients of the front and back frame feature images according to the sampling images; and obtainingan enhanced feature map of the current frame by using the sampling coefficient according to a fusion formula, taking the enhanced feature map as the input of the detection network, generating a candidate region set, and detecting the final target category and position through the classification and regression network. According to the method, video time sequence information is used, the model complexity is low, the parameter quantity is small, the detection effect is good, and the method can be used for traffic monitoring, security protection, target identification and the like.

Description

technical field [0001] The invention belongs to the technical field of digital image processing, and in particular relates to target detection of video images, in particular to a video target detection method based on multi-layer feature fusion, which can be used for traffic monitoring, security and target recognition. Background technique [0002] As the basis of most computer vision tasks, image target detection uses digital image processing technology to perform category recognition and position detection for specific targets in images in complex scenes. Compared with image target detection, video target detection can use the context information and spatio-temporal information provided by the video to improve the detection accuracy, especially the detection of fast moving targets. Target detection is widely used in intelligent transportation systems, intelligent monitoring systems, military target detection, and medical image auxiliary processing. In these applications, a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/00G06K9/62G06N3/04
CPCG06V20/40G06V2201/07G06N3/045G06F18/214G06F18/24G06F18/253
Inventor 韩红岳欣李阳陈军如张照宇范迎春高鑫磊唐裕亮
Owner XIAMEN BICHI INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products