Video target detection method based on multi-layer feature fusion

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of target detection and feature fusion, applied in instruments, biological neural network models, character and pattern recognition, etc., can solve the problems of high computing resources, large amount of network parameters, and high model complexity, and achieve enhanced foreground features, Strong robustness, the effect of suppressing background features

Active Publication Date: 2019-11-08

XIAMEN BICHI INFORMATION TECH CO LTD

View PDF5 Cites 18 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0009] The current video target detection method mainly uses a two-stage detection model, which has the problems of high model complexity and large amount of network parameters; at the same time, there are high requirements for computing resources when training the model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0031] With the popularity of camera equipment and the development of multimedia technology, the amount of video information in life is increasing day by day. How to understand and apply video content and find useful information from a large number of videos has become a hot research direction. Among them, video object detection as the basis of other tasks is an important research direction. Compared with image target detection, the input of video target detection is a certain video, and the video provides more inter-frame timing information and redundant information. At the same time, the target in the video is prone to occlusion, deformation, blurring and other problems, directly using the image The object detection method performs object detection on video, which is not only ineffective, but also slow. Most of the current video target detection methods use a two-stage detection model, and comprehensively utilize video information by introducing optical flow networks or trac...

Embodiment 2

[0042] The video target detection method based on multi-layer feature fusion is the same as Example 1. In step (1), the current frame, previous frame and rear frame images are input into the improved convolutional neural network to extract the feature map F t , F t- , F t+ , including the following steps:

[0043] (1a) Input the image into the improved convolutional neural network, add a shallow attention mechanism module after the convolutional layer at one-third of the depth of the network, optimize the shallow feature map extracted by the convolutional layer, and use it as Input to the next convolutional layer. The feature map extracted by the convolutional layer at one-third of the depth position contains the texture and position information of the target, and the texture and position information are selectively enhanced by using the attention mechanism module.

[0044] (1b) Add a middle-layer attention mechanism module after the convolutional layer at the two-thirds de...

Embodiment 3

[0048] The video target detection method based on multi-layer feature fusion is the same as example 1-2. The fusion network mentioned in step (1) fuses the feature map information of the previous frame and the rear frame into the feature map of the current frame. The process includes:

[0049] (a) First connect the feature maps of the current frame, the previous frame and the subsequent frame according to the first dimension, and input them to the sampling network layer to obtain the sampling map H of the feature maps of the previous frame and the subsequent frame t- , H t+ , as the input when calculating the sampling coefficient. The sampling network layer of the present invention includes 5 layers of convolutional layers, and the size of the convolution kernel of each layer of convolutional layers is 5*5, 3*3, 1*1, 3*3, 5*5, and the size of the 5 layers of convolutional layers The structure is similar to the pyramid structure, and the sampling information of different resol...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a video target detection method based on multi-layer feature fusion, which solves the problems that the existing detection method does not utilize video time sequence information and is poor in detection effect, and adopts the technical scheme of inputting a frame of video image as a current frame, selecting a front frame of image from the front 9 frames, and selecting a rear frame of image from the rear 9 frames; inputting the three frames of images into an improved convolutional neural network to obtain three feature maps respectively; inputting into a sampling network to obtain sampling images of the front and back frame feature images, and calculating sampling coefficients of the front and back frame feature images according to the sampling images; and obtainingan enhanced feature map of the current frame by using the sampling coefficient according to a fusion formula, taking the enhanced feature map as the input of the detection network, generating a candidate region set, and detecting the final target category and position through the classification and regression network. According to the method, video time sequence information is used, the model complexity is low, the parameter quantity is small, the detection effect is good, and the method can be used for traffic monitoring, security protection, target identification and the like.

Description

technical field [0001] The invention belongs to the technical field of digital image processing, and in particular relates to target detection of video images, in particular to a video target detection method based on multi-layer feature fusion, which can be used for traffic monitoring, security and target recognition. Background technique [0002] As the basis of most computer vision tasks, image target detection uses digital image processing technology to perform category recognition and position detection for specific targets in images in complex scenes. Compared with image target detection, video target detection can use the context information and spatio-temporal information provided by the video to improve the detection accuracy, especially the detection of fast moving targets. Target detection is widely used in intelligent transportation systems, intelligent monitoring systems, military target detection, and medical image auxiliary processing. In these applications, a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/00G06K9/62G06N3/04

CPCG06V20/40G06V2201/07G06N3/045G06F18/214G06F18/24G06F18/253

Inventor 韩红岳欣李阳陈军如张照宇范迎春高鑫磊唐裕亮

Owner XIAMEN BICHI INFORMATION TECH CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Video target detection method based on multi-layer feature fusion

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology