An object detection method based on cross-modal and multi-scale feature fusion

A multi-scale feature and object detection technology, applied in the field of image recognition, can solve the problems of speed limitation, lack of inclusion, inability to directly obtain general feature expression of depth information, etc., to achieve real-time detection speed and improve detection performance.

Active Publication Date: 2021-10-15
ZHEJIANG UNIV OF TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the industry does not contain a sufficient number of categories, and a large-scale depth image dataset that has been labeled, so that it is impossible to directly obtain the general feature expression of depth information
[0004] On the other hand, the existing fusion feature detection methods have speed limitations, and often require high-performance GPUs to obtain results after long-term calculations, which cannot meet the rigid requirements for high real-time performance in industrial systems.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An object detection method based on cross-modal and multi-scale feature fusion
  • An object detection method based on cross-modal and multi-scale feature fusion
  • An object detection method based on cross-modal and multi-scale feature fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The technical solution of the present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments, and the following embodiments do not constitute a limitation of the present invention.

[0029] The general idea of ​​the present invention is that, without relying on a large number of labeled depth image data sets, it is possible to fuse depth image and RGB image features across modalities, real-time, efficient, and accurately complete object recognition, positioning and detection. Train to obtain a fusion model that can accept cross-modal RGB and depth image input, and obtain the location and category information of multiple objects in real time. This solution needs to complete cross-modal feature transfer: initialize the depth map information network from the RGB model parameters and train the depth map model; and then initialize the feature extraction of the fusion network proposed by the present invention based on ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an object detection method based on fusion of cross-modal and multi-scale features. The depth map detection network model is initialized by the network parameters of the RGB detection network model; and then based on the obtained RGB detection network model and the depth map detection network model, respectively Initialize the feature extraction weights of the fusion network model, and finally train a fusion network model that fuses multi-scale and cross-modal features. The present invention does not rely on a large number of marked depth image data sets, and can fuse depth image and RGB image features across modalities, real-time, efficient, and accurately complete object recognition, positioning and detection. The fusion network model designed by the present invention only needs a consumer-grade graphics card and a CPU as hardware to achieve real-time detection speed.

Description

technical field [0001] The present invention relates to the field of image recognition technology, in particular to an object detection method based on cross-modal multi-scale feature fusion, which simultaneously completes detection and positioning of objects in color depth images (RGB-D images, including color information and depth information) and precise identification tasks. Background technique [0002] In industry, faster, more accurate and more generalizable object detection methods are always in urgent need. RGB images will be severely affected in some special environments, such as motion or glare, which will degrade image data. Using RGB image features to complete detection often cannot achieve the expected accuracy. So it is necessary to utilize information from different sensors such as depth information to improve the performance of object detection. [0003] Since convolutional neural networks are used for object recognition and detection tasks, most high-prec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62G06K9/46
CPCG06V10/56G06F18/253G06F18/214
Inventor 刘盛尹科杰刘儒瑜陈一彬沈康
Owner ZHEJIANG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products