RGB-D image semantic segmentation method based on multi-modal feature fusion

A RGB-D, feature fusion technology, applied in the field of computer vision, to achieve the effect of good complexity and diversity

Pending Publication Date: 2022-05-27
ZHONGBEI UNIV
View PDF0 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although these methods have achieved good segmentation results, how to make full use of the complementarity and difference between the two modalities, and how to enhance the information interaction, transfer and feature extraction capabilities in the encoder are still issues that need further research.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • RGB-D image semantic segmentation method based on multi-modal feature fusion
  • RGB-D image semantic segmentation method based on multi-modal feature fusion
  • RGB-D image semantic segmentation method based on multi-modal feature fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] A RGB-D image semantic segmentation method based on multimodal feature fusion, comprising the following steps:

[0044] Step 1, data preprocessing, convert a single-channel depth image into a three-channel HHA image, and the three channels represent the height of the horizontal parallax above the ground, the local surface normal of the pixel, and the inferred angle of the direction of gravity;

[0045] Step 2, take the RGB and HHA images as input data, and input the attention-guided multi-modal cross-fusion segmentation network model (such as figure 1 shown), the model follows an encoder-decoder structure, where the encoder extracts semantic features from the input, and the decoder recovers the input resolution using upsampling techniques, assigning a semantic class to each input pixel.

[0046] The encoder uses asymmetric dual-stream branches for RGB and HHA images, including RGB encoder and depth encoder, which use ResNet-101 network and ResNet-50 network as backbone ...

Embodiment 2

[0069] The segmentation network model proposed by the present invention is applied on the public RGB-D indoor dataset NYUD V2, and experiments show the effectiveness of the network model.

[0070] The NYUD V2 dataset contains 1449 labeled RGB-D image pairs from 464 different indoor scenes in 3 cities, of which 796 image pairs are used for training and 654 image pairs are used for testing. The way objects are classified into 40 categories. The effectiveness of the proposed segmentation network ACFNet is shown by providing segmentation visualization comparison results. The network structure after removing the feature fusion module ACFM and the global-local feature extraction module (GL) is recorded as the baseline network (Baseline). Figure 5 Partial visualization results of ACFNet on the public dataset NYUD V2 are shown in , where the first, second, and third columns represent RGB images, HHA images, and semantic labels in turn, and the fourth and fifth columns represent the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the field of computer vision, and particularly relates to an RGB-D image semantic segmentation method based on multi-modal feature fusion. Due to the internal difference of RGB and depth features, how to more effectively fuse the two features is still a problem to be solved. In order to solve the problem, an attention guidance multi-mode cross fusion segmentation network (ACFNet) is provided, an encoder-decoder structure is adopted, a depth map is encoded into an HHA image, an asymmetric double-flow feature extraction network is designed, ResNet-101 and ResNet-50 are used as main networks by RGB and a depth encoder respectively, and a global-local feature extraction module (GL) is added in the RGB encoder. In order to effectively fuse RGB and depth features, an attention guidance multi-modal cross fusion module (ACFM) is proposed to better utilize fused enhanced feature representation in multiple stages.

Description

technical field [0001] The invention belongs to the field of computer vision, in particular to a RGB-D image semantic segmentation method based on multimodal feature fusion. Background technique [0002] Image semantic segmentation is a basic task of key research in the field of computer vision. The purpose is to assign a class label to each pixel of an image to achieve pixel-level scene understanding. It is widely used in medical imaging, automatic driving, face recognition, target detection, etc. . In general, due to the complex indoor scene environment, cluttered and trivial objects, and severe occlusion, the performance requirements of segmentation algorithms are higher and more challenging. According to whether the deep neural network is applied, image semantic segmentation methods can be divided into traditional methods and deep learning-based methods. However, traditional image segmentation methods are not suitable for segmentation tasks that require a large amount ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06T7/00G06T7/11G06T5/00G06T3/40G06N3/08G06N3/04G06K9/62G06V10/44G06V10/764G06V10/82
CPCG06T7/0002G06T5/00G06T7/11G06T3/4007G06N3/08G06T2207/20221G06T2207/10024G06T2207/20081G06N3/048G06N3/045G06F18/2415G06F18/25
Inventor 杨晓文靳瑜昕韩慧妍张元庞敏韩燮
Owner ZHONGBEI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products