RGB-D image semantic segmentation method based on multi-modal feature fusion

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A RGB-D, feature fusion technology, applied in the field of computer vision, to achieve the effect of good complexity and diversity

Pending Publication Date: 2022-05-27

ZHONGBEI UNIV

View PDF0 Cites 9 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Although these methods have achieved good segmentation results, how to make full use of the complementarity and difference between the two modalities, and how to enhance the information interaction, transfer and feature extraction capabilities in the encoder are still issues that need further research.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0043] A RGB-D image semantic segmentation method based on multimodal feature fusion, comprising the following steps:

[0044] Step 1, data preprocessing, convert a single-channel depth image into a three-channel HHA image, and the three channels represent the height of the horizontal parallax above the ground, the local surface normal of the pixel, and the inferred angle of the direction of gravity;

[0045] Step 2, take the RGB and HHA images as input data, and input the attention-guided multi-modal cross-fusion segmentation network model (such as figure 1 shown), the model follows an encoder-decoder structure, where the encoder extracts semantic features from the input, and the decoder recovers the input resolution using upsampling techniques, assigning a semantic class to each input pixel.

[0046] The encoder uses asymmetric dual-stream branches for RGB and HHA images, including RGB encoder and depth encoder, which use ResNet-101 network and ResNet-50 network as backbone ...

Embodiment 2

[0069] The segmentation network model proposed by the present invention is applied on the public RGB-D indoor dataset NYUD V2, and experiments show the effectiveness of the network model.

[0070] The NYUD V2 dataset contains 1449 labeled RGB-D image pairs from 464 different indoor scenes in 3 cities, of which 796 image pairs are used for training and 654 image pairs are used for testing. The way objects are classified into 40 categories. The effectiveness of the proposed segmentation network ACFNet is shown by providing segmentation visualization comparison results. The network structure after removing the feature fusion module ACFM and the global-local feature extraction module (GL) is recorded as the baseline network (Baseline). Figure 5 Partial visualization results of ACFNet on the public dataset NYUD V2 are shown in , where the first, second, and third columns represent RGB images, HHA images, and semantic labels in turn, and the fourth and fifth columns represent the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the field of computer vision, and particularly relates to an RGB-D image semantic segmentation method based on multi-modal feature fusion. Due to the internal difference of RGB and depth features, how to more effectively fuse the two features is still a problem to be solved. In order to solve the problem, an attention guidance multi-mode cross fusion segmentation network (ACFNet) is provided, an encoder-decoder structure is adopted, a depth map is encoded into an HHA image, an asymmetric double-flow feature extraction network is designed, ResNet-101 and ResNet-50 are used as main networks by RGB and a depth encoder respectively, and a global-local feature extraction module (GL) is added in the RGB encoder. In order to effectively fuse RGB and depth features, an attention guidance multi-modal cross fusion module (ACFM) is proposed to better utilize fused enhanced feature representation in multiple stages.

Description

technical field [0001] The invention belongs to the field of computer vision, in particular to a RGB-D image semantic segmentation method based on multimodal feature fusion. Background technique [0002] Image semantic segmentation is a basic task of key research in the field of computer vision. The purpose is to assign a class label to each pixel of an image to achieve pixel-level scene understanding. It is widely used in medical imaging, automatic driving, face recognition, target detection, etc. . In general, due to the complex indoor scene environment, cluttered and trivial objects, and severe occlusion, the performance requirements of segmentation algorithms are higher and more challenging. According to whether the deep neural network is applied, image semantic segmentation methods can be divided into traditional methods and deep learning-based methods. However, traditional image segmentation methods are not suitable for segmentation tasks that require a large amount ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06T7/00G06T7/11G06T5/00G06T3/40G06N3/08G06N3/04G06K9/62G06V10/44G06V10/764G06V10/82

CPCG06T7/0002G06T5/00G06T7/11G06T3/4007G06N3/08G06T2207/20221G06T2207/10024G06T2207/20081G06N3/048G06N3/045G06F18/2415G06F18/25

Inventor 杨晓文靳瑜昕韩慧妍张元庞敏韩燮

Owner ZHONGBEI UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

RGB-D image semantic segmentation method based on multi-modal feature fusion

What is Al technical title? Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document. A RGB-D, feature fusion technology, applied in the field of computer vision, to achieve the effect of good complexity and diversity

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A RGB-D, feature fusion technology, applied in the field of computer vision, to achieve the effect of good complexity and diversity

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology