Scene semantic segmentation method based on full convolution and long and short term memory units

A long-term and short-term memory and semantic segmentation technology, applied in the field of image semantic segmentation and deep learning, can solve the problems of over-segmentation of objects and low accuracy of scene image segmentation, and achieve the effect of solving low accuracy and improving accuracy.

Inactive Publication Date: 2017-12-15
UNIV OF ELECTRONIC SCI & TECH OF CHINA
4 Cites 49 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to provide a method for scene semant...
View more

Method used

[0031] As shown in Figure 3, wherein CNN and Feature Map respectively represent the front-end network and the feature map obtained by the front-end network. The pyramid pooling module consists of 4 pooling branches with different core sizes. The 4 branches are pooled at intervals of 2, 4, 6, and 8 to obtain feature maps of different resolutions for targets of different scales, and A 3x3 convolution is added to th...
View more

Abstract

The invention discloses a scene semantic segmentation method based on full convolution and a long-short term memory unit, relating to the technical field of image processing. The method includes the following steps of S1 constructing a deep neural network based on full convolution, a pyramid pooling module and long-short term memory unit module; S2 comparing a predictive image with a marked image, training by taking the Softmax loss as the objective function and the stochastic gradient descent as the optimization method, and updating the weight of the deep neural network obtained in step 1; S3 carrying out the S2 for many times, and completing the training until the loss is decreased to the limitation; and S4 inputting a new scene image to the trained deep neural network, and performing the bilinear interpolation to the original image resolution to obtain the semantic segmentation result of the scene. The method solves the problems that the current scene image segmentation is low in accuracy, and objects in the image are subjected to over-segmentation and under-segmentation.

Application Domain

Character and pattern recognitionNeural architectures

Technology Topic

Short durationImage segmentation +10

Image

  • Scene semantic segmentation method based on full convolution and long and short term memory units
  • Scene semantic segmentation method based on full convolution and long and short term memory units
  • Scene semantic segmentation method based on full convolution and long and short term memory units

Examples

  • Experimental program(1)

Example Embodiment

[0027] In order to make the objectives, technical solutions, and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with specific implementation and drawings.
[0028] Such as figure 1 As shown, the specific steps of the scene semantic segmentation method based on full convolution and long short-term memory unit in this embodiment are as follows:
[0029] S1: Build a deep neural network based on full convolution, multi-scale fusion and long and short-term memory units.
[0030] Such as figure 2 As shown, the basic structure of the front-end convolutional neural network module is modified from VGG-16. The main components of VGG-16 are 5 group convolutional layers, 3 fully connected layers, and 1 Softmax layer. In this embodiment Use the front-end network to use the convolutional layer of the first 5 groups of VGG-16, and remove the pooling layer of the 4th and 5th group and the last 3 fully connected layers. Among them, the first group (that is, convolution module one) contains 2 convolutional layers and ReLU layers and 1 maximum pooling layer. The size of the convolution kernel is 3x3, the number of filters is 64, and the pooling layer interval is 2 ; The second group (ie convolution module two) contains 2 convolutional layers and ReLU layers and 1 maximum pooling layer, the size of the convolution kernel is 3x3, the number of filters is 128, and the pooling layer interval is 2; The third group (that is, the third convolution module) contains 3 convolutional layers, ReLU layers, and 1 maximum pooling layer. The size of the convolution kernel is 3x3, the number of filters is 256, and the pooling layer interval is 2; the fourth A group (ie convolution module four) contains 3 convolutional layers and ReLU layers, the size of the convolution kernel is 3x3, and the number of filters is 512; the fifth group (ie convolution module five) contains 3 convolutional layers And the ReLU layer, the size of the convolution kernel is 3x3, and the number of filters is 512. In the above description, the ReLU layer is a modified linear unit layer.
[0031] Such as image 3 As shown, CNN and Feature Map respectively represent the front-end network and the feature map obtained by the front-end network. The pyramid pooling module is composed of 4 pooling branches with different core sizes. The 4 branches are pooled with intervals of 2, 4, 6, and 8 respectively to obtain feature maps of different resolutions for targets of different scales, and Add a 3x3 convolution to the pooled feature map to improve the learning ability of the network. Due to the different resolution of the feature map caused by different pooling methods, this method uses transposed convolution to improve the resolution of the pooled feature map, and the pooling interval is 2, 4, 6, 8 respectively. The graph performs transposed convolution operations with intervals of 2, 4, 6, and 8.
[0032] Such as image 3 As shown, the long and short-term memory unit module includes scanning sequences in two directions. Scanning the feature map obtained by the front-end network in the up and down directions obtains the up and down sequences, and scanning in the left and right directions to obtain the left and right sequences. In this method, the sequence obtained by each pass is input into the long and short-term memory unit, the number of hidden layer units and the sequence length are the same, and the output sequence of the same length is finally obtained and restored to a two-dimensional feature map.
[0033] In the method of the present invention, the feature maps obtained by the three modules are finally connected in series, and an extra layer of convolutional layer is added to obtain the classification feature, which is input to the final Softmax layer.
[0034] S2: Input the scene image and perform a forward propagation in the deep neural network to obtain the predicted image A; the input labeled image is scaled down to obtain the labeled image A with the same resolution as the predicted image; Softmax loss is used as the objective function, and the random gradient descent is The optimization method is to update the deep neural network obtained in step S1. Annotated images and scene images belong to the problem of constructing training data sets. In the problem of image classification, in the training set of the present invention, there are a series of pictures and corresponding category labels, which are respectively used as the original scene image and the annotated image, which are well known in the art Common sense, so it will not be discussed too much in this invention.
[0035] In S2, the specific process of the weight update is:
[0036] S21: Network initialization: Use the parameters of the VGG-16 network pre-trained on the ImageNet data set as the initial value of the front-end network. All convolutional layers in the pyramid pooling module are initialized by the standard Gaussian distribution, and the long and short-term memory unit is used Standard uniform distribution initialization parameters.
[0037] S22: Training: compare the predicted image with the labeled image, take the sum of the softmax loss of each pixel as the objective function, use the stochastic gradient descent method as the optimization method, set the initial learning rate to 0.001, and update the weight of the deep neural network.
[0038] S3: Loop to step S22, and appropriately reduce the learning rate according to the drop effect of loss, until the loss drops to the point where the training cannot be reduced.
[0039] S4: Input a new scene image to the trained deep neural network, and perform bilinear interpolation to the resolution of the original image to obtain the semantic segmentation result of the scene. This method reduces the scale of image annotation in the way of image pyramid in the training stage, so that the resolution of the annotated image is the same as the feature map obtained by the final deep neural network. Based on the image annotation, calculate the Softmax loss of each pixel, and optimize the sum of the losses of all pixels as the objective function.
[0040] The technical solution of the present invention is not limited to the limitation of the above-mentioned specific embodiments, and all technical modifications made according to the technical solution of the present invention fall within the protection scope of the present invention.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

Imaging apparatus and flicker detection method

ActiveUS20100013953A1reduce dependencyimprove accuracy
Owner:RENESAS ELECTRONICS CORP

Color interpolation method

InactiveUS20050117040A1improve accuracy
Owner:MEGACHIPS

Emotion classifying method fusing intrinsic feature and shallow feature

ActiveCN105824922AImprove classification performanceimprove accuracy
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Classification and recommendation of technical efficacy words

  • improve accuracy
  • low resolution accuracy

Golf club head with adjustable vibration-absorbing capacity

InactiveUS20050277485A1improve grip comfortimprove accuracy
Owner:FUSHENG IND CO LTD

Stent delivery system with securement and deployment accuracy

ActiveUS7473271B2improve accuracyreduces occurrence and/or severity
Owner:BOSTON SCI SCIMED INC

Method for improving an HS-DSCH transport format allocation

InactiveUS20060089104A1improve accuracyincrease benefit
Owner:NOKIA SOLUTIONS & NETWORKS OY

Catheter systems

ActiveUS20120059255A1increase selectivityimprove accuracy
Owner:ST JUDE MEDICAL ATRIAL FIBRILLATION DIV

Gaming Machine And Gaming System Using Chips

ActiveUS20090075725A1improve accuracy
Owner:UNIVERSAL ENTERTAINMENT CORP

Method and device for detecting network attack

InactiveCN107046518AImprove accuracy and effectivenesslow resolution accuracy
Owner:ALIBABA GRP HLDG LTD

Identity identification system, method and device

PendingCN107609370Alow resolution accuracyimprove accuracy
Owner:SHENZHEN YIHUA COMP +2

A method and a device for associating and matching shareholder names

ActiveCN109189809Alow resolution accuracyimprove accuracy
Owner:BEIJING JINTI TECH CO LTD

Equipment safety state comprehensive analysis method and computer readable storage medium

ActiveCN111651760AAccurately detect abnormal behaviorlow resolution accuracy
Owner:北京志翔科技股份有限公司

An intelligent stock prediction method based on a news text

InactiveCN108985941Alow resolution accuracyimprove accuracy
Owner:HOHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products