Video semantic segmentation network training method, system and device and storage medium

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A semantic segmentation and training method technology, applied in the field of video analysis, can solve the problems of not fully mining the characteristics of video data, the impact of model generalization performance, and the inability to effectively use video data, so as to improve generalization performance and alleviate frame-to-frame oversimulation Combined phenomenon, the effect of high segmentation accuracy

Pending Publication Date: 2022-05-13

UNIV OF SCI & TECH OF CHINA

View PDF0 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] However, the above-mentioned mainstream semi-supervised learning methods are all designed based on image data, and do not fully exploit the characteristics of video data, so they cannot effectively utilize the existing large amount of unlabeled video data.

In addition, if figure 1 As shown, in the previous experiments, it was found that the video semantic segmentation method has an over-fitting phenomenon between frames, that is, there is an obvious segmentation of the labeled frame images (Labeled Frames) and unlabeled frame images (UnLabeled Frames) of the training video data (TrainingVideo) difference in accuracy, which means that the generalization performance of the model suffers

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0044] An embodiment of the present invention provides a training method for a video semantic segmentation network, figure 2 The main flow of the method is shown, image 3 The general framework of the method is shown; the method mainly includes:

[0045] 1. Obtain training video data including several video clips.

[0046] There are two types of video clips, one is a video clip containing labeled frame images and unlabeled frame images, and the other is a video clip containing only unlabeled frame images, called unlabeled video clips.

[0047] Among them, the setting method of the marked frame in the video segment containing marked frame images and unmarked frame images can be realized by referring to conventional technology. Taking the typical public dataset Cityscapes as an example, the 20th frame is marked for every 30 frames. image 3 Unlabeled frame image in frame image with annotation subscript t 2 with t 1 Represents different moments, which can be adjacent mom...

Embodiment 2

[0106] The present invention also provides a training system for a video semantic segmentation network, which is mainly implemented based on the method provided in the first embodiment, as Figure 4 As shown, the system mainly includes:

[0107] The data acquisition unit is used to obtain training video data that includes several video clips; the video clips include labeled frame images and unlabeled frame images, or only include unlabeled frame images; when only unlabeled frame images are included in the video clips, from A single image is sampled from the video clip and a pseudo-label is obtained through feature extraction and classification, and the corresponding image is used as an annotated frame image;

[0108] The category prototype generation unit is used to input the unlabeled frame image to the video semantic segmentation network to be trained for feature extraction and classification, the classification result is used as a pseudo-label, and the category prototype of...

Embodiment 3

[0116] The present invention also provides a processing device, such as Figure 5 As shown, it mainly includes: one or more processors; memory for storing one or more programs; wherein, when the one or more programs are executed by the one or more processors, the One or more processors implement the methods provided in the foregoing embodiments.

[0117] Further, the processing device further includes at least one input device and at least one output device; in the processing device, the processor, memory, input device, and output device are connected through a bus.

[0118] In the embodiment of the present invention, the specific types of the memory, input device and output device are not limited; for example:

[0119] The input device can be a touch screen, an image acquisition device, a physical button or a mouse, etc.;

[0120] The output device can be a display terminal;

[0121] The memory may be random access memory (Random Access Memory, RAM), or non-volatile memory...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a video semantic segmentation network training method, system and device, and a storage medium, designs an inter-frame feature reconstruction scheme by using the inherent relevance of video data, reconstructs labeled frame features by means of a category prototype extracted from unlabeled frame features, and improves the video semantic segmentation efficiency. Therefore, the annotation information is utilized to supervise and learn the reconstruction features, the purpose of providing accurate supervision signals for the non-annotated frames by utilizing the single-frame annotation information of the video data is achieved, different frames of the training video data are supervised by the same supervision signals, feature distribution of the different frames is drawn close, and the robustness of the video data is improved. The inter-frame over-fitting phenomenon can be effectively relieved, so that the generalization performance of the model is improved; and tests on a test set show that the video semantic segmentation network trained by the method provided by the invention obtains higher segmentation precision.

Description

technical field [0001] The present invention relates to the technical field of video analysis, in particular to a training method, system, device and storage medium for a video semantic segmentation network. Background technique [0002] With the development of video surveillance, transmission and storage technologies, a large amount of video data can be obtained conveniently and economically in practical application scenarios. How to finely identify scenes and target objects in video data has become the core requirement of many applications, and video semantic segmentation technology has therefore received more and more attention. The purpose of video semantic segmentation technology is to classify each pixel in the video clip, so as to realize the pixel-level analysis of the video scene. Different from image semantic segmentation, video semantic segmentation can mine the temporal correlation prior of video data, use the temporal correlation between adjacent frames to guid...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06V20/40G06V10/764G06V10/778G06V10/774G06V20/70G06K9/62

CPCG06F18/217G06F18/24G06F18/214

Inventor王子磊庄嘉帆

OwnerUNIV OF SCI & TECH OF CHINA

Video semantic segmentation network training method, system and device and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology