A video description method and system based on an information loss function

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of information loss and video description, applied in neural learning methods, character and pattern recognition, instruments, etc., can solve problems such as model learning problems, recognition errors, and discriminative word recognition errors

Inactive Publication Date: 2019-04-26

INST OF COMPUTING TECH CHINESE ACAD OF SCI

View PDF7 Cites 24 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] When the inventor was conducting visual description research, he found that the descriptions generated by existing video description methods had the problems of missing details and recognition errors

This problem is caused by the fact that the existing loss function is affected by the uneven distribution of words in the data set and the visual features used by the existing methods are not rich enough.

The problem of uneven distribution of words can be simply attributed to the problem of uneven samples. The model spends a lot of effort to learn a small number of simple samples, which leads to the recognition of discriminative words by the model.

However, directly adopting methods to solve sample unevenness (such as increasing the loss weight of rare words) will cause the model to pay too much attention to rare words

And rare words are not necessarily related to the video content, causing problems in model learning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0074] The purpose of the present invention is to overcome the problems of semantic word recognition errors and missing details in the language description generated by the above-mentioned existing video description method, and propose a video description method based on an information loss function. The method includes: 1) A learning strategy called information loss function is used to overcome the problem of description ambiguity caused by biased data distribution. 2) An optimized model framework includes a hierarchical visual representation and a hierarchical attention mechanism to fully exploit the potential of the information loss function.

[0075] Specifically as Figure 5 As shown, the present invention discloses a video description method based on an information loss function, which includes:

[0076] Step 1, obtain the training video, and input the training video to the target detection network, the convolutional neural network and the action recognition network res...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a video description method and system based on an information loss function, and the method comprises the steps: obtaining a training video, and obtaining the semantic information of each frame of a set training video; Inputting the semantic information of the training video into an LSTM-combined hierarchical attention mechanism model to obtain character description of thetraining video; According to the importance of each word in the character description to the expression video content, performing loss weighting on the words to obtain an information loss function, and taking the information loss function as an objective function to perform back-propagation gradient optimization on the hierarchical attention mechanism model to obtain a video description model; Obtaining a to-be-described video, respectively inputting the to-be-described video into the target detection network, the convolutional neural network and the action recognition network to obtain a setof target features, overall features and motion features of each frame of the to-be-described video as semantic information of the to-be-described video, and inputting the semantic information into the video description model to obtain character description of the to-be-described video.

Description

technical field [0001] The invention relates to the technical field of computer vision and natural language processing, in particular to a video description method and system based on an information loss function, which can be applied to video description, human-computer interaction and video retrieval tasks. Background technique [0002] The video describes the model architecture. Current video description models are mainly divided into bottom-up and top-down structures. The bottom-up model first recognizes limited semantic words based on visual information, and then connects these semantic words into a sentence through language templates. The study found that bottom-up generated statements lack flexibility. Inspired by machine translation tasks, researchers proposed a top-down model based on long short-term memory (hereinafter referred to as LSTM). The top-down model completes semantic word recognition while generating sentences, so this method can generate more diverse...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/00G06N3/04G06N3/08

CPCG06N3/08G06V20/41G06V20/46G06N3/045

Inventor 高科董嘉蓉陈潇凯郭俊波

Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A video description method and system based on an information loss function

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology