A video description method and system based on an information loss function

A technology of information loss and video description, applied in neural learning methods, character and pattern recognition, instruments, etc., can solve problems such as model learning problems, recognition errors, and discriminative word recognition errors

Inactive Publication Date: 2019-04-26
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF7 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] When the inventor was conducting visual description research, he found that the descriptions generated by existing video description methods had the problems of missing details and recognition errors
This problem is caused by the fact that the existing loss function is affected by the uneven distribution of words in the data set and the visual features used by the existing methods are not rich enough.
The problem of uneven distribution of words can be sim

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A video description method and system based on an information loss function
  • A video description method and system based on an information loss function
  • A video description method and system based on an information loss function

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0074] The purpose of the present invention is to overcome the problems of semantic word recognition errors and missing details in the language description generated by the above-mentioned existing video description method, and propose a video description method based on an information loss function. The method includes: 1) A learning strategy called information loss function is used to overcome the problem of description ambiguity caused by biased data distribution. 2) An optimized model framework includes a hierarchical visual representation and a hierarchical attention mechanism to fully exploit the potential of the information loss function.

[0075] Specifically as Figure 5 As shown, the present invention discloses a video description method based on an information loss function, which includes:

[0076] Step 1, obtain the training video, and input the training video to the target detection network, the convolutional neural network and the action recognition network res...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a video description method and system based on an information loss function, and the method comprises the steps: obtaining a training video, and obtaining the semantic information of each frame of a set training video; Inputting the semantic information of the training video into an LSTM-combined hierarchical attention mechanism model to obtain character description of thetraining video; According to the importance of each word in the character description to the expression video content, performing loss weighting on the words to obtain an information loss function, and taking the information loss function as an objective function to perform back-propagation gradient optimization on the hierarchical attention mechanism model to obtain a video description model; Obtaining a to-be-described video, respectively inputting the to-be-described video into the target detection network, the convolutional neural network and the action recognition network to obtain a setof target features, overall features and motion features of each frame of the to-be-described video as semantic information of the to-be-described video, and inputting the semantic information into the video description model to obtain character description of the to-be-described video.

Description

technical field [0001] The invention relates to the technical field of computer vision and natural language processing, in particular to a video description method and system based on an information loss function, which can be applied to video description, human-computer interaction and video retrieval tasks. Background technique [0002] The video describes the model architecture. Current video description models are mainly divided into bottom-up and top-down structures. The bottom-up model first recognizes limited semantic words based on visual information, and then connects these semantic words into a sentence through language templates. The study found that bottom-up generated statements lack flexibility. Inspired by machine translation tasks, researchers proposed a top-down model based on long short-term memory (hereinafter referred to as LSTM). The top-down model completes semantic word recognition while generating sentences, so this method can generate more diverse...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/00G06N3/04G06N3/08
CPCG06N3/08G06V20/41G06V20/46G06N3/045
Inventor 高科董嘉蓉陈潇凯郭俊波
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products