Video description method based on high-order low-rank multi-modal attention mechanism

A video description, multi-modal technology, applied in the field of computer vision, can solve the problems of ignoring multi-modal feature correlation information, the impact of video description accuracy, etc., to achieve good application value, improve efficiency, and improve accuracy.
CN110826397AActive Publication Date: 2020-02-21ZHEJIANG UNIV

Patent Information

Authority / Receiving Office
CN · China
Current Assignee / Owner
ZHEJIANG UNIV
Publication Date
2020-02-21

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a video description method based on a high-order low-rank multi-modal attention mechanism, which is used for generating short and accurate description for a given video clip. The method specifically comprises the following steps: obtaining a video data set for training a video description generation model, and defining an algorithm target; modeling time sequence multi-modalfeatures in the video data set; establishing a high-order low-rank multi-modal attention mechanism on a decoder based on the time sequence multi-modal characteristics; generating a description of aninput video using the model. The method is suitable for video description generation of a real video scene, and has better effect and robustness for various complex conditions.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention belongs to the field of computer vision, in particular to a video description method based on a high-order low-rank multi-modal attention mechanism. Background technique

[0002] In today's society, video has become an indispensable part of human society, it can be said that it is everywhere. Such an environment has made people's research on the semantic content of video has also been greatly developed. At present, most of the research on video is mainly concentrated on lower levels, such as classification, detection and so on. Thanks to the development of recurrent neural networks, the new task of video description generation has also come into view. Given a video clip, use the trained network model to automatically generate a sentence description for the video clip. Its application in the real world is also very extensive. For example, about 100 hours of videos are generated every minute on YouTube. If the generated video resources ar...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More