Unlock instant, AI-driven research and patent intelligence for your innovation.

Video analysis method based on cross-modal hash learning

An analysis method and cross-modal technology, applied in the field of video semantic analysis, to achieve the effect of improving accuracy and efficient video positioning

Active Publication Date: 2021-07-13
SHANDONG ARTIFICIAL INTELLIGENCE INST +3
View PDF6 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

It can be seen that the research on cross-modal video localization is more meaningful and more challenging.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video analysis method based on cross-modal hash learning
  • Video analysis method based on cross-modal hash learning
  • Video analysis method based on cross-modal hash learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0037] In step a), take 16 frames as the minimum unit for the kth video data V k Carry out cell division.

Embodiment 2

[0039] In order to ensure that each bidirectional convolution process can obtain a video unit set with a length of R. We need to add padding information for each bidirectional sequential convolution operation. The padding number of the i-th layer is to increase the padding information for each two-way sequential convolution operation, so the formula p i =(ε-1)p Calculate the filling number p of the i-th layer i .

Embodiment 3

[0041] In step e), by the formula Calculate the loss function Γ of the fully connected neural network 1 , where is the Fronius norm, T is the transpose, and Y is the uniform dimension set by the multimodal feature.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a video analysis method based on cross-modal Hash learning, which realizes feature mapping and fusion from multi-modal features to a Hamming common space, and performs efficient retrieval on video clip-query statement pairs with semantic similarity by utilizing a Hamming distance. On one hand, a bidirectional time sequence convolutional network model is introduced, and context information of a video unit and long-term semantic dependence in a video are deeply understood; and on the other hand, a text semantic understanding model based on a multi-head attention mechanism is introduced, and a given query statement is effectively represented, so that the precision of video positioning is improved. The feature coding models are mutually independent, i.e., the generation of the video clip candidate set and the representation of the query statement feature set can be separately and independently operated. Therefore, after a corresponding candidate set is generated for a given video, efficient video positioning based on Hamming distance measurement can be repeatedly performed on the current video according to diversity requirements of different users.

Description

technical field [0001] The invention relates to the technical field of video semantic analysis, in particular to a video analysis method based on cross-modal hash learning. Background technique [0002] With the rapid development and mutual integration of the Internet, cloud computing, and big data technologies, video data has also increased and is widely distributed in various application scenarios to meet people's different needs. As a result, video retrieval technology has also received widespread attention. The current research on video retrieval is mainly divided into: (1) single-modal retrieval, that is, using a given video feature to retrieve video data with similar characteristics from the video database; (2) cross-modal retrieval, that is, using A given natural language description retrieves video data that is "semantically similar" to it from the video database. Obviously, this kind of visual information retrieval based on natural language is not only a deepening...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06V20/49G06V20/41G06N3/045G06F18/213G06F18/22G06F18/253
Inventor 贾永坡申培胡宇鹏甘甜吴建龙高赞聂礼强
Owner SHANDONG ARTIFICIAL INTELLIGENCE INST