Video analysis method based on cross-modal hash learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An analysis method and cross-modal technology, applied in the field of video semantic analysis, to achieve the effect of improving accuracy and efficient video positioning

Active Publication Date: 2021-07-13

SHANDONG ARTIFICIAL INTELLIGENCE INST +3

View PDF6 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

It can be seen that the research on cross-modal video localization is more meaningful and more challenging.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0037] In step a), take 16 frames as the minimum unit for the kth video data V k Carry out cell division.

Embodiment 2

[0039] In order to ensure that each bidirectional convolution process can obtain a video unit set with a length of R. We need to add padding information for each bidirectional sequential convolution operation. The padding number of the i-th layer is to increase the padding information for each two-way sequential convolution operation, so the formula p i =(ε-1)p Calculate the filling number p of the i-th layer i .

Embodiment 3

[0041] In step e), by the formula Calculate the loss function Γ of the fully connected neural network 1 , where is the Fronius norm, T is the transpose, and Y is the uniform dimension set by the multimodal feature.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a video analysis method based on cross-modal Hash learning, which realizes feature mapping and fusion from multi-modal features to a Hamming common space, and performs efficient retrieval on video clip-query statement pairs with semantic similarity by utilizing a Hamming distance. On one hand, a bidirectional time sequence convolutional network model is introduced, and context information of a video unit and long-term semantic dependence in a video are deeply understood; and on the other hand, a text semantic understanding model based on a multi-head attention mechanism is introduced, and a given query statement is effectively represented, so that the precision of video positioning is improved. The feature coding models are mutually independent, i.e., the generation of the video clip candidate set and the representation of the query statement feature set can be separately and independently operated. Therefore, after a corresponding candidate set is generated for a given video, efficient video positioning based on Hamming distance measurement can be repeatedly performed on the current video according to diversity requirements of different users.

Description

technical field [0001] The invention relates to the technical field of video semantic analysis, in particular to a video analysis method based on cross-modal hash learning. Background technique [0002] With the rapid development and mutual integration of the Internet, cloud computing, and big data technologies, video data has also increased and is widely distributed in various application scenarios to meet people's different needs. As a result, video retrieval technology has also received widespread attention. The current research on video retrieval is mainly divided into: (1) single-modal retrieval, that is, using a given video feature to retrieve video data with similar characteristics from the video database; (2) cross-modal retrieval, that is, using A given natural language description retrieves video data that is "semantically similar" to it from the video database. Obviously, this kind of visual information retrieval based on natural language is not only a deepening...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08

CPCG06N3/08G06V20/49G06V20/41G06N3/045G06F18/213G06F18/22G06F18/253

Inventor 贾永坡申培胡宇鹏甘甜吴建龙高赞聂礼强

Owner SHANDONG ARTIFICIAL INTELLIGENCE INST

Video analysis method based on cross-modal hash learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology