Cross-modal retrieval method for querying video from complex text based on semantic tree enhancement

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A semantic tree, cross-modal technology, applied in the field of cross-modal retrieval, can solve the problems of information loss, poor video retrieval effect, and ineffective complex text query.

Active Publication Date: 2020-11-06

ZHEJIANG GONGSHANG UNIVERSITY

View PDF5 Cites 12 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, this kind of method has the following shortcomings: First, it is usually not very effective for complex text queries, because it is usually difficult to fully describe the semantic content of complex text queries through a number of visual concepts, resulting in information loss, and the semantic content of complex text queries Not just aggregations for extracting concepts

2. How to effectively train a concept classifier and select related concepts is also a very challenging problem

Although this type of direction can better handle longer text queries than concept-based methods, this type of method has the following shortcomings: First, the user's text query is represented by a word vector, which cannot effectively understand the user's intent. Therefore, the video retrieval effect is not good for complex text queries

2. This type of method lacks the interpretability of the sub-retrieval process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0067] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0068] In order to solve the problem of cross-modal retrieval from complex text query to video, the present invention proposes a cross-modal retrieval method from complex text query to video based on semantic tree enhancement. The specific steps are as follows:

[0069] (1) Using the feature extraction method to extract the features of the complex text query statement, and obtain the leaf node features of the complex text query statement.

[0070] (1-1) Given a complex text query statement Q of length N, the complex text query statement Q can be expressed as:

[0071] Q={w 1 ,w 2 ,...,w N}

[0072] where w 1 Represents the first word in the complex text query sentence, first use one-hot encoding (one-hot) to encode each word in the complex text query sentence, and the one-hot encoding vector sequence {w′ 1 , w′ 2 ,...,w′ N}, where w′ ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a cross-modal retrieval method for querying a video from a complex text based on semantic tree enhancement. For complex text query statements, words of complex text query statements are converted into leaf node representations, the relationship between child nodes is mined, the two child nodes with the highest dependency are combined, a semantic tree structure of the querystatements is constructed in a recursion mode, and query representations based on semantic tree enhancement are obtained. For coding of candidate videos, video preliminary features are obtained through a CNN, time dependence and semantic correlation between the videos are captured through a GRU and a self-attention mechanism module, and robust video feature representation is obtained. The complextext query representation and the video feature representation are mapped into a public space, and a matching relationship between the complex text query representation and the video feature representation is is automatically learned, thereby realizing cross-modal retrieval from complex text query to video. Information components in the complex text query statements can be explained, the user intention can be better understood, and the retrieval performance is improved to a great extent.

Description

technical field [0001] The invention relates to the field of cross-modal retrieval from text query to video, in particular to a cross-modal retrieval method from complex text query to video based on semantic tree enhancement. Background technique [0002] With the exponential growth of user-generated videos on the Internet, uploading videos in daily life and searching for videos of interest have become indispensable activities in people's daily life. The cross-modal retrieval method from text query to video is one of the techniques to obtain interesting videos. Early cross-modal retrieval methods from text query to video are mainly based on text keywords, and have been extensively researched and developed. But such methods only allow the user to input several keywords as queries. With the further improvement of people's demand for Internet video search capabilities, keyword-based queries are difficult to fully express users' search intentions, thereby affecting search expe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/33G06F16/783G06F40/30G06N3/04

CPCG06F16/3344G06F16/783G06F40/30G06N3/044G06N3/045

Inventor董建锋彭敬伟杨勋郑琪王勋

OwnerZHEJIANG GONGSHANG UNIVERSITY

Cross-modal retrieval method for querying video from complex text based on semantic tree enhancement

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology