Video natural language text retrieval method based on space time sequence characteristics

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A natural language and text technology, applied in the physical field, can solve the problems of difficulty in accurately modeling the spatiotemporal semantic features of videos, affecting the retrieval accuracy of natural language texts in videos, etc. Effect

Pending Publication Date: 2021-11-26

XIDIAN UNIV

View PDF4 Cites 11 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] The purpose of the present invention is to address the deficiencies in the above-mentioned prior art, and propose a video natural language text retrieval method based on spatial and temporal features, aiming at solving the difficulty of accurately modeling the complex temporal and spatial semantic features of video, and the semantic features of different modal data There is a problem that the heterogeneous underlying manifold structure distribution and different semantic gaps affect the accuracy of video natural language text retrieval

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0043] Attached below figure 1 And embodiment the present invention is described in further detail.

[0044] Step 1, generate a sample set.

[0045] Select at least 6,000 multi-category dynamic behavior videos to be retrieved and their corresponding natural language text annotations to form a sample set. Each video contains at least 20 human-labeled natural language text annotations, and the number of natural language texts does not exceed 30 characters. At least 120,000 video natural language text pairs.

[0046] Step 2, using three neural networks to extract three-level spatial and temporal features of video samples.

[0047] Input the videos in the sample set into the trained deep residual neural network ResNet-152, extract the features of each frame image in each video, average the image features of all frames in each video, and output the video The 2048-dimensional frame-level features are used as the first-level features of the video.

[0048] Use the trained 3D conv...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a video text retrieval method based on spatial time sequence characteristics, which comprises the following steps of: performing hierarchical fine-grained comprehensive video unified representation on spatial time sequence semantic information of a video by utilizing three different types of neural networks; and constructing a video text public semantic embedding network to fit a semantic gap of cross-modal data, and training the network by using a comparison sorting loss function. The method can be used for mutual retrieval of video natural language texts, the layered feature extraction method fully excavates more discriminative complex space-time semantic information of video modal data, and the video text public semantic embedding network effectively learns public space feature representation with different modal heterogeneous data semantic feature identical distribution. The semantic association between the high-order features of the video and the natural language text is accurately measured through public space feature representation, and the retrieval precision of the video natural language text is improved.

Description

technical field [0001] The invention belongs to the technical field of physics, and further relates to a video natural language text retrieval method based on spatial and temporal features in the technical field of image and data processing. The invention can be used for semantic information mutual retrieval of large-scale video modal and natural language text modal data emerging from the Internet and social media, video theme detection and content recommendation of video applications. Background technique [0002] The emergence of a large number of user-generated videos on the Internet has increased the demand for video retrieval systems based on natural language text descriptions, and users' requirements for retrieval accuracy have also brought unprecedented challenges to the precise retrieval of video content. Traditional methods mainly support concept-based retrieval for simple natural language text queries, which are ineffective for complex long natural language text qu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/78G06F16/783G06F16/33G06F40/30G06K9/46G06K9/62G06N3/04G06N3/08

CPCG06F16/783G06F16/7867G06F16/3344G06F40/30G06N3/08G06N3/044G06N3/045G06F18/241

Inventor 王笛田玉敏罗雪梅丁子芮万波王义峰赵辉

Owner XIDIAN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Video natural language text retrieval method based on space time sequence characteristics

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology