Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Text-to-Video Cross-Modal Retrieval Method Based on Multi-Level Coding

A cross-modal, text technology, applied in the field of video cross-modal retrieval, can solve the problems of information loss, difficulty and low retrieval efficiency, and achieve the effect of improving performance and efficiency

Active Publication Date: 2022-03-25
ZHEJIANG GONGSHANG UNIVERSITY
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this concept-based retrieval method has the following shortcomings: First, text and video have very rich content, and it is generally difficult to fully describe its content through several concepts, resulting in loss of information; second, the performance of the retrieval model depends on the text and video content. A video concept extractor, but how to build an effective concept extractor is not easy; Third, because this type of retrieval method relies on complex concept modeling and concept matching, its retrieval efficiency is relatively low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Text-to-Video Cross-Modal Retrieval Method Based on Multi-Level Coding
  • A Text-to-Video Cross-Modal Retrieval Method Based on Multi-Level Coding
  • A Text-to-Video Cross-Modal Retrieval Method Based on Multi-Level Coding

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] In order to make the above objects, features and advantages of the present invention more clearly understood, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0039] Many specific details are set forth in the following description to facilitate a full understanding of the present invention, but the present invention can also be implemented in other ways different from those described herein, and those skilled in the art can do so without departing from the connotation of the present invention. Similar promotion, therefore, the present invention is not limited by the specific embodiments disclosed below.

[0040] The present invention proposes a text-to-video cross-modal retrieval method based on multi-level coding, including:

[0041] (1) Using different feature extraction methods to extract the features of the two modalities of video and text, respectively.

[0042](1-1) For a given vide...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text-to-video cross-modal retrieval method based on multi-level coding. The method includes: firstly obtaining preliminary features of video and text, and then separately analyzing the two modes through two multi-level coding network branches. Global, timing, and local information are encoded; audio features are extracted from the video side, and sentence features are extracted from the text side; finally, multiple encoded features are fused with multi-level features to obtain robust video and text expressions. Map the features of the two modalities into a unified common space through the fully connected layer, use the public space algorithm to learn the relationship between the two modalities, train the model in an end-to-end manner, and automatically learn the matching relationship between text and video, This enables cross-modal retrieval from text to video. The invention is a concept-free method, which can realize cross-modal retrieval without complex concept detection operations, and utilizes deep learning technology to greatly improve the performance and efficiency of retrieval.

Description

technical field [0001] The invention relates to the technical field of video cross-modal retrieval, in particular to a text-to-video cross-modal retrieval method based on multi-level coding. Background technique [0002] In recent years, due to the popularization of the Internet and mobile smart devices and the rapid development of communication and multimedia technologies, massive amounts of multimedia data are created and uploaded to the Internet every day. The rate of growth is increasing, and these multimedia data have become the most important source of information for modern people. With the advent of the 5G era, due to its features such as faster transmission speed, larger bandwidth and lower delay, it will further accelerate the growth of multimedia data, especially for video data, it will be easier for people to It is foreseeable that the amount of video data storage on the Internet will be huge in the future. Faced with such a huge amount of multimedia data, how ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/783G06F16/33
CPCG06F16/7834G06F16/3344
Inventor 董建锋叶金德章磊敏林昶廷王勋
Owner ZHEJIANG GONGSHANG UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products