Unlock instant, AI-driven research and patent intelligence for your innovation.

Natural language video clip retrieval method based on multi-agent boundary perception network

A perception network, multi-agent technology, applied in digital data information retrieval, special data processing applications, instruments, etc., can solve the problems of inability to match query text and video clips, lack of video time inference ability, and retrieval accuracy limitations.

Pending Publication Date: 2020-05-26
TONGJI UNIV
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Such a complex semantic association needs to be based on a full understanding of the context information of video clips. However, traditional video clip retrieval technology focuses on the research of the entire video, ignoring the semantic association between video clips, which often leads to query Cases where the text does not match the video clip
Although the method of using the attention mechanism to establish the relationship between video clips and text has alleviated the above problems to a certain extent, it lacks the ability to infer video time, and still cannot fully understand the structural correlation of videos, and the retrieval accuracy is therefore limited.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Natural language video clip retrieval method based on multi-agent boundary perception network
  • Natural language video clip retrieval method based on multi-agent boundary perception network
  • Natural language video clip retrieval method based on multi-agent boundary perception network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments. This embodiment is carried out on the premise of the technical solution of the present invention, and detailed implementation and specific operation process are given, but the protection scope of the present invention is not limited to the following embodiments.

[0051] The invention provides a method for retrieving natural language video segments based on a multi-agent boundary perception network, which can retrieve a corresponding target segment from a certain video based on a sentence of natural language description. The retrieval method decomposes the task into two sub-tasks, starting point retrieval and end point retrieval, and iteratively adjusts the time boundary in multiple directions and scales through the boundary-aware agent (including the start point agent and the end point agent) to make the retrieval result approach the target seg...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a natural language video clip retrieval method based on a multi-agent boundary perception network. According to the method, a multi-agent boundary sensing network is used as abasic framework, multi-direction and multi-scale iteration is carried out on a starting point and an end point respectively, a time boundary is adjusted, a target fragment is obtained, and the multi-agent boundary sensing network comprises an observation network, a starting point agent, an end point agent and a limited supervision network. Compared with the prior art, the method has the advantages that the parameter quantity is still kept not to be greatly increased under the condition of realizing high-precision retrieval, and the retrieval requirements of video clips with various complex scenes in real life can be better met by virtue of the boundary perception capability.

Description

technical field [0001] The invention belongs to the technical field of video retrieval, and relates to a natural language video segment retrieval method, in particular to a natural language video segment retrieval method based on a multi-agent boundary perception network. Background technique [0002] In recent years, due to the rapid development of the mobile Internet, video sites such as Douyin, bilibili, iQiyi, and Douyu are in the ascendant and have become an indispensable and important part of people's entertainment life. At the same time, the state has increased investment in video surveillance, which has put forward higher requirements for video understanding. As a rapidly developing branch of video understanding, natural language video retrieval combines natural language processing and computer vision analysis, aiming at retrieving segments related to a given text description query semantics in a long video. It has important applications in video retrieval, intellig...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/783
CPCG06F16/783
Inventor 王瀚漓孙晓阳
Owner TONGJI UNIV