Method for solving polymorphic statement video positioning task by using space-time graph reasoning network

A technology of video positioning and time graph, which is applied in the field of natural language visual positioning, and can solve problems such as inability to solve multi-morphic sentence video positioning tasks

Active Publication Date: 2020-07-14
ZHEJIANG UNIV
View PDF4 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Aiming at the defect that the prior art cannot solve the video positioning task of multi-morphic sentences, the present invention proposes a method for solving the multi-morphic sentence video positioning task using a space-time graph reasoning network. First, the video is parsed into a space-time region graph, The space-time domain map of the present invention has not only implicit and explicit spatial subgraphs for each frame, but also temporal dynamic subgraphs across frames

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for solving polymorphic statement video positioning task by using space-time graph reasoning network
  • Method for solving polymorphic statement video positioning task by using space-time graph reasoning network
  • Method for solving polymorphic statement video positioning task by using space-time graph reasoning network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0114] The present invention increases the Sentence annotation, a large-scale spatio-temporal video localization dataset VidSTG is built and validated on the VidSTG dataset. VidOR is the largest existing video dataset containing object relations, containing 10,000 videos and fine-grained annotations of objects and relations among them. VidOR annotates 80 object categories with dense bounding boxes and annotates 50 relation predicate categories (8 spatial relations and 42 action relations) between objects, expressing relations as triplets , Each triplet is associated with a temporal boundary and a space-time conduit (to which the subject and object belong). Select the appropriate triplet based on VidOR and describe the subject or object with multiple forms of sentences. Using VidOR as the base dataset has many advantages. On the one hand, laborious annotation of bounding boxes can be avoided. On the other hand, relations in triples can be simply incorporated into annotated ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for solving a polymorphic statement video positioning task through a space-time graph reasoning network, and belongs to the field of natural language visual positioning. According to the method, firstly, a video is parsed into a space-time region graph, and the space-time region graph not only has implicit and explicit space sub-graphs of each frame, but also has across-frame time dynamic sub-graph; next, a text clue is added into the space-time region graph, and multi-step cross-modal graph reasoning is established; the multi-step process may support multi-order relational modeling; and thereafter, a temporal boundary of the pipeline is determined using a temporal locator, then the object is located in each frame using a spatial locator having a dynamic selection method, and a smooth pipeline is generated. According to the method, the video does not need to be trimmed when the natural language is positioned, so that the video positioning cost is reduced; and question sentences and declaration sentences can be effectively processed, technical support is provided for higher-level natural language processing and computational vision combined research(such as video questions and answers), and the application prospect is wide.

Description

technical field [0001] The invention relates to the field of natural language visual positioning, in particular to a method for solving multi-morphic sentence video positioning tasks by using a space-time graph reasoning network. Background technique [0002] Visual localization of natural language is a fundamental and crucial task in the field of visual understanding. The goal of this task is to locate the object described by a given natural language in visual content temporally and spatially. In recent years, researchers have begun to pay attention to the positioning of natural language (sentences) in videos, including temporal positioning and space-time positioning. Time positioning can obtain the time segment where the object appears in the video; space-time positioning also needs to obtain the area where the object appears on the basis of time positioning. Continuity, so it is also called space-time pipe (spatio-temporal tube). [0003] At present, the methods implem...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06N3/04G06N3/08G06F17/11
CPCG06N3/08G06F17/11G06V20/46G06N3/045Y02D10/00
Inventor 赵洲张品涵张竹
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products