Method for solving polymorphic statement video positioning task by using space-time graph reasoning network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of video positioning and time graph, which is applied in the field of natural language visual positioning, and can solve problems such as inability to solve multi-morphic sentence video positioning tasks

Active Publication Date: 2020-07-14

ZHEJIANG UNIV

View PDF4 Cites 5 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] Aiming at the defect that the prior art cannot solve the video positioning task of multi-morphic sentences, the present invention proposes a method for solving the multi-morphic sentence video positioning task using a space-time graph reasoning network. First, the video is parsed into a space-time region graph, The space-time domain map of the present invention has not only implicit and explicit spatial subgraphs for each frame, but also temporal dynamic subgraphs across frames

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0114] The present invention increases the Sentence annotation, a large-scale spatio-temporal video localization dataset VidSTG is built and validated on the VidSTG dataset. VidOR is the largest existing video dataset containing object relations, containing 10,000 videos and fine-grained annotations of objects and relations among them. VidOR annotates 80 object categories with dense bounding boxes and annotates 50 relation predicate categories (8 spatial relations and 42 action relations) between objects, expressing relations as triplets , Each triplet is associated with a temporal boundary and a space-time conduit (to which the subject and object belong). Select the appropriate triplet based on VidOR and describe the subject or object with multiple forms of sentences. Using VidOR as the base dataset has many advantages. On the one hand, laborious annotation of bounding boxes can be avoided. On the other hand, relations in triples can be simply incorporated into annotated ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method for solving a polymorphic statement video positioning task through a space-time graph reasoning network, and belongs to the field of natural language visual positioning. According to the method, firstly, a video is parsed into a space-time region graph, and the space-time region graph not only has implicit and explicit space sub-graphs of each frame, but also has across-frame time dynamic sub-graph; next, a text clue is added into the space-time region graph, and multi-step cross-modal graph reasoning is established; the multi-step process may support multi-order relational modeling; and thereafter, a temporal boundary of the pipeline is determined using a temporal locator, then the object is located in each frame using a spatial locator having a dynamic selection method, and a smooth pipeline is generated. According to the method, the video does not need to be trimmed when the natural language is positioned, so that the video positioning cost is reduced; and question sentences and declaration sentences can be effectively processed, technical support is provided for higher-level natural language processing and computational vision combined research(such as video questions and answers), and the application prospect is wide.

Description

technical field [0001] The invention relates to the field of natural language visual positioning, in particular to a method for solving multi-morphic sentence video positioning tasks by using a space-time graph reasoning network. Background technique [0002] Visual localization of natural language is a fundamental and crucial task in the field of visual understanding. The goal of this task is to locate the object described by a given natural language in visual content temporally and spatially. In recent years, researchers have begun to pay attention to the positioning of natural language (sentences) in videos, including temporal positioning and space-time positioning. Time positioning can obtain the time segment where the object appears in the video; space-time positioning also needs to obtain the area where the object appears on the basis of time positioning. Continuity, so it is also called space-time pipe (spatio-temporal tube). [0003] At present, the methods implem...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/00G06N3/04G06N3/08G06F17/11

CPCG06N3/08G06F17/11G06V20/46G06N3/045Y02D10/00

Inventor 赵洲张品涵张竹

Owner ZHEJIANG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method for solving polymorphic statement video positioning task by using space-time graph reasoning network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology