Method for solving polymorphic statement video positioning task by using space-time graph reasoning network

A technology of video positioning and time graph, which is applied in the field of natural language visual positioning, and can solve problems such as inability to solve multi-morphic sentence video positioning tasks
CN111414845AActive Publication Date: 2020-07-14ZHEJIANG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ZHEJIANG UNIV
Publication Date
2020-07-14

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a method for solving a polymorphic statement video positioning task through a space-time graph reasoning network, and belongs to the field of natural language visual positioning. According to the method, firstly, a video is parsed into a space-time region graph, and the space-time region graph not only has implicit and explicit space sub-graphs of each frame, but also has across-frame time dynamic sub-graph; next, a text clue is added into the space-time region graph, and multi-step cross-modal graph reasoning is established; the multi-step process may support multi-order relational modeling; and thereafter, a temporal boundary of the pipeline is determined using a temporal locator, then the object is located in each frame using a spatial locator having a dynamic selection method, and a smooth pipeline is generated. According to the method, the video does not need to be trimmed when the natural language is positioned, so that the video positioning cost is reduced; and question sentences and declaration sentences can be effectively processed, technical support is provided for higher-level natural language processing and computational vision combined research(such as video questions and answers), and the application prospect is wide.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the field of natural language visual positioning, in particular to a method for solving multi-morphic sentence video positioning tasks by using a space-time graph reasoning network. Background technique

[0002] Visual localization of natural language is a fundamental and crucial task in the field of visual understanding. The goal of this task is to locate the object described by a given natural language in visual content temporally and spatially. In recent years, researchers have begun to pay attention to the positioning of natural language (sentences) in videos, including temporal positioning and space-time positioning. Time positioning can obtain the time segment where the object appears in the video; space-time positioning also needs to obtain the area where the object appears on the basis of time positioning. Continuity, so it is also called space-time pipe (spatio-temporal tube).

[0003] At present, the methods implem...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More