Cross-modal video time sequence action positioning method and system based on time sequence-space diagram

A positioning method and spatial map technology, applied in stereoscopic systems, image communication, character and pattern recognition, etc., can solve the problems of single length, neglect of interactive information, and low accuracy of positioning results, and achieve the effect of enhancing representation and improving accuracy

Active Publication Date: 2022-01-21
SHANDONG JIANZHU UNIV
View PDF18 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Most of the existing cross-modal video temporal action location work uses sliding windows, multi-scale sampling of anchor points, etc. to generate candidate sets of temporal action segments, resulting in a single length of candidate temporal action segments and low coverage with target temporal action segments. The accuracy of the final positioning result is low
In addition, most existing methods use global representations (such as C3D or I3D) when representing video clips, ignoring the interaction information of objects within or between frames, resulting in insufficient understanding of the video content, which in turn affects the target. Localization accuracy of temporal action clips

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-modal video time sequence action positioning method and system based on time sequence-space diagram
  • Cross-modal video time sequence action positioning method and system based on time sequence-space diagram
  • Cross-modal video time sequence action positioning method and system based on time sequence-space diagram

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] This embodiment provides a method for locating time-series actions in cross-modal videos based on a time-spatial diagram. The specific scheme of this embodiment is as follows figure 1 As shown, the method includes the following steps:

[0063] Step (1): Receive video data and natural language query information;

[0064] Step (2): Determine the natural language query feature representation based on the natural language query information;

[0065] Step (3): Based on the video data, determine the feature representation of the candidate video sequence action segment;

[0066] Step (4): Based on the feature representation of the candidate video sequence action segment and the natural language query feature representation, predict the timing offset of the candidate video sequence action segment and the candidate video sequence action segment and natural language query information relevance;

[0067] Step (5): By performing offset correction on the candidate video timing ac...

Embodiment 2

[0118] This embodiment provides a temporal-spatial diagram-based time-series action location system for cross-modal videos.

[0119] A time-series action localization system for cross-modal videos based on time-spatial graphs, including:

[0120] a data collection module configured to receive video data and natural language query information;

[0121] A language feature representation determining module configured to determine a natural language query feature representation based on natural language query information;

[0122] The video feature representation determination module is configured to determine the feature representation of the candidate video timing action segment based on the video data;

[0123] The feature representation analysis module is configured to predict the timing offset of the candidate video sequence action segment and the difference between the candidate video sequence action segment and the natural language query feature representation based on the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of data representation, and provides a cross-modal video time sequence action positioning method and system based on a time sequence-space diagram, and the method comprises the steps: receiving video data and natural language query information; determining a natural language query feature representation based on the natural language query information; determining candidate video time sequence action segment feature representations based on the video data; on the basis of the candidate video time sequence action segment feature representations and the natural language query feature representation, predicting the time sequence offset of the candidate video time sequence action segments and the correlation between the candidate video time sequence action segments and natural language query information; and performing offset correction on the candidate video time sequence action segment with the highest correlation score to obtain a final target video time sequence action segment positioning result. According to the method and the system, the time sequence offset and the correlation score of the corresponding candidate video time sequence action segment are predicted through the two feature representations, and the accuracy of video time sequence action positioning is greatly improved.

Description

technical field [0001] The invention belongs to the technical field of data representation, and in particular relates to a time-series action positioning method and system for cross-modal video based on a time-spatial diagram. Background technique [0002] The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art. [0003] The rapid development of Internet technology and the increasing popularity of image acquisition equipment have led to an exponential increase in the number of videos and have become a mainstream media form in today's society. Faced with such large-scale video data, video timing action location has become a hot research issue in the field of video analysis, which aims to locate the start and end moments of all actions from a given video, and at the same time classify these actions predict. However, current video temporal action localization methods can only detect and ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06V20/40H04N13/327H04N13/161
CPCH04N13/327H04N13/161
Inventor 刘萌齐孟津田传发周迪郭杰马玉玲刘新锋
Owner SHANDONG JIANZHU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products