Cross-modal video time sequence action positioning method and system based on time sequence-space diagram

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A positioning method and spatial map technology, applied in stereoscopic systems, image communication, character and pattern recognition, etc., can solve the problems of single length, neglect of interactive information, and low accuracy of positioning results, and achieve the effect of enhancing representation and improving accuracy

Active Publication Date: 2022-01-21

SHANDONG JIANZHU UNIV

View PDF18 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Most of the existing cross-modal video temporal action location work uses sliding windows, multi-scale sampling of anchor points, etc. to generate candidate sets of temporal action segments, resulting in a single length of candidate temporal action segments and low coverage with target temporal action segments. The accuracy of the final positioning result is low

In addition, most existing methods use global representations (such as C3D or I3D) when representing video clips, ignoring the interaction information of objects within or between frames, resulting in insufficient understanding of the video content, which in turn affects the target. Localization accuracy of temporal action clips

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0062] This embodiment provides a method for locating time-series actions in cross-modal videos based on a time-spatial diagram. The specific scheme of this embodiment is as follows figure 1 As shown, the method includes the following steps:

[0063] Step (1): Receive video data and natural language query information;

[0064] Step (2): Determine the natural language query feature representation based on the natural language query information;

[0065] Step (3): Based on the video data, determine the feature representation of the candidate video sequence action segment;

[0066] Step (4): Based on the feature representation of the candidate video sequence action segment and the natural language query feature representation, predict the timing offset of the candidate video sequence action segment and the candidate video sequence action segment and natural language query information relevance;

[0067] Step (5): By performing offset correction on the candidate video timing ac...

Embodiment 2

[0118] This embodiment provides a temporal-spatial diagram-based time-series action location system for cross-modal videos.

[0119] A time-series action localization system for cross-modal videos based on time-spatial graphs, including:

[0120] a data collection module configured to receive video data and natural language query information;

[0121] A language feature representation determining module configured to determine a natural language query feature representation based on natural language query information;

[0122] The video feature representation determination module is configured to determine the feature representation of the candidate video timing action segment based on the video data;

[0123] The feature representation analysis module is configured to predict the timing offset of the candidate video sequence action segment and the difference between the candidate video sequence action segment and the natural language query feature representation based on the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the technical field of data representation, and provides a cross-modal video time sequence action positioning method and system based on a time sequence-space diagram, and the method comprises the steps: receiving video data and natural language query information; determining a natural language query feature representation based on the natural language query information; determining candidate video time sequence action segment feature representations based on the video data; on the basis of the candidate video time sequence action segment feature representations and the natural language query feature representation, predicting the time sequence offset of the candidate video time sequence action segments and the correlation between the candidate video time sequence action segments and natural language query information; and performing offset correction on the candidate video time sequence action segment with the highest correlation score to obtain a final target video time sequence action segment positioning result. According to the method and the system, the time sequence offset and the correlation score of the corresponding candidate video time sequence action segment are predicted through the two feature representations, and the accuracy of video time sequence action positioning is greatly improved.

Description

technical field [0001] The invention belongs to the technical field of data representation, and in particular relates to a time-series action positioning method and system for cross-modal video based on a time-spatial diagram. Background technique [0002] The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art. [0003] The rapid development of Internet technology and the increasing popularity of image acquisition equipment have led to an exponential increase in the number of videos and have become a mainstream media form in today's society. Faced with such large-scale video data, video timing action location has become a hot research issue in the field of video analysis, which aims to locate the start and end moments of all actions from a given video, and at the same time classify these actions predict. However, current video temporal action localization methods can only detect and ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06V20/40H04N13/327H04N13/161

CPCH04N13/327H04N13/161

Inventor 刘萌齐孟津田传发周迪郭杰马玉玲刘新锋

Owner SHANDONG JIANZHU UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Cross-modal video time sequence action positioning method and system based on time sequence-space diagram

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology