Method and system for cross-mode-based video time location, and storage medium

A localization method, cross-modal technology, applied in character and pattern recognition, instrumentation, computing, etc., can solve the problem that the correlation between words and time segments has not been fully studied.

Active Publication Date: 2018-12-04
SHANDONG UNIV
View PDF5 Cites 35 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

As we can see, the correlation between words and time segments in textual queries has not been fully studied

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for cross-mode-based video time location, and storage medium
  • Method and system for cross-mode-based video time location, and storage medium
  • Method and system for cross-mode-based video time location, and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0084] It should be pointed out that the following detailed description is exemplary and intended to provide further explanation to the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

[0085] It should be noted that the terminology used here is only for describing specific implementations, and is not intended to limit the exemplary implementations according to the present application. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and / or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and / or combinations thereof.

[0086] Such as figure 1As shown, as the first embodiment of the present invention, a video moment positioning m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a system for cross-mode-based video time location, and a storage medium. The method and the system are applied in a location problem of a certain time segment in avideo. The method comprises the following steps: establishing a language timing model, to extract text information which is beneficial for time location and extract features; a multimodal fusion model fusing text-visual features, to generate enhanced time representation features; a multi-layer perception model being used to predict matching degree between time and text description, and starting time of the time segment; using a training model which trains data from end to end. The method and the system have higher accuracy than an existing model on a time location problem based on text query.

Description

technical field [0001] The invention relates to a video moment positioning method, system and storage medium based on cross-modality. Background technique [0002] Much progress has been made in video retrieval, i.e. retrieving videos from a set of candidate videos to match a given linguistic query. However, moment retrieval, i.e. finding specific segments (i.e., moments) from a video when given a natural language description, remains largely unexplored. This task, also known as time-series moment localization, is gaining increasing attention in the field of computer vision. In particular, given a video and a text query like “kid dancing next to other people”, existing solutions usually use temporal bounding boxes (i.e., start and end time points) to locate the moment segment corresponding to the query. [0003] In traditional video retrieval tasks, the query text is usually a simple keyword to represent the desired action, object or attribute. In contrast, in the tempora...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27G06K9/00G06K9/62
CPCG06F40/289G06V20/41G06V20/46G06F18/22
Inventor 刘萌聂礼强王翔宋雪萌甘甜陈宝权
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products