Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Video moment location method, system and storage medium based on cross-modality

A positioning method and cross-modal technology, applied in video data retrieval, metadata video data retrieval, character and pattern recognition, etc., can solve the problem that the correlation between words and time segments has not been fully studied

Active Publication Date: 2019-06-18
SHANDONG UNIV
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

As we can see, the correlation between words and time segments in textual queries has not been fully studied

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video moment location method, system and storage medium based on cross-modality
  • Video moment location method, system and storage medium based on cross-modality
  • Video moment location method, system and storage medium based on cross-modality

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0084] It should be pointed out that the following detailed description is exemplary and intended to provide further explanation to the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

[0085] It should be noted that the terminology used here is only for describing specific implementations, and is not intended to limit the exemplary implementations according to the present application. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and / or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and / or combinations thereof.

[0086] like figure 1As shown, as the first embodiment of the present invention, a video moment positioning meth...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a cross-modal video time positioning method, system and storage medium, which is applied to the positioning problem of a certain time segment in a video, including the following steps: constructing a language time series model for text information that is beneficial to time positioning Extract and perform feature extraction; the multi-modal fusion model fuses text-visual features to generate enhanced time representation features; the multi-layer perceptron model is used to predict the matching degree between time and text description and the start time of time fragments; use training Data end-to-end training model. The accuracy rate of the present invention is higher than that of the existing model on the time positioning problem based on the text query.

Description

technical field [0001] The invention relates to a video moment positioning method, system and storage medium based on cross-modality. Background technique [0002] Much progress has been made in video retrieval, i.e. retrieving videos from a set of candidate videos to match a given linguistic query. However, moment retrieval, i.e. finding specific segments (i.e., moments) from a video when given a natural language description, remains largely unexplored. This task, also known as time-series moment localization, is gaining increasing attention in the field of computer vision. In particular, given a video and a text query like “kid dancing next to other people”, existing solutions usually use temporal bounding boxes (i.e., start and end time points) to locate the moment segment corresponding to the query. [0003] In traditional video retrieval tasks, the query text is usually a simple keyword to represent the desired action, object or attribute. In contrast, in the tempora...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/783G06F16/78G06F17/27G06K9/00G06K9/62
CPCG06F40/289G06V20/41G06V20/46G06F18/22
Inventor 刘萌聂礼强王翔宋雪萌甘甜陈宝权
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products