Video moment location method, system and storage medium based on cross-modality

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A positioning method and cross-modal technology, applied in video data retrieval, metadata video data retrieval, character and pattern recognition, etc., can solve the problem that the correlation between words and time segments has not been fully studied

Active Publication Date: 2019-06-18

SHANDONG UNIV

View PDF5 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

As we can see, the correlation between words and time segments in textual queries has not been fully studied

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0084] It should be pointed out that the following detailed description is exemplary and intended to provide further explanation to the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

[0085] It should be noted that the terminology used here is only for describing specific implementations, and is not intended to limit the exemplary implementations according to the present application. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and / or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and / or combinations thereof.

[0086] like figure 1As shown, as the first embodiment of the present invention, a video moment positioning meth...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a cross-modal video time positioning method, system and storage medium, which is applied to the positioning problem of a certain time segment in a video, including the following steps: constructing a language time series model for text information that is beneficial to time positioning Extract and perform feature extraction; the multi-modal fusion model fuses text-visual features to generate enhanced time representation features; the multi-layer perceptron model is used to predict the matching degree between time and text description and the start time of time fragments; use training Data end-to-end training model. The accuracy rate of the present invention is higher than that of the existing model on the time positioning problem based on the text query.

Description

technical field [0001] The invention relates to a video moment positioning method, system and storage medium based on cross-modality. Background technique [0002] Much progress has been made in video retrieval, i.e. retrieving videos from a set of candidate videos to match a given linguistic query. However, moment retrieval, i.e. finding specific segments (i.e., moments) from a video when given a natural language description, remains largely unexplored. This task, also known as time-series moment localization, is gaining increasing attention in the field of computer vision. In particular, given a video and a text query like “kid dancing next to other people”, existing solutions usually use temporal bounding boxes (i.e., start and end time points) to locate the moment segment corresponding to the query. [0003] In traditional video retrieval tasks, the query text is usually a simple keyword to represent the desired action, object or attribute. In contrast, in the tempora...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/783G06F16/78G06F17/27G06K9/00G06K9/62

CPCG06F40/289G06V20/41G06V20/46G06F18/22

Inventor 刘萌聂礼强王翔宋雪萌甘甜陈宝权

Owner SHANDONG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Video moment location method, system and storage medium based on cross-modality

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology