Method and system for cross-mode-based video time location, and storage medium

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A localization method, cross-modal technology, applied in character and pattern recognition, instrumentation, computing, etc., can solve the problem that the correlation between words and time segments has not been fully studied.

Active Publication Date: 2018-12-04

SHANDONG UNIV

View PDF5 Cites 35 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

As we can see, the correlation between words and time segments in textual queries has not been fully studied

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0084] It should be pointed out that the following detailed description is exemplary and intended to provide further explanation to the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

[0085] It should be noted that the terminology used here is only for describing specific implementations, and is not intended to limit the exemplary implementations according to the present application. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and / or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and / or combinations thereof.

[0086] Such as figure 1As shown, as the first embodiment of the present invention, a video moment positioning m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method and a system for cross-mode-based video time location, and a storage medium. The method and the system are applied in a location problem of a certain time segment in avideo. The method comprises the following steps: establishing a language timing model, to extract text information which is beneficial for time location and extract features; a multimodal fusion model fusing text-visual features, to generate enhanced time representation features; a multi-layer perception model being used to predict matching degree between time and text description, and starting time of the time segment; using a training model which trains data from end to end. The method and the system have higher accuracy than an existing model on a time location problem based on text query.

Description

technical field [0001] The invention relates to a video moment positioning method, system and storage medium based on cross-modality. Background technique [0002] Much progress has been made in video retrieval, i.e. retrieving videos from a set of candidate videos to match a given linguistic query. However, moment retrieval, i.e. finding specific segments (i.e., moments) from a video when given a natural language description, remains largely unexplored. This task, also known as time-series moment localization, is gaining increasing attention in the field of computer vision. In particular, given a video and a text query like “kid dancing next to other people”, existing solutions usually use temporal bounding boxes (i.e., start and end time points) to locate the moment segment corresponding to the query. [0003] In traditional video retrieval tasks, the query text is usually a simple keyword to represent the desired action, object or attribute. In contrast, in the tempora...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/30G06F17/27G06K9/00G06K9/62

CPCG06F40/289G06V20/41G06V20/46G06F18/22

Inventor 刘萌聂礼强王翔宋雪萌甘甜陈宝权

Owner SHANDONG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method and system for cross-mode-based video time location, and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology