Language description guided video timing sequence positioning method

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A language description and positioning method technology, applied in the field of computer vision, can solve problems such as low efficiency, inflexibility of bounding boxes, extra space consumption, and deviation from human perception mechanisms

Active Publication Date: 2020-06-12

SUN YAT SEN UNIV

View PDF4 Cites 7 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, this method relies on the matching and ranking of the external sliding window, resulting in inefficiency, inflexibility of the bounding box and additional space consumption, and the method also deviates from the human perception mechanism

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0039] The implementation of the present invention is described below through specific examples and in conjunction with the accompanying drawings, and those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific examples, and various modifications and changes can be made to the details in this specification based on different viewpoints and applications without departing from the spirit of the present invention.

[0040] figure 1 It is a flow chart of the steps of a language description-guided video timing positioning method in the present invention, figure 2 It is a flow chart of language description-guided video timing positioning in a specific embodiment of the present invention. Such as figure 1 and figure 2 Shown, a kind of language description guide video timing location method of the pres...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a language description guided video timing sequence positioning method, which comprises the following steps of S1, extracting a multi-modal feature coding network of cross-modal features so as to learn cross-modal information of videos and natural languages, and obtaining multi-modal fusion representation of input languages and visual modals; s2, performing hierarchical decomposition on the cross-modal information by adopting a hierarchical tree structure strategy; step S3, employing progressive reinforcement learning mechanisms, correct credit allocation is provided through two task-oriented rewards; mutual promotion of different strategies in the tree structure is encouraged; according to the method, the decision process of human beings from coarse to fine is simulated through the progressive reinforcement learning framework based on the tree structure, complex action strategies can be effectively decomposed, the number of search steps is reduced while the search space is increased, and a more impressive result is obtained in a more reasonable mode.

Description

technical field [0001] The invention relates to the technical field of computer vision, in particular to a video timing positioning method for language description guidance based on tree structure and progressive reinforcement learning. Background technique [0002] The task of natural language video temporal localization is an emerging and challenging task in the field of computer vision and video analysis. Its goal is to determine the temporal boundaries of segments in untrimmed videos that correspond to a given natural language. This task is very related to the action timing localization task. Compared with the action timing localization task, this task is more challenging: 1) Not only does it not have a predefined action list and label, but it may contain complex descriptions. 2) This task requires the model to be able to establish the relationship between language modality and visual modality, to model multimodal features, and to have a deep understanding of natural la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/73G06K9/62G06N3/04G06N3/08

CPCG06F16/73G06N3/08G06N3/045G06F18/25Y02T10/40

Inventor李冠彬吴捷林倞

OwnerSUN YAT SEN UNIV

Language description guided video timing sequence positioning method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology