Language description guided video timing sequence positioning method

A language description and positioning method technology, applied in the field of computer vision, can solve problems such as low efficiency, inflexibility of bounding boxes, extra space consumption, and deviation from human perception mechanisms

Active Publication Date: 2020-06-12
SUN YAT SEN UNIV
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method relies on the matching and ranking of the external sliding window, resulting in inefficiency, inflexib

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Language description guided video timing sequence positioning method
  • Language description guided video timing sequence positioning method
  • Language description guided video timing sequence positioning method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The implementation of the present invention is described below through specific examples and in conjunction with the accompanying drawings, and those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific examples, and various modifications and changes can be made to the details in this specification based on different viewpoints and applications without departing from the spirit of the present invention.

[0040] figure 1 It is a flow chart of the steps of a language description-guided video timing positioning method in the present invention, figure 2 It is a flow chart of language description-guided video timing positioning in a specific embodiment of the present invention. Such as figure 1 and figure 2 Shown, a kind of language description guide video timing location method of the pres...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a language description guided video timing sequence positioning method, which comprises the following steps of S1, extracting a multi-modal feature coding network of cross-modal features so as to learn cross-modal information of videos and natural languages, and obtaining multi-modal fusion representation of input languages and visual modals; s2, performing hierarchical decomposition on the cross-modal information by adopting a hierarchical tree structure strategy; step S3, employing progressive reinforcement learning mechanisms, correct credit allocation is provided through two task-oriented rewards; mutual promotion of different strategies in the tree structure is encouraged; according to the method, the decision process of human beings from coarse to fine is simulated through the progressive reinforcement learning framework based on the tree structure, complex action strategies can be effectively decomposed, the number of search steps is reduced while the search space is increased, and a more impressive result is obtained in a more reasonable mode.

Description

technical field [0001] The invention relates to the technical field of computer vision, in particular to a video timing positioning method for language description guidance based on tree structure and progressive reinforcement learning. Background technique [0002] The task of natural language video temporal localization is an emerging and challenging task in the field of computer vision and video analysis. Its goal is to determine the temporal boundaries of segments in untrimmed videos that correspond to a given natural language. This task is very related to the action timing localization task. Compared with the action timing localization task, this task is more challenging: 1) Not only does it not have a predefined action list and label, but it may contain complex descriptions. 2) This task requires the model to be able to establish the relationship between language modality and visual modality, to model multimodal features, and to have a deep understanding of natural la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/73G06K9/62G06N3/04G06N3/08
CPCG06F16/73G06N3/08G06N3/045G06F18/25Y02T10/40
Inventor 李冠彬吴捷林倞
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products