A video positioning method and device, model training method and device

A video positioning and video technology, applied in video data query, video data retrieval, metadata video data retrieval, etc., can solve the problems of high computing requirements, time-consuming and labor-intensive, human-subjective deviation of labeling information, etc., and achieve good universality Effect

Active Publication Date: 2022-02-11
山东力聚机器人科技股份有限公司
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Related natural language-based video positioning methods are mainly based on strong supervision methods, involving multiple independent networks, which require high computing requirements, and require artificially providing a large number of annotation information of video clips, and the annotation information has human subjective bias. and time consuming

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A video positioning method and device, model training method and device
  • A video positioning method and device, model training method and device
  • A video positioning method and device, model training method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0076] This embodiment proposes a video positioning method. figure 1 It is a flow chart of a video positioning method provided by the embodiment of the present invention. like figure 1 As shown, the method includes S10-S70.

[0077] S10: Use a multi-scale time sliding window to divide the segment to obtain a plurality of video clips, where there is an overlap between the setting ratio between adjacent video clips.

[0078] Alternatively, a multi-scale time sliding window is used in the division of the video clip, and the length of the time sliding window is [64, 128, 256, 512] frame; and the adjacent video clip maintains an overlap of 80%. Take the sliding window with a size of 64 frames as an example. The first video clip starts from the first frame to the 64th frame, and the second video clip starts from 12.8 frames to 75.8 frames to end, in this type. This ensures that the overlap between adjacent video clips reaches 80%. It should be noted that the frame is taken upward when t...

Embodiment 2

[0137] This embodiment provides a model training method for training the video positioning model constituting the video positioning method described in the first example. figure 2 It is a flow chart of a model training method according to an embodiment of the present invention. like figure 2 As shown, the method includes S01-S03.

[0138] S01: Build a training data set, the training data set includes multiple video-statement pairs; a video that makes a video and query statement, a video of a video, a video and query statement, a video that does not match? - The statement is labeled as a claim.

[0139] Alternatively, for a video to be queried, if the supplied natural language query statement does not match the video, the corresponding video-statement is considered to be a sample; if the supplied natural language query statement is matched to the video The corresponding video-statement is considered to be a sample.

[0140] Alternatively, during the training, the normal samples and...

Embodiment 3

[0155] Figure 4 A structural diagram of a video positioning device provided by the embodiment of the present invention. The apparatus is used to implement the video positioning method provided by the embodiment, including: video division module 410, feature extraction module 420, modal focus flow acquisition module 430, first feature update module 440, an modal focus flow acquisition Module 450, the second feature update module 460 and the similarity calculation and positioning module 470.

[0156] The video division module 410 is used to use a multi-scale time sliding window to divide the location of the video to obtain a plurality of video clips, where there is an overlap between the adjacent video clips.

[0157] Feature extraction module 420 is used to characterize each video clip and a query statement, and will result in the original feature of each video clip. R Decomposition is key R K , Query characteristics R Q And value characteristics R V , The original feature of each...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a video positioning method and device, and a model training method and device. The video positioning method includes: using a multi-scale time sliding window to divide the video to be positioned into segments; performing feature extraction on each video segment and each word of the query statement; dynamically acquiring the link between the video mode and the text mode Attention flow; based on the attention flow between the video modality and the text modality, the features of each video clip and the features of each word are updated; dynamically obtain the attention flow in the video modality and the text modality Attention flow in the state, to the feature of each video clip and each word feature; Calculate the similarity score of each video clip and the query sentence; Select the highest video clip of the similarity score as the result of video positioning . The present invention does not depend on time tags, and can dig deeper into interaction information between modalities, and has better universality.

Description

Technical field [0001] Embodiments of the present invention relate to the field of video positioning, and more particularly to a video positioning method and device, a model training method, and a device. Background technique [0002] The location of video clips based on natural language is one of the basic issues of computer vision. The goal of this task is to give a text description of the natural language, from the video to find the start time of the corresponding segment. Different from the video retrieval task to use pictures or short video to search, this task introduces the natural language, uses the natural language as the query index, which makes the retrieval task more convenient and accurate. Based on the location of natural language video clips, there is an important significance for security detection work in many areas such as fire, criminal investigation, military and transportation. Using this technology can achieve automated monitoring of the target fragment of t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/73G06F16/732G06F16/78
CPCG06F16/73G06F16/7343G06F16/7867
Inventor 房体品滕隽雅卢宪凯杨光远
Owner 山东力聚机器人科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products