Semantic decoupling-based no-proposal time sequence language positioning method

A positioning method and semantic technology, applied in the field of cross-modal content retrieval, to improve user experience and improve accuracy

Active Publication Date: 2022-01-14
CHENGDU KOALA URAN TECH CO LTD
View PDF9 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the present invention is: in order to solve the technical problems in the existing non-proposed localization mechanism, the present invention provides a non-proposed temporal language localization method based on semanti

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semantic decoupling-based no-proposal time sequence language positioning method
  • Semantic decoupling-based no-proposal time sequence language positioning method
  • Semantic decoupling-based no-proposal time sequence language positioning method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0090] A non-proposed temporal language localization method based on semantic decoupling, comprising the following steps,

[0091] Step 1: Select the training data set;

[0092] Step 2: Load the model parameters of the pre-trained 2D or 3D convolutional neural network, and extract the original video features from the dataset in step 1;

[0093] Step 3: For the data set in step 1, given a natural language query Q, it has Words, first use GLoVE word embedding, express each vocabulary as a word vector with a dimension of 300, and then combine the obtained forward and backward features through two layers of bidirectional gated recurrent units to obtain the query text features;

[0094] Step 4: Decouple the original video features in step 2 according to their latent semantics, obtain three semantic branches, and obtain three feature streams with different semantics;

[0095] Step 5: Perform feature interaction on the three feature streams in step 4 to obtain three different vide...

Embodiment 2

[0102] On the basis of Example 1, further, the step 5 includes in more detail,

[0103] Step 5.1: Perform feature interaction on the three feature streams to obtain three different video context features C S ,

[0104] Video Context Feature C S , obtained by the following formula:

[0105]

[0106]

[0107]

[0108] Step 5.2: Convert word-level text features H q Converted to a cross-modal specialized representation with strong resolution and compared to three different video context features C S Merge and get three cross-modal contexts;

[0109] Said step 5.2 comprises in more detail,

[0110] Step 5.21: Given a word-level textual feature H of a semantic branch q and video context feature C S , quantifying the word-to-per-video context feature C S The different contributions of the original word-level text features are weighted to obtain the updated text modality features;

[0111] Compute the intensity matrix:

[0112] ,in Represents the jth word pair aft...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a semantic decoupling-based no-proposal time sequence language positioning method, relates to the field of cross-modal content retrieval, solves the technical problems existing in an existing no-proposal positioning mechanism, and comprises the following steps: decoupling an original video, decomposing to obtain multi-level visual feature streams, and in each visual feature stream, performing in-modal and cross-modal context modeling on the basis of a candidate proposal method, so that the advantages of the candidate proposal method are kept, a cross-semantic integration mechanism is adopted, multi-modal context features after information interaction are integrated into fine-grained features, and finally, using a proposal-free positioning mechanism to directly solve the starting position and the ending position of a target video clip. The boundary positioning mechanism of the non-proposal positioning method is utilized, and meanwhile, the intermediate features are used for assisting the content understanding of the video, so that the defects of the existing non-proposal positioning mechanism are overcome.

Description

technical field [0001] The invention relates to the field of cross-modal content retrieval in multi-modal video understanding, in particular to a semantic decoupling-based non-proposed temporal language positioning method. Background technique [0002] The development of the Internet in recent years has led to an exponential growth of multimedia data, making many applications a big step forward. Driven by this trend, both academia and industry have raised new demands for multimodal video understanding, which has attracted a large number of researchers in the past decade. Temporal language localization is one of the most challenging tasks in multimodal video understanding. Unlike cross-modal video text retrieval, instead of retrieving short trimmed videos, temporal linguistic localization locates precise start and end times in untrimmed videos with multiple activities based on a given linguistic query describing a target moment. [0003] Based on the multimodal research of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/732G06F16/783G06F16/9032G06N3/04G06N3/08
CPCG06F16/732G06F16/783G06F16/90332G06N3/08G06N3/047G06N3/048G06N3/045G06N3/044
Inventor 沈复民蒋寻徐行申恒涛
Owner CHENGDU KOALA URAN TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products