Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Semantic decoupling-based no-proposal time sequence language positioning method

A positioning method and semantic technology, applied in the field of cross-modal content retrieval, to improve user experience and improve accuracy

Active Publication Date: 2022-01-14
CHENGDU KOALA URAN TECH CO LTD
View PDF9 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the present invention is: in order to solve the technical problems in the existing non-proposed localization mechanism, the present invention provides a non-proposed temporal language localization method based on semantic decoupling, by combining the candidate proposal-based scheme with the traditional non-proposed localization mechanism Advantages while overcoming their respective deficiencies, thereby improving the effect of temporal language localization

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semantic decoupling-based no-proposal time sequence language positioning method
  • Semantic decoupling-based no-proposal time sequence language positioning method
  • Semantic decoupling-based no-proposal time sequence language positioning method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0090] A non-proposed temporal language localization method based on semantic decoupling, comprising the following steps,

[0091] Step 1: Select the training data set;

[0092] Step 2: Load the model parameters of the pre-trained 2D or 3D convolutional neural network, and extract the original video features from the dataset in step 1;

[0093] Step 3: For the data set in step 1, given a natural language query Q, it has Words, first use GLoVE word embedding, express each vocabulary as a word vector with a dimension of 300, and then combine the obtained forward and backward features through two layers of bidirectional gated recurrent units to obtain the query text features;

[0094] Step 4: Decouple the original video features in step 2 according to their latent semantics, obtain three semantic branches, and obtain three feature streams with different semantics;

[0095] Step 5: Perform feature interaction on the three feature streams in step 4 to obtain three different vide...

Embodiment 2

[0102] On the basis of Example 1, further, the step 5 includes in more detail,

[0103] Step 5.1: Perform feature interaction on the three feature streams to obtain three different video context features C S ,

[0104] Video Context Feature C S , obtained by the following formula:

[0105]

[0106]

[0107]

[0108] Step 5.2: Convert word-level text features H q Converted to a cross-modal specialized representation with strong resolution and compared to three different video context features C S Merge and get three cross-modal contexts;

[0109] Said step 5.2 comprises in more detail,

[0110] Step 5.21: Given a word-level textual feature H of a semantic branch q and video context feature C S , quantifying the word-to-per-video context feature C S The different contributions of the original word-level text features are weighted to obtain the updated text modality features;

[0111] Compute the intensity matrix:

[0112] ,in Represents the jth word pair aft...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a semantic decoupling-based no-proposal time sequence language positioning method, relates to the field of cross-modal content retrieval, solves the technical problems existing in an existing no-proposal positioning mechanism, and comprises the following steps: decoupling an original video, decomposing to obtain multi-level visual feature streams, and in each visual feature stream, performing in-modal and cross-modal context modeling on the basis of a candidate proposal method, so that the advantages of the candidate proposal method are kept, a cross-semantic integration mechanism is adopted, multi-modal context features after information interaction are integrated into fine-grained features, and finally, using a proposal-free positioning mechanism to directly solve the starting position and the ending position of a target video clip. The boundary positioning mechanism of the non-proposal positioning method is utilized, and meanwhile, the intermediate features are used for assisting the content understanding of the video, so that the defects of the existing non-proposal positioning mechanism are overcome.

Description

technical field [0001] The invention relates to the field of cross-modal content retrieval in multi-modal video understanding, in particular to a semantic decoupling-based non-proposed temporal language positioning method. Background technique [0002] The development of the Internet in recent years has led to an exponential growth of multimedia data, making many applications a big step forward. Driven by this trend, both academia and industry have raised new demands for multimodal video understanding, which has attracted a large number of researchers in the past decade. Temporal language localization is one of the most challenging tasks in multimodal video understanding. Unlike cross-modal video text retrieval, instead of retrieving short trimmed videos, temporal linguistic localization locates precise start and end times in untrimmed videos with multiple activities based on a given linguistic query describing a target moment. [0003] Based on the multimodal research of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/732G06F16/783G06F16/9032G06N3/04G06N3/08
CPCG06F16/732G06F16/783G06F16/90332G06N3/08G06N3/047G06N3/048G06N3/045G06N3/044
Inventor 沈复民蒋寻徐行申恒涛
Owner CHENGDU KOALA URAN TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products