Video text similarity measurement method and system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A similarity measurement and video technology, applied in the similarity measurement method and system field of video text, can solve the problems of increasing the burden of computer hardware, strengthening compactness, etc., and achieve the effect of narrowing the semantic gap, relieving pressure, and accelerating processing

Pending Publication Date: 2022-02-25

SHANDONG NORMAL UNIV

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In addition, the closeness between different feature points in the same mode needs to be strengthened

Also, there is no better semantic alignment between different modal features in the video

Processing the multimodal nature of video increases the burden on computer hardware

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0055] See attached figure 1 ,2 As shown, this embodiment discloses a similarity measurement method for video text, including:

[0056] Video feature processing network:

[0057] First, in this embodiment, a pre-trained feature extractor is used to extract multimodal features of a video to obtain an initial video feature representation. Second, the initial features are fed into a coarse-fine-grained parallel attention network to obtain intermediate representation features. Again, the intermediate feature representation is input to the feature fusion module to obtain the final video multimodal feature representation.

[0058] Text feature processing network:

[0059] The input text is processed through the pre-trained BERT model, and finally the text feature representation is obtained. Then the text feature representation vectors corresponding to different modal features of the video are generated through the gate embedding module.

[0060] Video-text cross-modal retrieval...

Embodiment 2

[0120] The purpose of this embodiment is to provide a computing device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor implements the steps of the above method when executing the program.

Embodiment 3

[0122] The purpose of this embodiment is to provide a computer-readable storage medium.

[0123] A computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the steps of the above-mentioned method are executed.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a video text similarity measurement method and system. The method comprises the steps of extracting the multi-modal features of a video through a pre-trained feature extractor, and obtaining an initial video feature representation, inputting the initial features into a coarse-fine granularity parallel attention network to obtain intermediate representation features, inputting the intermediate feature representation into a feature fusion network to obtain a final video multi-modal feature representation, performing text feature extraction on the input text by using a pre-trained model to obtain text feature representation, generating text feature representation vectors corresponding to different modal features of the video through gate embedding, and performing similarity measurement on the video multi-modal feature representation and the text feature representation. The coarse-granularity attention network, the fine-granularity attention network and the feature fusion module are combined together, and the multi-head attention network is fused in the fine-granularity attention network for relieving the pressure of a computer memory, so that the processing of the multi-modal features of the video is accelerated.

Description

technical field [0001] The invention belongs to the technical field of video-text cross-modal retrieval, and in particular relates to a video text similarity measurement method and system. Background technique [0002] The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art. [0003] In recent years, due to the rapid development of video analysis and natural language processing, the video-text cross-modal research has also been pushed to the climax of research. However, there are large differences in semantics between video and text modalities, and narrowing this difference between different modalities is still a challenging task. Video-to-text cross-modal retrieval aims to retrieve a target video (text) given a text (video) query. And sort the retrieved videos or texts according to the retrieval similarity score. Compared with image-text cross-modal retrieval, video-text cross-modal...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06V10/40G06V10/74G06V10/774G06V10/80G06V10/82G06K9/62G06N3/04G06N3/08

CPCG06N3/08G06N3/045G06F18/22G06F18/253G06F18/214

Inventor 张化祥金明刘丽朱磊孙建德聂礼强金圣开

Owner SHANDONG NORMAL UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Video text similarity measurement method and system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology