Supercharge Your Innovation With Domain-Expert AI Agents!

Video text similarity measurement method and system

A similarity measurement and video technology, applied in the similarity measurement method and system field of video text, can solve the problems of increasing the burden of computer hardware, strengthening compactness, etc., and achieve the effect of narrowing the semantic gap, relieving pressure, and accelerating processing

Pending Publication Date: 2022-02-25
SHANDONG NORMAL UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In addition, the closeness between different feature points in the same mode needs to be strengthened
Also, there is no better semantic alignment between different modal features in the video
Processing the multimodal nature of video increases the burden on computer hardware

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video text similarity measurement method and system
  • Video text similarity measurement method and system
  • Video text similarity measurement method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] See attached figure 1 ,2 As shown, this embodiment discloses a similarity measurement method for video text, including:

[0056] Video feature processing network:

[0057] First, in this embodiment, a pre-trained feature extractor is used to extract multimodal features of a video to obtain an initial video feature representation. Second, the initial features are fed into a coarse-fine-grained parallel attention network to obtain intermediate representation features. Again, the intermediate feature representation is input to the feature fusion module to obtain the final video multimodal feature representation.

[0058] Text feature processing network:

[0059] The input text is processed through the pre-trained BERT model, and finally the text feature representation is obtained. Then the text feature representation vectors corresponding to different modal features of the video are generated through the gate embedding module.

[0060] Video-text cross-modal retrieval...

Embodiment 2

[0120] The purpose of this embodiment is to provide a computing device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor implements the steps of the above method when executing the program.

Embodiment 3

[0122] The purpose of this embodiment is to provide a computer-readable storage medium.

[0123] A computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the steps of the above-mentioned method are executed.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a video text similarity measurement method and system. The method comprises the steps of extracting the multi-modal features of a video through a pre-trained feature extractor, and obtaining an initial video feature representation, inputting the initial features into a coarse-fine granularity parallel attention network to obtain intermediate representation features, inputting the intermediate feature representation into a feature fusion network to obtain a final video multi-modal feature representation, performing text feature extraction on the input text by using a pre-trained model to obtain text feature representation, generating text feature representation vectors corresponding to different modal features of the video through gate embedding, and performing similarity measurement on the video multi-modal feature representation and the text feature representation. The coarse-granularity attention network, the fine-granularity attention network and the feature fusion module are combined together, and the multi-head attention network is fused in the fine-granularity attention network for relieving the pressure of a computer memory, so that the processing of the multi-modal features of the video is accelerated.

Description

technical field [0001] The invention belongs to the technical field of video-text cross-modal retrieval, and in particular relates to a video text similarity measurement method and system. Background technique [0002] The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art. [0003] In recent years, due to the rapid development of video analysis and natural language processing, the video-text cross-modal research has also been pushed to the climax of research. However, there are large differences in semantics between video and text modalities, and narrowing this difference between different modalities is still a challenging task. Video-to-text cross-modal retrieval aims to retrieve a target video (text) given a text (video) query. And sort the retrieved videos or texts according to the retrieval similarity score. Compared with image-text cross-modal retrieval, video-text cross-modal...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06V10/40G06V10/74G06V10/774G06V10/80G06V10/82G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06N3/045G06F18/22G06F18/253G06F18/214
Inventor 张化祥金明刘丽朱磊孙建德聂礼强金圣开
Owner SHANDONG NORMAL UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More