Unlock instant, AI-driven research and patent intelligence for your innovation.

Video and subtitle fragment retrieval method based on multi-cross attention

A technology of video clips and attention, applied in the field of video-language understanding, can solve the problems of limiting the retrieval performance of video and subtitle clips, and the inability to effectively realize word-level cross-modal deep relational learning, etc.

Pending Publication Date: 2022-05-03
CHONGQING UNIV
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For this reason, cross-modal encoding learning has become the main focus of current research, but current methods mainly focus on shallow query relational modeling at the sentence level, and cannot effectively achieve finer-grained word-level cross-modal deep relational learning. Limits the performance of video and subtitle segment retrieval

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video and subtitle fragment retrieval method based on multi-cross attention
  • Video and subtitle fragment retrieval method based on multi-cross attention
  • Video and subtitle fragment retrieval method based on multi-cross attention

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0065] In order to further illustrate the various embodiments, the present invention provides accompanying drawings, which are part of the disclosure of the present invention, and are mainly used to illustrate the embodiments, and can be used in conjunction with the relevant descriptions in the specification to explain the operation principles of the embodiments. For these, those of ordinary skill in the art will understand other possible implementations and the advantages of the present invention. Components in the figures are not drawn to scale, and similar component symbols are generally used to represent similar components.

[0066] The present invention will now be further described with reference to the accompanying drawings and specific embodiments, such as Figure 1-4 The method for retrieving video and subtitle segments based on multiple cross-attention is characterized in that it includes the following steps:

[0067] A method for retrieving video and subtitle segmen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a video and subtitle fragment retrieval method based on multi-cross attention. The method comprises the steps that a feature matrix Ev of a given video V is obtained; extracting a feature matrix Eq corresponding to the query statement and a feature matrix Es of the subtitles corresponding to the V; the incidence relation between Eq and Ev and the incidence relation between Eq and Es are calculated, and the sum is obtained and converted into q (v) and q (s); calculating the matching degree between each final fragment and the query statement; and training the model, inputting new video data into the trained model to obtain probabilities Pst and Ped, sorting according to the sizes of probability values in the Pst and Ped, and finding out N fragments with large values of the Pst and Ped as searched fragments. According to the method, not only can the relationship among multiple modal data be processed, but also bidirectional attention guidance can be realized, so that the advanced video-clip retrieval performance is ensured.

Description

technical field [0001] The invention relates to the technical field of video-language understanding, in particular to a video and subtitle segment retrieval method based on multi-cross attention. Background technique [0002] With the popularity of video shooting equipment, the number of videos on the Internet has exploded. For users who are only interested in a specific part of the original video, browsing the entire video is time-consuming and laborious, and video segment retrieval becomes an urgent need. Most of the existing methods focus on the matching of sentence query and video visual information, and less use of text information such as subtitles attached to the video. Segment retrieval based on video and subtitles requires simultaneous understanding of the relationship among the three modalities of query, video, and subtitles, which is a very challenging task. [0003] To accomplish the task of video and subtitle segment retrieval, a straightforward approach is to...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/211G06K9/62G06N3/04G06N3/08G06V20/62
CPCG06F40/211G06N3/08G06N3/044G06N3/045G06F18/2415
Inventor 王洪星傅豪荆铭冯超张小洪
Owner CHONGQING UNIV