Video and subtitle fragment retrieval method based on multi-cross attention
A technology of video clips and attention, applied in the field of video-language understanding, can solve the problems of limiting the retrieval performance of video and subtitle clips, and the inability to effectively realize word-level cross-modal deep relational learning, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0065] In order to further illustrate the various embodiments, the present invention provides accompanying drawings, which are part of the disclosure of the present invention, and are mainly used to illustrate the embodiments, and can be used in conjunction with the relevant descriptions in the specification to explain the operation principles of the embodiments. For these, those of ordinary skill in the art will understand other possible implementations and the advantages of the present invention. Components in the figures are not drawn to scale, and similar component symbols are generally used to represent similar components.
[0066] The present invention will now be further described with reference to the accompanying drawings and specific embodiments, such as Figure 1-4 The method for retrieving video and subtitle segments based on multiple cross-attention is characterized in that it includes the following steps:
[0067] A method for retrieving video and subtitle segmen...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


