Method for performing multi-mode video question answering by using frame-subtitle self-supervision
A multi-modal, subtitle technology, applied in the field of video question answering, can solve the problems of expensive time tags, ignoring the corresponding relationship between frames and subtitles, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0071] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. The following description of at least one exemplary embodiment is merely illustrative in nature and in no way taken as limiting the invention, its application or uses. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.
[0072] The embodiment of the present invention proposes a method for multimodal video question answering using frame-subtitle self-supervision, refer to figure 1 shown, including the following steps:
[0073] S1: For the input video, question and answer text, and subtitle text, extract the video fra...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com