Video description text generation method based on multi-modal fusion
A video description and text technology, applied in the field of image processing, can solve problems such as unstable semantic direction, inability to reflect video dynamic content and time domain information, and large divergence of description text content, so as to improve accuracy and robustness Effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0045] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.
[0046] It should be noted that, in the case of no conflict, the embodiments of the present invention and the features in the embodiments can be combined with each other.
[0047] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments, but not as a limitation of the present invention.
[0048] The present invention includes a method for generating video description text based on mult...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 
