Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Cross-modal image multi-style subtitle generation method and system

A multi-style, cross-modal technology, applied in neural learning methods, character and pattern recognition, biological neural network models, etc.

Active Publication Date: 2020-12-15
QILU UNIV OF TECH
View PDF7 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Another important problem is that the existing technology is difficult to take into account the consistency with the objective information of the image and the stylization of the subtitles

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-modal image multi-style subtitle generation method and system
  • Cross-modal image multi-style subtitle generation method and system
  • Cross-modal image multi-style subtitle generation method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] This embodiment provides a cross-modal image multi-style subtitle generation method;

[0036] A cross-modal image multi-style subtitle generation method, including:

[0037] S101: Acquiring an image of subtitles to be generated;

[0038] S102: Input the image of the subtitle to be generated into the pre-trained multi-style subtitle generation model, and output the multi-style subtitle of the image; the pre-trained multi-style subtitle generation model is obtained after training based on an adversarial generation network The training step includes: first training the ability of the multi-style subtitle generation model to express objective image information, and then training the ability of the multi-style subtitle generation model to generate stylized subtitles.

[0039] The cross-modality of this application is from the image mode to the text mode.

[0040] As one or more examples, such as figure 1 As shown, the confrontation generation network includes:

[0041] S...

Embodiment 2

[0107] This embodiment provides a cross-modal image multi-style subtitle generation system;

[0108] A cross-modal image multi-style subtitle generation system, including:

[0109] An acquisition module configured to: acquire an image of subtitles to be generated;

[0110] The generation module is configured to: input the image of the subtitle to be generated into a pre-trained multi-style subtitle generation model, and output the multi-style subtitle of the image; the pre-trained multi-style subtitle generation model is based on confrontation It is obtained after network training; the training step includes: first training the ability of the multi-style subtitle generation model to express objective image information, and then training the ability of the multi-style subtitle generation model to generate stylized subtitles.

[0111] It should be noted here that the above acquisition module and generation module correspond to steps S101 to S102 in the first embodiment, and the...

Embodiment 3

[0115] This embodiment also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein, the processor is connected to the memory, and the one or more computer programs are programmed Stored in the memory, when the electronic device is running, the processor executes one or more computer programs stored in the memory, so that the electronic device executes the method described in Embodiment 1 above.

[0116] It should be understood that in this embodiment, the processor can be a central processing unit CPU, and the processor can also be other general-purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic devices , discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a cross-modal image multi-style subtitle generation method and system. The method comprises the steps that an image of subtitles to be generated is acquired; the image of subtitles to be generated is input into a pre-trained multi-style subtitle generation model, and multi-style subtitles of the image are output; wherein the pre-trained multi-style caption generation modelis obtained after training based on an adversarial generative network; the training step comprises the steps of firstly training the image objective information expression capability of the multi-style subtitle generation model, and then training the stylized subtitle generation capability of the multi-style subtitle generation model.

Description

technical field [0001] The present application relates to the technical field of subtitle generation, in particular to a method and system for generating cross-modal image multi-style subtitles. Background technique [0002] The statements in this section merely mention the background art related to this application, and do not necessarily constitute the prior art. [0003] The goal of traditional image subtitles is to generate subtitles that are highly consistent with the objective information of images. Compared with traditional image subtitles, stylized image subtitles have a wider range of applications. Stylized image subtitles not only require the generated subtitles to be consistent with the objective information of the image, but also have specific style factors. [0004] The existing technologies are mainly divided into two types: single-style subtitle generation methods and multi-style subtitle generation methods. The single-style subtitle generation method is tha...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06K9/46G06N3/04G06N3/08
CPCG06N3/049G06N3/084G06V20/00G06V10/40G06N3/048G06N3/045
Inventor 杨振宇刘侨
Owner QILU UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products