Cross-modal image multi-style subtitle generation method and system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A multi-style, cross-modal technology, applied in neural learning methods, character and pattern recognition, biological neural network models, etc.

Active Publication Date: 2020-12-15

QILU UNIV OF TECH

View PDF7 Cites 10 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Another important problem is that the existing technology is difficult to take into account the consistency with the objective information of the image and the stylization of the subtitles

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0035] This embodiment provides a cross-modal image multi-style subtitle generation method;

[0036] A cross-modal image multi-style subtitle generation method, including:

[0037] S101: Acquiring an image of subtitles to be generated;

[0038] S102: Input the image of the subtitle to be generated into the pre-trained multi-style subtitle generation model, and output the multi-style subtitle of the image; the pre-trained multi-style subtitle generation model is obtained after training based on an adversarial generation network The training step includes: first training the ability of the multi-style subtitle generation model to express objective image information, and then training the ability of the multi-style subtitle generation model to generate stylized subtitles.

[0039] The cross-modality of this application is from the image mode to the text mode.

[0040] As one or more examples, such as figure 1 As shown, the confrontation generation network includes:

[0041] S...

Embodiment 2

[0107] This embodiment provides a cross-modal image multi-style subtitle generation system;

[0108] A cross-modal image multi-style subtitle generation system, including:

[0109] An acquisition module configured to: acquire an image of subtitles to be generated;

[0110] The generation module is configured to: input the image of the subtitle to be generated into a pre-trained multi-style subtitle generation model, and output the multi-style subtitle of the image; the pre-trained multi-style subtitle generation model is based on confrontation It is obtained after network training; the training step includes: first training the ability of the multi-style subtitle generation model to express objective image information, and then training the ability of the multi-style subtitle generation model to generate stylized subtitles.

[0111] It should be noted here that the above acquisition module and generation module correspond to steps S101 to S102 in the first embodiment, and the...

Embodiment 3

[0115] This embodiment also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein, the processor is connected to the memory, and the one or more computer programs are programmed Stored in the memory, when the electronic device is running, the processor executes one or more computer programs stored in the memory, so that the electronic device executes the method described in Embodiment 1 above.

[0116] It should be understood that in this embodiment, the processor can be a central processing unit CPU, and the processor can also be other general-purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic devices , discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a cross-modal image multi-style subtitle generation method and system. The method comprises the steps that an image of subtitles to be generated is acquired; the image of subtitles to be generated is input into a pre-trained multi-style subtitle generation model, and multi-style subtitles of the image are output; wherein the pre-trained multi-style caption generation modelis obtained after training based on an adversarial generative network; the training step comprises the steps of firstly training the image objective information expression capability of the multi-style subtitle generation model, and then training the stylized subtitle generation capability of the multi-style subtitle generation model.

Description

technical field [0001] The present application relates to the technical field of subtitle generation, in particular to a method and system for generating cross-modal image multi-style subtitles. Background technique [0002] The statements in this section merely mention the background art related to this application, and do not necessarily constitute the prior art. [0003] The goal of traditional image subtitles is to generate subtitles that are highly consistent with the objective information of images. Compared with traditional image subtitles, stylized image subtitles have a wider range of applications. Stylized image subtitles not only require the generated subtitles to be consistent with the objective information of the image, but also have specific style factors. [0004] The existing technologies are mainly divided into two types: single-style subtitle generation methods and multi-style subtitle generation methods. The single-style subtitle generation method is tha...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/00G06K9/46G06N3/04G06N3/08

CPCG06N3/049G06N3/084G06V20/00G06V10/40G06N3/048G06N3/045

Inventor 杨振宇刘侨

Owner QILU UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Cross-modal image multi-style subtitle generation method and system

What is Al technical title? Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document. A multi-style, cross-modal technology, applied in neural learning methods, character and pattern recognition, biological neural network models, etc.

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A multi-style, cross-modal technology, applied in neural learning methods, character and pattern recognition, biological neural network models, etc.

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology