Bidirectional text image generation method and system based on semantic consistency

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for image generation and consistency, applied in 2D image generation, semantic analysis, image data processing, etc., can solve the problems of semantic inconsistency between generated image and text, neglect of word-level local information concerns, etc.

Pending Publication Date: 2021-09-07

SHANDONG NORMAL UNIV

View PDF4 Cites 9 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In addition, the attention mechanism has been widely used in the field of text generation images, but the traditional attention mechanism only pays attention to the global vector of the whole sentence, but ignores the attention to the word-level local information

[0005] From the above analysis, it can be concluded that the focus of current text-generated images is to ensure the visual authenticity of the generated images, but the semantic inconsistency between the generated images and the text has not been resolved.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0040] Such as figure 1As shown, this embodiment provides a method for generating images from two-way text based on semantic consistency. This embodiment uses this method as an example to illustrate the application of the server. It can be understood that this method can also be applied to the terminal, and can also be The application includes terminals, servers and systems, and is realized through the interaction between terminals and servers. The server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or it can provide cloud services, cloud database, cloud computing, cloud function, cloud storage, network server, cloud communication, intermediate Cloud servers for basic cloud computing services such as software services, domain name services, security service CDN, and big data and artificial intelligence platforms. The terminal may be a smart phone, a tablet computer, a laptop computer, a desktop compute...

Embodiment approach

[0058] As one or more implementation manners, the word-level spatial channel attention mechanism includes: a word-level attention mechanism and a spatial channel attention mechanism.

[0059] For example, in order to improve the authenticity of the details of the generated image, a word attention mechanism (such as Figure 5 ), the word attention mechanism has two inputs: word features w and visual features f i . At stage i, the attention mechanism combines word features w and visual features As input, where H i and W i Denote the height and width of the i-th stage image, respectively. The word feature w passes through the perceptual layer P i Transform into the common semantic space, that is, w'=P i w, where Simultaneously combine it with the visual feature f i Multiply to get the attention matrix Through the normalization operation of the softmax function, it is obtained It represents the visual feature f i The correlation between the i-th channel in and the ...

Embodiment 2

[0091] This embodiment provides a bidirectional text image generation system based on semantic consistency.

[0092] A bidirectional text generation image system based on semantic consistency, including:

[0093] An acquisition and encoding module configured to: acquire natural language, input the natural language into a text encoder, and extract word vectors and sentence vectors;

[0094] The text generation image module is configured to: respectively input the sentence vector to the image generation network, and input the word vector to the word-level spatial attention mechanism module, and adjust the image features generated by each stage in the generation network with the attention mechanism The final word vectors are concatenated and used as the input of the next stage of the generation network, after multiple stages of refinement, the image is finally generated;

[0095] Among them, in the text generation image module, natural language is input into the text encoder, an...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a bidirectional text image generation method and system based on semantic consistency, and the method comprises the steps: obtaining a natural language, inputting the natural language into a text encoder, and extracting a word vector and a sentence vector; inputting sentence vectors into an image generation network, inputting word vectors into a word-level space attention mechanism module, connecting image features generated in each stage in the generation network with the word vectors adjusted by an attention mechanism in series to serve as input of the next stage of the generation network, and performing multi-stage refinement, and finally generating an image; inputting the generated image into an image encoder, and extracting image features; inputting the image features into a long short-term memory network with a sentinel mechanism, and outputting a re-described text; and after the image and the re-description text are optimized by adopting two confrontation loss functions, introducing semantic text reconstruction loss based on cross entropy to further optimize the image until the re-description text of the image is consistent with a natural language, and outputting the image.

Description

technical field [0001] The invention belongs to the technical field of cross-modal image generation, and in particular relates to a method and system for generating images based on semantic consistency of two-way text. Background technique [0002] The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art. [0003] Text-based image generation is to generate high-resolution and visually realistic images that match a given natural language description. It has broad application prospects in the fields of virtual reality, entertainment, electronic sports games, and computer-aided design. In recent years, generative adversarial networks (GAN) have made great progress in generating realistic images. Using the framework of generative adversarial networks, text generation images have proposed many methods to generate high-quality images. Significant progress has been made in text-to-image genera...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F40/205G06F40/126G06F40/30G06K9/46G06N3/04G06T11/00

CPCG06F40/205G06F40/126G06F40/30G06T11/001G06N3/044G06N3/045

Inventor 刘丽崔怀磊王泽康马跃张化祥

Owner SHANDONG NORMAL UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Bidirectional text image generation method and system based on semantic consistency

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment approach

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology