Bidirectional text image generation method and system based on semantic consistency

A technology for image generation and consistency, applied in 2D image generation, semantic analysis, image data processing, etc., can solve the problems of semantic inconsistency between generated image and text, neglect of word-level local information concerns, etc.

Pending Publication Date: 2021-09-07
SHANDONG NORMAL UNIV
View PDF4 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In addition, the attention mechanism has been widely used in the field of text generation images, but the traditional attention mechanism only pays attention to the global vector of the whole sentence, but ignores the attention to the word-level local information
[0005] From the above analysis, it can be concluded that the focus of current text-generated images is to ensure the visual authenticity of the generated images, but the semantic inconsistency between the generated images and the text has not been resolved.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Bidirectional text image generation method and system based on semantic consistency
  • Bidirectional text image generation method and system based on semantic consistency
  • Bidirectional text image generation method and system based on semantic consistency

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] Such as figure 1As shown, this embodiment provides a method for generating images from two-way text based on semantic consistency. This embodiment uses this method as an example to illustrate the application of the server. It can be understood that this method can also be applied to the terminal, and can also be The application includes terminals, servers and systems, and is realized through the interaction between terminals and servers. The server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or it can provide cloud services, cloud database, cloud computing, cloud function, cloud storage, network server, cloud communication, intermediate Cloud servers for basic cloud computing services such as software services, domain name services, security service CDN, and big data and artificial intelligence platforms. The terminal may be a smart phone, a tablet computer, a laptop computer, a desktop compute...

Embodiment approach

[0058] As one or more implementation manners, the word-level spatial channel attention mechanism includes: a word-level attention mechanism and a spatial channel attention mechanism.

[0059] For example, in order to improve the authenticity of the details of the generated image, a word attention mechanism (such as Figure 5 ), the word attention mechanism has two inputs: word features w and visual features f i . At stage i, the attention mechanism combines word features w and visual features As input, where H i and W i Denote the height and width of the i-th stage image, respectively. The word feature w passes through the perceptual layer P i Transform into the common semantic space, that is, w'=P i w, where Simultaneously combine it with the visual feature f i Multiply to get the attention matrix Through the normalization operation of the softmax function, it is obtained It represents the visual feature f i The correlation between the i-th channel in and the ...

Embodiment 2

[0091] This embodiment provides a bidirectional text image generation system based on semantic consistency.

[0092] A bidirectional text generation image system based on semantic consistency, including:

[0093] An acquisition and encoding module configured to: acquire natural language, input the natural language into a text encoder, and extract word vectors and sentence vectors;

[0094] The text generation image module is configured to: respectively input the sentence vector to the image generation network, and input the word vector to the word-level spatial attention mechanism module, and adjust the image features generated by each stage in the generation network with the attention mechanism The final word vectors are concatenated and used as the input of the next stage of the generation network, after multiple stages of refinement, the image is finally generated;

[0095] Among them, in the text generation image module, natural language is input into the text encoder, an...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a bidirectional text image generation method and system based on semantic consistency, and the method comprises the steps: obtaining a natural language, inputting the natural language into a text encoder, and extracting a word vector and a sentence vector; inputting sentence vectors into an image generation network, inputting word vectors into a word-level space attention mechanism module, connecting image features generated in each stage in the generation network with the word vectors adjusted by an attention mechanism in series to serve as input of the next stage of the generation network, and performing multi-stage refinement, and finally generating an image; inputting the generated image into an image encoder, and extracting image features; inputting the image features into a long short-term memory network with a sentinel mechanism, and outputting a re-described text; and after the image and the re-description text are optimized by adopting two confrontation loss functions, introducing semantic text reconstruction loss based on cross entropy to further optimize the image until the re-description text of the image is consistent with a natural language, and outputting the image.

Description

technical field [0001] The invention belongs to the technical field of cross-modal image generation, and in particular relates to a method and system for generating images based on semantic consistency of two-way text. Background technique [0002] The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art. [0003] Text-based image generation is to generate high-resolution and visually realistic images that match a given natural language description. It has broad application prospects in the fields of virtual reality, entertainment, electronic sports games, and computer-aided design. In recent years, generative adversarial networks (GAN) have made great progress in generating realistic images. Using the framework of generative adversarial networks, text generation images have proposed many methods to generate high-quality images. Significant progress has been made in text-to-image genera...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/205G06F40/126G06F40/30G06K9/46G06N3/04G06T11/00
CPCG06F40/205G06F40/126G06F40/30G06T11/001G06N3/044G06N3/045
Inventor 刘丽崔怀磊王泽康马跃张化祥
Owner SHANDONG NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products