Bidirectional text image generation method and system based on semantic consistency
A technology for image generation and consistency, applied in 2D image generation, semantic analysis, image data processing, etc., can solve the problems of semantic inconsistency between generated image and text, neglect of word-level local information concerns, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0040] Such as figure 1As shown, this embodiment provides a method for generating images from two-way text based on semantic consistency. This embodiment uses this method as an example to illustrate the application of the server. It can be understood that this method can also be applied to the terminal, and can also be The application includes terminals, servers and systems, and is realized through the interaction between terminals and servers. The server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or it can provide cloud services, cloud database, cloud computing, cloud function, cloud storage, network server, cloud communication, intermediate Cloud servers for basic cloud computing services such as software services, domain name services, security service CDN, and big data and artificial intelligence platforms. The terminal may be a smart phone, a tablet computer, a laptop computer, a desktop compute...
Embodiment approach
[0058] As one or more implementation manners, the word-level spatial channel attention mechanism includes: a word-level attention mechanism and a spatial channel attention mechanism.
[0059] For example, in order to improve the authenticity of the details of the generated image, a word attention mechanism (such as Figure 5 ), the word attention mechanism has two inputs: word features w and visual features f i . At stage i, the attention mechanism combines word features w and visual features As input, where H i and W i Denote the height and width of the i-th stage image, respectively. The word feature w passes through the perceptual layer P i Transform into the common semantic space, that is, w'=P i w, where Simultaneously combine it with the visual feature f i Multiply to get the attention matrix Through the normalization operation of the softmax function, it is obtained It represents the visual feature f i The correlation between the i-th channel in and the ...
Embodiment 2
[0091] This embodiment provides a bidirectional text image generation system based on semantic consistency.
[0092] A bidirectional text generation image system based on semantic consistency, including:
[0093] An acquisition and encoding module configured to: acquire natural language, input the natural language into a text encoder, and extract word vectors and sentence vectors;
[0094] The text generation image module is configured to: respectively input the sentence vector to the image generation network, and input the word vector to the word-level spatial attention mechanism module, and adjust the image features generated by each stage in the generation network with the attention mechanism The final word vectors are concatenated and used as the input of the next stage of the generation network, after multiple stages of refinement, the image is finally generated;
[0095] Among them, in the text generation image module, natural language is input into the text encoder, an...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com