Image generation method and apparatus, electronic device, and storage medium
By combining image generation models and generative adversarial networks, the problem of poor fusion between portrait and map images in portrait map art creation is solved, achieving efficient generation and high-quality output of portrait map images.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INST OF AUTOMATION CHINESE ACAD OF SCI
- Filing Date
- 2022-06-21
- Publication Date
- 2026-06-16
AI Technical Summary
Existing style transfer methods based on classical deep learning are not effective in portrait map art creation, as they are difficult to effectively integrate portrait images and map images, resulting in low-quality generated portrait map images.
An image generation model is used to extract features from portrait and map images, fuse them based on feature correlation, and perform adversarial training through a generative adversarial network to generate portrait map images.
It improves the efficiency and accuracy of portrait map image generation, making the generated portrait map images closer to reality and enhancing the naturalness and realism of the images.
Smart Images

Figure CN115222634B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image processing technology, and in particular to an image generation method, apparatus, electronic device, and storage medium. Background Technology
[0002] In recent years, people have paid close attention to artistic creation and art analysis. As a result, algorithms related to artistic creation and art analysis have become increasingly important. In particular, with the development of convolutional neural networks and generative adversarial networks, many methods have emerged, ranging from image-to-image translation to image style transfer. These methods aim to render images, that is, to bridge the appearance gaps in content, color, and texture of images in order to render a given image with a new appearance.
[0003] Currently, image synthesis methods are mostly used in image processing. These methods can coordinate multiple objects or content from different sources in a single target scene. However, they are not suitable for portrait map art creation because portrait map art uses a map as a canvas and gradually modifies the map using traditional materials to extract facial features from aspects such as roads, rivers, and mountain outlines, thereby drawing a portrait image on the map. In this process, it is highly likely that two completely different styles of input will appear. Furthermore, further experimental results also show that style transfer methods based on deep learning have little effect on portrait map art creation. Summary of the Invention
[0004] This invention provides an image generation method, apparatus, electronic device, and storage medium to address the shortcomings of existing style transfer methods based on classical deep learning in achieving poor results in portrait map art creation.
[0005] This invention provides an image generation method, comprising:
[0006] Determine the portrait image and map image to be generated;
[0007] Based on the image generation model, features are extracted from the portrait image and the map image respectively. Based on the correlation between the portrait features and map features obtained from the feature extraction, the portrait features and map features are fused, and an image is generated based on the fused features to obtain a portrait map image.
[0008] The image generation model is obtained by adversarial training based on sample portrait images, sample map images, and sample portrait map images, and a joint discriminant model. The discriminant model is used to distinguish between the predicted portrait map image and the sample portrait map image. The predicted portrait map image is determined by the image generation model based on the sample portrait image and the sample map image.
[0009] According to an image generation method provided by the present invention, the method involves extracting features from a portrait image and a map image based on an image generation model, and fusing the portrait features and map features based on the correlation between the extracted features. The method includes:
[0010] Based on the feature extraction layer in the image generation model, feature extraction is performed on the portrait image and the map image respectively to obtain portrait features and map features;
[0011] Based on the feature fusion layer in the image generation model, as well as the portrait features and the map features, portrait attention features and map attention features are determined;
[0012] Based on the feature fusion layer in the image generation model, the portrait attention features and the map attention features are fused.
[0013] According to an image generation method provided by the present invention, determining portrait attention features and map attention features based on the feature fusion layer in the image generation model, as well as the portrait features and the map features, includes:
[0014] Based on the feature fusion layer in the image generation model, the weights of the portrait features and the map features are determined.
[0015] Based on the feature fusion layer in the image generation model, the weights of the portrait features, and the weights of the map features, portrait attention features and map attention features are determined.
[0016] According to an image generation method provided by the present invention, the steps for determining the image generation model and the discrimination model include:
[0017] Construct an initial image generation model and an initial discrimination model;
[0018] The sample portrait image and the sample map image are input into the initial image generation model to obtain the predicted portrait map image output by the initial image generation model;
[0019] The predicted portrait map image and the sample portrait map image are input into the initial discrimination model to obtain the discrimination result of the predicted portrait map image and the discrimination result of the sample portrait map image output by the initial discrimination model.
[0020] Based on the predicted portrait map image, the sample portrait map image, the discrimination result of the predicted portrait map image, and the discrimination result of the sample portrait map image, the parameters of the initial image generation model and the initial discrimination model are updated to obtain the image generation model and the discrimination model.
[0021] According to an image generation method provided by the present invention, the step of determining the image generation model and the discrimination model further includes:
[0022] Construct an initial image decoupling model;
[0023] The predicted portrait map image is input into the initial image decoupling model to obtain the predicted portrait image and predicted map image output by the initial image decoupling model;
[0024] The predicted portrait image, the predicted map image, the sample portrait image, and the sample map image are input into the initial discrimination model to obtain the discrimination results of the predicted portrait image, the predicted map image, the sample portrait image, and the sample map image output by the initial discrimination model.
[0025] Based on the predicted image, the sample image, the discrimination result of the predicted image, and the discrimination result of the sample image, the parameters of the initial image generation model, the initial discrimination model, and the initial image decoupling model are updated to obtain the image generation model, the discrimination model, and the image decoupling model.
[0026] The predicted image includes the predicted portrait map image, the predicted portrait image and the predicted map image, and the sample image includes the sample portrait map image, the sample portrait image and the sample map image.
[0027] According to an image generation method provided by the present invention, the method involves updating the parameters of an initial image generation model, an initial discrimination model, and an initial image decoupling model based on a predicted image, a sample image, a discrimination result of the predicted image, and a discrimination result of the sample image, to obtain the image generation model, the discrimination model, and the image decoupling model, comprising:
[0028] Based on the predicted portrait map image and the sample portrait map image, determine the image generation loss of the initial image generation model;
[0029] Based on the predicted portrait image, the predicted map image, the sample portrait image, and the sample map image, determine the image decoupling loss of the initial image decoupling model;
[0030] Based on the discrimination results of the predicted image and the discrimination results of the sample image, the discrimination loss of the initial discrimination model is determined;
[0031] Based on the image generation loss, the image decoupling loss, and the discrimination loss, the parameters of the initial image generation model, the initial discrimination model, and the initial image decoupling model are updated to obtain the image generation model, the discrimination model, and the image decoupling model.
[0032] According to an image generation method provided by the present invention, the method involves updating the parameters of an initial image generation model, an initial discrimination model, and an initial image decoupling model based on the image generation loss, the image decoupling loss, and the discrimination loss to obtain the image generation model, the discrimination model, and the image decoupling model, including:
[0033] Based on the image generation loss, the image decoupling loss, the discrimination loss, and the cycle consistency loss constraint, the parameters of the initial image generation model, the initial discrimination model, and the initial image decoupling model are updated to obtain the image generation model, the discrimination model, and the image decoupling model.
[0034] The cycle consistency loss constraint is used to constrain the difference between the image features corresponding to the input and output images of the initial image generation model and the initial image decoupling model, respectively.
[0035] The present invention also provides an image generation apparatus, comprising:
[0036] An image determination unit is used to determine the portrait image and map image to be generated;
[0037] The image generation unit is used to extract features from the portrait image and the map image respectively based on the image generation model, fuse the portrait features and the map features based on the correlation between the extracted portrait features and map features, and generate an image based on the fused features to obtain a portrait map image.
[0038] The image generation model is obtained by adversarial training based on sample portrait images, sample map images, and sample portrait map images, and a joint discriminant model. The discriminant model is used to distinguish between the predicted portrait map image and the sample portrait map image. The predicted portrait map image is determined by the image generation model based on the sample portrait image and the sample map image.
[0039] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the image generation method as described above.
[0040] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the image generation method as described above.
[0041] The image generation method, apparatus, electronic device, and storage medium provided by this invention, based on the correlation between portrait features and map features, fuses these two features to ensure that the fused portrait map features possess both facial contour information of the portrait image and mountain and river orientation information of the map image. Furthermore, it adds subtle features between the portrait and map images, thereby increasing the accuracy of the image generation process based on these portrait map features. This overcomes the shortcomings of traditional deep learning-based style transfer methods in portrait map artistic creation, achieving a dual improvement in the generation efficiency and accuracy of portrait map images. In addition, the introduction of generative and adversarial mechanisms to train the image generation model ensures the portrait map image generation capability of the trained model. Moreover, the image generation model not only generates portrait map images but also makes the output portrait map images closer to real portrait map images. While achieving efficient generation of portrait map images, it also ensures the naturalness and realism of the portrait map images, greatly improving the image quality of the portrait map images. Attached Figure Description
[0042] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0043] Figure 1 This is a flowchart illustrating the image generation method provided by the present invention;
[0044] Figure 2 This is a flowchart of the image generation process and image decoupling process provided by the present invention;
[0045] Figure 3 This is a schematic diagram of the image generation model provided by the present invention;
[0046] Figure 4 This is a schematic diagram of the image generation device provided by the present invention;
[0047] Figure 5 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation
[0048] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0049] Portrait map art is a modern art form created by the British portrait painter John Fairburn. Artists use maps as canvases, employing traditional materials (such as ink, paint, and pencil) to progressively transform the map images, extracting facial features from road, river, and mountain outlines to create portrait images. This process was called topographic pointillism by Fairburn, a direct combination of topography and pointillism. However, the creation of portrait map art typically requires a considerable amount of time, which significantly limited the development of this art form.
[0050] In recent years, people have paid more and more attention to algorithms related to artistic creation and art analysis. In particular, with the development of convolutional neural networks and generative adversarial networks, many methods have emerged, ranging from image-to-image translation and image style transfer. These methods aim to render images, that is, to bridge the appearance gaps in content, color and texture of images in order to render a given image with a new appearance.
[0051] Currently, image synthesis methods are mostly used when processing images. These methods can coordinate multiple objects or content from different sources in a target scene. However, they are not suitable for portrait map art creation because there is a high probability that two completely different styles of input will appear during the process of creating portrait map art. Furthermore, experimental results also show that style transfer methods based on classic deep learning have little effect on portrait map art creation.
[0052] To address the above issues, this invention provides an image generation method that aims to formalize the generation process of portrait map images into an adaptive dual-to-single image transformation problem. Specifically, guided by portrait and map images, it utilizes a convolutional neural network-based method to fuse them into a portrait map image with the style of a portrait map artwork, thereby achieving automatic generation of portrait map images and improving the efficiency of portrait map art creation. Figure 1 This is a flowchart illustrating the image generation method provided by the present invention, as shown below. Figure 1 As shown, the method includes:
[0053] Step 110: Determine the portrait image and map image to be generated;
[0054] Specifically, before generating the image, it is necessary to first determine the basis for generating the portrait map image, namely the portrait image and map image to be generated. The portrait image and map image can be images of any style, which can be obtained from the Internet, newspapers, weather news, literary magazines, works of art, etc.
[0055] Furthermore, the image combination formed by the portrait image and the map image can be one or more, and this embodiment of the invention does not specifically limit this. When there are multiple image combinations, a portrait map image needs to be generated for each image combination. That is, a portrait map image with the style of a portrait map artwork needs to be generated, guided by the portrait image and map image in each image combination.
[0056] Step 120: Based on the image generation model, feature extraction is performed on the portrait image and the map image respectively. Based on the correlation between the portrait features and map features obtained from the feature extraction, the portrait features and map features are fused, and the image is generated based on the fused features to obtain a portrait map image.
[0057] The image generation model is obtained by adversarial training on a joint discriminant model based on sample portrait images, sample map images, and sample portrait map images. The discriminant model is used to distinguish between the predicted portrait map image and the sample portrait map image. The predicted portrait map image is determined by the image generation model based on the sample portrait image and the sample map image.
[0058] Specifically, after determining the portrait image and map image in step 110, step 120 can be executed. The execution process of step 120 includes the following steps:
[0059] First, feature extraction is performed on the portrait image and map image respectively using an image generation model to obtain the portrait features of the portrait image and the map features of the map image. This process can be implemented using a convolutional neural network in the image generation model. Specifically, the portrait image and map image to be generated are input into the convolutional neural network in the image generation model, and the convolutional neural network extracts features from the input portrait image and map image respectively, extracting the content features of the portrait image and map image. This can also be understood as extracting the facial contour information contained in the portrait image and the hidden mountain and river direction information in the map image through the convolutional neural network, and finally obtaining the portrait features of the portrait image and the map features of the map image output by the convolutional neural network.
[0060] Subsequently, considering the differences in the information represented by portrait features and map features, as well as their different roles in the generation of portrait map images, the two can be fused to integrate the facial contour information contained in the portrait features and the mountain and river orientation information hidden in the map image. The fusion process can be performed by an image generation model based on the correlation between portrait features and map features. That is, based on the correlation between portrait features and map features, the portrait features and map features are fused to obtain the fused features, namely, portrait map features.
[0061] It should be noted that the above-mentioned fusion based on the correlation between the two is actually equivalent to fusion based on the attention mechanism. Here, the attention mechanism used is a self-attention structure based on deep learning. The fused portrait map features not only contain the facial contour information of the portrait image, but also the mountain and river direction information of the map image. Furthermore, it also possesses the basic style of portrait map art creation. By fusing based on the correlation between portrait features and map features, the fused portrait map features can add subtle features of portrait map art creation. These subtle features can improve the portrait map image generation process, thereby making the generated portrait map image less abrupt, more vivid, and more realistic.
[0062] Subsequently, based on these portrait map features, an image generation model can be applied to generate images, thereby obtaining portrait map images. It should be noted that the portrait map image generation process based on these portrait map features can completely overcome the shortcomings of traditional schemes where style transfer methods based on classic deep learning are not effective in portrait map artistic creation, achieving efficient generation of portrait map images while improving the accuracy of the generated images.
[0063] Before inputting the portrait image and map image to be generated into the image generation model, sample portrait images, sample map images, and corresponding sample portrait map images can be used to pre-train the image generation model. Specifically, when training the image generation model, sample portrait images and sample map images can be used as training samples, and sample portrait map images can be used as training labels for supervised training. This allows the image generation model to learn the mapping relationship between sample portrait images and sample map images and sample portrait map images during the training process. This enables the trained image generation model to apply the above mapping relationship to generate corresponding portrait map images for the input portrait image and map image to be generated, thereby achieving the purpose of automatic generation of portrait map images.
[0064] However, in the process of training the image generation model using the above-mentioned supervised training method, the quantity and quality of the training data largely determine the performance of the trained model. Fairburn has a limited number of existing portrait map artworks, and a small amount of training data will prevent the image generation model from fully learning the above mapping relationship, resulting in poor performance of the image generation model. Consequently, the portrait map images generated by this image generation model are prone to distortion. Considering this situation, this embodiment of the invention adopts the idea of Generative Adversarial Networks (GAN) to conduct adversarial training on the image generation model and the discriminative model.
[0065] It's important to note that Generative Adversarial Networks (GANs) are primarily used in scenarios such as image synthesis and style transfer. They consist of a Generator network and a Discriminator network. The Generator generates images, while the Discriminator distinguishes between real and fake images. The Generator aims to generate images that are as realistic as possible and can fool the Discriminator, while the Discriminator aims to differentiate between the Generator-generated images and real images. The two networks work together to improve overall network performance and achieve better output results.
[0066] In adversarial training, the image generation model can be considered as a generator, and the discriminator model as a discriminator. The image generation model, acting as the generator, generates images from input sample portrait images and sample map images, outputting a generated portrait map image, i.e., a predicted portrait map image. The discriminator model, acting as the discriminator, distinguishes between the input portrait map image and the predicted image generated by the generator, or the actual sample portrait map image. In this process, the generator and discriminator compete against each other. The generator aims to output a predicted portrait map image that is as similar as possible to the sample portrait map image, making it difficult for the discriminator to distinguish between the sample and predicted images. The discriminator, on the other hand, aims to output a discrimination result that matches the actual input portrait map image, achieving a more accurate and reliable discrimination effect.
[0067] The image generation method provided by this invention uses the correlation between portrait features and map features as a benchmark, fusing the two to obtain portrait map features that combine facial contour information from the portrait image with the mountain and river orientation information from the map image. Furthermore, it adds subtle features between the portrait and map images, thereby increasing the accuracy of the image generation process based on these portrait map features. This overcomes the shortcomings of traditional deep learning-based style transfer methods in portrait map artistic creation, achieving a dual improvement in the generation efficiency and accuracy of portrait map images. In addition, the introduction of generative and adversarial mechanisms to train the image generation model ensures the portrait map image generation capability of the trained model. Moreover, the image generation model not only generates portrait map images but also makes the output portrait map images closer to real portrait map images. While achieving efficient generation of portrait map images, it also ensures the naturalness and realism of the images, greatly improving the image quality of the portrait map images.
[0068] Based on the above embodiments, in step 120, feature extraction is performed on the portrait image and the map image respectively based on the image generation model. Based on the correlation between the extracted portrait features and map features, the portrait features and map features are fused, including:
[0069] Based on the feature extraction layer in the image generation model, feature extraction is performed on portrait images and map images respectively to obtain portrait features and map features;
[0070] Based on the feature fusion layer in the image generation model, as well as portrait features and map features, portrait attention features and map attention features are determined;
[0071] Based on the feature fusion layer in the image generation model, portrait attention features and map attention features are fused.
[0072] Since the essence of generative adversarial networks is to extract features from the input image and generate the target image (portrait map image) through a deep "encoder-decoder" structure, the generation effect is improved through the synergistic effect of the generator and discriminator.
[0073] Therefore, in step 120, the process of using the image generation model trained based on the idea of generative adversarial networks to extract features from the portrait image and map image to be generated, and then fusing the extracted features, may include the following steps:
[0074] First, the feature extraction layer in the image generation model can be used to extract features from portrait images and map images, thereby obtaining portrait features and map features. Specifically, the portrait image and map image to be generated can be input into the feature extraction layer in the image generation model, and the feature extraction layer can extract features from the input portrait image and map image respectively, and finally obtain the portrait features of the portrait image and the map features of the map image output by the feature extraction layer.
[0075] The feature convolutional layer here is essentially a convolutional neural network (CNN). A CNN is used to obtain feature maps of an input image. For example, for an input image x, the CNN obtains the feature map F of input image x. j (x), F j (x) is the feature map of the j-th convolutional layer in the VGG19 convolutional neural network, with a size of C. j ×H j ×W j Among them, C j H represents the number of convolutional kernels in this convolutional layer. j W represents the height of the feature map. j This represents the width of the feature map. The feature map serves as an object-based representation of the image. During network training, object detection is used as the training objective. The pre-trained convolutional neural network VGG19 possesses the ability to extract content features (image features) from the feature map.
[0076] Subsequently, based on portrait features and map features, attention features of portrait images (i.e., portrait attention features) and attention features of map images (i.e., map attention features) can be calculated on the basis of the feature fusion layer in the image generation model. Specifically, since the feature fusion layer of the image generation model can learn the weights of the feature maps of the source domain (portrait domain and map domain) during training, the weights of the feature maps of the portrait domain and the map domain can be obtained through the trained image generation model, and the portrait attention features and map attention features can be calculated based on these two.
[0077] Subsequently, the feature fusion layer in the applied image generation model can be used to fuse portrait attention features and map attention features to obtain portrait map features. The fusion method here can be splicing, addition, cascading, etc., and the embodiments of the present invention do not specifically limit this. The portrait map features determined by the above process not only include facial contour information and mountain and river direction information, but also include subtle features between the portrait image and the map image, which can greatly improve the accuracy and realism of the portrait map image generated based on this portrait map feature.
[0078] Furthermore, regarding the feature fusion of the two mentioned above, the fusion of portrait features and map features can add subtle features to the portrait map features based on information from both aspects, thereby more completely reflecting the relevant information and stylistic features of the portrait map image. This completely overcomes the shortcomings of traditional style transfer methods based on classic deep learning in the artistic creation of portrait maps, and provides strong support for improving the generation efficiency and accuracy of portrait map images.
[0079] Based on the above embodiments, in step 120, based on the feature fusion layer in the image generation model, and portrait features and map features, portrait attention features and map attention features are determined, including:
[0080] Based on the feature fusion layer in the image generation model, the weights of portrait features and map features are determined.
[0081] Based on the feature fusion layer in the image generation model, the weights of portrait features, and the weights of map features, portrait attention features and map attention features are determined.
[0082] Specifically, the process of determining portrait attention features and map attention features based on the feature fusion layer in the image generation model, as well as portrait features and map features, includes the following steps:
[0083] First, the weights of portrait features and map features can be determined based on the feature fusion layer in the image generation model. Specifically, since the feature fusion layer of the image generation model can learn the weights of feature maps in the portrait and map domains during training, the weights of feature maps in the portrait domain and map domain can be obtained through the trained image generation model, and the weights of portrait features and map features can be determined based on this.
[0084] Subsequently, based on the weights of portrait features and map features, the feature fusion layer in the image generation model can be used to calculate portrait attention features and map attention features. Specifically, average pooling can be performed on the feature map of the portrait domain to maintain the overall structure, and max pooling can be performed on the feature map of the map domain to capture detailed information. On this basis, the weights of portrait features and map features are combined to calculate portrait attention features and map attention features. This process can increase the weights of portrait and map features that are related to the portrait and map domains and weaken the weights that are not related. This allows the image generation model to focus on the regions that can constitute the portrait map image when generating images, thus contributing to the improvement of the accuracy of the portrait map image.
[0085] Based on the above embodiments, the feature fusion layer in the image generation model consists of two pooling layers and an auxiliary classifier; wherein, the output of the auxiliary classifier represents the portrait image. From the portrait domain The probability, and the map image From map domain The probability. The two pooling layers are average pooling for the portrait domain and max pooling for the map domain, respectively.
[0086] The feature fusion layer, once trained, can learn the weights of the feature maps in the portrait domain. Weights of feature maps in the map domain Specifically, in this embodiment of the invention, average pooling can be performed on the feature map of the portrait domain to maintain the overall structure, while max pooling can be performed on the feature map of the map domain to capture detailed information. Based on this, the weights of the portrait features and the map features are combined to calculate the portrait attention features and the map attention features.
[0087] It is worth noting that the feature fusion layer here is essentially an autoencoder structure in a convolutional neural network.
[0088] The formulas for calculating portrait attention features and map attention features are as follows:
[0089]
[0090]
[0091] in, An attention feature map representing a portrait image. This represents the attention feature map of the map image. Auxiliary classifier for portrait images Activation mapping, Auxiliary classifier for map images Activation mapping, This represents the weight of the k-th feature map in the portrait domain. This represents the weight of the k-th feature map in the map domain. Auxiliary classifier for portrait images The k-th activation mapping, Auxiliary classifier for map images The k-th activation mapping, where n is the number of encoded feature maps.
[0092] Therefore, the image generation model It can be represented as G t (a s (x)), a s (x) includes and t is the subscript of the portrait map field.
[0093] The fusion process of portrait attention features and map attention features can be represented as follows:
[0094]
[0095] Where, η s (x1, x2) represent the fused features, i.e., the portrait map features, and σ represents the fusion operation, i.e., connecting the portrait attention features and the map attention features. and They are respectively and The value at (i,j).
[0096] Based on the above embodiments, the steps for determining the image generation model and the discrimination model include:
[0097] Construct an initial image generation model and an initial discrimination model;
[0098] Input the sample portrait image and sample map image into the initial image generation model to obtain the predicted portrait map image output by the initial image generation model;
[0099] The predicted portrait map image and the sample portrait map image are input into the initial discrimination model to obtain the discrimination result of the predicted portrait map image and the discrimination result of the sample portrait map image output by the initial discrimination model.
[0100] Based on the predicted portrait map image, the sample portrait map image, the discrimination result of the predicted portrait map image, and the discrimination result of the sample portrait map image, the parameters of the initial image generation model and the initial discrimination model are updated to obtain the image generation model and the discrimination model.
[0101] Specifically, the process of determining the image generation model and the discrimination model includes the following steps:
[0102] First, determine the initial image generation model and the initial discrimination model. Specifically, the initial image generation model and the initial discrimination model can be constructed based on the generative adversarial network. The initial image generation model can be regarded as the generator of the generative adversarial network, and the initial discrimination model can be regarded as the discriminator.
[0103] Subsequently, the pre-collected sample portrait images and sample map images are input into the initial image generation model. The initial image generation model, which acts as the generator, can generate images based on the input sample portrait images and sample map images, thereby outputting predicted portrait map images of the sample portrait images and sample map images.
[0104] Subsequently, the predicted portrait map image output by the initial image generation model and the pre-determined sample portrait map image can be input into the initial discrimination model. As a discriminator, the initial discrimination model can distinguish between the input predicted portrait map image and the sample portrait map image, that is, distinguish whether the input portrait map image is the predicted portrait map image generated by the initial image generation model (generator) or the real sample portrait map image, and thus output the discrimination result of the predicted portrait map image and the discrimination result of the sample portrait map image.
[0105] Subsequently, both the predicted portrait map image and the sample portrait map image can be used as a benchmark. Based on the discrimination results of the predicted and sample portrait map images, the parameters of the initial image generation model and the initial discrimination model are adjusted to obtain the image generation model and the discrimination model. Specifically, during the adjustment, for the initial image generation model, the goal is to output a predicted portrait map image that is as similar as possible to the sample portrait map image, aiming to make it difficult for the initial discrimination model to distinguish between the sample portrait map image and the predicted portrait map image. For the initial discrimination model, the goal is to ensure that the output discrimination result is consistent with the actual situation of the input portrait map image, aiming to achieve a more accurate and reliable discrimination effect.
[0106] Based on the above embodiments, the steps for determining the image generation model and the discrimination model further include:
[0107] Construct an initial image decoupling model;
[0108] The predicted portrait map image is input into the initial image decoupling model to obtain the predicted portrait image and predicted map image output by the initial image decoupling model.
[0109] The predicted portrait image, predicted map image, sample portrait image, and sample map image are input into the initial discrimination model to obtain the discrimination results of the predicted portrait image, the predicted map image, the sample portrait image, and the sample map image output by the initial discrimination model.
[0110] Based on the predicted image, the sample image, the discrimination result of the predicted image, and the discrimination result of the sample image, the parameters of the initial image generation model, the initial discrimination model, and the initial image decoupling model are updated to obtain the image generation model, the discrimination model, and the image decoupling model.
[0111] The predicted images include predicted portrait map images, predicted portrait images, and predicted map images, while the sample images include sample portrait map images, sample portrait images, and sample map images.
[0112] Specifically, the process of determining the image generation model and the discrimination model described above may also include the following steps:
[0113] First, an initial image decoupling model is determined. This initial image decoupling model can be regarded as a new generator, which is used to decouple the predicted portrait map image generated by the initial image generation model to obtain the predicted portrait image and the predicted map image. The decoupling process of the initial image decoupling model is the inverse process of the generation process of the initial image generation model, aiming to obtain the predicted portrait image and the predicted map image from the predicted portrait map image.
[0114] Subsequently, the predicted portrait map image generated by the initial image generation model can be input into the initial image decoupling model. The initial image decoupling model, as the generator, can obtain the distribution characteristics of the input predicted portrait map image and decouple the predicted portrait map image based on these distribution characteristics, thereby outputting the predicted portrait image and the predicted map image.
[0115] Subsequently, the predicted portrait image, predicted map image output by the initial image decoupling model, as well as the pre-determined sample portrait image and sample map image, can be input into the initial discrimination model. As a discriminator, the initial discrimination model can discriminate the input predicted portrait image, predicted map image, sample portrait image, and sample map image, that is, distinguish whether the input portrait image and map image are the predicted portrait image and predicted map image obtained by the initial image decoupling model, or the real sample portrait image and sample map image, thereby outputting the discrimination results of the predicted portrait image, the discrimination results of the predicted map image, the discrimination results of the sample portrait image, and the discrimination results of the sample map image.
[0116] Subsequently, based on the predicted image, sample image, the discrimination result of the predicted image, and the discrimination result of the sample image, the parameters of the initial image generation model, the initial discrimination model, and the initial image decoupling model can be adjusted to obtain the image generation model, the discrimination model, and the image decoupling model. Specifically, the adjustment process for the initial image generation model has been detailed above and will not be repeated here. For the initial image decoupling model, the goal is to output predicted portrait images and predicted map images that are as similar as possible to the sample portrait images and sample map images, so that the initial discrimination model cannot easily distinguish between the sample portrait images and the predicted portrait images, nor between the sample map images and the predicted map images. For the initial discrimination model, the goal is to ensure that the output discrimination result is consistent with the actual situation of the corresponding input portrait image, map image, or portrait-map image, in order to achieve a more accurate and reliable discrimination effect.
[0117] It should be noted that the predicted images here include predicted portrait map images, predicted portrait images, and predicted map images, while the sample images include sample portrait map images, sample portrait images, and sample map images.
[0118] Based on the above embodiments, based on the predicted image, the sample image, the discrimination result of the predicted image, and the discrimination result of the sample image, the parameters of the initial image generation model, the initial discrimination model, and the initial image decoupling model are updated to obtain the image generation model, the discrimination model, and the image decoupling model, including:
[0119] Based on the predicted portrait map image and the sample portrait map image, determine the image generation loss of the initial image generation model;
[0120] Based on the predicted portrait image, the predicted map image, the sample portrait image, and the sample map image, determine the image decoupling loss of the initial image decoupling model;
[0121] Based on the discrimination results of the predicted image and the discrimination results of the sample image, the discrimination loss of the initial discrimination model is determined;
[0122] Based on image generation loss, image decoupling loss, and discrimination loss, the parameters of the initial image generation model, initial discrimination model, and initial image decoupling model are updated to obtain the image generation model, discrimination model, and image decoupling model.
[0123] Specifically, the process of updating the parameters of the initial image generation model, the initial discrimination model, and the initial image decoupling model based on the predicted image, the sample image, the discrimination result of the predicted image, and the discrimination result of the sample image to obtain the image generation model, the discrimination model, and the image decoupling model includes the following steps:
[0124] First, the difference between the sample portrait map image and the predicted portrait map image output by the initial image generation model can be determined. Then, based on this difference, the loss of the initial image generation model in the process of generating the predicted portrait map image can be determined, that is, the image generation loss of the initial image generation model.
[0125] The differences between the sample portrait image and the predicted portrait image output by the initial image decoupling model, as well as the differences between the sample map image and the predicted map image output by the initial image decoupling model, can be determined. Based on these two differences, the loss of the initial image decoupling model in the process of decoupling to obtain the predicted portrait image and the predicted map image can be determined, i.e., the image decoupling loss of the initial image decoupling model; the image decoupling loss here includes portrait image decoupling loss and map image decoupling loss.
[0126] At the same time, the discrimination loss of the initial discrimination model can be determined based on the discrimination results of the predicted image output by the discrimination model and the discrimination results of the sample image. Specifically, the differences between the discrimination results of the predicted image and the discrimination results of the sample image and the actual situation can be judged respectively, and the discrimination loss of the initial discrimination model can be determined based on the judgment results.
[0127] It should be noted that the discrimination loss here consists of three parts: portrait image discrimination loss, map image discrimination loss, and portrait-map image discrimination loss.
[0128] Subsequently, adversarial training can be performed on the initial image generation model, the initial image decoupling model, and the initial discriminant model, using the image generation loss of the initial image generation model, the image decoupling model, and the discriminant model as benchmarks. This involves updating the parameters of the initial image generation model, the initial image decoupling model, and the initial discriminant model to obtain the image generation model, the image decoupling model, and the discriminant model. This process has been explained in detail above and will not be repeated here.
[0129] Based on the above embodiments, the loss function for adversarial training can be expressed as:
[0130] The formula for calculating the image generation loss of the initial image generation model is as follows:
[0131]
[0132] in, For image generation loss, x ~ X t Indicates that x is a distribution X t Sample portrait map images in X t For portrait map domain; Indicates that x1 is a distribution The sample portrait images in For portrait domain; Indicates that x2 is a distribution The sample map image in For the map domain; E represents the corresponding expectation, D t This represents the loss in portrait map image discrimination. This represents the initial image generation model. This represents the predicted portrait map image generated by the initial image generation model.
[0133] The formula for calculating the image decoupling loss of the initial image decoupling model is as follows:
[0134]
[0135]
[0136] in, This represents the loss in the process of decoupling from the predicted portrait image to obtain the predicted portrait image, i.e., the portrait image decoupling loss. This represents the loss in the process of decoupling from the predicted portrait image to obtain the predicted map image, i.e., the map image decoupling loss. express It is a distribution The predicted portrait map image in E represents the predicted portrait map domain generated by the initial image generation model, and E represents the corresponding expectation. This represents the initial image decoupling model, which is... and composition, This represents the predicted portrait image obtained after decoupling. This indicates that decoupling yields the predicted map image. This represents the loss in portrait image discrimination. This represents the map image discrimination loss.
[0137] Based on the above embodiments, the initial image generation model, initial discrimination model, and initial image decoupling model are updated with parameters based on image generation loss, image decoupling loss, and discrimination loss to obtain the image generation model, discrimination model, and image decoupling model, including:
[0138] Based on image generation loss, image decoupling loss, discrimination loss, and cycle consistency loss constraints, the parameters of the initial image generation model, initial discrimination model, and initial image decoupling model are updated to obtain the image generation model, discrimination model, and image decoupling model.
[0139] Cyclic consistency loss constraint is used to constrain the differences between the image features corresponding to the input and output images of the initial image generation model and the initial image decoupling model, respectively.
[0140] Specifically, in the process of updating the parameters of the initial image generation model, initial discriminant model, and initial image decoupling model based on the image generation loss, image decoupling loss, and discriminant loss to obtain the image generation model, discriminant model, and image decoupling model, a cycle consistency loss constraint can also be added to make the content features (image features) of the input and output images of the initial image generation model and initial image decoupling model as consistent as possible, preventing mode collapse. Specifically, the parameters of the initial image generation model, initial discriminant model, and initial image decoupling model can be updated based on the image generation loss, image decoupling loss, discriminant loss, and cycle consistency loss constraint to obtain the image generation model, discriminant model, and image decoupling model.
[0141] It should be noted that the cycle consistency loss constraint here is specifically used to constrain the difference between the predicted portrait map image output by the initial image generation model and the content features of the sample portrait image and sample map image input to the model, as well as to constrain the difference between the predicted portrait image and predicted map image output by the initial image decoupling model and the content features of the predicted portrait map image input to the model.
[0142] Based on the above embodiments, the formula for calculating the cycle consistency loss constraint is as follows:
[0143] The cycle consistency loss constraint for the initial image generation model can be expressed as:
[0144]
[0145] in, This represents the cycle consistency loss constraint applied to the process by which the initial image generation model generates predicted portrait map images using sample portrait images and sample map images. This represents the generator that couples sample portrait images and sample map images to produce predicted portrait map images, i.e., the initial image generation model. This represents the generator that decouples the predicted portrait image and the predicted map image from the predicted portrait map image, i.e., the initial image decoupling model. x represents the real sample portrait image and sample map image, which is the collective name of sample portrait image x1 and sample map image x2. ‖*‖1 represents the L1 norm.
[0146] The cycle consistency loss constraint for the initial image decoupling model can be expressed as:
[0147]
[0148]
[0149] in, This represents the cycle consistency loss constraint for the process of obtaining the predicted portrait image from the initial image decoupling model. This represents the cycle consistency loss constraint in the process of obtaining the predicted map image from the initial image decoupling model. This represents the generator that decouples the predicted portrait image and the predicted map image from the predicted portrait map image, i.e., the initial image decoupling model, which is derived from... and composition, This represents the predicted portrait image obtained after decoupling. This represents the predicted map image obtained through decoupling.
[0150] Cyclic consistency loss constraints can be used to maintain the consistency of content features in the style transfer of images across different domains, preventing mode collapse.
[0151] Based on the above embodiments, Figure 2 This is a flowchart of the image generation process and image decoupling process provided by the present invention, such as... Figure 2 As shown, from the portrait domain Sample portrait images and from map domain Sample map images The predicted portrait map domain can be obtained by considering the initial image generation model as the generator G. Predicted portrait map images Predicting portrait map images The predicted portrait image can be obtained by decoupling the image using an image decoupling model that treats F as a new generator. and predicted map images
[0152] Similarly, from the portrait map domain X t Sample portrait map image x t Similarly, decoupling can be achieved using an initial image decoupling model to obtain the predicted portrait domain. Image and predicted map domain The image can be decoupled from the initial image decoupling model and input into the initial image generation model to obtain the predicted portrait map image output by the initial image generation model.
[0153] In this process, the predicted portrait map image generated by generator G can be used. And the predicted portrait image obtained by decoupling from the new generator F. and predicted map images The input is fed into the initial discriminant model, which is regarded as the discriminator D, to obtain the predicted portrait map image output by the initial discriminant model. The discrimination results predict portrait images The discrimination results and predicted map images The discrimination result is used to determine the discrimination loss of the corresponding image; where, This represents the loss in portrait image discrimination. D represents the map image discrimination loss. t This represents the loss in portrait map image discrimination.
[0154] The two mapping workflows provided in this embodiment of the invention, namely two-to-one and one-to-two, can decouple the image generation model and the image decoupling model through an asymmetric cyclic workflow. During the adversarial training of the joint discriminant model, the model learns to obtain the predicted portrait map image through vertical coupling and mapping, and to obtain the vertical coupling and mapping through decoupling of the predicted portrait map image. Thus, the learned mapping relationship can be directly applied in the application process to generate a portrait map image based on the portrait image and the map image.
[0155] Based on the above embodiments, Figure 3 This is a schematic diagram of the image generation model provided by the present invention, as shown below. Figure 3 As shown, firstly, the portrait image and map image can be input into the feature extraction layer of the image generation model, respectively. The feature extraction layer extracts features from the input portrait image and map image, thereby obtaining the portrait features of the portrait image and the map features of the map image output by the feature extraction layer.
[0156] It should be noted that the feature extraction layer here consists of a convolutional neural network including multiple convolutional layers (Convolution1, Convolution2, and Convolution3) and a deep residual network (Residual Neural Network Block); wherein Convolution1 has 64 channels, Convolution2 has 128 channels, and Convolution3 has 256 channels; each layer in the ResNet block has 256 channels.
[0157] Subsequently, portrait features and map features are input into the feature fusion layer of the image generation model. The feature fusion layer performs average pooling for the portrait domain to maintain the overall structure and max pooling for the feature map of the map domain to capture detailed information. Based on this, the weights (W) of the portrait features obtained after passing through the fully connected layer (fc) and the weights (W) of the map features are combined to calculate the portrait attention map and the map attention map. The number of channels in the fully connected layer here is 256.
[0158] Subsequently, in the feature fusion layer of the image generation model, the portrait attention features and map attention features can be fused (concatenation of attention maps) and obtained through a fully connected layer (fc) with 256 channels;
[0159] Subsequently, the portrait map image can be generated based on the fused portrait map features. This part also consists of a deep residual network (resnet block) and a convolutional neural network including multiple convolutional layers (Convolution3, Convolution2 and Convolution1). However, unlike the feature extraction layer, the features input to this part are passed through the resnet block, Convolution3, Convolution2 and Convolution1 in sequence to finally obtain the portrait map image.
[0160] The image generation apparatus provided by the present invention will be described below. The image generation apparatus described below can be referred to in correspondence with the image generation method described above.
[0161] Figure 4 This is a schematic diagram of the image generation device provided by the present invention, as shown below. Figure 4 As shown, the device includes:
[0162] Image determination unit 410 is used to determine the portrait image and map image to be generated;
[0163] The image generation unit 420 is used to extract features from the portrait image and the map image respectively based on the image generation model, fuse the portrait features and the map features based on the correlation between the extracted portrait features and map features, and generate an image based on the fused features to obtain a portrait map image.
[0164] The image generation model is obtained by adversarial training based on sample portrait images, sample map images, and sample portrait map images, and a joint discriminant model. The discriminant model is used to distinguish between the predicted portrait map image and the sample portrait map image. The predicted portrait map image is determined by the image generation model based on the sample portrait image and the sample map image.
[0165] The image generation apparatus provided by this invention uses the correlation between portrait features and map features as a benchmark, fusing the two to obtain portrait map features that possess both facial contour information of the portrait image and mountain and river orientation information of the map image. Furthermore, it adds subtle features between the portrait and map images, thereby increasing the accuracy of the image generation process based on these portrait map features. This overcomes the shortcomings of traditional deep learning-based style transfer methods in portrait map artistic creation, achieving a dual improvement in the generation efficiency and accuracy of portrait map images. In addition, the introduction of generative and adversarial mechanisms to train the image generation model ensures the portrait map image generation capability of the trained model. Moreover, the image generation model not only generates portrait map images but also makes the output portrait map images closer to real portrait map images. While achieving efficient generation of portrait map images, it also ensures the naturalness and realism of the images, greatly improving the image quality of the portrait map images.
[0166] Based on the above embodiments, the image generation unit 420 is used for:
[0167] Based on the feature extraction layer in the image generation model, feature extraction is performed on the portrait image and the map image respectively to obtain portrait features and map features;
[0168] Based on the feature fusion layer in the image generation model, as well as the portrait features and the map features, portrait attention features and map attention features are determined;
[0169] Based on the feature fusion layer in the image generation model, the portrait attention features and the map attention features are fused.
[0170] Based on the above embodiments, the image generation unit 420 is used for:
[0171] Based on the feature fusion layer in the image generation model, the weights of the portrait features and the map features are determined.
[0172] Based on the feature fusion layer in the image generation model, the weights of the portrait features, and the weights of the map features, portrait attention features and map attention features are determined.
[0173] Based on the above embodiments, the device further includes a model determination unit, used for:
[0174] Construct an initial image generation model and an initial discrimination model;
[0175] The sample portrait image and the sample map image are input into the initial image generation model to obtain the predicted portrait map image output by the initial image generation model;
[0176] The predicted portrait map image and the sample portrait map image are input into the initial discrimination model to obtain the discrimination result of the predicted portrait map image and the discrimination result of the sample portrait map image output by the initial discrimination model.
[0177] Based on the predicted portrait map image, the sample portrait map image, the discrimination result of the predicted portrait map image, and the discrimination result of the sample portrait map image, the parameters of the initial image generation model and the initial discrimination model are updated to obtain the image generation model and the discrimination model.
[0178] Based on the above embodiments, the model determination unit is further configured to:
[0179] Construct an initial image decoupling model;
[0180] The predicted portrait map image is input into the initial image decoupling model to obtain the predicted portrait image and predicted map image output by the initial image decoupling model;
[0181] The predicted portrait image, the predicted map image, the sample portrait image, and the sample map image are input into the initial discrimination model to obtain the discrimination results of the predicted portrait image, the predicted map image, the sample portrait image, and the sample map image output by the initial discrimination model.
[0182] Based on the predicted image, the sample image, the discrimination result of the predicted image, and the discrimination result of the sample image, the parameters of the initial image generation model, the initial discrimination model, and the initial image decoupling model are updated to obtain the image generation model, the discrimination model, and the image decoupling model.
[0183] The predicted image includes the predicted portrait map image, the predicted portrait image and the predicted map image, and the sample image includes the sample portrait map image, the sample portrait image and the sample map image.
[0184] Based on the above embodiments, the model determination unit is further configured to:
[0185] Based on the predicted portrait map image and the sample portrait map image, determine the image generation loss of the initial image generation model;
[0186] Based on the predicted portrait image, the predicted map image, the sample portrait image, and the sample map image, determine the image decoupling loss of the initial image decoupling model;
[0187] Based on the discrimination results of the predicted image and the discrimination results of the sample image, the discrimination loss of the initial discrimination model is determined;
[0188] Based on the image generation loss, the image decoupling loss, and the discrimination loss, the parameters of the initial image generation model, the initial discrimination model, and the initial image decoupling model are updated to obtain the image generation model, the discrimination model, and the image decoupling model.
[0189] Based on the above embodiments, the model determination unit is further configured to:
[0190] Based on the image generation loss, the image decoupling loss, the discrimination loss, and the cycle consistency loss constraint, the parameters of the initial image generation model, the initial discrimination model, and the initial image decoupling model are updated to obtain the image generation model, the discrimination model, and the image decoupling model.
[0191] The cycle consistency loss constraint is used to constrain the difference between the image features corresponding to the input and output images of the initial image generation model and the initial image decoupling model, respectively.
[0192] Figure 5 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 5 As shown, the electronic device may include: a processor 510, a communication interface 520, a memory 530, and a communication bus 540, wherein the processor 510, the communication interface 520, and the memory 530 communicate with each other through the communication bus 540. The processor 510 can call logical instructions in the memory 530 to execute an image generation method, which includes: determining a portrait image and a map image to be generated; extracting features from the portrait image and the map image respectively based on an image generation model; fusing the portrait features and the map features based on the correlation between the extracted features; and generating an image based on the fused features to obtain a portrait map image; the image generation model is obtained through adversarial training using a joint discriminant model based on sample portrait images, sample map images, and sample portrait map images. The discriminant model is used to distinguish between the predicted portrait map image and the sample portrait map image, and the predicted portrait map image is determined by the image generation model based on the sample portrait image and the sample map image.
[0193] Furthermore, the logical instructions in the aforementioned memory 530 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, essentially, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0194] On the other hand, the present invention also provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, wherein when the program instructions are executed by a computer, the computer is able to execute the image generation method provided by the above methods, the method comprising: determining a portrait image and a map image to be generated; extracting features from the portrait image and the map image respectively based on an image generation model; fusing the portrait features and the map features based on the correlation between the extracted portrait features and map features; and generating an image based on the fused features to obtain a portrait map image; wherein the image generation model is obtained by adversarial training based on a sample portrait image, a sample map image, and a sample portrait map image, and a joint discriminant model, the discriminant model being used to distinguish between a predicted portrait map image and the sample portrait map image, the predicted portrait map image being determined by the image generation model based on the sample portrait image and the sample map image.
[0195] In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the image generation method provided by the above methods. The method includes: determining a portrait image and a map image to be generated; extracting features from the portrait image and the map image respectively based on an image generation model; fusing the portrait features and the map features based on the correlation between the extracted features; and generating an image based on the fused features to obtain a portrait map image; wherein the image generation model is obtained through adversarial training using a joint discriminant model based on sample portrait images, sample map images, and sample portrait map images, and the discriminant model is used to distinguish between a predicted portrait map image and the sample portrait map image, and the predicted portrait map image is determined by the image generation model based on the sample portrait image and the sample map image.
[0196] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0197] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0198] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. An image generation method, characterized in that, include: Determine the portrait image and map image to be generated; Based on the image generation model, features are extracted from the portrait image and the map image respectively. Based on the correlation between the portrait features and map features obtained from the feature extraction, the portrait features and map features are fused, and an image is generated based on the fused features to obtain a portrait map image. The image generation model is obtained by adversarial training on an initial discriminant model, based on sample portrait images, sample map images, and sample portrait map images, as well as predicted portrait images, predicted map images, and predicted portrait map images. The initial discriminant model is used to distinguish between the predicted portrait map image and the sample portrait map image, between the predicted portrait image and the sample portrait image, and between the predicted map image and the sample map image. The predicted portrait map image is determined by the initial image generation model based on the sample portrait image and the sample map image, and the predicted portrait image and the predicted map image are obtained by decoupling the predicted portrait map image.
2. The image generation method according to claim 1, characterized in that, The image generation model extracts features from the portrait image and the map image respectively. Based on the correlation between the extracted portrait features and map features, the portrait features and map features are fused, including: Based on the feature extraction layer in the image generation model, feature extraction is performed on the portrait image and the map image respectively to obtain portrait features and map features; Based on the feature fusion layer in the image generation model, as well as the portrait features and the map features, portrait attention features and map attention features are determined; Based on the feature fusion layer in the image generation model, the portrait attention features and the map attention features are fused.
3. The image generation method according to claim 2, characterized in that, The determination of portrait attention features and map attention features based on the feature fusion layer in the image generation model, as well as the portrait features and the map features, includes: Based on the feature fusion layer in the image generation model, the weights of the portrait features and the map features are determined. Based on the feature fusion layer in the image generation model, the weights of the portrait features, and the weights of the map features, portrait attention features and map attention features are determined.
4. The image generation method according to any one of claims 1 to 3, characterized in that, The steps for determining the image generation model and the discrimination model include: Construct the initial image generation model and the initial discrimination model; The sample portrait image and the sample map image are input into the initial image generation model to obtain the predicted portrait map image output by the initial image generation model; The predicted portrait map image and the sample portrait map image, the predicted portrait image and the sample portrait image, and the predicted map image and the sample map image are input into the initial discrimination model to obtain the discrimination results of the predicted portrait map image and the sample portrait map image, the discrimination results of the predicted portrait image and the sample portrait image, and the discrimination results of the predicted map image and the sample map image output by the initial discrimination model. Based on the predicted portrait map image and the sample portrait map image, the predicted portrait image and the sample portrait image, the predicted map image and the sample map image, and the discrimination results of the predicted portrait map image and the sample portrait map image, the discrimination results of the predicted portrait image and the sample portrait image, and the discrimination results of the predicted map image and the sample map image, the parameters of the initial image generation model and the initial discrimination model are updated to obtain the image generation model and the discrimination model.
5. The image generation method according to claim 4, characterized in that, The step of updating the parameters of the initial image generation model and the initial discrimination model to obtain the image generation model and the discrimination model includes: Construct an initial image decoupling model; The predicted portrait map image is input into the initial image decoupling model to obtain the predicted portrait image and the predicted map image output by the initial image decoupling model; Based on the predicted portrait map image and the sample portrait map image, the predicted portrait image and the sample portrait image, the predicted map image and the sample map image, and the discrimination results of the predicted portrait map image and the sample portrait map image, the discrimination results of the predicted portrait image and the sample portrait image, the discrimination results of the predicted map image and the sample map image, the parameters of the initial image generation model, the initial discrimination model and the initial image decoupling model are updated to obtain the image generation model, the discrimination model and the image decoupling model.
6. The image generation method according to claim 5, characterized in that, The method involves updating the parameters of the initial image generation model, the initial discrimination model, and the initial image decoupling model based on the predicted portrait map image and the sample portrait map image, the predicted portrait image and the sample portrait image, the predicted map image and the sample map image, and the discrimination results of the predicted portrait map image and the sample portrait map image, the discrimination results of the predicted portrait image and the sample portrait image, and the discrimination results of the predicted map image and the sample map image, to obtain the image generation model, the discrimination model, and the image decoupling model, including: Based on the predicted portrait map image and the sample portrait map image, determine the image generation loss of the initial image generation model; Based on the predicted portrait image, the predicted map image, the sample portrait image, and the sample map image, determine the image decoupling loss of the initial image decoupling model; Based on the discrimination results of the predicted portrait map image and the sample portrait map image, the discrimination results of the predicted portrait image and the sample portrait image, and the discrimination results of the predicted map image and the sample map image, the discrimination loss of the initial discrimination model is determined. Based on the image generation loss, the image decoupling loss, and the discrimination loss, the parameters of the initial image generation model, the initial discrimination model, and the initial image decoupling model are updated to obtain the image generation model, the discrimination model, and the image decoupling model.
7. The image generation method according to claim 6, characterized in that, The process of updating the parameters of the initial image generation model, the initial discriminant model, and the initial image decoupling model based on the image generation loss, the image decoupling loss, and the discriminant loss to obtain the image generation model, the discriminant model, and the image decoupling model includes: Based on the image generation loss, the image decoupling loss, the discrimination loss, and the cycle consistency loss constraint, the parameters of the initial image generation model, the initial discrimination model, and the initial image decoupling model are updated to obtain the image generation model, the discrimination model, and the image decoupling model. The cycle consistency loss constraint is used to constrain the difference between the image features corresponding to the input and output images of the initial image generation model and the initial image decoupling model, respectively.
8. An image generation apparatus, characterized in that, include: An image determination unit is used to determine the portrait image and map image to be generated; The image generation unit is used to extract features from the portrait image and the map image respectively based on the image generation model, fuse the portrait features and the map features based on the correlation between the extracted portrait features and map features, and generate an image based on the fused features to obtain a portrait map image. The image generation model is obtained by adversarial training on an initial discriminant model, based on sample portrait images, sample map images, and sample portrait map images, as well as predicted portrait images, predicted map images, and predicted portrait map images. The initial discriminant model is used to distinguish between the predicted portrait map image and the sample portrait map image, between the predicted portrait image and the sample portrait image, and between the predicted map image and the sample map image. The predicted portrait map image is determined by the initial image generation model based on the sample portrait image and the sample map image, and the predicted portrait image and the predicted map image are obtained by decoupling the predicted portrait map image.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the image generation method as described in any one of claims 1 to 7.
10. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the image generation method as described in any one of claims 1 to 7.