Model training method, image generation method, device, electronic equipment, computer readable storage medium and computer program product

By performing multiple masking processes and parameter adjustments on the generated images, the problems of mode collapse and instability during the training process of generative adversarial networks are solved, achieving efficient training of image generation models and high-quality, diverse image generation.

CN122244577APending Publication Date: 2026-06-19TENCENT TECHNOLOGY (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date
2024-12-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Generative adversarial networks (GANs) are prone to pattern collapse and training instability during training, resulting in fixed generator output images that affect the diversity and quality of image generation.

Method used

By performing multiple masking processes on the generated image, multiple masked images are obtained. The model parameters of the image generation model are then adjusted based on the masked images and training target information to suppress some features in the generated image and avoid overtraining.

Benefits of technology

It improves the training efficiency of image generation models and the diversity and realism of generated images, ensuring that the model can fully learn the features of sample images and generate high-quality and diverse images.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244577A_ABST
    Figure CN122244577A_ABST
Patent Text Reader

Abstract

This application provides an image generation model training method, an image generation method, an apparatus, an electronic device, a computer program product, and a computer-readable storage medium. The image generation model training method includes: acquiring multiple sample images and an image generation model to be trained; inputting the sample images into the image generation model to be trained to generate a generated image corresponding to each sample image; performing multiple masking processes on each generated image to obtain multiple masked images corresponding to the generated image; and adjusting the model parameters of the image generation model to be trained based on the masked images and the training target information of the masked images to obtain a trained image generation model. This application can improve the training efficiency of image generation models and the realism of generated images.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to computer technology, and more particularly to an image generation model training method, an image generation method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product. Background Technology

[0002] Image generation models are artificial intelligence models capable of automatically generating images. These models typically employ generative adversarial networks (GANs), consisting of a generator and a discriminator. The generator aims to produce images as realistic as possible, while the discriminator distinguishes between the generator's images and real images. Through this adversarial process, the generator gradually learns to generate high-quality images. However, during the training process, the generator's output is suppressed by the discriminator, which can easily lead to pattern collapse. This can cause the generator to output only low-quality and fixed images, negatively impacting the overall image generation quality. Summary of the Invention

[0003] This application provides an image generation model training method, an image generation method, an apparatus, a computer-readable storage medium, and a computer program product, which can improve the training efficiency of the image generation model and the realism of the generated images.

[0004] The technical solution of this application embodiment is implemented as follows:

[0005] This application provides an image generation model training method, the method comprising:

[0006] Acquire multiple sample images and an image generation model to be trained;

[0007] The sample images are input into the image generation model to be trained to generate a generated image corresponding to each sample image.

[0008] For each generated image, the generated image is subjected to multiple masking processes to obtain multiple masked images corresponding to the generated image;

[0009] Based on the masked image and the training target information of the masked image, the model parameters of the image generation model to be trained are adjusted to obtain the trained image generation model.

[0010] This application provides an image generation method, the method comprising:

[0011] When the time for image generation is reached, a trained image generation model is obtained. The trained image generation model is obtained by training using the image generation model training method provided in the embodiments of this application. The trained image generation model includes a trained generator.

[0012] Using the trained generator, at least one target image is generated;

[0013] Output the at least one target image.

[0014] This application provides an image generation model training device, including:

[0015] The first acquisition module is used to acquire multiple sample images and the image generation model to be trained.

[0016] The first generation module is used to input the sample image into the image generation model to be trained, and generate a generated image corresponding to each sample image;

[0017] The model training module is used to perform multiple masking processes on each generated image to obtain multiple masked images corresponding to the generated image.

[0018] The model training module is further configured to adjust the model parameters of the image generation model to be trained based on the masked image and the training target information of the masked image, so as to obtain the trained image generation model.

[0019] This application provides an image generation apparatus, the apparatus comprising:

[0020] The second acquisition module is used to acquire the trained image generation model when the image generation time is reached. The trained image generation model is obtained by training using the image generation model training method provided in the embodiments of this application. The trained image generation model includes a trained generator.

[0021] The second generation module is used to generate at least one target image using the trained generator;

[0022] An output module is used to output the at least one target image.

[0023] This application provides an electronic device, the electronic device comprising:

[0024] Memory is used to store executable instructions or computer programs.

[0025] The processor, when executing computer-executable instructions or computer programs stored in the memory, implements the image generation model training method or image generation method provided in the embodiments of this application.

[0026] This application provides a computer-readable storage medium storing a computer program or computer-executable instructions, which, when executed by a processor, implements the image generation model training method or image generation method provided in this application.

[0027] This application provides a computer program product, including a computer program or computer executable instructions. When the computer program or computer executable instructions are executed by a processor, they implement the image generation model training method or image generation method provided in this application.

[0028] The embodiments of this application have the following beneficial effects:

[0029] In this embodiment, multiple sample images and an image generation model to be trained are acquired. The sample images are input into the image generation model to generate a generated image corresponding to each sample image. Multiple masking processes are performed on each generated image to obtain multiple masked images. Based on the masked images and training target information, the model parameters of the image generation model to be trained are adjusted to obtain the trained image generation model. In this way, by masking the generated images, some features in the generated images can be suppressed at each masking stage. When training the image generation model based on the masked images and training target information, the model parameters are adjusted only based on a portion of the features of the generated images, avoiding overtraining that could lead to the model only generating fixed types of images, thereby improving the diversity of images generated by the image generation model. Furthermore, different masking processes can suppress different features in the generated images, improving the diversity of generated images while ensuring that the image generation model can fully learn the features of the sample images, thus improving the quality and realism of the generated images. The training process provided in this application is simple and efficient, and can ensure that the trained image generation model can generate high-quality and diverse images, thereby improving the training efficiency of the image generation model and the realism of the generated images. Attached Figure Description

[0030] Figure 1 This is a schematic diagram of the model training system architecture provided in the embodiments of this application;

[0031] Figure 2A This is a schematic diagram of the structure of server 200-1 provided in an embodiment of this application;

[0032] Figure 2B This is a schematic diagram of the structure of server 200-2 provided in the embodiments of this application;

[0033] Figure 3A This is a flowchart illustrating the image generation model training method provided in an embodiment of this application;

[0034] Figure 3B This is a schematic diagram of the masking process provided in the embodiments of this application;

[0035] Figure 3CThis is a schematic diagram of the model parameter adjustment process provided in the embodiments of this application;

[0036] Figure 3D This is a schematic diagram of the process for determining the total loss value provided in an embodiment of this application;

[0037] Figure 3E This is a schematic diagram of another masking process provided in an embodiment of this application;

[0038] Figure 4 This is a schematic flowchart of the image generation method provided in the embodiments of this application;

[0039] Figure 5 This is a schematic flowchart of the image generation model training method in the related technology provided in the embodiments of this application;

[0040] Figure 6 This is another schematic diagram of the image generation model training method provided in the embodiments of this application;

[0041] Figure 7 This is another flowchart illustrating the image generation model training method provided in the embodiments of this application;

[0042] Figure 8A This is a schematic diagram of the image generation effect provided in the embodiments of this application;

[0043] Figure 8B This is another schematic diagram of the image generation effect provided in the embodiments of this application. Detailed Implementation

[0044] To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to the accompanying drawings. The described embodiments should not be regarded as limitations on this application. All other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0045] In the following description, references are made to “some embodiments,” which describe a subset of all possible embodiments. However, it is understood that “some embodiments” may be the same subset or different subsets of all possible embodiments and may be combined with each other without conflict.

[0046] In the following description, the terms "first, second, third" are used merely to distinguish similar objects and do not represent a specific ordering of objects. It is understood that "first, second, third" may be interchanged in a specific order or sequence where permitted, so that the embodiments of this application described herein can be implemented in an order other than that illustrated or described herein.

[0047] In this application embodiment, the terms "module" or "unit" refer to a computer program or part of a computer program that has a predetermined function and works with other related parts to achieve a predetermined goal, and can be implemented wholly or partially using software, hardware (such as processing circuitry or memory), or a combination thereof. Similarly, a processor (or multiple processors or memory) can be used to implement one or more modules or units. Furthermore, each module or unit can be part of an overall module or unit that includes the functionality of that module or unit.

[0048] Unless otherwise defined, all technical and scientific terms used in the embodiments of this application have the same meaning as commonly understood by one of ordinary skill in the art. The terminology used in the embodiments of this application is for the purpose of describing the embodiments of this application only and is not intended to limit this application.

[0049] In the implementation of this application, the collection and processing of relevant data should strictly comply with the requirements of relevant laws and regulations, obtain the informed consent or separate consent of the personal information subject, and carry out subsequent data use and processing within the scope of laws and regulations and the authorization of the personal information subject.

[0050] Before providing a further detailed description of the embodiments of this application, the nouns and terms involved in the embodiments of this application will be explained, and the nouns and terms involved in the embodiments of this application shall be interpreted as follows.

[0051] 1) Generative Adversarial Networks (GANs): A deep learning model consisting of two parts: a generator and a discriminator. Through competition between the two, they continuously iterate and update. The generator gradually learns to generate higher quality and more realistic samples, while the discriminator gradually learns to make more accurate judgments.

[0052] 2) Generator: Responsible for generating data. It takes a random noise vector as input, processes it through a series of neural network layers, and outputs a sample similar to the training dataset. This sample is usually random and unrealistic at the beginning, but it will gradually become more realistic as training progresses.

[0053] 3) Discriminator: Responsible for determining whether the input data is a real sample or a fake sample generated by the generator. It receives samples from the real dataset and the generator's output, and then classifies these samples through a neural network layer.

[0054] 4) Overfitting: This refers to a model learning the training data too thoroughly, so much so that it not only learns the effective patterns in the data, but also the noise and details, resulting in a decrease in the model's generalization ability, that is, its performance on new datasets is far worse than on the training set.

[0055] In related technologies, the training process of image generation models using generative adversarial networks is as follows: Figure 5 As shown, Figure 5 This is a schematic flowchart of the image generation model training method provided in the embodiments of this application. The generator 51 generates a generated image 53 corresponding to the sample image 54 and inputs it into the discriminator 52. The discriminator 52 discriminates the generated image 53 based on the generated image 53 and the sample image 54 to obtain a prediction result 55. Then, the prediction result 55 is compared with the training target information 56 to determine the total loss value 57. The total loss value 57 includes a first loss value 571 and a second loss value 572. The generator 51 is trained using backpropagation through the first loss value 571, and the discriminator 52 is trained using backpropagation through the second loss value 572.

[0056] The training process of image generation models in related technologies has the following problems:

[0057] 1. Pattern collapse: Because generative adversarial networks require the generator and discriminator to compete against each other during training to generate realistic images or data, the generator's output will be suppressed by the discriminator. At this time, the generator cannot fully learn the distribution characteristics of the sample images. In order to minimize the training loss, the generator can only output certain fixed images, such as images of a single category or images that are almost exactly the same.

[0058] 2. Unstable training: The discriminator is prone to overfitting during training, which causes the data distribution generated by the generator to not overlap with the real data distribution. This results in poor generation quality or limited synthesis diversity, failing to achieve the expected results.

[0059] This application provides an image generation model training method, image generation method, apparatus, device, computer-readable storage medium, and computer program product, which can improve the efficiency and accuracy of model training. The following describes exemplary applications of the electronic devices provided in this application. These devices can be implemented as various types of terminals such as laptops, tablets, desktop computers, set-top boxes, smartphones, smart speakers, smartwatches, smart TVs, and in-vehicle terminals, or as servers. The following describes exemplary applications when the device is implemented as a server.

[0060] See Figure 1 , Figure 1 This is a schematic diagram of the architecture of the image generation system provided in the embodiments of this application. Figure 1 This involves a database 100, a server 200, a network 300, and a terminal 400. For example, the terminal 400 can be a smartphone, and the terminal 400 can connect to the server 200 via the network 300. The image generation model can be stored in the database 100, which can be independent of the server 200 or deployed on the server 200. Figure 1 Database 100 is exemplarily shown independently of server 200. Network 300 can be a wide area network (WAN), a local area network (LAN), or a combination of both. Server 200 can acquire multiple sample images and an image generation model to be trained; the sample images are input into the image generation model to generate a generated image corresponding to each sample image; for each generated image, multiple masking processes are performed to obtain multiple masked images; based on the masked images and the training target information of the masked images, the model parameters of the image generation model to be trained are adjusted to obtain the trained image generation model, which is then stored in database 100. When image generation is needed, server 200 retrieves the image generation model from database 100 to generate the target image and sends the target image to terminal 400 for display on the terminal's interface.

[0061] Alternatively, the terminal 400 can acquire multiple sample images and an image generation model to be trained. The sample images are input into the image generation model to generate a generated image for each sample image. For each generated image, multiple masking processes are performed to obtain multiple masked images. Based on the masked images and the training target information of the masked images, the model parameters of the image generation model to be trained are adjusted to obtain the trained image generation model. Then, when image generation is needed, the terminal 400 calls the image generation model to generate the target image and displays it through the terminal 400's display interface.

[0062] Taking the generation of the model by server 200 for model training as an example, see [link / reference]. Figure 2A , Figure 2A This is a schematic diagram of the server 200-1 structure provided in an embodiment of this application. Figure 2A The server 200-1 shown includes at least one processor 210-1, memory 230-1, and at least one network interface 220-1. The various components in server 200-1 are coupled together via a bus system 240-1. It is understood that the bus system 240-1 is used to implement communication between these components. In addition to a data bus, the bus system 240-1 also includes a power bus, a control bus, and a status signal bus. However, for clarity, ... Figure 2AThe general designated all buses as Bus System 240-1.

[0063] The processor 210-1 can be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or any conventional processor, etc.

[0064] The memory 230-1 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid-state memory, hard disk drives, optical disk drives, etc. The memory 230-1 may optionally include one or more storage devices physically located away from the processor 210-1.

[0065] Memory 230-1 may include volatile memory or non-volatile memory, or both. Non-volatile memory may be read-only memory (ROM), and volatile memory may be random access memory (RAM). The memory 230-1 described in this application embodiment is intended to include any suitable type of memory.

[0066] In some embodiments, memory 230-1 is capable of storing data to support various operations, examples of which include programs, modules, and data structures or subsets or supersets thereof, as illustrated below.

[0067] Operating system 231-1 includes system programs for handling various basic system services and performing hardware-related tasks, such as the framework layer, core library layer, driver layer, etc., for implementing various basic business functions and handling hardware-based tasks;

[0068] The network communication module 232-1 is used to reach other electronic devices via one or more (wired or wireless) network interfaces 220-1, such as Bluetooth, WiFi, and Universal Serial Bus (USB).

[0069] In some embodiments, the apparatus provided in this application can be implemented in software. Figure 2AAn image generation model training device 233 stored in memory 230-1 is shown. This device can be software in the form of programs and plugins, and includes the following software modules: a first acquisition module 2331, a first generation module 2332, and a model training module 2333. These modules are logically connected and can therefore be arbitrarily combined or further separated according to their implemented functions. The functions of each module will be described below.

[0070] Taking server 200 for image generation as an example, see Figure 2B , Figure 2B This is a schematic diagram of the server 200-2 structure provided in the embodiments of this application. Figure 2B The server 200-2 shown includes at least one processor 210-2, memory 230-2, and at least one network interface 220-2. The various components in server 200-2 are coupled together via a bus system 240-2. It is understood that the bus system 240-2 is used to implement communication between these components. In addition to a data bus, the bus system 240-2 also includes a power bus, a control bus, and a status signal bus. However, for clarity, ... Figure 2B All buses are labeled as Bus System 240-2. For detailed descriptions of Processor 210-2 and Memory 230-2, please refer to the above text; they will not be repeated here.

[0071] In some embodiments, the apparatus provided in this application can be implemented in software. Figure 2B An image generation apparatus 234 stored in memory 230-2 is shown. This apparatus can be software in the form of programs and plug-ins, and includes the following software modules: a second acquisition module 2341, a second generation module 2342, and an output module 2343. These modules are logically connected and can therefore be arbitrarily combined or further separated according to their implemented functions. The functions of each module will be described below.

[0072] In other embodiments, the apparatus provided in this application can be implemented in hardware. For example, the apparatus provided in this application can be a processor in the form of a hardware decoding processor, which is programmed to execute the image generation model training method or image generation method provided in this application. For example, the processor in the form of a hardware decoding processor can be one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), programmable logic devices (PLDs), complex programmable logic devices (CPLDs), field-programmable gate arrays (FPGAs), or other electronic components.

[0073] The image generation model training method provided in this application will be described in conjunction with exemplary applications and implementations of the terminals provided in the embodiments of this application.

[0074] The image generation model training method provided in the embodiments of this application will be described below. As mentioned above, the electronic device implementing the image generation model training method of the embodiments of this application can be a terminal, a server, or a combination of both. Therefore, the executing entity of each step will not be described again below.

[0075] It should be noted that the model training examples below are illustrated using an image generation model. Those skilled in the art can apply the image generation model training method provided in the embodiments of this application to the training of other types of models based on their understanding of the following text.

[0076] See Figure 3A , Figure 3A This is a flowchart illustrating the image generation model training method provided in this application embodiment, which will be combined with... Figure 3A The steps shown are explained.

[0077] In step 101, multiple sample images and an image generation model to be trained are obtained.

[0078] Here, the sample images are real image data used to train the image generation model. They are images collected from the real world or a specific domain, and can be various images collected from the natural environment, such as landscapes, portraits, and animals. Multiple sample images can be obtained from publicly available datasets, such as the Animal FaceHQ Cat dataset (AFHQ-Cat) and the FaceForum High-Quality dataset (FFHQ). AFHQ-Cat is a dataset specifically for cat face images, designed to provide high-quality, high-resolution cat face images for machine learning models. FFHQ is a high-quality human face image dataset containing a large number of high-resolution, high-quality human face images. The image generation model to be trained is a generative adversarial network (GAN), consisting of a generator and a discriminator. It can create new, style-similar images by learning the features and patterns of the sample data. The generator is responsible for generating images, while the discriminator is responsible for distinguishing between generated and real images. The two compete with each other, continuously improving the quality of the generated images.

[0079] In step 102, the sample images are input into the image generation model to be trained to generate the generated image corresponding to each sample image.

[0080] Here, the generated images are realistic and diverse in style, used to simulate real sample images. The generated images are of the same type as the sample images; for example, if the sample image is a landscape image, the generated image will also be a landscape image; if the sample image is a person image, the generated image will also be a person image; if the sample image is an animal image, the generated image will also be an animal image. The image generation model to be trained learns the features and patterns of the sample data, enabling it to generate new images of the same type as the sample images. Therefore, by inputting a sample image into the image generation model to be trained, the corresponding generated image can be generated.

[0081] In step 103, for each generated image, the generated image is subjected to multiple masking processes to obtain multiple masked images corresponding to the generated image.

[0082] Here, masking is used to hide, isolate, or highlight specific regions in the generated image. The generated image can be masked using a mask image corresponding to the mask data. The mask image is a binary image where the pixel values ​​of non-masked areas are typically 1, while the pixel values ​​of masked areas are 0. By multiplying the mask image pixel-by-pixel with the generated image, the masked regions in the generated image can be covered, resulting in a masked image. Therefore, the pixel values ​​of the masked regions in the masked image are 0, and the masked regions do not carry the original pixel information; thus, it can be understood that some areas in the masked image are covered. The training target information for the masked image can be either a real image or a generated image; alternatively, the training target information for the masked image can be the prediction result output by the discriminator when performing prediction processing on the sample image. The training target information is either 1 or 0, where a training target information of 1 indicates that the masked image is a real image, and a training target information of 0 indicates that the masked image is a generated image.

[0083] In some embodiments, each generated image is used to train the image generation model to be trained for N iterations, where N is an integer greater than 2, and the generated image is masked once in each training iteration.

[0084] Here, N is preset based on the developers' experience and can be an integer such as 3, 5, or 10. For example, if N is 4, then the image generation model to be trained will undergo 4 training iterations using each generated image. In each training iteration, the masked image is fixed. In the next training iteration, the generated image will undergo a new masking process to cover other regions in the generated image, resulting in a new masked image. Then, the generated image will be used for the next training iteration. In this way, the image generation model to be trained can focus on the feature information of different locations in the generated image in different training iterations.

[0085] Among them, such as Figure 3B As shown, step 103, "performing multiple masking processes on the generated image to obtain multiple masked images corresponding to the generated image," can be achieved through steps 1031 and 1032, including:

[0086] In step 1031, for the k-th training iteration, the feature image corresponding to the generated image and the k-th mask data corresponding to the k-th training iteration are obtained.

[0087] Here, a feature image is obtained by extracting features from the generated image. The feature image includes important features of the generated image, such as edges, textures, colors, and shapes. Mask data is a set of binary data used to identify specific regions in the generated image. This data can be an array of numbers or a matrix. The mask data can be a binary mask, consisting of two values: 1 and 0. 1 represents masked regions in the generated image that do not require masking, and 0 represents regions in the generated image that require masking. The mask data can be converted into a mask image, which is a binary image with the same size as the feature image. Pixel values ​​in non-masked regions of the mask image are typically 1, while pixel values ​​in masked regions are 0. k is an integer. When k = 1, it indicates the first training iteration. In this case, the first mask data is either pre-set or randomly generated. When k = 2, 3, ..., N, the k-th mask data is obtained by randomly updating the (k-1)-th mask data corresponding to the (k-1)-th training iteration. For example, the 7th mask data is obtained by randomly updating the 6th mask data corresponding to the 6th training iteration. A mask data that can be pre-set for timed updates can be used. This mask data has time consistency with the training iterations; that is, the time interval between mask data updates is consistent with the time required for a training iteration. For example, if the mask data is updated every 2 seconds, and the time required for a training iteration is also 2 seconds, then starting from the beginning of the first training iteration, the first training iteration ends after 2 seconds, at which point the mask data has also been updated for the first time, and so on. Thus, at the end of a training iteration, the mask data is exactly updated. At this time, the generated image is masked using the latest mask data to obtain the mask image needed for the next training iteration, and so on, until all training iterations are completed.

[0088] In step 1032, the feature image is masked based on the k-th mask data to obtain the masked image corresponding to the k-th training iteration.

[0089] Here, the masking data and pixel values ​​in the feature image are multiplied to achieve masking. For example, each pixel value in the feature image is multiplied by the corresponding pixel value in the mask. Pixel values ​​in the masked region are multiplied by 0, turning them black, while pixel values ​​in the non-masked region are multiplied by 1, leaving the original pixel unchanged. Each training iteration produces a corresponding masked image; therefore, the masked image for the k-th training iteration is only used in the k-th training iteration. Since N training iterations were performed in total, the same generated image was masked N times using N mask data, resulting in N masked images.

[0090] In this embodiment, each generated image is used for N training iterations. In each training iteration, the generated image is masked once. For the k-th training iteration, the feature image corresponding to the generated image and the k-th mask data corresponding to the k-th training iteration are obtained. Based on the k-th mask data, the feature image is masked to obtain the masked image corresponding to the k-th training iteration. Thus, the masked image remains unchanged within the same training iteration, but is dynamically updated in different training iterations. This relatively dynamic masking method enables the image generation model to focus on the features of different local regions in the generated image over time, allowing the model to better adapt to the image features and thereby improving the realism of the images generated by the trained model.

[0091] In step 104, based on the masked image and the training target information of the masked image, the model parameters of the image generation model to be trained are adjusted to obtain the trained image generation model.

[0092] Here, the training target information of the masked image is used to measure the difference between the output of the image generation model to be trained and the expected result. Taking the example that the training target information of the masked image is the prediction result output by the discriminator when predicting the sample image, the training target information is 1. In this case, the discriminator first performs prediction processing on the masked image to obtain the prediction result, and then calculates the total loss value between the prediction result of the masked image and the training target information using a loss function. The total loss value is used to characterize the difference between the generated image and the sample image in the unmasked region. Afterwards, the image generation model to be trained is backpropagated using the total loss value, thereby adjusting the model parameters and improving the model's performance. Alternatively, taking the example that the training target information of the masked image is the feature image of the sample image corresponding to the generated image, since the masked image is also a feature image, the total loss value between the feature images of the masked image and the sample image can be calculated using a loss function. The image generation model to be trained is then backpropagated using the total loss value, thereby adjusting the model parameters and improving the model's performance, resulting in the trained image generation model.

[0093] In some embodiments, such as Figure 3C As shown, step 104 can be implemented through steps 1041 to 1043, including:

[0094] In step 1041, for the k-th training iteration, the image generation model to be trained is used to perform M prediction processes based on the image after the k-th mask, resulting in M ​​prediction results.

[0095] Here, prediction processing refers to the classification of the masked image by the discriminator in the image generation model to be trained, with a prediction result of 1 or 0. A prediction result of 1 indicates that the discriminator considers the input data to be a real image, while a prediction result of 0 indicates that the discriminator considers the input data to be a fake image. Alternatively, the prediction result can be a probability value between 0 and 1, such as 0.4, 0.75, etc. The closer the prediction result is to 1, the higher the probability that the discriminator considers the input data to be a real image; the closer the prediction result is to 0, the higher the probability that the discriminator considers the input data to be a fake image. M is an integer, such as 3, 9, 11, etc., which is preset by the developers based on experience. In each training iteration, the discriminator in the image generation model to be trained performs prediction processing M times on the masked image in this training iteration, obtaining M prediction results.

[0096] In step 1042, the total loss value is determined based on the M prediction results and the training target information of the masked image.

[0097] Here, the training target information of the masked image is the prediction result output by the discriminator when it performs prediction processing on the sample image, and the training target information is 1. The total loss value is used to measure the difference between the generated image output by the image generation model and the sample image. The total loss value can include the loss value used to train the generator and the loss value used to train the discriminator.

[0098] In step 1043, the model parameters of the image generation model to be trained are adjusted based on the total loss value.

[0099] Here, the model parameters of the image generation model to be trained are gradually adjusted using the gradient descent algorithm based on the total loss value in order to reduce the total loss value.

[0100] Specifically, after completing the k-th training iteration of the image generation model to be trained, the k-th mask data is randomly updated to obtain the k+1-th mask data corresponding to the (k+1)-th training iteration. In this way, the mask data is automatically updated at the end of a training iteration, thereby obtaining the mask image needed for the next training iteration and improving the model training efficiency.

[0101] In this embodiment, for each training iteration, the image generation model to be trained performs M prediction processes based on the image after the k-th mask, obtaining M prediction results. The total loss value is determined based on the M prediction results and the training target information of the masked image. The model parameters of the image generation model to be trained are then adjusted based on the total loss value. In this way, a more accurate total loss value can be determined through multiple prediction results, thereby allowing for more precise adjustment of the model parameters of the image generation model to be trained, improving model training efficiency and accuracy.

[0102] In some embodiments, the image generation model to be trained includes a generator and a discriminator. Therefore, when generating the generated image corresponding to the sample image, the generator can be used directly to generate the generated image corresponding to each sample image; when performing prediction processing, the discriminator can be used directly to perform M prediction processes based on the image after the k-th mask to obtain M prediction results.

[0103] In some embodiments, see Figure 3D Step 1042 can be implemented through steps 421 to 423, including:

[0104] In step 421, the first loss function corresponding to the generator and the second loss function corresponding to the discriminator are obtained.

[0105] For example, the first loss function is shown in Equation (1).

[0106]

[0107] Among them, L G Let G(z) be the loss function of the generator, which measures the quality of the generated image. Let z be the input of the generator G, x be the sample image, G(z) be the generated image, and D(G(z)) be the prediction result. It is the expected value operator, which means taking the expected value of the random variable z.

[0108] The second loss function is shown in equation (2).

[0109]

[0110] Among them, L D Let be the loss function of the discriminator, which measures the discriminator's ability to distinguish between sample images and generated images. Let z be the input of the generator G, x be the sample image, G(z) be the generated image, and D(G(z)) be the prediction result. It is the expected value operator, which means taking the expected value of the random variable z.

[0111] In step 422, the first loss value is determined based on the M prediction results, the training target information of the masked image, and the first loss function.

[0112] Here, the M prediction results are taken as D(G(z)), and the training target information of the masked image is taken as x. They are substituted into formula (1) for calculation, and the calculation result is determined as the first loss value.

[0113] In step 423, a second loss value is determined based on the M prediction results, the training target information of the masked image, and the second loss function.

[0114] Here, the M prediction results are taken as D(G(z)), and the training target information of the masked image is taken as x. They are substituted into formula (2) for calculation, and the calculation result is determined as the second loss value.

[0115] Correspondingly, when adjusting the model parameters of the image generation model to be trained based on the total loss value, the model parameters of the generator and the discriminator can be adjusted alternately based on the first loss value and the second loss value.

[0116] Here, the generator's model parameters are first adjusted using gradient descent based on the first loss value, then the discriminator's model parameters are adjusted using gradient descent based on the second loss value, and then the generator's model parameters are adjusted using gradient descent based on the first loss value again... This process is repeated until the generator can generate sufficiently realistic generated images, making it difficult for the discriminator to distinguish between sample images and generated images.

[0117] In this embodiment, the image generation model to be trained includes a generator and a discriminator. The generator generates a generated image corresponding to each sample image, and the discriminator performs prediction processing to obtain a prediction result. Based on the prediction result, the training target information of the masked image, a first loss function, and a second loss function, a first loss value and a second loss value are determined. The model parameters of the generator and the discriminator are alternately adjusted based on the first loss value and the second loss value. In this way, through adversarial training, the generator can generate high-quality, realistic generated images, and the discriminator can improve its discrimination ability, thereby improving the image generation effect and the ability to distinguish images of the trained image generation model.

[0118] In some embodiments, such as Figure 3E As shown, the step 103, "masking the generated image to obtain the masked image," can also be implemented through steps 1131 to 1133, including:

[0119] In step 1131, feature extraction is performed on the generated image to obtain a feature image.

[0120] Here, the feature image includes certain important features of the generated image, such as edges, texture, color, and shape. When the image generation model to be trained has a discriminator, the discriminator can be used directly to extract features from the generated image.

[0121] In step 1132, multiple preset mask data are obtained.

[0122] Here, the preset mask data is pre-set mask data, which can be a binary mask. A 1 indicates a masked region in the generated image that does not need to be masked, and a 0 indicates a region in the generated image that needs to be masked. The preset mask data can be converted into a mask image with the same size as the feature image. Each preset mask data corresponds to a different masked region. Developers can customize these settings based on experience. For example, with three preset mask data sets, the first preset mask data can mask the upper left corner of the generated image, the second preset mask data can mask the middle area of ​​the generated image, and the third preset mask data can mask the right side of the generated image.

[0123] In step 1133, for each preset mask data, the feature image is masked based on the preset mask data to obtain the masked image.

[0124] Here, the feature image is masked using each preset mask data to obtain multiple masked images, each with a different masked region. Then, using each masked image and the training target information of the masked images, the model parameters of the image generation model to be trained are adjusted to obtain the trained image generation model.

[0125] In this embodiment, feature images are obtained by feature extraction from the generated image, and multiple preset mask data are acquired. The feature images are then masked using each preset mask data to obtain a masked image. In this way, the preset mask data can be customized according to scenario requirements, allowing for targeted masking of the generated images, thereby improving model training efficiency.

[0126] In this embodiment, multiple sample images and an image generation model to be trained are acquired. The sample images are input into the image generation model to generate a corresponding generated image for each sample image. Multiple masking processes are performed on each generated image to obtain multiple masked images. Based on the masked images and training target information, the model parameters of the image generation model to be trained are adjusted to obtain the trained image generation model. In this way, by masking the generated images, some features in the generated images can be suppressed at each masking stage. When training the image generation model based on the masked images and training target information, the model parameters are adjusted only based on a portion of the features of the generated images, avoiding overtraining that could lead to the model only generating fixed types of images, thereby improving the diversity of images generated by the image generation model. Furthermore, different masking processes can suppress different features in the generated images, improving the diversity of generated images while ensuring that the image generation model can fully learn the features of the sample images, thus improving the quality and realism of the generated images. The training process provided in this application is simple and efficient, and can ensure that the trained image generation model can generate high-quality and diverse images, thereby improving the training efficiency of the image generation model and the realism of the generated images.

[0127] See Figure 4 , Figure 4 This is a flowchart illustrating the image generation method provided in the embodiments of this application, which will be combined with... Figure 4 The steps shown are explained.

[0128] In step 201, when the image generation timing is reached, the trained image generation model is obtained.

[0129] Here, the image generation timing can be determined based on a received instruction, such as determining the image generation timing upon receiving an instruction to start image generation. Alternatively, the image generation timing can also be determined based on a preset time, such as ten minutes later, in which case the image generation timing is determined to be reached after ten minutes; or the preset time can be a specific point in time, in which case the image generation timing is determined to be reached when the current time reaches that specific point in time. The trained image generation model is obtained using the image generation model training method provided in the embodiments of this application, and the trained image generation model includes the trained generator.

[0130] In step 202, at least one target image is generated using the trained generator.

[0131] Here, at least one target image is generated using the trained generator in the trained image generation model. The target image is a realistic generated image of a certain type, such as a landscape, portrait, or animal image.

[0132] In step 203, at least one target image is output.

[0133] Here, the target image can be displayed on the device's display interface to achieve output, or the target image can be sent to a designated device to achieve output, or the target image can be stored and output when needed.

[0134] In this embodiment, when the image generation timing is reached, at least one target image is generated and output using the trained generator in the trained image generation model. Thus, the trained image generation model in this embodiment can generate high-quality target images, improving the realism of the generated images.

[0135] The following will describe an exemplary application of the embodiments of this application in a real-world model training scenario.

[0136] This application proposes a method to improve the performance of generative adversarial networks (GANs) based on attention synchronization suppression discriminator. In each training iteration of the discriminator of the GAN model, a time-consistent mask is used to mask the feature image to reduce the feature units in the masked region of the feature image, thereby achieving synchronous suppression of the discriminator's attention. This prompts the discriminator to focus on feature information at different locations in the generated image in different training iterations, thereby further improving the generator's image synthesis capability.

[0137] In this embodiment, attention inhibition is added to the discriminator of the generative adversarial network in related technologies, such as... Figure 6 As shown, Figure 6 This is another schematic diagram of the image generation model training method provided in this application embodiment. In this method, generator 61 generates a generated image 63 corresponding to the sample image 64 and inputs it into discriminator 62. Discriminator 62 discriminates the generated image 63 based on the generated image 63 and the sample image 64. During the discrimination process, attention suppression is added to enable discriminator 62 to focus on feature information at different locations in the generated image 63 during different training iterations, resulting in multiple prediction results 65. Figure 6 (Only one is shown in the image). Then, multiple prediction results 65 are compared with the training target information 66 to determine the total loss value 67. The total loss value 67 includes a first loss value 671 and a second loss value 672. The generator 61 is trained by backpropagation using the first loss value 671, and the discriminator 62 is trained by backpropagation using the second loss value 672. In this way, the training stability can be improved and the image synthesis performance of the generator can be enhanced.

[0138] like Figure 7 As shown, Figure 7This is another flowchart illustrating the image generation model training method provided in this application. First, the generated image 71 is input into the discriminator, and feature map 73 is obtained by extracting image features. A temporally consistent mask is used to suppress attention in feature image 73. Specifically, in the first training iteration, the first mask data is used to suppress attention in the first fixed region of feature image 73, resulting in masked image 1. Prediction processing is performed on masked image 1 to obtain prediction result 1. In the second training iteration, the first mask data is updated to second mask data, and the second mask data is used to suppress attention in the second fixed region of feature image 73, resulting in masked image 2. Prediction processing is performed on masked image 2 to obtain prediction result 2. In the third training iteration, the second mask data is updated to third mask data, and the third mask data is used to suppress attention in the third fixed region of feature image 73, resulting in masked image 3. Prediction processing is performed on masked image 3 to obtain prediction result 3. The generator and discriminator are trained using prediction results 1, 2, and 3 to obtain the trained image generation model, which generates the target image 72.

[0139] During training, let the generator be G and the discriminator be D. The generator G is trained to generate images that can fool the discriminator, and the discriminator D is trained to distinguish whether the generated images are real or not. Its loss function is as follows:

[0140]

[0141] Where z is the input to the generator G, and x is a real image sample.

[0142] The discriminator D typically contains N convolutional layers. The i-th layer of the discriminator computes the feature map of the previous layer or the entire input image.

[0143] F i =L i (F i-1 (3)

[0144] Among them, L i Let F be the i-th convolutional layer, and let F be the feature image or the generated image. Given the i-th discriminator, different 0 and 1 binary masks are injected into it during training to reduce the features of some regions in the feature image of this layer.

[0145] set up This represents the mask data in the t-th training iteration. Mask data The size of the mask is the same as the size of the feature image. Using this time-consistent mask data, the calculation formula for the i-th layer of the discriminator is formulated as follows:

[0146]

[0147] Where ⊙ represents the Hadamard matrix product. Let be the feature image in the t-th training iteration. The mask data is fixed within the same training iteration, and is only randomly updated when the training iteration t changes. This relatively dynamic switching and updating method prompts the discriminator to focus on the features of different local regions in various feature images over time, allowing the discriminator to better adapt to the distribution of generated images.

[0148] Below, the performance of the embodiments of this application on the publicly available datasets AFHQ-Cat and FFHQ is evaluated. See Table 1, which compares the performance of the embodiments of this application on the FFHQ dataset with related technologies:

[0149] Table 1

[0150] method FID score Style-GANV2 (CVPR'2020) 3.862 Style-GANV2(Re-Impl.) 3.810 Adaptive Dropout 4.160 LC-Reg 3.933 Dynamic D-Decreasing 3.740 ADA 3.880 APA 3.678 AdaptiveMixup 3.623 zCR 3.450 Image generation model in this application embodiment 3.048

[0151] In Table 1, the FID score represents the Fréchet Inception Distance (FID), a metric used to evaluate the performance of generative models. It is calculated by comparing the statistical characteristics between the generated images and the real images. Generally, the lower the FID score, the closer the distribution of the generated images is to that of the real images, indicating better generator performance. As can be seen from Table 1, the embodiment of this application achieved the lowest FID score of 3.048 on the FFHQ dataset, significantly lower than other methods in related technologies. Therefore, the generated images generated by the embodiment of this application on the FFHQ dataset have the closest distribution to the real images.

[0152] See Table 2, which shows the comparison results of the embodiments of this application on the AFHQ-Cat dataset and related technologies:

[0153] Table 2

[0154] method FID score Style-GANV2 (CVPR'2020) 7.737 Style-GANV2(Re-Impl.) 7.924 LC-Reg 6.699 Dynamic D-Decreasing 5.410 ADA 6.053 APA 4.876 AdaptiveMixup 4.477 Image generation model in this application embodiment 4.459

[0155] As shown in Table 2, the embodiment of this application also obtained the lowest FID score of 4.459 on the AFHQ-Cat dataset, which is significantly lower than other methods in other related technologies. Therefore, the generated image generated by the embodiment of this application on the FFHQ dataset is closest to the distribution of the real image.

[0156] See Figure 8A , Figure 8AThese are schematic diagrams illustrating the image generation effect provided in this application embodiment. Images 81 to 84 are generated images produced by other methods in related technologies: in image 81, the person's glasses are incomplete; in image 82, there is an extra portion in the lower left corner of the generated image; in image 83, there is an extra portion in the upper right corner of the generated image; and in image 84, there is an extra portion in the person's eyebrows. Correspondingly, images 85 to 88 are generated images generated by the trained image generation model in this application embodiment: in image 85, the person's glasses are complete; in image 86, there is no extra portion in the lower left corner of the generated image; in image 87, there is no extra portion in the upper right corner of the generated image; and in image 88, there is no extra portion in the person's eyebrows. Therefore, it can be seen that the generated image quality produced by the trained image generation model in this application embodiment is superior to other methods in related technologies, and can repair flawed and unrealistic parts in the generated image, thereby improving the realism and image quality of the generated image.

[0157] See Figure 8B , Figure 8B This is another schematic diagram of the image generation effect provided in the embodiments of this application. Figure 8B The document displays eight different generated images, which are generated by the image generation model trained in this embodiment of the application. Figure 8B It can be clearly seen that the image generation model trained in this embodiment of the application can generate realistic generated images.

[0158] This application proposes a method to improve the performance of generative adversarial networks (GANs) based on an attention-synchronous suppression discriminator, overcoming problems such as pattern collapse, training instability, and limited generation diversity in current GANs. In this application, a portion of feature units in the feature image are synchronously reduced from the same location within a training iteration, and the location of the feature to be reduced is changed only in the next training iteration. By dynamically adjusting the attention region at the image feature level during GAN training, the generator is encouraged to synthesize higher-quality images, thus improving the training stability of the GAN. This application improves the training efficiency of image generation models and the realism of generated images, thereby producing higher-quality generated images.

[0159] The following description continues to illustrate the exemplary structure of the image generation model training device 233 in the server 200-1 provided in this application embodiment as a software module. In some embodiments, such as Figure 2A As shown, the software modules stored in the image generation model training device 233 of the memory 230-1 may include:

[0160] The first acquisition module 2331 is used to acquire multiple sample images and an image generation model to be trained. The first generation module 2332 is used to generate a generated image corresponding to each of the sample images using the image generation model to be trained. The model training module 2333 is used to perform the following process for each generated image to obtain a trained image generation model: masking the generated image to obtain a masked image, and acquiring the training target information of the masked image; adjusting the model parameters of the image generation model to be trained based on the masked image and the training target information of the masked image.

[0161] In some embodiments, the model training module 2333 is further configured to obtain the feature image corresponding to the generated image, and the k-th mask data corresponding to the k-th training iteration, wherein the k-th mask data is obtained by randomly updating the (k-1)-th mask data corresponding to the (k-1)-th training iteration, and k = 2, 3, ..., N; and to perform masking processing on the feature image based on the k-th mask data to obtain the masked image corresponding to the k-th training iteration.

[0162] In some embodiments, the model training module 2333 is further configured to, for the k-th training iteration, perform M prediction processes on the image generation model to be trained based on the k-th masked image to obtain M prediction results; determine the total loss value based on the M prediction results and the training target information of the masked image; and adjust the model parameters of the image generation model to be trained based on the total loss value.

[0163] In some embodiments, the generation module 2332 is further configured to generate a generated image corresponding to each of the sample images using the generator; the model training module 2333 is further configured to perform M prediction processing based on the image after the k-th mask using the image generation model to be trained, and obtain M prediction results, including: performing M prediction processing based on the image after the k-th mask using the discriminator, and obtaining M prediction results.

[0164] In some embodiments, the model training module 2333 is further configured to obtain a first loss function corresponding to the generator and a second loss function corresponding to the discriminator; determine a first loss value based on the M prediction results, the training target information of the masked image, and the first loss function; determine a second loss value based on the M prediction results, the training target information of the masked image, and the second loss function; and adjust the model parameters of the image generation model to be trained based on the loss value, including: alternately adjusting the model parameters of the generator and the model parameters of the discriminator based on the first loss value and the second loss value.

[0165] In some embodiments, the model training module 2333 is further configured to randomly update the k-th mask data to obtain the k+1-th mask data corresponding to the k+1-th training iteration.

[0166] In some embodiments, the model training module 2333 is further configured to extract features from the generated image to obtain a feature image; acquire multiple preset mask data, each preset mask data corresponding to a different mask region; and perform masking processing on the feature image based on each preset mask data to obtain the masked image.

[0167] The following description continues to illustrate the exemplary structure of the image generation device 234 in the server 200-2 provided in this application embodiment as a software module. In some embodiments, such as Figure 2B As shown, the image generation device 234 stored in memory 230-2 may include:

[0168] The second acquisition module 2341 is used to acquire the trained image generation model when the image generation time is reached. The trained image generation model is obtained by training using the image generation model training method provided in the embodiments of this application. The trained image generation model includes a trained generator. The second generation module 2342 is used to generate at least one target image using the trained generator. The output module 2343 is used to output the at least one target image.

[0169] This application provides a computer program product, which includes a computer program or computer-executable instructions stored in a computer-readable storage medium. The processor of an electronic device reads the computer-executable instructions from the computer-readable storage medium and executes the computer-executable instructions, causing the electronic device to perform the image generation model training method or image generation method described in this application.

[0170] This application provides a computer-readable storage medium storing computer-executable instructions or a computer program. When the computer-executable instructions or the computer program are executed by a processor, the processor will execute the image generation model training method or image generation method provided in this application. For example, ... Figure 3A The image generation model training method shown, or as... Figure 4 The image generation method is shown.

[0171] In some embodiments, the computer-readable storage medium may be a memory such as RAM, ROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or it may be a variety of devices including one or any combination of the above-mentioned memories.

[0172] In some embodiments, computer-executable instructions may take the form of programs, software, software modules, scripts, or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.

[0173] As an example, computer-executable instructions may, but do not necessarily, correspond to files in a file system. They may be stored in a portion of a file that holds other programs or data, for example, in one or more scripts in a Hyper Text Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple co-located files (e.g., a file that stores one or more modules, subroutines, or code sections).

[0174] As an example, computer-executable instructions can be deployed to execute on a single electronic device, or on multiple electronic devices located at one location, or on multiple electronic devices distributed across multiple locations and interconnected via a communication network.

[0175] In summary, in this embodiment, multiple sample images and an image generation model to be trained are obtained. The sample images are input into the image generation model to generate a generated image corresponding to each sample image. Multiple masking processes are performed on each generated image to obtain multiple masked images. Based on the masked images and training target information, the model parameters of the image generation model to be trained are adjusted to obtain the trained image generation model. Thus, by masking the generated images, some features in the generated images can be suppressed at each masking stage. When training the image generation model based on the masked images and training target information, the model parameters are adjusted only based on a portion of the features of the generated images, avoiding overtraining that could lead to the model only generating fixed types of images, thereby improving the diversity of images generated by the image generation model. Furthermore, different masking processes can suppress different features in the generated images, improving the diversity of generated images while ensuring that the image generation model can fully learn the features of the sample images, thereby improving the quality and realism of the generated images. The training process provided in this application is simple and efficient, and can ensure that the trained image generation model can generate high-quality and diverse images, thereby improving the training efficiency of the image generation model and the realism of the generated images.

[0176] The above description is merely an embodiment of this application and is not intended to limit the scope of protection of this application. Any modifications, equivalent substitutions, and improvements made within the spirit and scope of this application are included within the scope of protection of this application.

Claims

1. A method for training an image generation model, characterized in that, The method includes: Acquire multiple sample images and an image generation model to be trained; The sample images are input into the image generation model to be trained to generate a generated image corresponding to each sample image. For each generated image, the generated image is subjected to multiple masking processes to obtain multiple masked images corresponding to the generated image; Based on the masked image and the training target information of the masked image, the model parameters of the image generation model to be trained are adjusted to obtain the trained image generation model.

2. The method according to claim 1, characterized in that, Each generated image is used for N training iterations, where N is an integer greater than 2. During each training iteration, the generated image is masked once. The step of performing multiple masking processes on the generated image to obtain multiple masked images corresponding to the generated image includes: For the k-th training iteration, the feature image corresponding to the generated image and the k-th mask data corresponding to the k-th training iteration are obtained. The k-th mask data is obtained by randomly updating the k-1 mask data corresponding to the (k-1)-th training iteration, where k = 2, 3, ..., N. The feature image is masked based on the k-th mask data to obtain the masked image corresponding to the k-th training iteration.

3. The method according to claim 2, characterized in that, The step of adjusting the model parameters of the image generation model to be trained based on the masked image and the training target information of the masked image includes: For the k-th training iteration, the image generation model to be trained is used to perform M prediction processes based on the image after the k-th mask, resulting in M ​​prediction results; Based on the M prediction results and the training target information of the masked image, the total loss value is determined; The model parameters of the image generation model to be trained are adjusted based on the total loss value.

4. The method according to claim 3, characterized in that, The image generation model to be trained includes a generator and a discriminator. The step of generating a generated image corresponding to each of the sample images using the image generation model to be trained includes: generating a generated image corresponding to each of the sample images using the generator; The step of using the image generation model to be trained to perform M prediction processes based on the image after the k-th mask to obtain M prediction results includes: using the discriminator to perform M prediction processes based on the image after the k-th mask to obtain M prediction results.

5. The method according to claim 4, characterized in that, The determination of the total loss value based on the M prediction results and the training target information of the masked image includes: Obtain the first loss function corresponding to the generator and the second loss function corresponding to the discriminator; The first loss value is determined based on the M prediction results, the training target information of the masked image, and the first loss function; The second loss value is determined based on the M prediction results, the training target information of the masked image, and the second loss function; The step of adjusting the model parameters of the image generation model to be trained based on the loss value includes: alternatingly adjusting the model parameters of the generator and the model parameters of the discriminator based on the first loss value and the second loss value.

6. The method according to claim 3, characterized in that, After completing the k-th training iteration of the image generation model to be trained, the method further includes: The k-th mask data is randomly updated to obtain the k+1-th mask data corresponding to the k+1-th training iteration.

7. The method according to claim 1, characterized in that, The step of performing multiple masking processes on the generated image to obtain multiple masked images corresponding to the generated image includes: Feature extraction is performed on the generated image to obtain a feature image; Acquire multiple preset mask data, each of which corresponds to a different mask region; For each of the preset mask data, the feature image is masked based on the preset mask data to obtain the masked image.

8. An image generation method, characterized in that, The method further includes: When the image generation timing is reached, a trained image generation model is obtained. The trained image generation model is obtained by training the image generation model training method according to any one of claims 1 to 7. The trained image generation model includes a trained generator. Using the trained generator, at least one target image is generated; Output the at least one target image.

9. An image generation model training device, characterized in that, The device includes: The first acquisition module is used to acquire multiple sample images and the image generation model to be trained. The first generation module is used to input the sample image into the image generation model to be trained, and generate a generated image corresponding to each sample image; The model training module is used to perform multiple masking processes on each generated image to obtain multiple masked images corresponding to the generated image. The model training module is further configured to adjust the model parameters of the image generation model to be trained based on the masked image and the training target information of the masked image, so as to obtain the trained image generation model.

10. An image generation apparatus, characterized in that, The device includes: The second acquisition module is used to acquire the trained image generation model when the image generation time is reached. The trained image generation model is obtained by training using the image generation model training method according to any one of claims 1 to 7. The trained image generation model includes a trained generator. The second generation module is used to generate at least one target image using the trained generator; An output module is used to output the at least one target image.

11. An electronic device, characterized in that, The electronic device includes: Memory is used to store executable instructions or computer programs. The processor, when executing computer-executable instructions or computer programs stored in the memory, implements the image generation model training method according to any one of claims 1 to 7, or the image generation method according to claim 8.

12. A computer-readable storage medium storing computer-executable instructions or a computer program, characterized in that, When the computer-executable instructions or computer program are executed by a processor, they implement the image generation model training method according to any one of claims 1 to 7, or the image generation method according to claim 8.

13. A computer program product comprising computer-executable instructions or a computer program, characterized in that, When the computer-executable instructions or computer program are executed by a processor, they implement the image generation model training method according to any one of claims 1 to 7, or the image generation method according to claim 8.