[0061] Next, the technical solutions in the embodiments of the present invention will be described in connection with the drawings of the embodiments of the present invention, and it is understood that the described embodiments are merely the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art are in the range of the present invention without making creative labor premise.
[0062] In order to make the above objects, features, and advantages of the present invention, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
[0063] Refer figure 1 As shown, this embodiment provides an image generation method, including:
[0064] S1, acquire training data set, and training data sets include several sheets of first image X and several sheets of second image Y; wherein the first image X is the original image, the second image y is an image to be generated;
[0065] S2, based on CycleGaN and VAE establish a neural network model;
[0066] The principle of CycleGan and VAE is as follows:
[0067] GaN includes generator (refermstriction g) and discriminator, briefly D), G is used to generate data, and D is used to identify real data and generated data, both simultaneously training. G is responsible for implying the original image as much as possible into a real image containing the target feature, while D generated the image and the real image generated as much as possible, G, and D can form a game. After the game, the generation capability of G is enhanced, and D identification ability is improved. When D distingurated image is true or G-generated image, the counter-fighting process reaches Nash balance, at this time, it is considered that the counter process ends, G obtains the optimal generation capability, and D obtain the strongest discrimination.
[0068] Traditional GaN includes generator g from a false image from the domain A gene B AB , Generator G from the resumption of the fake image in domain B to the reconstructed image in domain A BA , And domain b disciator D B ,like figure 2 Indicated. Traditional GAN is one-way, training this single-way GaN requires two loss functions: the generator's reconstruction loss function L and the discriminator of the discriminator LGAN, the calculation method is as follows:
[0069] L (g AB , G BA , A, b) = e a~A [|| g BA (G AB (a)) || 1 ]
[0070] L GAN (G AB , D B , A, b) = e b~B LOG D B (b)] + E a~A [log (1-D B (G AB (a))])]]]]
[0071] In the formula, e [] represents the desired value of the distribution function, which represents the domains A, B ~ B of the sample A, b ~ b to represent the domain B belong to the sample B, || || 1 represents the L1 norm. The purpose of reconstructing the loss function L is to make the reconstructed image GBA (Gab (a)) as similar to the original map A. Discriminating loss function L GAN The purpose is to make the image generated by the generated network in line with the characteristic distribution of the target image.
[0072] CycleGan is essentially a ring network composed of two mirrored symmetrical confrontation networks. like image 3 As shown, CYCLEGAN has two discriminators: DX, DY, two builders: G, and F, confrontation network sharing two generators, but discriminator is independent of each other. One advantage of CYCLEGAN is that the training of two image sets that can be implemented, overcoming the shortcomings of the requirements of the Pixel2pixel method must strictly paid. The CycleGan works is that the original data set is created to generate a mapping of the data set by training, ensuring that there is meaningful association between the input image and the generated image. like image 3 As shown, CYCLEGAN acquires image X, X input generator g to obtain the image in the target domain B from domain a. image Restore the image in Domain A by generator F Similarly, there is Y to Y to the domain B Symmetric process.
[0073] In order to make the generated image meet the requirements, it is necessary to use the loss function to constrain the image generation process. Set the mapping function of the A domain image to the B domain image, F is the mapping function of the B domain image to the A domain image, D X And D Y A discrimination of the GaN network in the A domain and the B field, respectively. In CycleGan, the loss function is defined as follows:
[0074] L (g, f, d X , D Y ) = L GAN (G, d Y , A, B) + L GAN (F, d X , B, A) + λL cyc (G, f)
[0075]
[0076]
[0077]
[0078] Where || || 1 indicates the L1 norm, E indicates that P data(x) And P data(y) The real probability distribution of the data set in X and Y samples is respectively, and λ is the weight parameters. L (g, f, d X , D Y ) For the loss function of the entire GaN, L GAN (G, d Y , A, B), L GAN (F, d X , B, a) loss function is used to ensure the real image of the generated image proximity domain A or domain B; cyc (G, f) for the cycle loss function of the GaN, ensuring that the maps of domain A to domain B are not single shot, thereby avoiding a specific element of the Element of the A space to b-space when Amounts B mapping.
[0079]VAE hidden by constructing an intermediate variable, and then input to the hidden variables coded image generation networks. VAE against generates an image during the process does not exist, it is not required to achieve the Nash equilibrium when generating the image. Since there is not found a good method to achieve the Nash equilibrium, so training VAE training compared to GAN has stable characteristics. VAE is a most important feature learning prediction mechanism imitating encoder performs encoding and decoding between a measurable function. The most important idea is based on a mathematical fact: For a target probability distribution, given any probability distribution, there is always a differentiable function can be measured, map it to another probability distribution, makes this probability distribution arbitrarily close to the target probability distribution. An important philosophy VAE that follow graph model, hope generated by certain implicit variable sample is constructed out. VAE joint using a Gaussian distribution function as implied measurable distribution, then the learning problem into a measurable function from hidden (hidden variables) to generate a desired map of the sample, this process is a decoding process. The encoder can be obtained using the noise distribution of the input image corresponding to the encoding after selecting noise distribution can be controlled to generate an image, i.e., noise may be obtained by selecting the desired image is generated. May generate a target category by selection of the limiting noise in a VAE encoding process, such that the feature vector output from the encoder subject to standard normal distribution. By selecting the appropriate noise standard normal subject, as an input to the decoder network, via deconvolution calculation of the decoder, the noise is reduced to the desired image. The process does not require the input image, the input image noise only requires a standard normal distribution to generate desired. Since there is no confrontation process of generating an image of true and false judgment, VAE model is only calculated to generate an image of the original image and the mean variance model of training can be achieved, leading to its image compared to GAN will generate more blurred. VAE works as Figure 4 Indicated.
[0080] In the present application, CycleGAN comprises a first discriminator DX, the second discriminator DY, first generator G1 and the second generator G2; VAE comprises a first encoder E1, a second encoder E2, the first classifier and CX CY second classifier; a first encoder E1, a first generator G1, a second encoder E2, the second generator G2 is connected sequentially; a second generator G2 is connected to one end of a second encoder E2, and the other end a first discriminator DX and CX is connected a first classifier; DY second discriminator and the second classifier CY connected between the first generator G1 and a second encoder E2, such as Figure 5 Indicated.
[0081] A first encoder E1 for inputting a first image of the first output image X or X ', is further configured to input a first image corresponding to the image category X XC, and outputs a first encoded ZX;
[0082] A first generator G1 for inputting a first image and a second coding ZX Y corresponding to the image category YC, and outputs the second output image Y ';
[0083] A second encoder E2 for inputting a second image of the second output image Y or Y ', Y is further configured to input a second image corresponding to the image category YC, and outputs a second coded ZY;
[0084] A second generator G2 to a second input of the first image and the ZY encoding an image corresponding to the category X XC, and outputs the first output image X ';
[0085] CX first classifier to input a first image of the first output image X or X ', X and outputs a first image or a first output image X' category belongs;
[0086] CY second classifier for inputting a second image of the second output image Y or Y ', Y and outputs a second image or a second image output Y' category belongs;
[0087] A first discriminator for inputting a first image DX first output image X or X ', X and outputs a first image or a first output image X' true probabilistic;
[0088] DY second discriminator for inputting a second image of the second output image Y or Y ', Y and outputs a second image or a second image output Y' of the true probability.
[0089] S3, train the neural network by training data set, the trained neural network model for image generation;
[0090] In this step, the process of training the training data set by a neural network model, loss function comprises six parts, namely: the loss of discrimination, loss category, divergence loss, loss of generation, generate differential loss, the loss of generating category; the generated from the six parts category, an intermediate distribution of hidden variables, to generate an image to generate image genuine three restrictions, so that the target image to generate an image in the same space. The above-described network structure is introduced classifier performs image generation network category restrictions, so that the generated image more realistic.
[0091] L is calculated as a function of loss of Formula (1):
[0092] L = L DX + L DY + L CX + L CY + λ 1 L KL + λ 2 (L) GX + L GY + λ 3 (L) GDX + L GDY + λ 4 (L) GCX + L GCY )
[0093] ……………………(1)
[0094] Among them, L DX L DY Differential loss functions for the first discriminator Dx, the second authenticator DY, respectively; L CX L CY The category loss function of the first classifier CX, the second classifier CY, respectively; L KL Loss function for a scatter degrees; GX L GY Generate loss functions for the first generator G1 and the second generator G2, respectively; L GDX L GDY Differential loss functions for the generation of the first discriminator DX, the second authentication device DY, respectively; L GCX L GCY Respectively, a first classifier CX, CY classifier generating a second category loss function.
[0095] L DX L DY L CX L CY L KL L GX L GY L GDX L GDY L GCX L GCY Calculating respectively, in Formulas (2) - of formula (12) below:
[0096]
[0097]
[0098]
[0099]
[0100]
[0101]
[0102]
[0103]
[0104]
[0105]
[0106]
[0107] Medium, || · || 2 Represents the L2 norm, P data(A) Indicates the real probability distribution of the data set in A. Indicates that a belongs to P data(A) The expectations, a ∈ {x, y, zx, zy}, p (·) means probability, λ 1 Λ 2 Λ 3 Λ 4 Both weight parameters, μ ZX Μ ZY The mean of ZX, Zy, ε, respectively, ε ZX Ε ZY The variance of ZX and ZY respectively.
[0108] in addition, Figure 5 In f c (X) and f c (X ') are expressed by a first input X and X classifier CX' categorize the results of C (X) and C (X '); f c (Y) and f c (Y ') represent the input Y and Y by the second classifier CY' categorize result is C (Y) and C (Y '); f d (X) and f d (X ') represent X and X DX input by the first discriminator' identification, authentication result is D (X) and D (X '); f d (Y) and f d (Y ') represent the input DY Y and Y by the second discriminator' identification, authentication result is D (Y) and D (Y ').
[0109] Training termination condition by setting the maximum number of training achieved when the maximum number of times the training exercise, training is completed; all images using a first step S1 as the input to a training completion of training.
[0110] After completion of the training of the neural network model, further comprising the steps of:
[0111] S4, obtaining a first image to be an image generated by the second image of the first image input trained neural network model, the generated output.
[0112] Further, due to the symmetry of the network structure, the step S4 further comprises: acquiring a second image generated by the second image input trained neural network model, the output of the first image.
[0113] In the following example the ship identification image surface image generation method of the present application is described in detail; wherein the vessel clear of the surface of the first image captured by X, Y is a second image of the ship image is not clear; while the ship is subject to fluctuations in the seawater cause the camera to capture images will produce motion blur, by collecting vessel with a motion blur image to obtain a second image; were collected for each of the first image and the second image 500.
[0114] A first encoder E1, E2 network structure as a second encoder Image 6 , The network structure of the first generator G1 and G2, such as a second generator Figure 7 , The first discriminator the DX, DY network structure as a second discriminator Figure 8 Distance Image 6 - Figure 8 In, Conv convolution layer, GLU (Gated Linear Unit) is the linear gating means, IN (Instance Normalization) is an example standardization layer, AdaIN (Adaptive Instance Normalization) adaptive example standardization layer, ResBlock to the residual block, ResBlock the network structureFigure 9 The first classifier CX, the second classifier CY uses the standard RESNET50 network structure of the IMAGENET data set pre-training.
[0115] The first discrimerizer DX, the second authenticator DY, the first generator G1, the second generator G2, the first encoder E1, the second encoder E2, the first classifier CX and the second classifier CY all the network Training uses the ADAM optimization method, the volume size is set to 1, the weight parameters λ 1 Λ 2 Λ 3 Λ 4 All are set to 10, the loss function is calculated according to the formula (1) - (12), all network start learning rates are set to 0.0002, and start attenuation from half of the number of training times to the maximum number of training times, to the maximum number of trains Linear attenuation to 0, where the maximum number of training is set to 1000.
[0116] Enter a clear sea vessel image to a well-trained neural network model, which can generate a blurred sea vessel image, these newly generated blurred sea vessel images, can be used for image target detection or classification tasks, improve the goal Robustness of detection or classification.
[0117] In order to further verify the effects of the present invention, the image generated by the method of the present invention is compared with the image generated by the standard cyclegan method, and the comparison Figure 10 Give the indicated by Figure 10 It can be seen that the image generated by the method of the present invention has better image clarity and details.
[0118] The embodiments described above are merely described in the preferred embodiment of the present invention, and are not limited to the scope of the invention, and those of ordinary skill in the art will make various technical solutions according to the present invention without departing from the spirit of the invention. Deformation and improvement, should fall within the scope of protection determined according to the claims of the present invention.