[0079] Below in conjunction with accompanying drawing and specific embodiment the present invention is described in further detail:
[0080] The invention provides a face illumination processing method based on Retinex decomposition and generation confrontation network. The model proposed by the method includes an illumination decomposition module, a face reconstruction module, a discriminator module and a face verification module. Among them, the illumination decomposition module extracts the reflection component and illumination component of the face image; the face reconstruction module adjusts the illumination level of the input face image; the discriminator module uses generative confrontation learning to ensure the authenticity of the synthesized face image; the face verification module retains Identity information for synthetic face images.
[0081] Refer to the specific process figure 1 As shown, this embodiment provides a face illumination processing method based on Retinex decomposition and generation confrontation network, and the specific implementation steps are as follows:
[0082] Step 1: Create a face lighting processing dataset, using the CAS-PEAL dataset published by the Institute of Computing Technology, Chinese Academy of Sciences. like figure 2 As shown, the data set contains a total of 1666 face pictures under 10 lighting conditions, of which 1498 face pictures (180 face instances) are training samples, and the remaining 198 face pictures (20 face instances) are test samples .
[0083] Step 2: Build the light decomposition module. This module consists of a convolutional neural network, which is used for Retinex illumination decomposition of face images, and outputs the reflection component and illumination component of the face image;
[0084] Step 201: For a given input face image S in and the target face image S tar , the input of the illumination decomposition module is the image pair {S in ,S tar}. The module consists of 6 convolutional layers, where the first convolutional layer uses a 9×9 convolutional kernel to learn the global information of the face image. The remaining convolutional layers use 3×3 convolutional kernels and ReLU activation functions. Finally, the Sigmoid activation function is used to normalize the pixel values of the reflection component R and illumination component I output by the network to the [0,1] interval. The network does not contain a pooling layer and the step size of the convolution operation is 1 to ensure that the size of the reflection component R and the illumination component I is consistent with the size of the input image S. The operation of the light decomposition module can be specifically expressed as:
[0085] R in , I in , R tar , I tar =Dec(S in ,S tar ) (1)
[0086] Where Dec(·) represents the light decomposition module, R in , I in , R tar , I tar For the input face image S in and the target face image S tar Retinex light decomposition results.
[0087] Step 202: The illumination decomposition module performs unsupervised learning through the intrinsic constraints of face image pairs, and its objective function consists of the following parts.
[0088] (1) Consistency loss of reflection components: According to the Retinex theory, the reflection components of the input image and the target image are approximately consistent, and the main difference is reflected in the illumination component. The reflection consistency loss is used to constrain the reflection component R of the input image in and the reflection component R of the target image tar The distance between , its loss function can be specifically expressed as:
[0089]
[0090] in Represents the consistency loss of the reflection component of the illumination decomposition module, and the L1 norm distance is used to measure the R in and R tar degree of similarity.
[0091] (2) Pixel regression loss: the reflection component {R in , R tar} and the light component {I in , I tar}The input face image and the target face image can be reconstructed by multiplying matrix elements, and the loss function can be defined as:
[0092]
[0093] in Represents the pixel regression loss of the illumination decomposition module, involving the input image, the target image and the cross-reconstructed image, α ij Weights pixel regression loss for different images.
[0094] (3) Smoothing loss: the full variational model can be used to smooth the illumination component output by the illumination decomposition module {I in , I tar} and filter the noise, its loss can be specifically expressed as:
[0095]
[0096] in Represents the smoothing loss of the illumination decomposition module, specifically involving the input face image and the target face image, Represents the total variation value of the image, λ g A weight parameter to adjust the smoothness of the image.
[0097] The target loss function of the illumination decomposition module is a weighted combination of losses of different learning tasks, and the final loss function can be expressed as:
[0098]
[0099] in and Respectively represent the weight parameters of different losses in the illumination decomposition module.
[0100] Step 3: Construct the face reconstruction module. This module is composed of a codec convolutional neural network, which is used to reconstruct the illumination component of the face image, and adjust the illumination component of the low-light face image to the target illumination level.
[0101] Step 301: The face reconstruction module uses a U-NET encoding and decoding network model to reconstruct the face illumination component, and its input includes three parts: the face image reflection component R in , the face image illumination component I in and target lighting label l tar , where the illumination label adopts the one-hot encoding method. The output of the face reconstruction module is the adjusted face illumination component I rec. In the encoding network, the 3×3 convolution kernel is used to extract the illumination invariant information of the face image, and the decoding network uses deconvolution to upsample the feature map. A skip connection strategy is used between the encoding network and the decoding network to obtain face detail information. The operation of the face reconstruction module can be specifically expressed as:
[0102] I rec =Rec(R in , I in | l tar ) (6)
[0103] where Rec( ) represents the face reconstruction module. R in , I in and l tar Represent the decomposed reflection component, illumination component and target illumination label respectively, I rec is the reconstructed face illumination component.
[0104] Step 302: The face reconstruction module combines pixel regression learning and generative adversarial learning to reconstruct face illumination components, and its objective function consists of the following parts.
[0105] (1) Pixel regression loss: the face illumination component I output by the face reconstruction module rec with the reflection component R in The face reconstruction image S can be obtained by multiplying matrix elements rec. Face reconstruction image and target image S tar The L1 norm distance of is the pixel regression loss of the face reconstruction module, which can be specifically defined as:
[0106]
[0107] in Represents the pixel regression loss of the face reconstruction module. R in , I rec , S tar Represent the reflection component of the input image, the illumination component of the reconstructed image, and the target image, respectively.
[0108] (2) Cyclic consistency loss: The face reconstruction module uses a closed-loop structure to retain the content information of the reconstructed image. Specifically, the adjusted face illumination component I rec is re-sent into the face reconstruction module, in the light label l in Under the guidance of , the illumination component can be adjusted to the illumination level of the input image, and its loss function can be specifically expressed as:
[0109]
[0110] in Represents the cycle consistency loss of the face reconstruction module. I in , R in , I rec respectively represent the illumination component of the input image, the reflection component, and the illumination component of the reconstructed image. l in Represents the lighting label of the input image.
[0111] (3) Smoothing loss: The full variational model is also used to smooth the illumination component output by the face reconstruction module. The loss function can be specifically expressed as
[0112]
[0113] in Denotes the smoothing loss of the reconstructed face image, Represents the total variation value of the image, λ g A weight parameter to adjust the smoothness of the image.
[0114] (4) Confrontation loss: The face reconstruction module uses a generative confrontation learning method to synthesize a face image S rec , so that the discriminator module cannot judge the synthetic face image S rec The authenticity of , the loss function can be specifically expressed as:
[0115]
[0116] in Denotes the adversarial loss of the face reconstruction module, S rec Represents a synthetic face image, D(·) represents a discriminator module, D src ( ) output S rec The probability of being discriminated as a real face image, while the least squares distance is used to improve the stability of generative adversarial learning.
[0117] (5) Label classification loss: The face reconstruction module uses the target illumination label as a guide to synthesize a face image with a specific illumination level, so that the discriminator module can correctly classify the illumination level. The loss function can be specifically defined as:
[0118]
[0119] in Indicates the label classification loss of the face reconstruction module, l tar Indicates the target light level, D cls Denotes a synthetic face image S rec Probability of being correctly classified as the target light level.
[0120] (6) Perceptual loss: the synthetic face image S output by the face reconstruction module rec is sent to the face verification module to ensure that the synthesized face image S rec with the input image S in , target image S tar With the same face identity information, its loss function can be specifically defined as:
[0121]
[0122] in Represents the perceptual loss of the face reconstruction module, φ( ) represents the identity feature vector output by the face verification module, and the L2 norm distance is used to measure φ(S rec ), φ(S in ) and φ(S tar ) similarity.
[0123] The target loss function of the face reconstruction module is a weighted combination of losses of different learning tasks, and the final loss function can be expressed as:
[0124]
[0125] in and Respectively represent the weight parameters of different losses in the face reconstruction module.
[0126] Step 4: Build the discriminator module. The discriminator module learns to distinguish target face images from synthetic face images through generative adversarial learning, and classifies the illumination level of face images. The module consists of a convolutional neural network whose objective function consists of the following components.
[0127] (1) Adversarial loss: the discriminator module simultaneously inputs the target face image and the synthetic face image, and judges the authenticity of this group of images. The loss function can be specifically defined as:
[0128]
[0129] in Denotes the adversarial loss of the discriminator module, S rec Represents a synthetic face image, S tar represents the target face image.
[0130] (2) Label classification loss: The discriminator module inputs the target face image and classifies the illumination level. The loss function can be specifically defined as:
[0131]
[0132] in Denotes the label classification loss of the discriminator module, l tar Indicates the target light level, D cls Denotes the target face image S tar Probability of being correctly classified as the target light level.
[0133] The target loss function of the discriminator is a weighted combination of losses of different learning tasks, and the final loss function can be expressed as:
[0134]
[0135] in Respectively represent the weight parameters of different losses in the discriminator module.
[0136] Step 5: Build a face verification module, which consists of a pre-trained VGGFace network to guarantee the synthetic face image S rec with the input image S in , target image S tar have the same face verification information. This module is only used to extract face identity features and transfer perceptual loss without parameter update.
[0137]Step 6: Model training, use the Pytorch open source library to build a deep integrated neural network model, use NVIDIATITAN X GPU, and train the face lighting processing model under the Ubuntu 18.04 operating system;
[0138] Step 601: Train the illumination decomposition module separately, so that the module can decompose the input face image into reflection components and illumination components, such as image 3 shown.
[0139] Step 602: Train the entire face illumination processing framework, including illumination decomposition module, face reconstruction module, discriminator module and face verification module. The overall schematic diagram of the framework is shown in Figure 4 shown.
[0140] Step 7: Use the trained model to test the lighting processing results. Given an input face image and a target illumination level label, the model outputs a synthetic face image after illumination processing, such as Figure 5 shown.
[0141] The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any other form, and any modification or equivalent change made according to the technical essence of the present invention still belongs to the scope of protection required by the present invention .