Face image super-resolution reconstruction method based on an attribute description generative adversarial network

A super-resolution reconstruction and face image technology, which is applied in the field of digital image/video signal processing, can solve problems such as the difficulty in reconstructing the real attribute information of face identity, and achieve the effects of improving learning ability, promoting generation ability, and enhancing quality

Pending Publication Date: 2019-04-12
BEIJING UNIV OF TECH
5 Cites 66 Cited by

AI-Extracted Technical Summary

Problems solved by technology

Therefore, face images produced by generative models can easily produce faces that do not actually exist
The purpose of these methods is mainly to generate face images with ...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Abstract

The invention discloses a face image super-resolution reconstruction method based on an attribute description generative adversarial network, and belongs to the field of digital image/video signal processing. The method is characterized in that a training stage comprises three parts of training sample preparation, network structure design and network training; the network structure design adopts agenerative adversarial network framework and is composed of a generative network and a discriminant network. The generation network comprises a face attribute coding and decoding module and a super-resolution reconstruction module; the discrimination network comprises an attribute classification module, an adversarial module and a perception module; wherein the network training process is carriedout in a mode that a generative adversarial network of a generative adversarial network framework and a discriminant network are alternately subjected to adversarial training; and a reconstruction stage: taking the LR face image and the attribute description information as input, and realizing image coding, attribute adding, image decoding and image reconstruction through a trained generation network. According to the invention, the enhancement of the face information of the low-resolution face image can be completed, and the accuracy of low-resolution face recognition can be improved.

Application Domain

Technology Topic

Image

  • Face image super-resolution reconstruction method based on an attribute description generative adversarial network
  • Face image super-resolution reconstruction method based on an attribute description generative adversarial network
  • Face image super-resolution reconstruction method based on an attribute description generative adversarial network

Examples

  • Experimental program(1)

Example Embodiment

[0052] The following describes the implementation examples of the present invention in detail with reference to the accompanying drawings of the specification:
[0053] A face image super-resolution reconstruction method based on attribute description generation confrontation network, divided into training phase and reconstruction phase, the overall flow chart is attached figure 1 As shown; the overall structure diagram of the network for generating the confrontation network is attached figure 2 Shown.
[0054] (1) In the process of preprocessing the training data, in order to reduce errors caused by the background and posture of the face image, the present invention obtains the training sample library through three stages. In the first stage, considering that the common face data sets "CelebA" and "LFW" at home and abroad are obtained from actual monitoring, and their universality and important experimental comparison significance, the present invention uses data including 202,599 face images Set CelebA as the training sample, and use the data set LFW containing 13,300 face images as the test sample. The used CelebA training data set has complete attribute annotation labels, which can be used directly by the present invention. In the second stage, for the CelebA and LFW data sets, the MTCNN network is used for image preprocessing, which can jointly process face detection and face alignment. First, the key areas of the face are obtained through face detection, and then the processed face image is obtained through face key point alignment. Finally, the image is uniformly normalized to 96×96 pixels, which is used as the HR training sample Y i. In the final stage, face degradation is performed on the HR training samples, and the bicubic interpolation Bicubic method is used for down-sampling of different magnifications D, D is set to 4, and the LR training sample X is obtained i , The image size is 24×24 pixels. The image degradation process is shown in formula (1).
[0055] X=D(Y), (1)
[0056] (2) Image encoding, attribute addition and image decoding: first pass the degraded LR face image X through the encoder E enc After a series of down-sampling into a set of potential feature vectors z, as shown in formula (2). We selected the important attributes that affect face recognition, including facial features, gender, age, etc. The present invention selects five typical attributes that are beneficial to face recognition: "mouth", "nose", "eyes", "face shape", and "gender", and set them as "big mouth", "big nose", and "big nose". "Narrow eyes", "Melon seed face", "Male", the attribute vector is represented by [1 1 1 1 1]. The image filter is connected to the attribute vector e along the dimension of the channel. The connected tensor is further sent to the deconvolution layer to jointly learn the common features of the image and the attribute, and in the process of learning the feature, it passes through the decoder E in turn dec After a series of upsampling, a face image Z with this attribute is generated e , As shown in formula (3). Such a generation network can help correct the attribute detail defects in the input image, while adding more facial details to generate a more realistic high-resolution face image. Attached image 3 In the method of the present invention, the subjective experimental results displayed on the LFW data set after the attribute description is added. From the visual effect point of view, the attribute description can help correct the attribute detail defects in the input image. From the subjective visual effect point of view, attribute characteristics It is more obvious, while adding more facial details to generate a more realistic high-resolution face image.
[0057] z=E enc (X) (2)
[0058] Z e = E dec (z, e) (3)
[0059] (3) Extraction of LR face image features: input the LR face image Z with attribute description added after decoding e Firstly, the convolution filter Conv is used to extract the features of the image, the task of removing noise from the image is completed, and the effective information of the edge of the image is extracted; then the non-linear activation function is used to process the convolved image to mine the potential features of the image. Finally, the high-frequency information of the LR image is obtained through layer-by-layer feature transformation. Among them, the activation function used in the present invention is LeakyRectified LinearUnits (LReLU), as shown in formula (4). Compared with the Sigmoid, Tanh, and ReLU functions, the stochastic gradient descent of LReLU has a faster convergence speed and does not require a lot of complicated operations. In the present invention, a is a non-zero number and is set to 0.01.
[0060] g i (Z e )=max(0,Conv(Z e ))+a×min(0,Conv(Z e )) (4)
[0061] The present invention adopts Batch Normalization (BN), which is used before the activation function and after the convolutional layer. BN mainly normalizes the input of the current layer so that their mean is 0 and variance is 1. It can speed up the convergence speed, reduce the influence of CNN weight initialization, has good stability, and helps prevent the gradient from disappearing.
[0062] (4) LR image residual learning and high-frequency information fusion: LR image with added attribute information is extracted layer by layer to obtain high-frequency information g of the i-th layer of LR image i (Z e ), the LR image Z e And the high frequency information g of the i-th layer i (Z e ) Is added to get the LR high-frequency fusion image I LR. The process of LR image feature extraction layer by layer is shown in formula (5), and the process of high-frequency information fusion is shown in formula (6).
[0063] g i (Z e )=g i-1 (g i-2 (g i-3 …(G 1 (Z e )))), (5)
[0064] I LR =g i (Z e )+Z e , (6)
[0065] (5) Sub-pixel convolutional layer image enlargement: through the above step (4), the image I after high-frequency information fusion is obtained LR , Its feature image contains r 2 Feature channels (r is the target magnification of the image). R per pixel 2 The channels are rearranged into an r×r area, corresponding to a sub-block of size r×r in the high-resolution image, with the size r 2 ×H×W feature image I LR Rearranged into a high-resolution image of 1×rH×rW size I SR. Sub-pixel convolution not only completes the enlargement of the image size, but also synthesizes a variety of feature maps into an image with more detailed information. The sub-pixel convolutional layer can be calculated by formula (7):
[0066]
[0067] In formula (7), f L-1 (I LR ) Represents the feature map of layer L-1, W L Represents the weight parameter of the L layer, b L Is the bias value connected to the L layer, f L (I LR ) Represents the feature map of the Lth layer obtained after sub-pixel convolution of the SP layer.
[0068] (6) Cascade magnification: The image size can be enlarged through the above step (5). When the LR image needs to be enlarged by a larger magnification, the present invention takes the result of the above step (5) sub-pixel convolution as the above step (2) Input, repeat the above steps (3-5) LR feature extraction, high frequency information fusion, sub-pixel convolution operation, and finally complete the image enlargement. The cascade zoom can gradually complete the image zoom and reduce the loss of detailed information in the reconstruction process. In addition, the cascade module can reduce the network structure design, which is conducive to network training.
[0069] (7) Network training and model acquisition methods: The present invention establishes training mechanisms for generating networks and adversarial networks respectively, learns the mapping relationship between LR and HR end-to-end, and conducts adversarial training such as feature comparison and attribute classification. The significance of the GAN network is to maximize the accuracy of the classification of real samples and generated samples in the discrimination network D, and to reduce the difference between the real samples and the generated samples of the generated network G. As shown in formula (8),
[0070] min G max D f(G, D), (8)
[0071] In the generation network, LR face image X i First get the HR face image Z by generating the network G i , And then generate the output image Z of the network i And real image Y i Compose image pair {Z i ,Y i }.
[0072] Z i =G(X i ), (9)
[0073] In order to distinguish the reconstructed image Z of the generating network e Whether the attribute label of is consistent with the real attribute label, the present invention uses the attribute classification module C to constrain the generated image Z e Generate the attributes we described. The input of the attribute classification module C is {Z e ,e i }, the attribute loss is L att , Where e i Represents the true attribute tag of the i-th image, and e represents the five face attributes selected in the present invention. L att The loss function of is shown in formula (10) and formula (11). E(Z e , E) is the binary cross-entropy loss of the attribute, the attribute classification module C joint encoder E enc And decoder E dec Train together.
[0074]
[0075]
[0076] In the discriminant network, in order to ensure that the generated image Z i With real image Y i The characteristics of is more similar, and the present invention uses the perception module to calculate the difference between the color, texture, shape, etc. of the generated image and the real image. The input image pair of the perception module is {Z i ,Y i }, its loss function is the perceptual loss L per In the process of calculating the perceptual loss, first obtain the Gram matrix for the five stages of a-e of the perceptual module, then calculate the Euclidean distance between the corresponding layers, and finally weight the Euclidean distance of different layers to obtain the perceptual loss. As shown in formula (12), j represents the jth layer of the perception module, and C j H j W j Respectively represent the number of channels, height and width of the j-th layer feature map, and multiply the three to get the size of the feature map, the j-th layer feature H j (X) h, w, c The Gram matrix of the j-th layer is calculated by the pairwise inner product. In formula (13), G j (Z) and G j (Y) respectively represent the Gram matrix of the reconstructed image and the real image on the jth layer of the perception module, and the Euclidean distance between the two on the jth layer is calculated. Finally, the perceptual loss function L shown in formula (14) is obtained per.
[0077]
[0078]
[0079]
[0080] In the discrimination network, in order to distinguish the reconstructed image Z of the generating network i Is the image f generated by the algorithm fake , Or the real image f real , The present invention uses the confrontation module to distinguish Z i True and false. The input image pair of the confrontation module is {Z i ,Y i }, give them 0/1 labels, the module consists of two loss functions, which are true and false loss functions L real And L fake. When the label s is 0, L fake The loss function of is shown in formula (15); when the label s is 1, L real The loss function of is shown in formula (16).
[0081] L fake =-log(D(G(X s ))), s=0, (15)
[0082] L real =log((G(Y s )), s=1, (16)
[0083] In formulas (15) and (16), D represents the confrontation module, G represents the generation network, Xs represents the LR image, and Ys represents the real image. The training goal of the confrontation module is to classify true and false images: make the output of the real sample close to 1; the output of the fake sample obtained by the generating network is close to 0.
[0084] In network training, the generation network and the discrimination network are trained alternately. When the generation network is fixed, the discrimination network starts training; on the contrary, when the discrimination network is fixed, the generation network starts training. Among them, the judgment network does not update the parameters, and only the error of the judgment network is passed back to the generation network. The total loss function of the generated network is as follows:
[0085] L G =L per +L fake , (17)
[0086] The total loss function of the discriminant network is as follows:
[0087] L D =L att +L adv , (18)
[0088] L adv =M-L fake +L real , (19)
[0089] As shown in formula (18), the loss function of the discriminant network L D The attribute loss function L of the attribute classification module att Adversarial loss function L with adversarial module adv Combination. As shown in formula (19), L adv Is the loss function of equilibrium confrontation, aiming to find L fake With L real To complete the training of the network, its equilibrium term M is set to 20. The basic learning rate of the discriminant network is set to 0.01, and a stochastic gradient descent method (Adam) is used to calculate the network error and adjust the network parameters. In order to complete the convergence of the GAN network training and accelerate the training, the batch size of the training samples of the present invention is set to 16. After repeated iterations, the training stops when the preset maximum number of iterations (100,000 times) is reached, and the generation model for image restoration is obtained.
[0090] (8) The reconstructed face image is used for face recognition: attached Figure 4 In this, the method of the present invention is compared with the typical SR method under the LFW data set subjective experiment results. Compared with other methods, the reconstructed image of the present invention completes the enhancement of the detail information of the face image, and the edge information is sharper. Through the above step (7), the result image after image reconstruction can be obtained. In order to verify that face reconstruction is beneficial to face recognition, the present invention first inputs the reconstructed image of the above step (7) into the face recognition model, so that the face image is mapped to the Euclidean space, and the face image and the label information are calculated by Similarity; judge the two to be the same individual image or different individual images. The face recognition model of the present invention is tested under LFW data. The data set provides 6000 pairs of face images as evaluation data, 3000 pairs belong to the same person, and 3000 pairs belong to different people. In the test phase, calculate the similarity of the two pictures to get a similarity (0~1), and then according to whether it is greater than a given threshold, set the threshold to 0.7 according to experience, and get 6000 results, and finally get The accuracy of face recognition. The main evaluation indicators are divided into: structural similarity (SSIM) and face recognition accuracy (Accuracy). Attached Figure 5 In the method, the method of the present invention is compared with the typical SR method under the LFW data set. Compared with other methods, the accuracy of face recognition of the present invention is the highest; Image 6 Among them, the structure similarity results of the method of the present invention and the typical SR method are compared under the LFW data set. Compared with other methods, the structure similarity of the present invention is also the highest.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Similar technology patents

3-D text in a gaming machine

InactiveUS7367885B2Quality improvementImprove text qualityRoulette gamesApparatus for meter-controlled dispensingReal-time renderingGraphics
Owner:IGT

Classification and recommendation of technical efficacy words

  • Quality improvement
  • Improve learning ability

Monodisperse core/shell and other complex structured nanocrystals and methods of preparing the same

ActiveUS20050129947A1Narrow size distributionQuality improvementMaterial nanotechnologyFrom normal temperature solutionsChemistrySemiconductor nanostructures
Owner:THE BOARD OF TRUSTEES OF THE UNIV OF ARKANSAS

Campus life and education service software

InactiveCN107220908APromote campus digital constructionImprove learning abilityBuying/selling/leasing transactionsUndergraduate educationLearning abilities
Owner:冯成

Many-to-many speaker conversion method based on Transitive STARGAN

PendingCN111429893AImprove learning abilityOvercoming semantic feature lossSpeech recognitionLearning abilitiesNetwork on
Owner:NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products