A portrait beauty type method and system based on CNN and a storage medium

By using a CNN-based portrait beautification method, which predicts the deformation region using a portrait mask and a multi-scale loss function, and then fills it with an image search region, the problem of background distortion after portrait beautification in existing technologies is solved, achieving a more natural and realistic beautification effect.

CN116051422BActive Publication Date: 2026-06-26XIAMEN MEITUZHIJIA TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
XIAMEN MEITUZHIJIA TECH
Filing Date
2023-02-28
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing portrait beautification methods distort the pixels of the background around the image after enhancement, resulting in an unrealistic appearance.

Method used

This paper adopts a CNN-based portrait beautification method to separate the portrait from the image using a portrait mask, predict the deformed area and perform image inpainting. It uses a multi-scale loss function and a U-shaped network for optical flow prediction, and combines image search area for filling to ensure natural texture transition.

Benefits of technology

It improves the realism and robustness of portrait beautification results, avoids the influence of surrounding textures, and ensures that the filled area blends naturally with the background texture, resulting in high prediction accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116051422B_ABST
    Figure CN116051422B_ABST
Patent Text Reader

Abstract

The application discloses a portrait beautification method and system based on CNN, and a storage medium, which comprises the following steps: obtaining a to-be-beautified image and performing pretreatment to obtain a second portrait image and a second portrait mask; inputting the second portrait image into a CNN model to perform beautification optical flow prediction and obtain beautification optical flow; performing deformation area prediction based on the beautification optical flow, the second portrait image and the second portrait mask to obtain a deformation area; performing cutting processing on the deformation area to obtain a to-be-filled area; and performing image inpainting on the deformation area based on the to-be-filled area to obtain a portrait beautification result. The second portrait image is obtained through pretreatment, which is convenient for separate processing, and effectively avoids the influence of portrait beautification on the texture around the portrait. Furthermore, the image inpainting is performed on the deformation area based on the to-be-filled area, only the blank content of the deformation area needs to be repaired, the result is high in authenticity, the obtained portrait beautification result is more natural, and the robustness is good.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and in particular to a CNN-based portrait beautification method, a system for applying the method, and a computer-readable storage medium. Background Technology

[0002] Portrait beautification, also known as body liquefaction, refers to the use of image processing techniques to adjust the torso of a person in an image to achieve aesthetic effects such as slimming, leg slimming, and swan-like neck shaping. For image portrait beautification applications, existing automated portrait beautification methods can be broadly categorized into the following two types:

[0003] 1. Deep learning combined with traditional algorithms: This type of method first uses deep learning to detect key points of the human body, and then adjusts the image of the corresponding points according to the key point offset template;

[0004] 2. Optical flow-based deep learning methods: These methods first scale the human image to a fixed output size, input the scaled image into a deep learning model to predict an aesthetically pleasing optical flow map, then scale the optical flow map back to the original image size and apply it to the original image to obtain the aesthetically pleasing effect.

[0005] Both of these methods essentially adjust the position of pixels in the image and do not create new pixels. Therefore, when part of the torso is deformed (such as the arms becoming thinner), the content of the changed part can only be filled by stretching the surrounding pixels. This causes the pixels in the background of the beautified image to become distorted, giving a sense of "spatial distortion" and making the beautified image look unrealistic. Summary of the Invention

[0006] The main objective of this invention is to provide a CNN-based portrait beautification method, system, and storage medium, aiming to solve the technical problem that existing portrait beautification methods distort the pixels of the background around the image after beautification, giving people a "spatial distortion" visual effect and making the beautified image look unrealistic.

[0007] To achieve the above objectives, this invention provides a CNN-based portrait beautification method, comprising the following steps: acquiring an image to be beautified and preprocessing it to obtain a second portrait image and a second portrait mask; inputting the second portrait image into a CNN model for beautification optical flow prediction to obtain beautification optical flow; based on the beautification optical flow, the second portrait image, and the second portrait mask, predicting the deformation region to obtain the deformation region; segmenting the deformation region to obtain the region to be filled; and performing image restoration on the deformation region based on the region to be filled to obtain the portrait beautification result.

[0008] Optionally, the image to be beautified is acquired and preprocessed to obtain a second portrait image and a second portrait mask. Specifically, this includes: acquiring the image to be beautified and predicting the portrait region in the image to be beautified based on a portrait segmentation algorithm to obtain a first portrait mask; calculating the bounding rectangle of the portrait based on the first portrait mask and expanding the width and height of the bounding rectangle based on a preset ratio to obtain a cropping rectangle of the portrait; cropping the image to be beautified and the first portrait mask according to the cropping rectangle of the portrait to obtain a first portrait image and a second portrait mask respectively; and downsampling the first portrait image to obtain a second portrait image of a preset size.

[0009] Optionally, based on the beautification optical flow, the second portrait image, and the second portrait mask, deformation area prediction is performed to obtain the deformation region. Specifically, this includes: enlarging the beautification optical flow to the resolution of the first portrait image to obtain an enlarged result; applying the enlarged result to the first portrait image and the second portrait mask respectively to obtain the first portrait beautification image and the second portrait beautification mask respectively; removing the portrait from the image to be beautified based on the first portrait mask to obtain a portrait without a portrait; placing the first portrait beautification image into the portrait without a portrait based on the second portrait beautification mask to obtain the second portrait beautification image; calculating the difference between the second portrait mask and the second portrait beautification mask, and taking the part with a difference greater than a preset difference as the deformation region; placing the deformation region into the second portrait beautification image based on the cropping rectangle, and filling the area to be filled with 0.

[0010] Optionally, the deformed region is segmented to obtain the region to be filled; based on the region to be filled, image restoration is performed on the deformed region to obtain the portrait reshaping result, specifically including the following steps: Step a. Segment the deformed region into several regions with a side length of w. e A square area to be filled, each area is represented by a Patch. Select one Patch containing the background of the image to be reshaped from all Patches as the current Patch for filling; Step b. Calculate the background area in the current Patch based on the second portrait reshaping mask, and denote it as B, then calculate the texture gradient of B; Step c. Using the top left corner (x...) of the current Patch... e ,y e Using ) as a reference point, traverse x∈[x] excluding the current Patch e -R,x e +R+w e ], y∈[y e -R,y e +R+w e The rectangular search region is defined as R, where R is the search range, and several rectangles with side length w are cut off from the search region. eStep d. Calculate the background region in each candidate region and denote it as B', then calculate the texture gradient of B'; Step e. Calculate the difference between the background region B' in each candidate region and the background region B in the current patch, and select the candidate region with the smallest difference to fill it into the target filling region, which is the part of the image to be beautified in the current patch other than the background; Step f. Repeat steps be until all deformed regions are filled to obtain the portrait beautification result.

[0011] Optionally, the CNN model has four output channels, namely the optical flow map (f x ,f y Attention heatmap (h) x ,h y During the training phase, the CNN model uses multi-scale loss to supervise different resolutions of the model separately, gradually approaching the preset target from low resolution to high resolution. The model only calculates the loss for the optical flow map and attention heatmap that are greater than 0. For the differences in deformation of different body parts in the x and y directions in the second portrait image, supervision is carried out in the x and y directions respectively.

[0012] Optionally, the multi-scale loss specifically employs losses at N scales, with the optical flow loss at the k-th scale. for: in For the optical flow loss in the x-direction at the current scale, The optical flow loss in the y-direction at the current scale is represented by L, which represents the final optical flow loss. of This is the average value of the optical flow across N scales, i.e. Heatmap loss at the k-th scale for: in For the current scale, the heat map in the x-direction, The optical flow loss in the y-direction at the current scale is represented by the final optical flow thermogram loss L. heat This represents the average loss across N layers of the heatmap, i.e.

[0013] Optionally, the loss during the training of the CNN model is: Loss = γ1L of +γ2L heat +γ3L pec , Among them, L pec For perceptual loss, M is the number of layers selected from the VGG model, H is the height of the corresponding output, W is the width of the corresponding output, and G is the depth of the perceptual loss. k(...) represents the output of the k-th layer of the VGG model, X represents the image to be beautified, Y represents the beautification result, φ(...) is the optical flow transformation function, and γ1 is the optical flow partial loss L. of The weights, γ2, represent the optical flow thermogram loss L. heat The weights, γ3, represent the perceptual loss L. pec The weights are given by γ1≥0, γ2≥0, and γ3≥0.

[0014] Optionally, the CNN model employs a U-shaped network based on an encoder-decoder. The decoder consists of stacked optical flow super-resolution units (OFLs). Each OFL includes at least a feature super-resolution module and a corresponding convolutional structure. The processing of the OFL includes at least the following: inputting the preceding feature map into the feature super-resolution module for first processing to obtain the following feature map, and then passing it to the data stream of the next layer. The first processing includes at least stacked convolution, downsampling convolution, amplification convolution, and channel stacking. The preceding feature map is obtained based on feature extraction from the encoder, or based on the preceding OFL of the current OFL. Based on the following feature map, difference prediction is performed through a set of stacked convolutional structures to obtain the optical flow difference at the current scale. The preceding beautified optical flow is input into the feature super-resolution module for optical flow amplification. Based on the optical flow difference obtained from the difference prediction, difference compensation is performed on the optical flow amplification result to obtain the current beautified optical flow.

[0015] Corresponding to the aforementioned CNN-based portrait beautification method, this invention provides a CNN-based portrait beautification system, comprising: an image preprocessing module for acquiring and preprocessing the image to be beautified to obtain a second portrait image and a second portrait mask; a beautification optical flow prediction module for inputting the second portrait image into a CNN model to predict beautification optical flow, thereby obtaining beautification optical flow; a deformation region prediction module for predicting deformation regions based on beautification optical flow, the second portrait image, and the second portrait mask, thereby obtaining deformation regions; and an image inpainting module for performing image inpainting on the deformation regions based on the regions to be filled, thereby obtaining the portrait beautification result.

[0016] In addition, to achieve the above objectives, the present invention also provides a computer-readable storage medium storing a CNN-based portrait beautification program, which, when executed by a processor, implements the steps of the CNN-based portrait beautification method described above.

[0017] The beneficial effects of this invention are:

[0018] (1) The second portrait image is obtained through preprocessing, which makes it easier to process it separately and effectively avoids the influence of portrait beautification on the texture around the portrait; and, by predicting the deformation area, the area to be filled is obtained, and the image is repaired in the deformation area. Only the blank content of the deformation area needs to be repaired, which is more realistic and the portrait beautification result is more natural and robust.

[0019] (2) By calculating the difference between the second portrait mask and the second portrait beautification mask, the blank area in the second portrait beautification image is located, which makes it easier to determine the area to be filled.

[0020] (3) Search for matching pixel blocks in the image search area, and fill the target filling area by patch. Each patch has a side length of w. e The square area is filled by spreading the filling from the patch with background to the patch without background, making the filled texture blend more naturally with the surrounding texture and further improving the realism of the portrait beautification result.

[0021] (4) In addition to predicting the optical flow of the model, this invention further predicts the region where the deformation occurs (i.e., the deformation region), allowing the optical flow-related branches to focus on the optical flow prediction of the deformation region. Compared with the full-image evaluation method in the prior art, this invention innovates the loss function of the CNN model and adopts multi-scale loss to supervise the model at different resolutions separately, gradually approaching the preset target from low resolution to high resolution. Moreover, the model only calculates the loss for the optical flow map and the attention heatmap that are greater than 0. In addition, for the differences in deformation of different body parts in the x and y directions in the second portrait image, the x and y directions are supervised separately, resulting in better performance.

[0022] (5) By adding perceptual loss to the training part of the CNN model, the prediction results of the optical flow are closer to the target image, thus improving the prediction accuracy.

[0023] (6) The pre-stage beautified optical flow is used for amplification to obtain a higher resolution post-stage beautified optical flow. Combined with the pre-stage feature map, the required detail information on the high-resolution map is predicted. The current beautified optical flow facilitates the subsequent supervision of the optical flow prediction results at different resolutions. Attached Figure Description

[0024] The accompanying drawings, which are included to provide a further understanding of the invention and form part of this invention, illustrate exemplary embodiments of the invention and are used to explain the invention, but do not constitute an undue limitation of the invention. In the drawings:

[0025] Figure 1 This is a simplified flowchart of the CNN-based portrait beautification method of the present invention;

[0026] Figure 2 This is a simplified schematic diagram of the area to be filled and the search area in this invention;

[0027] Figure 3 This is a simplified schematic diagram of the optical flow super-resolution unit structure of the present invention. Detailed Implementation

[0028] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not intended to limit the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0029] like Figure 1 As shown, the present invention provides a CNN-based portrait beautification method, which includes the following steps: acquiring an image to be beautified and preprocessing it to obtain a second portrait image and a second portrait mask; inputting the second portrait image into a CNN model to perform beautification optical flow prediction to obtain beautification optical flow; based on the beautification optical flow, the second portrait image, and the second portrait mask, performing deformation region prediction to obtain deformation regions; cutting the deformation regions to obtain regions to be filled; and performing image restoration on the deformation regions based on the regions to be filled to obtain the portrait beautification result.

[0030] This invention obtains a second portrait image through preprocessing, which facilitates separate processing and effectively avoids the influence of portrait beautification on the texture around the portrait. Furthermore, by predicting the deformation area, the area to be filled is obtained, and image restoration is performed on the deformation area. Only the blank content of the deformation area needs to be repaired, resulting in higher realism, more natural portrait beautification results, and good robustness.

[0031] In this embodiment, acquiring and preprocessing the image to be beautified to obtain a second portrait image and a second portrait mask specifically includes: acquiring the image to be beautified and predicting the portrait region in the image to be beautified based on a portrait segmentation algorithm to obtain a first portrait mask; calculating the bounding rectangle of the portrait based on the first portrait mask and expanding the width and height of the bounding rectangle based on a preset ratio to obtain a cropping rectangle of the portrait; cropping the image to be beautified and the first portrait mask according to the cropping rectangle of the portrait to obtain a first portrait image and a second portrait mask respectively; and downsampling the first portrait image to obtain a second portrait image of a preset size. Preferably, the preset ratio can be a default ratio, or it can be set or changed according to actual needs.

[0032] In this embodiment, the preset size is W. in *Hin W in H represents the width of the second portrait image. in This indicates the height of the second portrait image.

[0033] In this embodiment, deformation region prediction is performed based on the beautification optical flow, the second portrait image, and the second portrait mask to obtain the deformation region. Specifically, this includes: enlarging the beautification optical flow to the resolution of the first portrait image to obtain an enlarged result; applying the enlarged result to the first portrait image and the second portrait mask respectively to obtain the first beautification image and the second beautification mask respectively; removing the portrait from the image to be beautified based on the first portrait mask to obtain an image without a portrait; placing the first beautification image into the image without a portrait based on the second beautification mask to obtain the second beautification image; calculating the difference between the second portrait mask and the second beautification mask, and taking the part with a difference greater than a preset difference as the deformation region; placing the deformation region into the second beautification image based on the cropping rectangle, and filling the area to be filled with 0.

[0034] Preferably, the preset difference value is 0.

[0035] Existing optical flow-based portrait beautification methods directly apply optical flow to the original image. Therefore, the portrait beautification process affects the surrounding pixels, causing the surrounding texture to deform and resulting in a "spatial distortion" problem in the final effect, which looks unrealistic.

[0036] This invention first uses a portrait mask to extract the portrait from the image to be reshaped and process it separately, avoiding the impact of portrait reshaping on the surrounding texture. Then, addressing the issue of pixel holes appearing in the original portrait area after reshaping, this invention locates these hole areas by comparing a second portrait mask with a second reshaping mask, facilitating the determination of the areas to be filled, and then filling them using image inpainting techniques.

[0037] Compared to existing deep learning-based image inpainting or image generation methods, this invention can directly use high-resolution images without worrying about blurring of the background / filled-in areas during image inpainting, while keeping the computational load under control.

[0038] Furthermore, compared to existing image restoration techniques, portrait reshaping only requires repairing the missing content in the deformed area, which is very close to the pixels surrounding the original portrait. This invention cleverly utilizes this principle of locality, specifically as follows: Figure 2 The diagram shown is a simplified schematic of the area to be filled and the search area of ​​the present invention. In the diagram, the dark gray area represents the background area of ​​the image to be reshaped, the white area represents the second portrait reshaping mask, and the light gray area represents the difference between the background area and the second portrait reshaping mask, i.e., the deformation area. In this embodiment, for... Figure 2 The deformed area (light gray part) is segmented to obtain the area to be filled; based on the area to be filled, image restoration is performed on the deformed area to obtain the portrait beautification result, which includes the following steps:

[0039] Step a. Divide the deformed region into several segments with side length w. e The square regions to be filled are represented by patches, for example... Figure 2 The area indicated by the middle arrow, all the square grid areas that overlap with the deformed area, constitute the Patch. From all the Patches, select the one containing the background of the image to be reshaped as the current Patch (i.e.,...). Figure 2 Fill the bolded small squares;

[0040] Step b. Calculate the background region in the current patch based on the second portrait beautification mask, and denote it as B. Then calculate the texture gradient of B using the following formula:

[0041] G x (x,y) =p (x+1,y) -p (x-1,y) ,

[0042] G y (x,y) =p (x,y+1) -p (x-1,y) ,

[0043]

[0044]

[0045] Among them, [G x (x,y) G y (x,y) ] represents the magnitude of the gradient at coordinates (x, y), p (x+1,y) p represents the intensity at point (x+1, y). (x-1,y) p represents the intensity at point (x-1, y). (x,y+1) p represents the intensity at point (x, y+1). (x,y-1) Let α(x,y) represent the intensity at point (x,y-1), and let α(x,y) represent the gradient direction at coordinates (x,y).

[0046] Step c. Using the top-left corner (x) of the current patch e ,y e Using ) as a reference point, traverse x∈[x] excluding the current Patch e -R,x e +R+w e ], y∈[y e-R,y e +R+w e The rectangular search region is defined as R, where R is the search range, and several rectangles with side length w are cut off from the search region. e The square (i.e., all small squares in the search area except the current patch) is selected, and each square is treated as a candidate region;

[0047] Step d. Calculate the background region in each candidate region and denote it as B', then calculate the texture gradient of B';

[0048] Step e. Calculate the difference between the background region B' in each candidate region and the background region B in the current patch, and select the candidate region with the smallest difference to fill it into the target filling region. The target filling region is the part of the image to be beautified in the current patch other than the background.

[0049] Specifically, the gradient G of B in the current Patch is calculated according to the formula in step b. t and the gradient G of B' in the candidate region c Furthermore, the gradient direction of the current patch is represented as α. t The gradient direction α of the candidate region c The difference D between the candidate region and the current patch can be calculated using the following formula:

[0050]

[0051]

[0052]

[0053] D=λ1*D G +λ2*D α +λ3*D p

[0054] in, The pixel value of the current patch at (x,y). Let λ1 be the pixel value of the candidate patch at (x, y), 0 be the sign function (1 for background regions (B or B') contained within the patch, 0 otherwise), and λ1, λ2, and λ3 be used to control the proportion of different parts. G The weight parameters are λ1≥0; λ2 is D α The weight parameters of D are λ2≥0; λ3 is the weight parameter of D, and λ3≥0.

[0055] Step f. Repeat steps be until all deformed areas are filled to obtain the portrait reshaping result.

[0056] In simple terms, the deformation area filling described in this invention is performed step by step in multiple stages. By searching the area, a suitable image (i.e., the candidate area with the smallest difference) is searched from the background each time and filled into the target filling area. The target filling area is the part of the image to be reshaped in the current patch that is outside the background. The search process is to find an image block in the background near this rectangular search area that connects most naturally with O and paste it into the target filling area.

[0057] This invention searches for matching pixel blocks within an image search region and fills the target filling region in patches. Each patch has a side length of w. e The square area is filled by spreading the filling from the patch with background to the patch without background, making the filled texture blend more naturally with the surrounding texture and further improving the realism of the portrait beautification result.

[0058] Human figure reshaping is concentrated in areas such as the arms, legs, shoulders, neck, and waist. The specific areas requiring reshaping become even more concentrated for different portraits. Therefore, focusing the model's attention on these areas can achieve better results than full-image supervision. To address this characteristic of human figure reshaping, this invention innovates the model's loss function, abandoning the existing method of full-image evaluation. In addition to predicting optical flow, it further predicts the regions where deformation occurs, allowing the optical flow-related branch to focus on optical flow prediction in the deformation regions.

[0059] Specifically, in this embodiment, the CNN model has four output channels, namely optical flow maps (f x ,f y Attention heatmap (h) x ,h y During the training phase, the CNN model uses multi-scale loss to supervise different resolutions of the model separately, gradually approaching the preset target from low resolution to high resolution. The model only calculates the loss for the optical flow map and attention heatmap that are greater than 0. For the differences in deformation of different body parts in the x and y directions in the second portrait image, supervision is carried out in the x and y directions respectively.

[0060] In this embodiment, the multi-scale loss specifically refers to the use of losses at N scales, with the optical flow loss at the k-th scale being... for:

[0061]

[0062] in:

[0063]

[0064]

[0065] For ease of description, * will be used to represent x (or y) thereafter. The optical flow loss in the * direction at the current scale, where H is the height of the corresponding output and W is the width of the corresponding output. Indicates the target optical flow. To predict optical flow, For target heatmap;

[0066] Heatmap loss at the k-th scale for:

[0067]

[0068] in:

[0069]

[0070]

[0071] in, To predict heat maps.

[0072] Ultimately, the optical flow of layer N is partially lost by L. of for:

[0073] The final optical flow thermal map loss L of N layers heat for:

[0074] In addition to predicting the optical flow of the model, this invention further predicts the region where deformation occurs (i.e., the deformation zone), allowing the optical flow-related branches to focus on the optical flow prediction of the deformation region. Compared with the full-image evaluation method in the prior art, this invention innovates the loss function of the CNN model by using multi-scale loss to supervise the model separately at different resolutions, gradually approaching the preset target from low resolution to high resolution. Moreover, the model only calculates the loss for the parts of the optical flow map and attention heatmap that are greater than 0. In addition, for the differences in deformation of different body parts in the x and y directions in the second portrait image, the x and y directions are supervised separately, resulting in better performance.

[0075] In this embodiment, the loss during the training of the CNN model is:

[0076] Loss=γ1L of +γ2L heat +γ3L pec ,

[0077]

[0078] Among them, L pecFor perceptual loss, M is the number of layers selected from the VGG model, H is the height of the corresponding output, W is the width of the corresponding output, and G is the depth of the perceptual loss. k (...) represents the output of the k-th layer of the VGG model, X represents the image to be beautified, Y represents the beautification result, φ(...) is the optical flow transformation function, and γ1 is the optical flow partial loss L. of The weights, γ2, represent the optical flow thermogram loss L. heat The weights, γ3, represent the perceptual loss L. pec The weights are γ1≥0, γ2≥0, γ3≥0.

[0079] The present invention incorporates perceptual loss into the training part of the CNN model, making the predicted optical flow more closely resemble the target image and improving prediction accuracy.

[0080] In this embodiment, the CNN model adopts a U-shaped network based on encoder-decoder, wherein the decoder progressively enlarges the resolution, with each resolution being a scale. Therefore, the CNN model described in this invention can output optical flow predictions at multiple scales (i.e., the present invention uses N scale losses as described above).

[0081] Because different people have different areas of beauty in their portraits: some need to focus on slimming the legs, some need to focus on slimming the waist, etc., these differences are reflected in the sparsity of the optical flow distribution, that is, only some areas in an image have optical flow.

[0082] Meanwhile, optical flow is actually a pixel deviation signal, which is scalable. Therefore, in terms of model design, this invention adopts a U-shaped network based on encoder-decoder, and further cleverly utilizes the optical flow characteristics of human portrait beautification to design an "optical flow super-resolution decoder". The decoder is composed of stacked optical flow super-resolution units, which enables the model to accurately capture the above differences.

[0083] like Figure 3 The diagram shown is a simplified schematic of the optical flow super-resolution unit structure of this invention. Each optical flow super-resolution unit is actually a small image super-resolution module, whose main function is to amplify the optical flow predicted by the previous stage and supplement it with more detailed information.

[0084] The input to the optical flow super-resolution unit is the preceding beautified optical flow and the preceding feature map, and the output is the subsequent beautified optical flow and the current beautified optical flow. Since the optical flow super-resolution unit is a component of the decoder, feature extraction is performed in the encoder. Therefore, for the initial optical flow super-resolution unit (i.e., the optical flow super-resolution unit directly connected to the encoder), the preceding feature map at its input comes from the encoder, and the preceding beautified optical flow at its input is all 0.

[0085] An optical flow super-resolution unit (ORF) includes at least a feature super-resolution module and a convolutional structure corresponding to the feature super-resolution module. The processing steps of the ORF super-resolution unit, i.e., inputting the second portrait image into a CNN model for aesthetic optical flow prediction, include at least: inputting the preceding feature map into the feature super-resolution module for first processing to obtain the following feature map, and then passing it to the next layer's data stream; preferably, the first processing includes at least stacked convolution, downsampling convolution, amplification convolution, and channel stacking; wherein the preceding feature map is obtained based on feature extraction from the encoder, or based on the preceding ORF super-resolution unit; based on the following feature map, difference prediction is performed through a set of stacked convolutional structures to obtain the optical flow difference at the current scale; the preceding aesthetic optical flow is input into the feature super-resolution module for optical flow amplification, and based on the optical flow difference obtained from the difference prediction, difference compensation is performed on the amplified optical flow result to obtain the current aesthetic optical flow.

[0086] Specifically, the preceding optical flow model is used for upscaling to obtain a higher-resolution subsequent optical flow model. Since the resolution of the preceding optical flow model is lower than that of the current layer, direct upscaling would result in blurring (imagine the blurring effect caused by directly upscaling an image). Therefore, the current layer needs to predict the difference that needs to be added during the upscaling process based on the preceding feature map (i.e., difference prediction), and then superimpose the prediction result onto the upscaled optical flow map. The CNN model described in this invention gradually adds details starting from a low-resolution optical flow map (the preceding feature map is used to predict the detailed information needed on the high-resolution map) until an optical flow map with the same resolution as the model input is obtained. The current optical flow model is used to supervise the subsequent optical flow prediction results at different resolutions.

[0087] Since the current aesthetic optical flow is obtained by superimposing the current prediction difference on the amplified aesthetic optical flow of the previous stage, the current aesthetic optical flow output by the optical flow super-resolution unit of the last layer is the aesthetic optical flow obtained by inputting the second portrait image into the CNN model for aesthetic optical flow prediction.

[0088] This invention uses a U-shaped encoding / decoding module (feature super-resolution module) to gradually improve the resolution and precision of predicted optical flow.

[0089] Corresponding to the aforementioned CNN-based portrait beautification method, this invention provides a CNN-based portrait beautification system, comprising: an image preprocessing module for acquiring and preprocessing the image to be beautified to obtain a second portrait image and a second portrait mask; a beautification optical flow prediction module for inputting the second portrait image into a CNN model to predict beautification optical flow, thereby obtaining beautification optical flow; a deformation region prediction module for predicting deformation regions based on beautification optical flow, the second portrait image, and the second portrait mask, thereby obtaining deformation regions; and an image inpainting module for performing image inpainting on the deformation regions based on the regions to be filled, thereby obtaining the portrait beautification result.

[0090] This invention also provides a computer-readable storage medium, which may be a computer-readable storage medium included in the memory described in the above embodiments; or it may be a standalone computer-readable storage medium not assembled into a device. The computer-readable storage medium stores at least one instruction, which is loaded and executed by a processor to implement... Figure 1 The illustrated method is a CNN-based portrait beautification technique. The computer-readable storage medium may be a read-only memory, a hard disk, or an optical disk, etc.

[0091] It should be noted that the various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the device embodiments, equipment embodiments, and storage medium embodiments, since they are basically similar to the method embodiments, the descriptions are relatively simple, and relevant parts can be referred to the descriptions in the method embodiments.

[0092] Furthermore, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0093] The foregoing description illustrates and describes preferred embodiments of the present invention. It should be understood that the present invention is not limited to the forms disclosed herein and should not be construed as excluding other embodiments. It can be used in various other combinations, modifications, and environments, and can be altered within the scope of the inventive concept by means of the foregoing teachings or techniques or knowledge in related fields. Any modifications and variations made by those skilled in the art that do not depart from the spirit and scope of the present invention should be within the protection scope of the appended claims.

Claims

1. A CNN-based portrait beautification method, characterized in that, Includes the following steps: The process involves acquiring and preprocessing an image to be reshaped to obtain a second portrait image and a second portrait mask. Specifically, this includes: acquiring the image to be reshaped and predicting the portrait region in the image based on a portrait segmentation algorithm to obtain a first portrait mask; calculating the bounding rectangle of the portrait based on the first portrait mask and expanding its width and height according to a preset ratio to obtain a cropping rectangle for the portrait; cropping the image to be reshaped and the first portrait mask based on the cropping rectangle to obtain the first portrait image and the second portrait mask, respectively; and downsampling the first portrait image to obtain a second portrait image of a preset size. The second portrait image is input into a CNN model to predict the aesthetic optical flow, and the aesthetic optical flow is obtained. Based on the optical flow of the model, the second portrait image, and the second portrait mask, the deformation area is predicted and the deformation region is obtained. The deformed area is cut to obtain the area to be filled; Based on the area to be filled, image restoration is performed on the deformed area to obtain a portrait beautification result; The deformed area is segmented to obtain the area to be filled; based on the area to be filled, image restoration is performed on the deformed area to obtain the portrait beautification result, which includes the following steps: Step a. Divide the deformed region into several sections with a side length of A square area to be filled, each area to be filled is represented by a Patch. Select one Patch containing the background of the image to be reshaped from all Patches and fill it as the current Patch. Step b. Calculate the background region in the current patch based on the second portrait beautification mask, and denote it as B. Then calculate the texture gradient of B. Step c. Using the top left corner of the current patch As a reference point, iterate through the areas that do not contain the current patch. The rectangular search area, where, To define the search range, the search region is truncated into several segments with side lengths of... The squares are defined, and each square is considered as a candidate region; Step d. Calculate the background region in each candidate region and denote it as B', then calculate the texture gradient of B'; Step e. Calculate the difference between the background region B' in each candidate region and the background region B in the current patch, and select the candidate region with the smallest difference to fill it into the target filling region. The target filling region is the part of the image to be beautified in the current patch other than the background. Step f. Repeat steps be until all deformed areas are filled to obtain the portrait reshaping result; The CNN model has four output channels, namely optical flow maps ( , Attention heatmap , ); During the training phase, the CNN model uses multi-scale loss to supervise different resolutions of the model separately, gradually approaching the preset target from low resolution to high resolution. The model only calculates the loss for the optical flow map and attention heatmap that are greater than 0. For the differences in deformation of different body parts in the x and y directions in the second portrait image, supervision is carried out in the x and y directions respectively.

2. The CNN-based portrait beautification method according to claim 1, characterized in that: Based on the optical flow of the beautified image, the second portrait image, and the second portrait mask, the deformation area is predicted to obtain the deformation region, which specifically includes: The optical flow of the model is magnified to the resolution of the first portrait image to obtain the magnified result; The magnified results are applied to the first portrait image and the second portrait mask respectively to obtain the first portrait beautification image and the second portrait beautification mask respectively; Based on the first portrait mask, the portrait in the image to be reshaped is removed to obtain an image without a portrait. Based on the second portrait beautification mask, the first portrait beautification image is placed into the image without human figures to obtain the second portrait beautification image; Calculate the difference between the second portrait mask and the second portrait beautification mask, and take the part where the difference is greater than the preset difference as the deformation area; Based on the cropped rectangle, the deformed area is placed into the second portrait reshaping image, and the area to be filled is filled with 0.

3. The CNN-based portrait beautification method according to claim 1, characterized in that: Multi-scale loss specifically refers to the use of loss at N scales, with the optical flow loss at the k-th scale being... for: ,in For the optical flow loss in the x-direction at the current scale, The optical flow loss in the y-direction at the current scale represents the final optical flow loss. This is the average value of the optical flow across N scales, i.e. ; Heatmap loss at the k-th scale for: ,in For the current scale, the heat map in the x-direction, The optical flow loss in the y-direction at the current scale, and the final optical flow thermogram loss. This represents the average loss across N layers of the heatmap, i.e. .

4. The CNN-based portrait beautification method according to claim 3, characterized in that: Loss during CNN model training for: , ; in, For perceptual loss, M is the number of layers selected from the VGG model, H is the height of the corresponding output, and W is the width of the corresponding output. (...) To obtain the output of the k-th layer of the VGG model, X represents the image to be reshaped, and Y represents the result of the portrait reshaping. Optical flow transformation function For optical flow partial loss weights, Loss of optical flow thermogram weights, To perceive loss The weight, and , , .

5. A CNN-based portrait beautification method, characterized in that, Includes the following steps: The process involves acquiring and preprocessing an image to be reshaped to obtain a second portrait image and a second portrait mask. Specifically, this includes: acquiring the image to be reshaped and predicting the portrait region in the image based on a portrait segmentation algorithm to obtain a first portrait mask; calculating the bounding rectangle of the portrait based on the first portrait mask and expanding its width and height according to a preset ratio to obtain a cropping rectangle for the portrait; cropping the image to be reshaped and the first portrait mask based on the cropping rectangle to obtain the first portrait image and the second portrait mask, respectively; and downsampling the first portrait image to obtain a second portrait image of a preset size. The second portrait image is input into a CNN model to predict the aesthetic optical flow, and the aesthetic optical flow is obtained. Based on the optical flow of the model, the second portrait image, and the second portrait mask, the deformation area is predicted and the deformation region is obtained. The deformed area is cut to obtain the area to be filled; Based on the area to be filled, image restoration is performed on the deformed area to obtain a portrait beautification result; The deformed area is segmented to obtain the area to be filled; based on the area to be filled, image restoration is performed on the deformed area to obtain the portrait beautification result, which includes the following steps: Step a. Divide the deformed region into several sections with a side length of A square area to be filled, each area to be filled is represented by a Patch. Select one Patch containing the background of the image to be reshaped from all Patches and fill it as the current Patch. Step b. Calculate the background region in the current patch based on the second portrait beautification mask, and denote it as B. Then calculate the texture gradient of B. Step c. Using the top left corner of the current patch As a reference point, iterate through the areas that do not contain the current patch. The rectangular search area, where, To define the search range, the search region is truncated into several segments with side lengths of... The squares are defined, and each square is considered as a candidate region; Step d. Calculate the background region in each candidate region and denote it as B', then calculate the texture gradient of B'; Step e. Calculate the difference between the background region B' in each candidate region and the background region B in the current patch, and select the candidate region with the smallest difference to fill it into the target filling region. The target filling region is the part of the image to be beautified in the current patch other than the background. Step f. Repeat steps be until all deformed areas are filled to obtain the portrait reshaping result; The CNN model uses a U-shaped network based on encoder-decoder. The decoder is composed of stacked optical flow super-resolution units. Each optical flow super-resolution unit includes at least a feature super-resolution module and a convolutional structure corresponding to the feature super-resolution module. The processing procedure of the optical flow super-resolution unit includes at least the following: The preceding feature map is input into the feature super-resolution module for first processing to obtain the subsequent feature map, which is then passed to the data stream of the next layer. The first processing includes at least stacked convolution, downsampling convolution, magnification convolution, and channel stacking. The preceding feature map is obtained based on the feature extraction processing of the encoder, or based on the previous optical flow super-resolution unit of the current optical flow super-resolution unit. Based on the subsequent feature map, the difference prediction is performed through a set of stacked convolutional structures to obtain the optical flow difference at the current scale. The pre-stage aesthetic optical flow is input to the feature super-resolution module for optical flow amplification. Based on the optical flow difference obtained by difference prediction, the optical flow amplification result is compensated for the difference to obtain the current aesthetic optical flow.

6. The CNN-based portrait beautification method according to claim 5, characterized in that: Based on the optical flow of the beautified image, the second portrait image, and the second portrait mask, the deformation area is predicted to obtain the deformation region, which specifically includes: The optical flow of the model is magnified to the resolution of the first portrait image to obtain the magnified result; The magnified results are applied to the first portrait image and the second portrait mask respectively to obtain the first portrait beautification image and the second portrait beautification mask respectively; Based on the first portrait mask, the portrait in the image to be reshaped is removed to obtain an image without a portrait. Based on the second portrait beautification mask, the first portrait beautification image is placed into the image without human figures to obtain the second portrait beautification image; Calculate the difference between the second portrait mask and the second portrait beautification mask, and take the part where the difference is greater than the preset difference as the deformation area; Based on the cropped rectangle, the deformed area is placed into the second portrait reshaping image, and the area to be filled is filled with 0.

7. A CNN-based portrait beautification system, using the CNN-based portrait beautification method according to any one of claims 1-6, characterized in that, include: The image preprocessing module is used to acquire the image to be beautified and preprocess it to obtain a second portrait image and a second portrait mask; The aesthetic optical flow prediction module is used to input the second portrait image into the CNN model to predict the aesthetic optical flow and obtain the aesthetic optical flow. The deformation area prediction module is used to predict the deformation area based on the optical flow of the aesthetic model, the second portrait image, and the second portrait mask, and obtain the deformation area. The image restoration module is used to cut out the deformed area to obtain the area to be filled; based on the area to be filled, the deformed area is restored to obtain the portrait beautification result.

8. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a CNN-based portrait beautification program, which, when executed by a processor, implements the steps of the CNN-based portrait beautification method as described in any one of claims 1 to 6.