Method and device for obtaining opacity map in portrait matting process
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HUAZHONG UNIV OF SCI & TECH
- Filing Date
- 2024-03-05
- Publication Date
- 2026-06-23
Smart Images

Figure CN118229724B_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the field of image processing technology, and more specifically, relates to a method and apparatus for obtaining an opacity map during portrait cutout. Background Technology
[0002] Portrait matting, which separates the foreground image of a person from the background image to obtain its corresponding opacity map, is widely used in film and television production, post-production photography, advertising design, education, and training. Commonly used portrait matting models are primarily deep learning models, whose learning and generalization abilities heavily depend on the quality and size of the dataset. The design of the annotation tool directly affects the quality of the dataset and the efficiency of the annotators; therefore, designing an efficient and user-friendly annotation tool is crucial.
[0003] The annotation tool can be an opacity map. Currently, there are two common methods for obtaining the baseline real data (i.e., opacity map) for portrait cutout: one is to manually separate the foreground and background images of the portrait using various tools and techniques provided by Adobe Photoshop; the other is to precisely control the shooting conditions by switching between various monochrome backgrounds, keeping the foreground and camera position unchanged, and taking multiple shots, and then obtaining the baseline real data through triangulation techniques.
[0004] Both of the above methods are quite complex to operate and require a high level of expertise from the annotators, making it very difficult to collect real baseline data for portrait matting. Summary of the Invention
[0005] In view of the shortcomings of related technologies, the purpose of this application is to provide a method and apparatus for obtaining opacity maps during portrait cutout, aiming to solve the problem of high difficulty in collecting real reference data for portrait cutout.
[0006] In a first aspect, embodiments of this application provide a method for obtaining an opacity map during portrait cutout, including:
[0007] Obtain the original image to be cut out;
[0008] Perform portrait segmentation and binarization on the original image to obtain the first binary mask; perform hair segmentation and binarization on the original image to obtain the second binary mask;
[0009] Adaptive morphological processing is applied to the first and second binary masks to obtain the initial tripartite image of the portrait foreground;
[0010] The iterative process includes: using the initial triangulation or the triangulation obtained in the previous iteration as prior information, inputting it together with the original image into the trained portrait matting network model to obtain the opacity map output by the portrait matting network model; segmenting and binarizing the opacity map to obtain the third binary mask; and performing adaptive morphological processing on the third binary mask to obtain the triangulation map of the current iteration.
[0011] When the opacity map output by the human portrait matting network model meets the baseline real data standard or reaches the maximum number of iterations, the opacity map of the last iteration process is output.
[0012] In some embodiments, the human portrait matting network model includes: an encoder module, a decoder block module, a bridging block between the embedded encoder module and the decoder module, and a propagation refinement module;
[0013] Obtain the opacity map output by the portrait matting network model, including:
[0014] Prior information and the original image are input into the encoder for feature extraction to obtain the first output, which includes the extracted low-level detail features and high-level semantic features.
[0015] The first output is input to the bridging block to perform a skip link between the encoder module and the decoder module, and the second output is obtained.
[0016] The first and second outputs are connected and input to the decoder for upsampling, and the initial opacity map is output.
[0017] The original image and initial opacity map are input into the propagation thinning module for iterative thinning, and the thinned opacity map is output.
[0018] In some embodiments, the propagation refinement module includes at least two propagation units, each of which includes two ResBlock sub-units and one convolutional LSTM sub-unit.
[0019] In some embodiments, the method further includes:
[0020] Receive the user's first input, which is used to fine-tune the triangulation as prior information; and / or,
[0021] It receives a second input from the user, which is used to fine-tune the opacity map output by the portrait matting network model.
[0022] In some embodiments, the joint training loss of the portrait matting network model includes:
[0023] Prediction loss is used to characterize the absolute difference between the true opacity map and the predicted opacity in uncertain regions.
[0024] Laplace loss is used to characterize the L1 distance between the Laplace pyramid of the true opacity and the predicted opacity in uncertain regions.
[0025] Synthesis loss is used to characterize the absolute difference between the original image and the synthesized image, which is generated based on predicted opacity, foreground image, and background image.
[0026] In some embodiments, adaptive morphological processing includes:
[0027] A distance map is obtained by performing a distance transformation on the target binary mask, and the maximum value of the distance map is used as the human image size parameter; the target binary mask can be a first binary mask, a second binary mask, or a third binary mask;
[0028] Adaptive parameters are obtained based on human portrait size parameters, including region dilation parameters and erosion parameters;
[0029] Based on adaptive parameters, the target binary mask is dilated and eroded to obtain the fourth binary mask after dilation and erosion.
[0030] The foreground region of the three-part image is determined based on the target binary mask, and the uncertain region of the three-part image is determined based on the target binary mask and the fourth binary mask. The region other than the foreground region and the uncertain region is used as the background region of the three-part image.
[0031] Secondly, embodiments of this application also provide a device for obtaining an opacity map during portrait cutout, comprising:
[0032] The original image acquisition unit is used to acquire the original image to be cut out;
[0033] The segmentation and binarization processing unit is used to perform portrait segmentation and binarization processing on the original image to obtain a first binary mask; and to perform hair segmentation and binarization processing on the original image to obtain a second binary mask.
[0034] The morphological processing unit is used to perform adaptive morphological processing on the first binary mask and the second binary mask to obtain the initial tripartite image of the portrait foreground.
[0035] The iteration unit is used to execute the iteration process, which includes: using the initial triangulation or the triangulation obtained in the previous iteration as prior information, inputting it together with the original image into the trained portrait matting network model to obtain the opacity map output by the portrait matting network model; segmenting and binarizing the opacity map to obtain the third binary mask; and performing adaptive morphological processing on the third binary mask to obtain the triangulation map of the current iteration.
[0036] The opacity map output unit is used to output the opacity map of the last iteration when the opacity map output by the portrait matting network model meets the baseline real data standard or reaches the maximum number of iterations.
[0037] Thirdly, embodiments of this application provide an electronic device, including: at least one memory for storing a program; and at least one processor for executing the program stored in the memory. When the program stored in the memory is executed, the processor is used to execute the method described in the first aspect or any possible implementation of the first aspect.
[0038] Fourthly, embodiments of this application provide a computer-readable storage medium storing a computer program that, when run on a processor, causes the processor to perform the method described in the first aspect or any possible implementation thereof.
[0039] Fifthly, embodiments of this application provide a computer program product that, when run on a processor, causes the processor to perform the method described in the first aspect or any possible implementation thereof.
[0040] The method and apparatus for obtaining opacity maps during portrait matting provided in this application embodiment perform portrait segmentation and hair segmentation and binarization processing on the original image to obtain a first binary mask and a second binary mask. Then, adaptive morphological processing is performed on the binary masks to obtain an initial triangulation map, which yields more accurate prior information. Then, a pre-trained portrait matting network model is used to obtain the opacity map and iterates until the iteration conditions are met, and then the opacity map is output as the baseline real data, realizing the automated, accurate and fast acquisition of the baseline real data. Attached Figure Description
[0041] To more clearly illustrate the technical solutions in this application or related technologies, the accompanying drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0042] Figure 1 This is one of the flowcharts illustrating the method for obtaining the opacity map during the portrait cutout process provided in this application embodiment;
[0043] Figure 2 This is the second flowchart illustrating the method for obtaining the opacity map during portrait cutout provided in this application embodiment;
[0044] Figure 3This is the third flowchart illustrating the method for obtaining the opacity map during portrait cutout provided in this application embodiment;
[0045] Figure 4 This is a schematic diagram of the structure of the device for obtaining the opacity map during the portrait cutout process provided in the embodiments of this application;
[0046] Figure 5 This is a schematic diagram of the structure of the electronic device provided in the embodiments of this application. Detailed Implementation
[0047] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0048] In this article, the term "and / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. The symbol " / " in this article indicates that the related objects are in an "or" relationship; for example, A / B means A or B.
[0049] The terms "first" and "second," etc., used in the specification and claims herein are used to distinguish different objects, not to describe a specific order of objects. For example, "first response message" and "second response message," etc., are used to distinguish different response messages, not to describe a specific order of response messages.
[0050] Figure 1 This is a flowchart illustrating the method for obtaining the opacity map during portrait cutout provided in this application embodiment. Figure 1 As shown, the method includes at least the following steps:
[0051] S101. Obtain the original image to be cut out.
[0052] S102. Perform portrait segmentation and binarization on the original image to obtain the first binary mask; perform hair segmentation and binarization on the original image to obtain the second binary mask.
[0053] Specifically, image binarization involves setting the grayscale values of pixels in an image to 0 and 255, resulting in a visual effect where the entire image displays only black and white. The purpose of binarization is to separate the target of interest from the background. In this embodiment, the target of interest is the human figure region and the hair region in the original image. Human figure segmentation and binarization, as well as hair segmentation and binarization, can be implemented using any general-purpose network learning model. In practical applications, the human figure segmentation model and the hair segmentation model can be continuously updated to obtain more accurate initial trisection results.
[0054] S103. Perform adaptive morphological processing on the first binary mask and the second binary mask to obtain the initial tripartite image of the portrait foreground.
[0055] Specifically, by performing adaptive morphological processing on the first and second binary masks, each pixel in the original image can be divided into the foreground region, the uncertain region, and the background region to obtain the initial three-part image of the portrait foreground.
[0056] S104. Execute the iterative process, which includes: using the initial tri-image or the tri-image obtained in the previous iteration as prior information, inputting it together with the original image into the trained portrait matting network model to obtain the opacity map output by the portrait matting network model; segmenting and binarizing the opacity map to obtain the target binary mask; performing adaptive morphological processing on the target binary mask to obtain the tri-image of the current iteration.
[0057] Specifically, for the first iteration, the initial three-part image is used as prior information and input together with the original image into the trained portrait matting network model. For subsequent iterations, the three-part image obtained in the previous iteration is used as prior information and input together with the original image into the trained portrait matting network model to obtain the opacity map output by the model in the current iteration. The opacity map output by the model is segmented and binarized to obtain the target binary mask, and then adaptive morphological processing is performed to finally output the three-part image, completing one complete iteration process.
[0058] S105. When the opacity map output by the portrait matting network model meets the baseline real data standard or reaches the maximum number of iterations, output the opacity map of the last iteration process.
[0059] Specifically, when the opacity map output by the model during the iteration process meets the baseline real data annotation, or when the maximum number of iterations is reached, the opacity map output by the model during the last iteration is output, which serves as both the image matting result and the baseline real data annotation result.
[0060] The method for obtaining opacity maps during portrait matting provided in this application embodiment performs portrait segmentation and hair segmentation and binarization on the original image to obtain a first binary mask and a second binary mask. Then, adaptive morphological processing is performed on the binary masks to obtain an initial triangulation map, which yields more accurate prior information. Then, a pre-trained portrait matting network model is used to obtain the opacity map and iterates until the iteration conditions are met, and then the opacity map is output as the baseline real data, realizing the automated, accurate and fast acquisition of the baseline real data.
[0061] In some embodiments, the method for obtaining the opacity map during portrait cutout further includes:
[0062] Receive the user's first input, which is used to fine-tune the triangulation as prior information; and / or,
[0063] It receives a second input from the user, which is used to fine-tune the opacity map output by the portrait matting network model.
[0064] Specifically, before inputting the tri-image, which serves as prior information, into the portrait matting network model, the category labels of local pixels in the tri-image can be adjusted by the annotator to obtain more accurate prior information.
[0065] It is conceivable that before segmenting and binarizing the opacity map output by the portrait matting network model, the semantic information of local pixels can be changed by the fine-tuning operation of the opacity map by the annotator.
[0066] Fine-tuning operations can specifically involve adding annotations in the form of polygons, rectangles, circles, polylines, line segments, or points to pixels in a local area, changing the semantic category of pixels, such as correcting foreground pixels that are misclassified as background pixels to foreground pixels, or modifying the pixel range of uncertain areas, etc.
[0067] The method for obtaining opacity maps during portrait matting provided in this application combines deep learning algorithms and prediction results with simple manual fine-tuning to quickly obtain baseline real data, reducing the difficulty of annotation work during the matting process and improving annotation efficiency.
[0068] Figure 2 This is the second flowchart illustrating the method for obtaining the opacity map during portrait cutout provided in this application embodiment. Figure 2 As shown, the portrait matting network model specifically includes: an encoder, a decoder, a bridging block between the encoder and decoder, and a propagation refinement module; S104 obtains the opaque image output by the portrait matting network model, specifically including:
[0069] Prior information and the original image are input into the encoder for feature extraction to obtain the first output, which includes the extracted low-level detail features and high-level semantic features.
[0070] The first output is input to the bridging block to perform a skip link between the encoder module and the decoder module, and the second output is obtained.
[0071] The first and second outputs are connected and input to the decoder for upsampling, and the initial opacity map is output.
[0072] The original image and initial opacity map are input into the propagation thinning module for iterative thinning, and the thinned opacity map is output.
[0073] Specifically, the portrait matting network is an encoder-decoder network containing skip connections. It predicts a binary mask of the portrait foreground based on the original image. The portrait matting network model includes an encoder module, a decoder block module, a bridging block between the encoder module and the decoder module, and a propagation refinement module.
[0074] The encoder module is responsible for extracting rich semantic and detail features from the portrait image samples and prior information (i.e., the tri-image) to obtain the first output; the decoder module upsamples the first output of the encoder module, retains necessary local detail information, and predicts the preliminary opacity map; the propagation refinement module consists of three propagation units, which further refines the output of the decoder module to obtain a more detailed prediction.
[0075] The input to the encoder module is transformed into downsampled feature maps through subsequent convolutional and max-pooling layers. Specifically, the encoder module has 14 convolutional layers and 5 max-pooling layers.
[0076] The decoder module uses non-pooling layers, inverse max-pooling operations, and convolutional layers to upsample the feature maps and output a coarse opacity map. The decoder module uses a smaller structure than the encoder network to reduce the number of parameters and speed up the training process. Specifically, the decoder module has 6 convolutional layers, 5 non-pooling layers, and a final alpha prediction layer.
[0077] A bridging block is inserted between the encoder and decoder modules to utilize the local context in different receptive fields. For example... Figure 2 As shown, the bridging block consists of three dilated convolutional layers. The features output by the encoder module and the bridging block are concatenated and input to the decoder module. Following the style of U-net, this embodiment uses skip links between the encoder and decoder modules to preserve fine details.
[0078] The propagation refinement module contains at least two propagation units. Each propagation unit consists of two ResBlocks and one convolutional LSTM subunit, and its input is the original image and the output of the decoder module. For example... Figure 2 As shown, the propagation refinement module contains three propagation units that, in each iteration, take the input image, fused features, and the previous opacity propagation result as input. ResBlocks extract features from the input, while the convolutional LSTM retains memory between propagation steps. The propagation units progressively refine the predicted opacity map, producing a final result with more accurate edge details and fewer undesirable artifacts.
[0079] In some embodiments, the joint training loss of the portrait matting IoT model includes:
[0080] Prediction loss is used to calculate the absolute difference between the true opacity and the predicted opacity of uncertain regions in the input tri-plot.
[0081] The Laplacian loss is used to calculate the L1 distance between the Laplacian pyramids of the true opacity and the predicted opacity of uncertain regions in the input tri-plot.
[0082] Synthesis loss is used to calculate the absolute difference between the original image and the synthesized image, which is generated based on predicted opacity, foreground image, and background image.
[0083] Specifically, the joint loss function used in the training of the portrait matting network model is the prediction loss L. T Laplace loss L lap and synthesis loss L com sum:
[0084] L = L T +L lap +L com
[0085] Predicted loss L T α is used to characterize the true opacity in uncertain regions. p And predicting opacity α p The absolute difference between them is as follows:
[0086]
[0087] Where i represents the pixel index, W i T ∈{0,1) indicates whether pixel i belongs to the uncertain region. To calculate stability, let ε = 10. -6 .
[0088] Laplace loss L lap α is used to characterize the true opacity of uncertain regions.p And predicting opacity α p The L1 distance between the Laplace pyramids is as follows:
[0089]
[0090] Among them, Lap k This represents the kth level of the Laplace pyramid.
[0091] Synthetic loss L com , used to characterize the input image I and the predicted opacity α p The absolute differences between the composite images generated from the foreground image F and the background image B are as follows:
[0092] L com =||α p F+(1-α p )BI||
[0093] In some embodiments, adaptive morphological processing specifically includes:
[0094] A distance transformation is performed on the hair binary mask to obtain a distance map, and the maximum value of the distance map is used as the human image size parameter; the target binary mask can be a first binary mask, a second binary mask, or a third binary mask;
[0095] Adaptive parameters are obtained based on human portrait size parameters, including region dilation parameters and erosion parameters;
[0096] Based on adaptive parameters, the target binary mask is dilated and eroded to obtain the fourth binary mask after dilation and erosion.
[0097] The foreground region of the three-part image is determined based on the target binary mask, and the uncertain region of the three-part image is determined based on the target binary mask and the fourth binary mask. The region other than the foreground region and the uncertain region is used as the background region of the three-part image.
[0098] Specifically, adaptive morphological processing of a binary mask to obtain a three-part image of the foreground of a portrait refers to generating a three-part image by adaptively dilating and eroding the binary mask. That is, the appropriate morphological parameters are calculated based on the size of the foreground of the image, thereby performing partitioned dilution and erosion on the portrait or hair.
[0099] First, a distance map is obtained by performing a distance transform on the binary mask of the target. The maximum value of the distance map is used as the portrait size parameter. The distance transform describes the distance between a pixel in the image and a certain region. Pixels in a region have a value of 0, pixels in neighboring regions have smaller values, and pixels farther away from a region have larger values.
[0100] Adaptive parameters are obtained based on the human image size parameters. These adaptive parameters specifically include region dilation parameters and erosion parameters. The head region dilation parameters satisfy:
[0101] head_dilate_intersize=D*head_parameter / 100
[0102] Among them, head_dilate_intersize is an adjustable head region dilation parameter, which is generally set to 3.5.
[0103] The body region expansion parameters satisfy:
[0104] body_dilate_intersize=D*body_parameter / 100
[0105] Among them, body_dilate_intersize is an adjustable body region dilation parameter, which is generally set to 1.5.
[0106] Corrosion parameters satisfy:
[0107] erode intersize =D*body_parameter / 100
[0108] Furthermore, based on the aforementioned adaptive parameters, the binary masks for hair and portrait are subjected to dilation and erosion processing respectively to obtain a fourth binary mask after dilation and erosion processing.
[0109] The original binary mask is used as the foreground region of the tri-image. The result of subtracting the original binary mask from the fourth binary mask after dilation and erosion is used as the uncertain region of the tri-image. The remaining pixels are used as the background region of the tri-image.
[0110] Optionally, the binarization rules during the iteration process are as follows:
[0111]
[0112] Where m is the binarized human face mask, and α is the (fine-tuned) opacity map. A three-part image is generated from this human face mask through dilation and erosion operations, and the adaptive parameters are the body region dilation and erosion parameters obtained in S102.
[0113] The technical solution provided in the embodiments of this application will be further illustrated below through a specific example.
[0114] Figure 3 This is the third flowchart illustrating the method for obtaining the opacity map during portrait cutout provided in this application embodiment. Figure 3 As shown, the method includes at least:
[0115] Step a: Obtain the image to be labeled;
[0116] Step b: Perform portrait segmentation and hair segmentation on the image to obtain the relevant binary masks for the portrait and hair;
[0117] Step c: Perform morphological processing on the two segmented outputs to generate a tripartite image of the foreground of the portrait;
[0118] Step d: The annotator makes appropriate minor adjustments to the three-part diagram (this step can be omitted);
[0119] Step e: Input the three-part image and the original image into the portrait matting network to obtain the opacity map predicted by the algorithm;
[0120] Step f: The annotator makes minor adjustments to the opacity graph (this step can be omitted);
[0121] Step g: Binarize the fine-tuned opacity map to obtain the corresponding binary mask, and then generate the corresponding triangulation map. Repeat steps e to g until the opacity map meets the accuracy of the baseline real data, then the annotation is complete.
[0122] In this example, the training dataset used to train the portrait matting network model consists of portrait image samples. Each original portrait image may contain one or more portrait foregrounds. After acquiring the original image, portrait segmentation and hair segmentation are performed to obtain a portrait binary mask and a hair binary mask. Next, adaptive morphological processing, including adaptive dilation and erosion, is applied to the two binary masks to obtain a three-part image of the portrait foreground. The annotator can fine-tune this three-part image to change the semantic information of the pixels and input it into the portrait matting network to predict the opacity map. The annotator can also fine-tune the opacity map or directly use the predicted opacity map as prior information to re-input it into the portrait matting network for prediction, repeating steps e to g until the accuracy requirements of the benchmark real data are met.
[0123] Figure 4 This is a schematic diagram of the structure of the device for obtaining the opacity map during the portrait cutout process provided in the embodiments of this application, as shown below. Figure 4 As shown, the device includes at least:
[0124] The original image acquisition unit 401 is used to acquire the original image to be cut out;
[0125] The segmentation and binarization processing unit 402 is used to perform portrait segmentation and binarization processing on the original image to obtain a first binary mask; and to perform hair segmentation and binarization processing on the original image to obtain a second binary mask.
[0126] The morphological processing unit 403 is used to perform adaptive morphological processing on the first binary mask and the second binary mask to obtain the initial tripartite image of the portrait foreground.
[0127] The iteration unit 404 is used to execute the iteration process, which includes: using the initial tri-image or the tri-image obtained in the previous iteration as prior information, inputting it together with the original image into the trained portrait matting network model to obtain the opacity map output by the portrait matting network model; performing segmentation and binarization processing on the opacity map to obtain the third binary mask; and performing adaptive morphological processing on the third binary mask to obtain the tri-image of the current iteration.
[0128] The opacity map output unit 405 is used to output the opacity map of the last iteration when the opacity map output by the portrait matting network model meets the baseline real data standard or reaches the maximum number of iterations.
[0129] In some embodiments, the portrait matting network model includes: an encoder module, a decoder block module, a bridging block between the embedded encoder module and the decoder module, and a propagation refinement module; the iteration unit 404 is specifically used for:
[0130] Prior information and the original image are input into the encoder for feature extraction to obtain the first output, which includes the extracted low-level detail features and high-level semantic features.
[0131] The first output is input to the bridging block to perform a skip link between the encoder module and the decoder module, and the second output is obtained.
[0132] The first and second outputs are connected and input to the decoder for upsampling, and the initial opacity map is output.
[0133] The original image and initial opacity map are input into the propagation thinning module for iterative thinning, and the thinned opacity map is output.
[0134] In some embodiments, the propagation refinement module includes at least two propagation units, each of which includes two ResBlock sub-units and one convolutional LSTM sub-unit.
[0135] In some embodiments, the device further includes a user input receiving unit for:
[0136] Receive the user's first input, which is used to fine-tune the triangulation as prior information; and / or,
[0137] It receives a second input from the user, which is used to fine-tune the opacity map output by the portrait matting network model.
[0138] In some embodiments, the joint training loss of the portrait matting network model includes:
[0139] Prediction loss is used to characterize the absolute difference between the true opacity map and the predicted opacity in uncertain regions.
[0140] Laplace loss is used to characterize the L1 distance between the Laplace pyramid of the true opacity and the predicted opacity in uncertain regions.
[0141] Synthesis loss is used to characterize the absolute difference between the original image and the synthesized image, which is generated based on predicted opacity, foreground image, and background image.
[0142] In some embodiments, adaptive morphological processing includes:
[0143] A distance map is obtained by performing a distance transformation on the target binary mask, and the maximum value of the distance map is used as the human image size parameter; the target binary mask can be a first binary mask, a second binary mask, or a third binary mask;
[0144] Adaptive parameters are obtained based on human portrait size parameters, including region dilation parameters and erosion parameters;
[0145] Based on adaptive parameters, the target binary mask is dilated and eroded to obtain the fourth binary mask after dilation and erosion.
[0146] The foreground region of the three-part image is determined based on the target binary mask, and the uncertain region of the three-part image is determined based on the target binary mask and the fourth binary mask. The region other than the foreground region and the uncertain region is used as the background region of the three-part image.
[0147] It is understood that the detailed functional implementation of each of the above units / modules can be found in the description in the aforementioned method embodiments, and will not be repeated here.
[0148] It should be understood that the above-described device is used to execute the methods in the above embodiments. The implementation principle and technical effect of the corresponding program modules in the device are similar to those described in the above methods. The working process of the device can be referred to the corresponding process in the above methods, and will not be repeated here.
[0149] Based on the methods described in the above embodiments, this application provides an electronic device. The device may include at least one memory for storing a program and at least one processor for executing the program stored in the memory. When the program stored in the memory is executed, the processor performs the methods described in the above embodiments.
[0150] Figure 5 This is a schematic diagram of the structure of the electronic device provided in the embodiments of this application, such as... Figure 5As shown, the electronic device may include a processor 501, a communications interface 520, a memory 503, and a communication bus 504. The processor 501, communications interface 502, and memory 503 communicate with each other via the communication bus 504. The processor 501 can call software instructions stored in the memory 503 to execute the methods described in the above embodiments.
[0151] Furthermore, the logical instructions in the aforementioned memory 503 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to related technologies, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods in the various embodiments of this application.
[0152] Based on the methods in the above embodiments, this application provides a computer-readable storage medium storing a computer program that, when run on a processor, causes the processor to execute the methods in the above embodiments.
[0153] Based on the methods in the above embodiments, this application provides a computer program product that, when run on a processor, causes the processor to execute the methods in the above embodiments.
[0154] It is understood that the processor in the embodiments of this application can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. A general-purpose processor can be a microprocessor or any conventional processor.
[0155] The method steps in this application embodiment can be implemented in hardware or by a processor executing software instructions. The software instructions can consist of corresponding software modules, which can be stored in random access memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disks, portable hard disks, CD-ROMs, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor, enabling the processor to read information from and write information to the storage medium. Of course, the storage medium can also be a component of the processor. The processor and the storage medium can reside in an ASIC.
[0156] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product. A computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the flow or function according to the embodiments of this application is generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in or transmitted through a computer-readable storage medium. The computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that integrates one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state disk (SSD)).
[0157] It is understood that the various numerical designations used in the embodiments of this application are merely for the convenience of description and are not intended to limit the scope of the embodiments of this application.
[0158] Those skilled in the art will readily understand that the above description is merely a preferred embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this application should be included within the scope of protection of this application.
Claims
1. A method for obtaining an opacity map during portrait cutout, characterized in that, include: Obtain the original image to be cut out; The original image is segmented and binarized to obtain a first binary mask; The original image is segmented and binarized to obtain a second binary mask; Adaptive morphological processing is performed on the first binary mask and the second binary mask to obtain the initial tripartite image of the portrait foreground; The process involves performing an iterative process, which includes: using the initial triangulation or the triangulation obtained in the previous iteration as prior information, inputting it along with the original image into a trained portrait matting network model to obtain an opacity map output by the portrait matting network model; segmenting and binarizing the opacity map to obtain a third binary mask; and performing adaptive morphological processing on the third binary mask to obtain the triangulation map for the current iteration. When the opacity map output by the portrait matting network model meets the benchmark real data standard or reaches the maximum number of iterations, the opacity map of the last iteration process is output. The portrait matting network model includes: an encoder module, a decoder module, a bridging block embedded between the encoder module and the decoder module, and a propagation refinement module; The step of obtaining the opacity map output by the portrait matting network model includes: The prior information and the original image are input into the encoder for feature extraction to obtain a first output, which includes the extracted low-level detail features and high-level semantic features. The first output is input to the bridging block to perform a skip link between the encoder module and the decoder module, and the second output is obtained. The first output and the second output are connected and then input to the decoder for upsampling, and an initial opacity map is output. The original image and the initial opacity map are input into the propagation thinning module for iterative thinning, and the thinned opacity map is output. The adaptive morphological processing includes: A distance map is obtained by performing a distance transformation on the target binary mask, and the maximum value of the distance map is used as the human image size parameter; the target binary mask is the first binary mask, the second binary mask, or the third binary mask; Adaptive parameters are obtained based on the portrait size parameters, and the adaptive parameters include region dilation parameters and erosion parameters. Based on the adaptive parameters, the target binary mask is dilated and eroded to obtain a fourth binary mask after dilation and erosion. The foreground region of the three-part image is determined based on the target binary mask, and the uncertain region of the three-part image is determined based on the target binary mask and the fourth binary mask. The region other than the foreground region and the uncertain region is used as the background region of the three-part image.
2. The method for obtaining the opacity map during portrait cutout according to claim 1, characterized in that, The propagation refinement module includes at least two propagation units, each of which includes two ResBlock sub-units and one convolutional LSTM sub-unit.
3. The method for obtaining the opacity map during portrait cutout according to claim 1, characterized in that, The method further includes: Receive first input from the user, which is used to fine-tune the triangulation as prior information; and / or, The system receives a second input from the user, which is used to fine-tune the opacity map output by the portrait matting network model.
4. The method for obtaining the opacity map during portrait cutout according to claim 1, characterized in that, The joint training loss of the portrait matting network model includes: Prediction loss is used to characterize the absolute difference between the true opacity map and the predicted opacity in uncertain regions. Laplace loss is used to characterize the difference between the true opacity and the predicted opacity in uncertain regions using the Laplace pyramid. distance; Synthesis loss is used to characterize the absolute difference between the original image and the synthesized image, which is generated based on predicted opacity, foreground image, and background image.
5. A device for acquiring an opacity map during portrait cutout process, characterized in that, The method for obtaining the opacity map during portrait cutout as described in any one of claims 1-4 includes: The original image acquisition unit is used to acquire the original image to be cut out; The segmentation and binarization processing unit is used to perform portrait segmentation and binarization processing on the original image to obtain a first binary mask; and to perform hair segmentation and binarization processing on the original image to obtain a second binary mask. The morphological processing unit is used to perform adaptive morphological processing on the first binary mask and the second binary mask to obtain the initial tripartite image of the portrait foreground. An iterative unit is used to execute an iterative process, which includes: using the initial triangulation or the triangulation obtained in the previous iteration as prior information, inputting it together with the original image into a trained portrait matting network model to obtain an opacity map output by the portrait matting network model; segmenting and binarizing the opacity map to obtain a third binary mask; and performing adaptive morphological processing on the third binary mask to obtain the triangulation map of the current iteration. The opacity map output unit is used to output the opacity map of the last iteration process when the opacity map output by the portrait matting network model meets the benchmark real data standard or reaches the maximum number of iterations.
6. An electronic device, characterized in that, include: At least one memory for storing computer programs; At least one processor is configured to execute a program stored in the memory, wherein when the program stored in the memory is executed, the processor is configured to perform the method as described in any one of claims 1-4.
7. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is run on the processor, it causes the processor to perform the method as described in any one of claims 1-4.
8. A computer program product, characterized in that, When the computer program product is run on a processor, the processor causes the processor to perform the method as described in any one of claims 1-4.