Training method of image inpainting model and related equipment

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By combining image fusion and deformation processing with image restoration model training, the problem of unrealistic image restoration effects in existing technologies has been solved, achieving high-quality facial fusion and automatic removal of target elements.

CN115953324BActive Publication Date: 2026-06-16BEIJING DAJIA INTERNET INFORMATION TECH CO LTD

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: BEIJING DAJIA INTERNET INFORMATION TECH CO LTD
Filing Date: 2023-01-28
Publication Date: 2026-06-16

Application Information

Patent Timeline

28 Jan 2023

Application

16 Jun 2026

Publication

CN115953324B

IPC: G06T5/77; G06T5/50

CPC: Y02T10/40

AI Tagging

Application Domain

Image enhancement Internal combustion piston engines

Technical Efficacy Phrases

improve accuracy increase authenticity

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In image restoration, existing technologies directly remove target elements, resulting in an overall unrealistic and unnatural image quality, making it difficult to achieve high-quality facial fusion and automatic removal of target elements.

Method used

By acquiring facial images of the first object without the target element and the second object with the target element, image fusion processing is performed to determine deformation parameters. After segmentation and deformation processing, a target fused image is generated. The initial image restoration model is then trained, and the model parameters are adjusted until a predicted image consistent with the original image is generated.

Benefits of technology

It improves the accuracy and realism of image restoration models, making the generated fused images more realistic, and can automatically remove target elements while keeping other parts unchanged.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN115953324B_ABST

Patent Text Reader

Abstract

Embodiments of the present disclosure provide a method for training an image inpainting model and related equipment. The method comprises: obtaining a first facial image of a first object and a second facial image of a second object, the first facial image not having a target element, and the second facial image having the target element; performing image fusion processing on the first facial image and the second facial image to obtain a first fused image of the first object having the target element; inputting the first fused image into an initial image inpainting model to obtain a predicted image in which the target element is removed; training the initial image inpainting model according to the predicted image and the first facial image, and determining the trained initial image inpainting model as the image inpainting model. The method can improve the fusion effect of the face of the first object and the target element in the first fused image, and generate a more realistic fused image; and can automatically remove the target element in the input image while keeping other parts unchanged, thereby improving the authenticity of the image inpainting model.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of computer technology, and more specifically, to a training method for an image restoration model, an image restoration method, a training device for an image restoration model, an image restoration device, an electronic device, and a computer-readable storage medium. Background Technology

[0002] With the rapid development of computer vision technology, image inpainting has become one of the key research areas in the field. Special effects often involve scenes that require the removal or editing of parts of a person's face, such as removing hair or beard; after editing, appropriate restoration is needed, such as adding back bald scalp or missing chin.

[0003] In related technologies, the common approach is to directly remove the area where the target element is located in the image, and then use a texture to occlude the area where the target element is located, or use a background image near the area to fill in the area where the target element is located. The overall effect of the image obtained by this method is fake and has poor realism. Summary of the Invention

[0004] This disclosure provides a training method for an image restoration model, an image restoration method, a training device for an image restoration model, an image restoration device, an electronic device, and a computer-readable storage medium. On the one hand, this method can improve the fusion effect of the face of the first object and the target element in the first fused image, making the generated first fused image more realistic. On the other hand, it can automatically remove the target element in the input image while keeping other parts unchanged, thereby improving the accuracy and realism of the image restoration model.

[0005] This disclosure provides a method for training an image restoration model. The method includes: acquiring a first facial image of a first object and a second facial image of a second object, wherein the first facial image does not contain a target element, and the second facial image contains the target element; performing image fusion processing on the first facial image and the second facial image to obtain a first fused image of the first object containing the target element; inputting the first fused image into an initial image restoration model to obtain a predicted image with the target element removed; and training the initial image restoration model based on the predicted image and the first facial image, so as to determine the trained initial image restoration model as the image restoration model.

[0006] In some exemplary embodiments of this disclosure, performing image fusion processing on the first facial image and the second facial image to obtain a first fused image of the first object having the target element includes: determining deformation parameters from the second facial image to the first facial image based on the first facial image and the second facial image; performing segmentation processing on the second facial image to obtain a target image of the region where the target element is located; performing deformation processing on the target image based on the deformation parameters to obtain a target deformed image; and performing fusion processing on the first facial image and the target deformed image to obtain a first fused image of the first object having the target element.

[0007] In some exemplary embodiments of this disclosure, determining deformation parameters from the second facial image to the first facial image based on the first facial image and the second facial image includes: performing keypoint matching on the first facial image and the second facial image to obtain a plurality of first keypoints in the first facial image and a plurality of second keypoints in the second facial image, wherein the plurality of first keypoints and the plurality of second keypoints correspond one-to-one; and determining deformation parameters from the second facial image to the first facial image based on the coordinates of the plurality of first keypoints and the coordinates of the plurality of second keypoints.

[0008] In some exemplary embodiments of this disclosure, inputting the first fused image into an initial image inpainting model to obtain a predicted image with the target element removed includes: inputting the first fused image into the initial image inpainting model, wherein the initial image inpainting model performs completion processing on the region where the target element is located in the first fused image and outputs a predicted image with the target element removed; training the initial image inpainting model based on the predicted image and the first facial image, and determining the trained initial image inpainting model as the image inpainting model, includes: adjusting the model parameters of the initial image inpainting model based on the difference between the predicted image and the first facial image until the difference between the predicted image generated by the initial image inpainting model and the first facial image meets a preset condition, completing the training of the initial image inpainting model, and using the trained initial image inpainting model as the image inpainting model.

[0009] In some exemplary embodiments of this disclosure, the method further includes: acquiring a third facial image of a third object, wherein the third facial image has the target element, and the representation of the target element in the third facial image is different from the representation of the target element in the second facial image; performing image fusion processing on the first facial image and the third facial image to obtain a second fused image of the first object having the target element, and using the first facial image and the second fused image as paired data to train the image restoration model.

[0010] This disclosure provides an image restoration method, comprising: acquiring a facial image to be processed containing a target element; segmenting the facial image to be processed to obtain a target region where the target element is located; filling the target region in the facial image to be processed with a preset color; and inputting the filled facial image to be processed into an image restoration model trained by any of the above methods to obtain a facial image without the target element.

[0011] This disclosure provides a training apparatus for an image restoration model, comprising: an acquisition module configured to acquire a first facial image of a first object and a second facial image of a second object, wherein the first facial image does not contain a target element, and the second facial image contains the target element; an acquisition module configured to perform image fusion processing on the first facial image and the second facial image to obtain a first fused image of the first object containing the target element; and a training module configured to input the first fused image into an initial image restoration model to obtain a predicted image with the target element removed; the training module is further configured to train the initial image restoration model based on the predicted image and the first facial image, so as to determine the trained initial image restoration model as the image restoration model.

[0012] In some exemplary embodiments of this disclosure, the obtaining module is configured to perform: determining deformation parameters from the second facial image to the first facial image based on the first facial image and the second facial image; performing segmentation processing on the second facial image to obtain a target image of the region where the target element is located; performing deformation processing on the target image based on the deformation parameters to obtain a target deformed image; and performing fusion processing on the first facial image and the target deformed image to obtain a first fused image of the first object having the target element.

[0013] In some exemplary embodiments of this disclosure, the obtaining module is configured to perform: key point matching on the first facial image and the second facial image to obtain a plurality of first key points in the first facial image and a plurality of second key points in the second facial image, wherein the plurality of first key points and the plurality of second key points correspond one-to-one; and determine deformation parameters from the second facial image to the first facial image based on the coordinates of the plurality of first key points and the coordinates of the plurality of second key points.

[0014] In some exemplary embodiments of this disclosure, the training module is configured to perform: inputting the first fused image into the initial image restoration model, wherein the initial image restoration model performs completion processing on the region where the target element is located in the first fused image and outputs a predicted image with the target element removed; adjusting the model parameters of the initial image restoration model according to the difference between the predicted image and the first facial image until the difference between the predicted image generated by the initial image restoration model and the first facial image meets a preset condition, thereby completing the training of the initial image restoration model, and using the trained initial image restoration model as the image restoration model.

[0015] In some exemplary embodiments of this disclosure, the acquisition module is further configured to perform: acquiring a third facial image of a third object, wherein the third facial image has the target element, and the representation of the target element in the third facial image is different from the representation of the target element in the second facial image; the acquisition module is further configured to perform: performing image fusion processing on the first facial image and the third facial image to obtain a second fused image of the first object having the target element, and using the first facial image and the second fused image as paired data to train the image restoration model.

[0016] This disclosure provides an image restoration apparatus, comprising: an acquisition module configured to acquire a face image to be processed containing a target element; an obtaining module configured to perform segmentation processing on the face image to be processed to obtain a target region where the target element is located; and a filling module configured to fill the target region in the face image to be processed with a preset color; the obtaining module is further configured to input the filled face image to be processed into an image restoration model trained by any of the above methods to obtain a face image without the target element.

[0017] This disclosure provides an electronic device, including: a processor; and a memory for storing processor-executable instructions; wherein the processor is configured to execute the executable instructions to implement a training method for an image restoration model as described above or an image restoration method as described above.

[0018] This disclosure provides a computer-readable storage medium that, when executed by a processor of an electronic device, enables the electronic device to perform a training method for an image restoration model as described above or an image restoration method as described above.

[0019] This disclosure provides a computer program product, including a computer program that, when executed by a processor, implements a training method for an image restoration model as described above, or an image restoration method as described above.

[0020] The image restoration model training method provided in this disclosure includes, on the one hand, obtaining a first fused image of the first object with target elements based on a first facial image of the first object without target elements and a second facial image of the second object with target elements, which can improve the fusion effect of the face of the first object with target elements in the first fused image, making the generated first fused image more realistic; on the other hand, inputting the first fused image of the first object with target elements into an initial image restoration model to obtain a predicted image with target elements removed; using the first facial image of the first object without target elements as a training label, training the initial image restoration model based on the predicted image and the first facial image, so that the trained image restoration model can automatically remove target elements from the input image while keeping other parts unchanged, thereby improving the accuracy and realism of the image restoration model.

[0021] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Attached Figure Description

[0022] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure. It is obvious that the drawings described below are merely some embodiments of this disclosure, and those skilled in the art can obtain other drawings based on these drawings without any inventive effort.

[0023] Figure 1 A schematic diagram of an exemplary system architecture for training a training method or an image restoration method to which embodiments of the present disclosure can be applied is shown.

[0024] Figure 2 This is a flowchart illustrating a training method for an image restoration model according to an exemplary embodiment.

[0025] Figure 3 This is a flowchart illustrating another method for training an image restoration model according to an exemplary embodiment.

[0026] Figure 4 This is a schematic diagram based on training data for determining an image restoration model, as illustrated in an example.

[0027] Figure 5 This is a flowchart illustrating another method for training an image restoration model according to an exemplary embodiment.

[0028] Figure 6 This is a flowchart illustrating an image restoration method according to an exemplary embodiment.

[0029] Figure 7 This is a schematic diagram of an image restoration model based on an example, showing the input and output images.

[0030] Figure 8 This is a block diagram illustrating a training apparatus for an image restoration model according to an exemplary embodiment.

[0031] Figure 9 This is a block diagram illustrating an image restoration apparatus according to an exemplary embodiment.

[0032] Figure 10 This is a schematic diagram illustrating the structure of an electronic device suitable for implementing exemplary embodiments of the present disclosure, according to an exemplary embodiment. Detailed Implementation

[0033] Exemplary embodiments will now be described more fully with reference to the accompanying drawings. However, these exemplary embodiments can be implemented in many forms and should not be construed as limited to the embodiments set forth herein; rather, they are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the exemplary embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and therefore repeated descriptions of them will be omitted.

[0034] The features, structures, or characteristics described in this disclosure can be combined in any suitable manner in one or more embodiments. Numerous specific details are provided in the following description to give a thorough understanding of embodiments of this disclosure. However, those skilled in the art will recognize that the technical solutions of this disclosure can be practiced with one or more specific details omitted, or other methods, components, apparatuses, steps, etc., can be employed. In other instances, well-known methods, apparatuses, implementations, or operations are not shown or described in detail to avoid obscuring various aspects of this disclosure.

[0035] The accompanying drawings are merely illustrative of this disclosure, and the same reference numerals in the drawings denote the same or similar parts, thus omitting repeated descriptions of them. Some block diagrams shown in the drawings do not necessarily correspond to physically or logically independent entities. These functional entities may be implemented in software, in at least one hardware module or integrated circuit, or in different network and / or processor devices and / or microcontroller devices.

[0036] The flowchart shown in the accompanying drawings is merely illustrative and does not necessarily include all content and steps, nor does it require execution in the described order. For example, some steps may be broken down, while others may be combined or partially combined; therefore, the actual execution order may change depending on the specific circumstances.

[0037] In this specification, the terms “a,” “an,” “the,” “the,” and “at least one” are used to indicate the presence of at least one element / component / etc.; the terms “comprising,” “including,” and “having” are used to indicate an open-ended inclusion and to mean that there may be other elements / components / etc. in addition to the listed elements / components / etc.; the terms “first,” “second,” and “third,” etc., are used only as markings and are not a limitation on the number of objects.

[0038] Figure 1 A schematic diagram of an exemplary system architecture for training a training method or an image restoration method to which embodiments of the present disclosure can be applied is shown.

[0039] like Figure 1 As shown, the system architecture may include server 101, network 102, terminal device 103, terminal device 104, and terminal device 105. Network 102 serves as the medium for providing a communication link between terminal device 103, terminal device 104, or terminal device 105 and server 101. Network 102 may include various connection types, such as wired or wireless communication links or fiber optic cables, etc.

[0040] Server 101 can be a server that provides various services, such as a back-end management server that supports the devices operated by users using terminal devices 103, 104, or 105. The back-end management server can analyze and process received requests and other data, and feed back the processing results to terminal devices 103, 104, or 105.

[0041] Terminal devices 103, 104, and 105 can be smartphones, tablets, laptops, desktop computers, smart speakers, wearable smart devices, virtual reality devices, augmented reality devices, etc., but are not limited to these.

[0042] In this embodiment of the disclosure, server 101 may: acquire a first facial image of a first object and a second facial image of a second object, wherein the first facial image does not contain the target element and the second facial image contains the target element; perform image fusion processing on the first facial image and the second facial image to obtain a first fused image of the first object containing the target element; input the first fused image into an initial image restoration model to obtain a predicted image with the target element removed; and train the initial image restoration model based on the predicted image and the first facial image to determine the trained initial image restoration model as the image restoration model.

[0043] In this embodiment of the present disclosure, server 101 can obtain a facial image to be processed containing target elements from a terminal device; perform segmentation processing on the facial image to be processed to obtain the target region where the target elements are located; fill the target region in the facial image to be processed with a preset color; input the filled facial image to be processed into an image restoration model to obtain a facial image with the target elements removed; server 101 can return the obtained facial image to the terminal device.

[0044] It should be understood that Figure 1 The number of terminal devices 103, 104, 105, network 102, and server 101 in the diagram is merely illustrative. Server 101 can be a single physical server, a server cluster consisting of multiple servers, or a cloud server. Depending on actual needs, it can have any number of terminal devices, networks, and servers.

[0045] The training steps of the image inpainting model in the exemplary embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings and examples. The method provided in the embodiments of this disclosure can be executed by any electronic device, such as the one described above. Figure 1 The server and / or terminal equipment in the process, but this disclosure does not limit this.

[0046] Figure 2 This is a flowchart illustrating a training method for an image restoration model according to an exemplary embodiment. Figure 2 As shown, the training method for the image restoration model provided in this embodiment may include the following steps.

[0047] In step S210, a first facial image of the first object and a second facial image of the second object are obtained, wherein the first facial image does not contain the target element and the second facial image contains the target element.

[0048] In this embodiment of the disclosure, both the first object and the second object can be people, animals, toys, etc. The first facial image refers to an image including the face of the first object, and the second facial image refers to an image including the face of the second object. For example, the first facial image of the first object is the face image of person A, and the second facial image of the second object is the face image of person B.

[0049] In this embodiment of the disclosure, the target element refers to an element that is present on the face or worn on the face. The target element may be one or more of the following: hair element, eyebrow element, glasses element, beard element, and accessory element (such as scarf element, earring element, hat element). In the following example, the target element is described as hair element, but this disclosure is not limited to this.

[0050] In this embodiment of the disclosure, reference is made to Figure 4 The first facial image does not contain the target element, for example, the first facial image is a bald image of person A (401); the second facial image contains the target element, for example, the second facial image is a hair image of person B (402).

[0051] In step S220, the first facial image and the second facial image are subjected to image fusion processing to obtain a first fused image of the first object with the target element.

[0052] In this embodiment of the disclosure, a first facial image without target elements can be processed based on a second facial image with target elements to obtain a first fused image of a first object with target elements; wherein the first fused image refers to the image after the face of the first object is fused with the target elements.

[0053] For example, the image of A's bald head 401 can be processed based on the image of B with hair 402 to obtain the image of A with hair 407.

[0054] Specifically, based on the first facial image and the second facial image, deformation parameters from the deformation of the second facial image to the deformation of the first facial image can be determined; the second facial image can be segmented to obtain a target image of the region where the target element is located in the second facial image; the target image can be deformed based on the deformation parameters to obtain a target deformed image; the first facial image and the target deformed image can be fused to obtain a first fused image of the first object with the target element.

[0055] For example, based on image 401 of A's bald head and image 402 of B's hair, the deformation parameters from image 402 of B's hair to image 401 of A's bald head are determined; image 402 of B's hair is segmented to obtain image 405 of the region where the hair is located; image 405 of the region where the hair is located is deformed based on the deformation parameters to obtain image 406 of the deformed hair region; and image 401 of A's bald head and image 406 of the deformed hair region are fused to obtain image 407 of A's hair.

[0056] In step S230, the first fused image is input into the initial image inpainting model to obtain a predicted image with the target element removed.

[0057] In this embodiment of the disclosure, a first facial image of the first object without the target element and a first fused image of the first object with the target element are used as a pair of paired data. The first facial image without the target element is used as a training label to train the image restoration model. This makes the output predicted image basically consistent with the first facial image when the first fused image with the target element is input into the image restoration model. That is, the trained image restoration model can automatically remove the target element from the input image.

[0058] For example, the image restoration model is trained by using the image of A's bald head 401 and the image of A with hair 407 as a pair of paired data and the image of A's bald head 401 as the training label. This trains the image restoration model so that when the image restoration model is input with the image of A with hair 407, the output predicted image is basically consistent with the image of A's bald head 401. In other words, the trained image restoration model can automatically remove hair elements from the input image.

[0059] In this embodiment of the disclosure, the initial image inpainting model can be any deep learning model applicable to image inpainting, such as an inpainting model or other types of deep learning models; this disclosure does not limit this. Using inpainting techniques for local image inpainting allows the portion of the input image outside the target region to remain unchanged.

[0060] In an exemplary embodiment, a first fused image can be input into an initial image inpainting model, which performs inpainting processing on the region where the target element is located in the first fused image and outputs a predicted image with the target element removed.

[0061] Specifically, the first fused image of the first object containing the target element is input into the initial image inpainting model. The initial image inpainting model processes the first fused image, fills in the region where the target element is located, and outputs the inpainted predicted image.

[0062] For example, the image 407 of person A with hair is input into the image restoration model 408. The image restoration model 408 completes the hair region in the image 407 of person A with hair and outputs the completed predicted image.

[0063] In step S240, an initial image restoration model is trained based on the predicted image and the first facial image, so that the trained initial image restoration model is determined as the image restoration model.

[0064] In this embodiment of the disclosure, the difference between the predicted image and the first facial image of the first object without the target element can be compared. The model parameters are iteratively optimized by the gradient descent algorithm until the initial image restoration model can correctly generate the first facial image, thus completing the training of the initial image restoration model. The trained initial image restoration model is then used as the image restoration model.

[0065] For example, the predicted image after completion is compared with the bald image 401 of A, and the model parameters are iteratively optimized through the gradient descent algorithm until the initial image restoration model can correctly generate the bald image 401 of A.

[0066] In an exemplary embodiment, the model parameters of the initial image restoration model are adjusted according to the difference between the predicted image and the first facial image until the difference between the predicted image generated by the initial image restoration model and the first facial image meets the preset conditions, thereby completing the training of the initial image restoration model, and the trained initial image restoration model is used as the image restoration model.

[0067] Specifically, a loss function is constructed based on the difference between the predicted image and the first facial image. When the difference between the predicted image and the first facial image is not satisfied with the preset conditions (e.g., the difference is large), the model parameters of the initial image restoration model are adjusted. The adjusted initial image restoration model is used to re-complete the first fused image to obtain the adjusted predicted image. The difference between the predicted image and the first facial image is compared with the preset conditions again until the difference between the generated predicted image and the first facial image meets the preset conditions, thus completing the training of the initial image restoration model.

[0068] In this embodiment of the disclosure, the model parameters of the initial image restoration model are adjusted according to the difference between the predicted image and the first facial image until the difference between the predicted image generated by the initial image restoration model and the first facial image meets the preset conditions, thereby improving the accuracy and realism of the trained image restoration model.

[0069] In this embodiment of the disclosure, a first fused image is input to an initial image inpainting model. The initial image inpainting model can predict the semantic segmentation of a first object in the first fused image, so as to obtain a predicted image with the target element removed based on the semantic segmentation of the first object. Predicting the semantic segmentation of the first object in the first fused image may include predicting whether similar points in the first fused image belong to the target part of the first object (e.g., a shoulder), thereby identifying the target part of the first object in the first fused image, making the target part of the first object in the generated predicted image more natural and realistic.

[0070] The image restoration model training method provided in this disclosure includes, on the one hand, obtaining a first fused image of the first object with target elements based on a first facial image of the first object without target elements and a second facial image of the second object with target elements, which can improve the fusion effect of the face of the first object and target elements in the first fused image, making the generated first fused image more realistic; on the other hand, inputting the first fused image of the first object with target elements into an initial image restoration model to obtain a predicted image with target elements removed; using the first facial image of the first object without target elements as a training label, training the initial image restoration model based on the predicted image and the first facial image, so that the trained image restoration model can automatically remove target elements from the input image while keeping other parts unchanged, thereby improving the accuracy and realism of the image restoration model.

[0071] Figure 3 This is a flowchart illustrating another method for training an image restoration model according to an exemplary embodiment. Figure 3 The specific steps are shown to perform image fusion processing on a first facial image and a second facial image to obtain a first fused image of a first object with target elements.

[0072] exist Figure 3 In the embodiments, the above Figure 2 Step S220 in the embodiment may further include the following steps.

[0073] In step S221, the deformation parameters from the second facial image to the first facial image are determined based on the first facial image and the second facial image.

[0074] In this embodiment of the disclosure, the deformation parameter refers to the deformation parameter from the second facial image to the first facial image; the deformation parameter can be determined by the least squares method based on the coordinates of key points in the first and second facial images.

[0075] In an exemplary embodiment, determining the deformation parameters from the second face image to the first face image based on the first face image and the second face image includes: performing key point matching on the first face image and the second face image to obtain a plurality of first key points in the first face image and a plurality of second key points in the second face image, wherein the plurality of first key points and the plurality of second key points correspond one-to-one; and determining the deformation parameters from the second face image to the first face image based on the coordinates of the plurality of first key points and the coordinates of the plurality of second key points.

[0076] Specifically, keypoint matching is performed on the first and second facial images to obtain multiple first keypoints in the first facial image and multiple second keypoints in the second facial image that match the multiple first keypoints one-to-one. The first and second keypoints can be, for example, keypoints for the left eye, right eye, nose, mouth, and eyebrows. Then, the horizontal and vertical coordinates of each first and second keypoint are determined. Based on these coordinates, deformation parameters are determined from the second facial image to the first facial image. These deformation parameters can be the six parameters representing translation, scaling, and rotation in the affine transformation formula. Specifically, the horizontal and vertical coordinates of each first and second keypoint can be substituted into the affine transformation formula, and the deformation parameters can be calculated using the least squares method.

[0077] For example, refer to Figure 4 After obtaining the bald head image 401 of A and the hairy head image 402 of B, key point matching is performed on the bald head image 401 of A and the hairy head image 402 of B to obtain the bald head image 403 including multiple key points of A and the hairy head image 404 including multiple key points of B; the deformation parameters are determined based on the coordinates of each key point in the bald head image 403 and the hairy head image 404.

[0078] In this embodiment of the disclosure, key point matching is performed on the first facial image and the second facial image. Based on the coordinates of multiple first key points and multiple second key points, deformation parameters from the second facial image to the first facial image are determined. Since the deformation parameters are the deformation parameters from the second facial image to the first facial image, the target deformed image obtained after deformation processing of the target image corresponding to the second object based on the deformation parameters can fit the first object in the first facial image better.

[0079] In step S222, the second facial image is segmented to obtain the target image of the region where the target element is located.

[0080] The target image of the region where the target element is located refers to an image that includes the outline of the region where the target element is located.

[0081] For example, continue to refer to Figure 4 The image 402 of B with hair is segmented to obtain the image 405 of the area where the hair is located.

[0082] In step S223, the target image is deformed based on the deformation parameters to obtain the target deformed image.

[0083] In this embodiment of the disclosure, the coordinates of multiple pixels in the target image can be obtained, and the coordinates of the multiple pixels and the deformation parameters can be substituted into the affine transformation formula for calculation to obtain the coordinates of each pixel after deformation; the target deformed image can be obtained based on the coordinates of each pixel after deformation.

[0084] In this embodiment of the disclosure, since the deformation parameter is the deformation parameter from the second facial image to the first facial image, after the target image corresponding to the second object is deformed based on the deformation parameter, the obtained target deformed image fits the first object in the first facial image better.

[0085] For example, continue to refer to Figure 4 Based on the deformation parameters, the image 405 of the hair region is deformed to obtain the deformed hair region image 406, in which the hair in the deformed hair region image 406 is more in line with the head shape of A.

[0086] In step S224, the first facial image and the target deformation image are fused to obtain a first fused image of the first object with the target elements.

[0087] In this embodiment of the disclosure, the first facial image and the target deformed image are fused (or combined) and the target deformed image is superimposed on the first facial image to obtain a first fused image of the first object with the target element.

[0088] For example, continue to refer to Figure 4 The deformed hair region image 406 and the bald head image 401 of A are combined to obtain the hair image 407 of A.

[0089] In this embodiment, deformation parameters are determined based on a first facial image and a second facial image; the second facial image is segmented to obtain a target image of the region where the target element is located; the target image is deformed based on the deformation parameters to obtain a target deformed image; the first facial image and the target deformed image are fused to obtain a first fused image of the first object containing the target element; since the deformation parameters are determined based on the second facial image and the first facial image, after deforming the target image corresponding to the second object based on the deformation parameters, the obtained target deformed image fits the first object in the first facial image better, making the target element and the first object in the first fused image obtained by fusing the first facial image and the target deformed image fit better, thereby improving the accuracy of subsequent image restoration model training and making the image output by the image restoration model more realistic.

[0090] Figure 5 This is a flowchart illustrating another method for training an image restoration model according to an exemplary embodiment.

[0091] refer to Figure 5 The above Figure 2 Based on the embodiments, the training method for the image restoration model provided in this disclosure may further include the following steps.

[0092] In this embodiment of the disclosure, a first facial image of a first object can be paired with facial images of multiple other objects having target elements, thereby generating a large amount of pairing data for training an image restoration model.

[0093] The facial images of other objects may include the second facial image of the second object in the above embodiments and the third facial image of the third object described below, and may also include facial images of multiple other objects; in the following examples, the other objects include the second object and the third object as examples. In practical applications, there may be a large number of other objects and facial images of other objects, and this disclosure does not limit this.

[0094] In step S510, a third facial image of the third object is obtained, wherein the third facial image has target elements, and the representation of the target elements in the third facial image is different from that in the second facial image.

[0095] In this embodiment of the disclosure, when the target element is a hair element, the different representation of the target element in the third facial image and the different representation of the target element in the second facial image refers to the different hairstyle of the hair in the third facial image and the different hairstyle of the hair in the second facial image.

[0096] For example, the third facial image of the third object is the facial image of C, where the second object B has long curly hair and the third object C has short straight hair.

[0097] In step S520, the first facial image and the third facial image are subjected to image fusion processing to obtain a second fused image of the first object with the target elements, and the first facial image and the second fused image are used as paired data to train the image restoration model.

[0098] In this embodiment of the present disclosure, the process of "obtaining a second fused image of a first object having target elements based on a first facial image and a third facial image" is similar to the process of "obtaining a first fused image of a first object having target elements based on a first facial image and a second facial image" in step S220 above. The process of "training an image restoration model using the first facial image and the second fused image as paired data" is similar to the process of "training an image restoration model using the first facial image and the first fused image as paired data" in step S230 above. For details, please refer to the textual description of the above embodiments.

[0099] Specifically, the deformation parameters (which can be called the second deformation parameters) for transforming the third facial image into the first facial image can be determined based on the first facial image and the third facial image; the third facial image can be segmented to obtain a target image (which can be called the second target image) of the region where the target element of the third facial image is located; the second target image can be deformed based on the second deformation parameters to obtain a second target deformed image; the first facial image and the second target deformed image can be fused to obtain a second fused image of the first object with the target element; the first facial image and the second fused image can be used as paired data (which can be called the second paired data) to train the image restoration model.

[0100] For example, a small number of bald images and a large number of images with hair can be paired and combined to generate a large amount of reasonable training data, avoiding the problem of insufficient training data caused by a small number of bald images.

[0101] In this embodiment of the disclosure, the first facial image of a first object that does not have the target element can be paired with facial images of multiple other objects that have the target element (including the second facial image of the second object and the third facial image of the third object), which can generate a large amount of pairing data (including the first pairing data generated based on the first facial image and the second facial image, and the second pairing data generated based on the first facial image and the third facial image) for training the image restoration model. This avoids the problem of insufficient training data due to the limited number of image data that does not have the target element, and improves the accuracy of image restoration model training.

[0102] Figure 6 This is a flowchart illustrating an image restoration method according to an exemplary embodiment. Figure 6 The application process of the image segmentation model after it has been trained using the method provided in the above embodiments is illustrated.

[0103] like Figure 6 As shown, the image restoration method provided in this disclosure embodiment may include the following steps.

[0104] In step S610, a facial image to be processed containing the target elements is acquired.

[0105] refer to Figure 7 For example, obtaining a facial image 701 with hair elements; obtaining a facial image 703 with beard elements; and obtaining a facial image 705 with headscarf elements.

[0106] In step S620, the facial image to be processed is segmented to obtain the target region where the target element is located.

[0107] For example, the face image 701 containing hair elements is segmented to obtain the area where the hair is located; as another example, the face image 703 containing beard elements is segmented to obtain the area where the beard is located; and as yet another example, the face image 705 containing headscarf elements is segmented to obtain the area where the headscarf is located.

[0108] In step S630, the target area in the face image to be processed is filled with a preset color.

[0109] The preset color can be set according to actual needs; for example, the preset color is gray.

[0110] For example, gray fills the area where the hair is in the face image 701 to be processed; gray fills the area where the beard is in the face image 703 to be processed; and gray fills the area where the headscarf is in the face image 705 to be processed.

[0111] In step S640, the filled facial image to be processed is input into the image inpainting model trained according to the method provided in any of the above embodiments to obtain a facial image with the target elements removed.

[0112] In this embodiment of the disclosure, the filled facial image to be processed is input into an image inpainting model. The image inpainting model can automatically repair the target region in the facial image to be processed and generate a facial image without the target element.

[0113] For example, the filled facial image 701 is input into the image restoration model to obtain a facial image 702 with hair removed; the filled facial image 703 is input into the image restoration model to obtain a facial image 704 with beard removed; and the filled facial image 705 is input into the image restoration model to obtain a facial image 706 with headscarf removed.

[0114] The image restoration method provided in this disclosure can automatically segment the face image after acquiring the face image containing the target element, obtain the target region where the target element is located, fill the target region in the face image with a preset color, and then automatically restore the target region in the face image to generate a face image without the target element. This method can automatically remove the target element in the input image while keeping other parts unchanged, and automatically restore the target region in the face image to generate a more realistic face image.

[0115] It should also be understood that the above is only to help those skilled in the art better understand the embodiments of this disclosure, and is not intended to limit the scope of the embodiments of this disclosure. Those skilled in the art can obviously make various equivalent modifications or changes based on the examples given above. For example, some steps in the above methods may be unnecessary, or new steps may be added, etc. Alternatively, any combination of any two or more of the above embodiments may be used. Such modifications, changes, or combinations also fall within the scope of the embodiments of this disclosure.

[0116] It should also be understood that the above description of the embodiments of this disclosure focuses on highlighting the differences between the various embodiments. Similarities or differences not mentioned can be referred to each other, and for the sake of brevity, they will not be repeated here.

[0117] It should also be understood that the sequence number of each process does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this disclosure.

[0118] It should also be understood that, in the various embodiments of this disclosure, unless otherwise specified or in case of logical conflict, the terminology and / or descriptions between different embodiments are consistent and can be referenced by each other, and the technical features in different embodiments can be combined to form new embodiments according to their inherent logical relationships.

[0119] The foregoing has detailed examples of training methods for the image restoration model provided in this disclosure. It is understood that, in order to implement the above functions, the computer device includes corresponding hardware structures and / or software modules for performing each function. Those skilled in the art should readily recognize that, based on the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein, this disclosure can be implemented in hardware or a combination of hardware and computer software. Whether a function is implemented in hardware or by computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this disclosure.

[0120] The following are embodiments of the apparatus disclosed herein, which can be used to execute embodiments of the method disclosed herein. For details not disclosed in the apparatus embodiments of this disclosure, please refer to the embodiments of the method disclosed herein.

[0121] Figure 8 This is a block diagram illustrating a training apparatus for an image restoration model according to an exemplary embodiment. (Refer to...) Figure 8 The device 800 may include an acquisition module 810, an acquisition module 820, and a training module 830.

[0122] The acquisition module 810 is configured to acquire a first facial image of a first object and a second facial image of a second object, wherein the first facial image does not contain the target element and the second facial image contains the target element; the acquisition module 820 is configured to perform image fusion processing on the first facial image and the second facial image to obtain a first fused image of the first object containing the target element; the training module 830 is configured to input the first fused image into an initial image restoration model to obtain a predicted image with the target element removed; the training module is further configured to train the initial image restoration model based on the predicted image and the first facial image, so as to determine the trained initial image restoration model as the image restoration model.

[0123] In some exemplary embodiments of this disclosure, the obtaining module 820 is configured to perform: determining deformation parameters from the second facial image to the first facial image based on the first facial image and the second facial image; performing segmentation processing on the second facial image to obtain a target image of the region where the target element is located; performing deformation processing on the target image based on the deformation parameters to obtain a target deformed image; and performing fusion processing on the first facial image and the target deformed image to obtain a first fused image of the first object having the target element.

[0124] In some exemplary embodiments of this disclosure, the obtaining module 820 is configured to perform: key point matching on the first facial image and the second facial image to obtain a plurality of first key points in the first facial image and a plurality of second key points in the second facial image, wherein the plurality of first key points and the plurality of second key points correspond one-to-one; and determine deformation parameters from the second facial image to the first facial image based on the coordinates of the plurality of first key points and the coordinates of the plurality of second key points.

[0125] In some exemplary embodiments of this disclosure, the training module 830 is configured to perform: inputting the first fused image into the initial image restoration model, wherein the initial image restoration model performs completion processing on the region where the target element is located in the first fused image and outputs a predicted image with the target element removed; adjusting the model parameters of the initial image restoration model according to the difference between the predicted image and the first facial image until the difference between the predicted image generated by the initial image restoration model and the first facial image meets a preset condition, completing the training of the initial image restoration model, and using the trained initial image restoration model as the image restoration model.

[0126] In some exemplary embodiments of this disclosure, the acquisition module 810 is configured to acquire a third facial image of a third object, wherein the third facial image has the target element, and the representation of the target element in the third facial image is different from the representation of the target element in the second facial image; the acquisition module 820 is configured to perform: image fusion processing on the first facial image and the third facial image to obtain a second fused image of the first object having the target element, and use the first facial image and the second fused image as paired data to train the image restoration model.

[0127] Figure 9 This is a block diagram illustrating an image restoration apparatus according to an exemplary embodiment. (Refer to...) Figure 9 The device 900 may include an acquisition module 910, a obtaining module 920, and a filling module 930.

[0128] The acquisition module 910 is configured to acquire a facial image to be processed containing a target element; the acquisition module 920 is configured to perform segmentation processing on the facial image to be processed to obtain the target region where the target element is located; the filling module 930 is configured to fill the target region in the facial image to be processed with a preset color; the acquisition module 920 is further configured to input the filled facial image to be processed into an image inpainting model trained by any of the above methods to obtain a facial image with the target element removed.

[0129] It should be noted that the block diagrams shown in the above figures are functional entities and do not necessarily correspond to physically or logically independent entities. These functional entities can be implemented in software, in one or more hardware modules or integrated circuits, or in different network and / or processor terminal devices and / or microcontroller terminal devices.

[0130] Regarding the apparatus in the above embodiments, the specific manner in which each module performs its operation has been described in detail in the embodiments related to the method, and will not be elaborated upon here.

[0131] The following reference Figure 10 To describe an electronic device 1000 according to such an embodiment of the present disclosure. Figure 10 The electronic device 1000 shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments disclosed herein.

[0132] like Figure 10 As shown, the electronic device 1000 is manifested in the form of a general-purpose computing device. The components of the electronic device 1000 may include, but are not limited to: at least one processing unit 1010, at least one storage unit 1020, a bus 1030 connecting different system components (including storage unit 1020 and processing unit 1010), and a display unit 1040.

[0133] The storage unit stores program code, which can be executed by the processing unit 1010 to perform the steps described in the "Exemplary Methods" section of this specification according to various exemplary embodiments of this disclosure. For example, the processing unit 1010 can perform actions such as... Figure 2 The steps shown are as follows.

[0134] For example, electronic devices can achieve such Figure 2 The steps shown.

[0135] Storage unit 1020 may include a readable medium in the form of a volatile storage unit, such as a random access memory unit (RAM) 1021 and / or a cache memory unit 1022, and may further include a read-only memory unit (ROM) 1023.

[0136] Storage unit 1020 may also include a program / utility 1024 having a set (at least one) program module 1025, such program module 1025 including but not limited to: operating system, one or more application programs, other program modules and program data, each or some combination of these examples may include an implementation of a network environment.

[0137] Bus 1030 can represent one or more of several types of bus structures, including a memory cell bus or memory cell controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local bus using any of the multiple bus structures.

[0138] Electronic device 1000 can also communicate with one or more external devices 1070 (e.g., keyboard, pointing device, Bluetooth device, etc.), one or more devices that enable a user to interact with electronic device 1000, and / or any device that enables electronic device 1000 to communicate with one or more other computing devices (e.g., router, modem, etc.). This communication can be performed via input / output (I / O) interface 1050. Furthermore, electronic device 1000 can also communicate with one or more networks (e.g., local area network (LAN), wide area network (WAN), and / or public networks, such as the Internet) via network adapter 1060. As shown, network adapter 1060 communicates with other modules of electronic device 1000 via bus 1030. It should be understood that, although not shown in the figures, other hardware and / or software modules can be used in conjunction with electronic device 1000, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems.

[0139] From the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein can be implemented by software or by combining software with necessary hardware. Therefore, the technical solutions according to the embodiments of this disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (such as a CD-ROM, USB flash drive, external hard drive, etc.) or on a network, including several instructions to cause a computing device (such as a personal computer, server, terminal device, or network device, etc.) to execute the methods according to the embodiments of this disclosure.

[0140] In an exemplary embodiment, a computer-readable storage medium including instructions is also provided, such as a memory including instructions that can be executed by a processor of the device to perform the described method. Optionally, the computer-readable storage medium may be a ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc.

[0141] In an exemplary embodiment, a computer program product is also provided, including a computer program / instructions, which, when executed by a processor, implement the training method of the image restoration model in the above embodiments.

[0142] Other embodiments of this disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this disclosure are indicated by the following claims.

[0143] It should be understood that this disclosure is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this disclosure is limited only by the appended claims.

Claims

1. A training method for an image restoration model, characterized in that, include: Acquire a first facial image of a first object and a second facial image of a second object, wherein the first facial image does not contain the target element, and the second facial image contains the target element; The first facial image and the second facial image are subjected to image fusion processing to obtain a first fused image of the first object having the target element; The first fused image is input into the initial image inpainting model to obtain a predicted image with the target element removed; Based on the predicted image and the first facial image, the initial image restoration model is trained, and the trained initial image restoration model is determined as the image restoration model; The process of performing image fusion processing on the first facial image and the second facial image to obtain a first fused image of the first object having the target elements includes: Based on the first facial image and the second facial image, determine the deformation parameters from the deformation of the second facial image to the deformation of the first facial image; The second facial image is segmented to obtain a target image of the region where the target element is located; The target image is deformed based on the deformation parameters to obtain a target deformed image. The first facial image and the target deformed image are fused together to obtain a first fused image of the first object having the target elements.

2. The training method for the image restoration model according to claim 1, characterized in that, Based on the first facial image and the second facial image, determine the deformation parameters from the deformation of the second facial image to the deformation of the first facial image, including: Key point matching is performed on the first facial image and the second facial image to obtain multiple first key points in the first facial image and multiple second key points in the second facial image, wherein the multiple first key points and the multiple second key points correspond one-to-one; Based on the coordinates of the plurality of first key points and the coordinates of the plurality of second key points, the deformation parameters from the deformation of the second facial image to the deformation of the first facial image are determined.

3. The training method for the image restoration model according to claim 1, characterized in that, The first fused image is input into the initial image inpainting model to obtain a predicted image with the target element removed, including: The first fused image is input into the initial image inpainting model, which performs completion processing on the region where the target element is located in the first fused image and outputs a predicted image with the target element removed. Based on the predicted image and the first facial image, the initial image restoration model is trained, and the trained initial image restoration model is determined as the image restoration model, including: Based on the difference between the predicted image and the first facial image, the model parameters of the initial image restoration model are adjusted until the difference between the predicted image generated by the initial image restoration model and the first facial image meets the preset conditions, thus completing the training of the initial image restoration model, and the trained initial image restoration model is used as the image restoration model.

4. The training method for the image restoration model according to claim 1, characterized in that, The method further includes: Obtain a third facial image of a third object, wherein the third facial image has the target element, and the representation of the target element in the third facial image is different from the representation of the target element in the second facial image; The first facial image and the third facial image are subjected to image fusion processing to obtain a second fused image of the first object with the target elements, and the first facial image and the second fused image are used as paired data to train the image restoration model.

5. An image restoration method, characterized in that, include: Obtain the facial image to be processed containing the target elements; The facial image to be processed is segmented to obtain the target region where the target element is located; The target region in the facial image to be processed is filled with a preset color; The filled facial image to be processed is input into the image inpainting model trained by the method according to any one of claims 1-4 to obtain a facial image with the target element removed.

6. A training device for an image restoration model, characterized in that, include: The acquisition module is configured to acquire a first facial image of a first object and a second facial image of a second object, wherein the first facial image does not contain the target element and the second facial image contains the target element; The obtaining module is configured to perform image fusion processing on the first facial image and the second facial image to obtain a first fused image of the first object having the target element; The training module is configured to input the first fused image into an initial image inpainting model to obtain a predicted image with the target element removed; The training module is also configured to train the initial image restoration model based on the predicted image and the first facial image, so as to determine the trained initial image restoration model as the image restoration model; The obtaining module is further configured to perform the following operations: determining deformation parameters from the second facial image to the first facial image based on the first facial image and the second facial image; performing segmentation processing on the second facial image to obtain a target image of the region where the target element is located; and performing deformation processing on the target image based on the deformation parameters to obtain a target deformed image. The first facial image and the target deformed image are fused together to obtain a first fused image of the first object having the target elements.

7. An image restoration device, characterized in that, include: The acquisition module is configured to acquire a face image to be processed containing the target elements; The acquisition module is configured to perform segmentation processing on the face image to be processed to obtain the target region where the target element is located; The fill module is configured to fill the target region in the face image to be processed with a preset color; The obtaining module is further configured to input the filled facial image to be processed into an image inpainting model trained by the method according to any one of claims 1-4, to obtain a facial image with the target element removed.

8. An electronic device, characterized in that, include: processor; Memory used to store the processor's executable instructions; The processor is configured to execute the executable instructions to implement the training method of the image restoration model as described in any one of claims 1 to 4 or the image restoration method as described in claim 5.

9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform a training method for an image restoration model as described in any one of claims 1 to 4 or an image restoration method as described in claim 5.

Citation Information

Patent Citations

Information extraction method and device, equipment and medium
CN114782696A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

Information extraction method and device, equipment and medium