Image processing device, image processing method, and program

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
The image processing device enhances image synthesis by determining the front-to-back relationship between clothing and human body parts, addressing issues of unnaturalness and accuracy in existing techniques, resulting in high-quality composite images without manual intervention.

JP7880759B2Active Publication Date: 2026-06-26NTT DATA JAPAN CORP

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Patents
Current Assignee / Owner: NTT DATA JAPAN CORP
Filing Date: 2022-07-08
Publication Date: 2026-06-26

AI Technical Summary

Technical Problem

Existing image synthesis techniques face challenges in creating a large number of images with high accuracy and naturalness, particularly when combining 3D clothing data with 2D human images, often requiring manual corrections due to issues like invisible clothing areas and unnatural back-of-the-neck lining.

Method used

An image processing device that generates 3D clothing avatars, determines the front-to-back relationship between clothing and human body parts using edge information, and synthesizes 2D clothing images with 2D human data, incorporating shadow images to enhance naturalness and accuracy.

Benefits of technology

Reduces the need for manual corrections by improving the naturalness and accuracy of composite images, ensuring that clothing areas are accurately positioned and visible, while eliminating unnatural artifacts.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 0007880759000001
Figure 0007880759000002
Figure 0007880759000003

Patent Text Reader

Abstract

To eliminate the need of correction by human hands by reducing unnaturalness in a composite image.SOLUTION: An image processing device divides an image of the face and whole hands of a model image for composite into areas surrounded by edges on the basis of edge information of a mask image of the face and hands of the model image for composite and edge information of a clothing image for composite, calculates a concordance rate of corresponding parts between a second three-dimensional clothing avatar and the image of the face and whole hands of the model image for composite divided into the areas in each divided area, and generates a face and hand image for composite that has determined the context of the clothing and a human body. The image processing device generates a shadow image in reflecting setting data associated with the model image for composite in a first three-dimensional clothing avatar to execute rendering, overlaps the clothing image for composite on the model image for composite and outputs a first clothing model image, overlaps the face and hand image for composite on the first clothing model image, further overlaps the shadow image, and generates a final clothing model image.SELECTED DRAWING: Figure 2

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention relates to an image processing apparatus, an image processing method, and a program. More specifically, the present invention relates to an image processing apparatus, an image processing method, and a program that dress a 3D avatar with 3D clothing data created by a user in a 3D CG environment, convert only the clothing image into 2D data therefrom, automatically synthesize the 2D data of the clothing image and the 2D data of a human, and output the synthesized image as a 2D data wearing image.

Background Art

[0002] Conventionally, in the apparel industry, advertising industry, etc., a real model has been dressed in clothes and photographed, and the image has been used for promotion and marketing. However, such a method has a high human, financial, and time load when generating one image and is not suitable for generating many images.

[0003]

[0004] Therefore, technologies such as 3D avatars and GANs (Generative Adversarial Networks) are being used. An avatar is a character used as a user's alter ego on the network, and there are 2D (two-dimensional) image avatars and 3D (three-dimensional) avatars. GAN is a method of generating an image by learning using two neural networks. With the technology of GAN, a computer can generate an endless number of similar images as if it had photographed a real object. [Prior art documents] [Patent Documents]

[0005] [Patent Document 1] Japanese Patent Publication No. 2011-186774 [Overview of the Initiative] [Problems that the invention aims to solve]

[0006] Because there are limitations to the number of images that can be created using methods that rely on human hands for image synthesis, many companies are researching image synthesis techniques that automatically dress human images in 3D clothing data.

[0007] Patent Document 1 was known as an image synthesis technique that combines clothing data with mannequin images. However, even when using the technique in Patent Document 1, there was still a problem of unnaturalness remaining in the combined image of clothing images and model images. For example, when combining an image of a human body from the neck up with an image of clothing, the lining of the back of the neck would be displayed, and other problems of unnaturalness remained in the synthesized image.

[0008] Furthermore, when processing 3D clothing data to automatically dress a 2D image of a person, conventional technologies frequently encountered accuracy problems, such as the clothing area that should be on top of the person becoming invisible. This presented technical challenges, requiring manual corrections by humans in the end.

[0009] The present invention was made to solve these problems, and aims to provide an image processing device, an image processing method, and a program that dress a 3D avatar in a 3DCG environment using 3D clothing data created by a user, convert only the clothing image into 2D data, automatically synthesize the 2D clothing image data with 2D human data, and output a 2D image of the clothing being worn. [Means for solving the problem]

[0010] An image processing apparatus according to one aspect of the present invention is: A means for generating a first 3D clothing avatar, a second 3D clothing avatar, and a clothing image for synthesis using setting data, a 3D avatar, and 3D clothing data associated with a model image for synthesis, Means for generating a composite image of a human body part with the front-to-back relationship between the clothing and the human body, by dividing the image of the entire human body part of the composite model into regions enclosed by edges based on the edge information of the mask image of the human body part of the composite model associated with the composite model image and the edge information of the clothing image, and by calculating the agreement rate of the corresponding parts between the second 3D clothing avatar and the image of the entire human body part of the composite model image divided into the region, thereby determining the front-to-back relationship between the clothing and the human body, A means for generating a shadow image when rendering is performed on the first 3D clothing avatar, reflecting the setting data associated with the composite model image, A means for outputting a first clothing model image by superimposing the clothing image for synthesis onto the aforementioned model image for synthesis, A means for generating a final clothing model image by superimposing the composite image of exposed human body parts onto the first clothing model image, and further superimposing the shadow image. It is configured to include the following: [Effects of the Invention]

[0011] According to the present invention, by using two types of edge images, 3D image data, and a model image, the unnaturalness that occurred in conventional composite images is reduced, eliminating the need for manual correction. [Brief explanation of the drawing]

[0012] A detailed understanding of the embodiments disclosed herein can be obtained from the following description illustrated in relation to the accompanying drawings. [Figure 1] This is an overall configuration diagram of an image processing system 1 including an image processing device 10, a user terminal 11, a 3D scanner 12, and an imaging device 13 according to an embodiment of the present invention. [Figure 2] FIG. 1 is a diagram for explaining an outline of processing executed by an image processing apparatus 10, a user terminal 11, and an imaging apparatus 13 according to an embodiment of the present invention. [Figure 3] FIG. 4 is a system configuration diagram of an image processing apparatus 10 according to an embodiment of the present invention. [Figure 4] FIG. 7 is a diagram showing an example of a data structure of actual shooting data 106 according to an embodiment of the present invention. [Figure 5] FIG. 10 is a diagram showing an example of a data structure of a 3D avatar 108 according to an embodiment of the present invention. [Figure 6] FIG. 13 is a diagram showing an example of a data structure of a 3D dressed avatar 109 according to an embodiment of the present invention. [Figure 7] FIG. 16 is a diagram showing an example of a data structure of a synthetic clothing image 110 for synthesis according to an embodiment of the present invention. [Figure 8] FIG. 19 is a diagram showing a processing flow in which the image processing apparatus 10 generates a 3D dressed avatar and a synthetic clothing image. [Figure 9] FIG. 22 is a diagram showing a processing flow in which the image processing apparatus 10 generates a synthetic face / hand image for which the front-back relationship between the clothing and the human body has been determined. [Figure 10] FIG. 25 is a diagram showing a processing flow in which the image processing apparatus 10 generates a shadow image. [Figure 11] FIG. 28 is a diagram showing a processing flow in which the image processing apparatus 10 generates a final dressed image. [Figure 12] (a) is a diagram showing an image for extracting edge information from a face and hand mask image and a synthetic clothing image, and (b) is a diagram showing an image obtained by combining two pieces of edge information. [Figure 13] (a) is a diagram showing an example in which the hand portion of a synthetic model image is divided into several regions surrounded by edges, and (b) is a diagram showing an example of a hand image of the prior art and the present invention of the generated synthetic face / hand image.

BEST MODE FOR CARRYING OUT THE INVENTION

[0013] (Overall Configuration) FIG. 1 is an overall configuration diagram of an image processing system 1 including an image processing apparatus 10, a user terminal 11, a 3D scanner 12, and an imaging apparatus 13 according to an embodiment of the present invention. The image processing apparatus 10 is communicably connected to the user terminal 11 and the imaging apparatus 13 via a network 14. The user terminal 11 is communicably connected to the 3D scanner 12 via an arbitrary network such as a LAN or a WAN. Although FIG. 1 shows only one image processing apparatus 10, one user terminal 11, one 3D scanner 12, and one imaging apparatus 13 for simplicity of explanation, a plurality of these may exist.

[0014] In this specification, the image processing apparatus 10 is described as one apparatus or system, but various processes executed by the image processing apparatus 10 may be configured to be distributed and executed by a plurality of apparatuses or systems.

[0015] The image processing apparatus 10 executes the image synthesis process described in this specification. More specifically, the image processing apparatus 10 dresses a 3D avatar with 3D clothing data created by a user in a 3DCG environment, converts only the clothing image into 2D data therefrom, automatically synthesizes the 2D data of the clothing image and the 2D data of a human, and outputs a worn image of the 2D data.

[0016] The user terminal 11 can be any type of device (for example, a PC, a tablet terminal, etc.) that can operate in a wired or wireless environment used by the user, but is not limited to a specific apparatus or device. The user terminal 11 can generate 3D clothing data using a third-party application and generate 3D avatar data using a 3D scanner 12 or the like. The user terminal 11 can transmit the 3D clothing data and the 3D avatar data to the image processing apparatus 10, transmit various instructions regarding the image synthesis process via an application provided by the image processing apparatus 10, and receive a synthesis result from the image processing apparatus 10.

[0017] The 3D scanner 12 is a device that has the function of generating 3D avatar data in response to instructions from the user terminal 11.

[0018] The imaging device 13 is a device for taking live-action photographs of a real model, and is a device that captures images of the model using one or more cameras, and may include any studio equipment. In order to make the captured image data of the model easier to identify, the floor, walls, etc. of the imaging location may be any background such as a blue screen or a green screen.

[0019] Network 14 is any communication network responsible for communication between the image processing device 10, the user terminal 11, the 3D scanner 12, and the imaging device 13, and includes, but is not limited to, the Internet, an intranet, a dedicated line, or any network system.

[0020] (Functional configuration of the image processing device 10) The image processing device 10 uses images of real models and various setting data to dress a 3D avatar in 3DCG using 3D clothing data created by the user, then converts only the clothing image into 2D data, automatically combines the 2D clothing image data with 2D human data, and outputs a 2D image of the avatar wearing the clothing.

[0021] The various functions provided by the image processing device 10 will be described below with reference to Figure 2. Figure 2 is a diagram illustrating the overview of the processing performed by the image processing device 10, user terminal 11, and imaging device 13 according to an embodiment of the present invention. In Figure 2, S1 is performed by the imaging device 13, S2 is performed by the user terminal 11, and S3 to S6 are performed by the image processing device 10. Although the embodiments described herein use tops (upper garments) as an example of clothing, it should be understood that the present invention is also applicable to other types of clothing (for example, bottoms).

[0022] (S1: Processing details when imaging a real model using the imaging device 13) The user uses the imaging device 13 to photograph a real-world model. As described later, the model image of the real-world model is used as a composite model image in the image synthesis process described later. The composite model image is 2D image data.

[0023] The imaging device 13 transmits the composite model image, along with camera setting data (camera angle, distance, etc.) and lighting setting data (brightness, etc.) from the time of imaging to the image processing device 10. The image processing device 10 stores the composite model image, camera setting data, and lighting setting data received from the imaging device 13 into the actual shooting data 106.

[0024] (S2: Processing details of 3D clothing data and 3D avatar generation by user terminal 11) The user terminal 11 has third-party applications that generate 3D clothing data and communicate with the 3D scanner 12 to generate a 3D avatar in the same pose as a real-world model. The 3D avatar's pose can be the same as the real-world model's at this stage, or it can be set to a basic pose and changed to the same pose during the S3 processing described later.

[0025] The user terminal 11 transmits 3D clothing data and a 3D avatar to the image processing device 10, and the image processing device 10 stores the received 3D clothing data and 3D avatar in 3D clothing data 107 and 3D avatar 108, respectively.

[0026] (S3: Process for generating 3D clothing avatars and clothing images for synthesis) The image processing device 10 provides the user terminal 11 with an application for generating 3D clothing avatars and clothing images for synthesis. In response to instructions from the user terminal 11, the image processing device 10 outputs a 2D image of only the clothing (clothing image for synthesis) and two types of 3D clothing avatars for use in the synthesis process.

[0027] The image processing device 10 reads 3D clothing data from 3D clothing data 107 and reads a 3D avatar from 3D avatar 108 in response to instructions from the user terminal 11. The image processing device 10 can also read a model image from live-action shooting data 106 in response to instructions from the user terminal 11 and change the pose of the 3D avatar to the same pose as the read model image. The image processing device 10 overlays the 3D clothing data onto the 3D avatar in 3DCG (computer graphics) space in response to instructions from the user terminal 11, performs predetermined position calculations to adjust the size and position of the 3D clothing data, and places the 3D clothing data in the appropriate position on the 3D avatar. Through this process, the 3D clothing data is made to appear as clothing on the 3D avatar.

[0028] In response to instructions from the user terminal 11, the image processing device 10 performs a cross-simulation of 3D clothing data in a 3DCG space on a 3D avatar wearing 3D clothing data, according to the body shape and pose of the 3D avatar, and stores the first 3D clothing avatar in the 3D clothing avatar 109.

[0029] The image processing device 10 applies camera setting data and lighting setting data read from live-action shooting data 106 to the first 3D clothed avatar in the 3DCG space, performs 3D rendering processing using predetermined shader setting parameters, and stores the second 3D clothed avatar in the 3D clothed avatar 109.

[0030] The image processing device 10 generates a 2D clothing image (also called a "composite clothing image") based on the 3D clothing data excluding the 3D avatar, and stores it in the composite clothing image 110.

[0031] (S4: Generation of face and hand images for synthesis) The user generates face and hand mask images from the composite model image mechanically or manually using any application, such as by binarization. That is, the image processing device 10 generates face and hand mask images from the composite model image received from the imaging device 13 in response to a mask image generation instruction from the user terminal 11. The image processing device 10 extracts edge information of the pre-generated face and hand mask images using any filter. The embodiments described are explained using a top as an example of the type of clothing, so the parts of the human body that are exposed when worn (exposed body parts) are the face and / or hands, but please note that if other types of clothing are worn, the exposed body parts may change depending on the type (for example, in the case of bottoms, the feet and / or ankles).

[0032] The image processing device 10 extracts keypoints of the face and hands from the model image for synthesis and sets a search area using the extracted keypoints as bounding boxes. The image processing device 10 searches for edges while moving from one side of the 3D face or hand to the other side (for example, from the left outer part to the right outer part), and continues searching for edges until it reaches the other side of the face or hand and turns back. The image processing device 10 extracts images of the entire face and hands from the range of edges searched.

[0033] The image processing device 10 performs a depth information extraction process to extract depth information (also called "depth information") from the first 3D clothing avatar.

[0034] The image processing device 10 extracts edge information from the garment image for synthesis using an arbitrary filter. If the image processing device 10 were to extract the edges of the garment image for synthesis directly, wrinkles and other textures would become noise in the edges, so it can extract edge information using depth information.

[0035] The image processing device 10 combines the edge information of the face and hand mask images with the edge information of the garment image for synthesis, and divides the entire face and hands of the model image for synthesis into several images based on the combined image. For each divided region of the entire face and hands of the model image for synthesis, the image processing device 10 calculates the degree of agreement with the corresponding region in the second 3D clothing avatar, and finally extracts the face and hand images (face and hand images for synthesis) to be synthesized.

[0036] (S5: Processing to generate a shadow image) The image processing device 10 generates a shadow image as a shadow image when rendering is performed on the first 3D clothing avatar read from the 3D clothing avatar 109, reflecting the camera setting data and lighting setting data used when generating the composite model image, and stores it in the 3D clothing avatar 109.

[0037] (S6: Clothing synthesis process to generate the final clothing image) The image processing device 10 performs a first clothing synthesis process, which outputs a first clothing model image by superimposing a clothing image for synthesis onto a model image for synthesis. The image processing device 10 then performs a second clothing image generation process, which generates a final clothing model image by superimposing a face and hand image for synthesis onto the first clothing model image, and further superimposing a shadow image onto it.

[0038] (System configuration of the image processing device 10) Next, the system configuration of the image processing device 10 will be described. Figure 3 is a system configuration diagram of the image processing device 10 according to an embodiment of the present invention. The image processing device 10 may be configured to be located on a cloud system or on an internal network. As shown in Figure 3, the image processing device 10 includes a control unit 101, a main memory unit 102, an auxiliary storage unit 103, an interface (IF) unit 104, and an output unit 105, which are interconnected by a bus 120 or the like, similar to a general computer. The auxiliary storage unit 103 stores programs that implement each function of the image processing device 10, and data handled by those programs. The auxiliary storage unit 103 includes, in the form of a file / database, real-life shooting data 106, 3D clothing data 107, 3D avatar 108, 3D clothing avatar 109, and clothing images 110 for synthesis. The image processing device 10 can read or update information stored in the live-action shooting data 106, 3D clothing data 107, 3D avatar 108, 3D clothing avatar 109, and composite clothing image 110. Each program stored in the auxiliary storage unit 103 is executed by the image processing device 10.

[0039] The control unit 101, also known as the central processing unit (CPU), controls each component of the image processing device 10 and performs data calculations. It also reads various programs stored in the auxiliary storage unit 103 into the main memory unit 102 and executes them. The main memory unit 102, also known as the main memory, stores various received data, computer-executable instructions, and data after calculations performed by those instructions. The auxiliary storage unit 103 is a storage device such as a hard disk drive (HDD) or solid state drive (SSD) that stores data and programs for the long term.

[0040] The embodiment shown in Figure 3 describes an embodiment in which the control unit 101, main memory unit 102, and auxiliary storage unit 103 are located inside the same computer. However, in other embodiments, the image processing device 10 can be configured to achieve parallel distributed processing by multiple computers by using multiple control units 101, main memory unit 102, and auxiliary storage unit 103. In another embodiment, it is also possible to set up multiple servers for the image processing device 10, and have multiple servers share a single auxiliary storage unit 103.

[0041] The IF unit 104 acts as an interface for sending and receiving data with other systems and devices, and also provides an interface for receiving various commands and input data (various masters, tables, etc.) from the system operator. The output unit 105 provides a display screen for displaying the processed data and printing means for printing the data.

[0042] Similar components to those in the control unit 101, main memory unit 102, auxiliary memory unit 103, IF unit 104, and output unit 105 are also present in the user terminal 11 and imaging device 13.

[0043] The live-action shooting data 106 stores model images (2D image data) of a real model, mask images of the real model's face and hands, and camera setting data and lighting setting data used during live-action shooting. Figure 4 shows an example of the data structure of the live-action shooting data 106 according to an embodiment of the present invention. The live-action shooting data 106 may include, but is not limited to, a live-action shooting ID 401, model image 402, mask image 403, camera setting data 404, and lighting setting data 405, and may also include other data items.

[0044] The live-action shooting ID 401 is an identifier that identifies the model and the data associated with that model during live-action shooting. The model image 402 is 2D model image data of the real model, also called the "model image for compositing". The mask image 403 is a mask image of the model's face and hands generated from the model image for compositing. The camera setting data 404 indicates the camera setting data during live-action shooting, such as camera angle and distance. The lighting setting data 405 indicates the lighting setting data during live-action shooting, such as brightness.

[0045] Returning to Figure 3, the 3D clothing data 107 stores user-generated 3D clothing data. The 3D clothing data may be stored associated with attribute information (e.g., clothing category, color, shape, etc.) to facilitate image selection.

[0046] The 3D avatar 108 stores data for a 3D avatar generated by the user. The 3D avatar is created by the user so that it is in the same pose as the model image used during live-action filming. Figure 5 shows an example of the data structure of the 3D avatar 108 according to an embodiment of the present invention. The 3D avatar 108 may include, but is not limited to, a 3D avatar ID 501, a 3D avatar ID 502, and a live-action filming ID 401, and may also include other data items.

[0047] 3D Avatar ID 501 is an identifier that identifies the 3D avatar. 3D Avatar ID 502 indicates the data of the 3D avatar. Live-Action Shooting ID 401 is an identifier that identifies the live-action shooting associated with the 3D avatar. Live-Action Shooting ID 401 associates the pose, mask image, and camera and lighting settings data of the corresponding live-action model with the 3D avatar.

[0048] Returning to Figure 3, the 3D clothing avatar 109 stores image data of the 3D clothing avatar obtained by overlaying 3D clothing data onto the 3D avatar and performing a predetermined process. Figure 6 shows an example of the data structure of the 3D clothing avatar 109 according to an embodiment of the present invention. The 3D clothing avatar 109 may include a 3D clothing avatar ID 601, a first 3D clothing avatar 602, a second 3D clothing avatar 603, shadow information 604, shadow image 605, a 3D avatar ID 501, and a live-action shooting ID 401, but is not limited to these data items and may include other data items as well.

[0049] 3D clothing avatar ID 601 is an identifier that identifies the 3D clothing avatar generated by the image processing device 10. The first 3D clothing avatar 602 shows image data of a 3D clothing avatar that has undergone 3D cloth simulation. The second 3D clothing avatar 603 shows image data of a 3D clothing avatar that has undergone 3D rendering processing by reflecting camera setting data and lighting setting data to the first 3D clothing avatar. Shadow information 604 and shadow image 605 show shadow information and shadow image, respectively, that are generated when rendering is performed by reflecting the camera setting data and lighting setting data used when generating a composite model image for the first 3D clothing avatar. 3D avatar ID 501 is an identifier for identifying the 3D avatar from which the 3D clothing avatar was generated, and live-action shooting ID 401 is an identifier that identifies the live-action shooting associated with the 3D avatar. 3D avatar ID 501 and live-action shooting ID 401 make it easier to obtain various setting data from real-world model shooting.

[0050] Returning to Figure 3, the composite clothing image 110 stores 2D clothing data generated based on the 3D clothing data of the second 3D clothing avatar. Figure 7 shows an example of the data structure of the composite clothing image 110 according to an embodiment of the present invention. The composite clothing image 110 may include, but is not limited to, a composite clothing image ID 701, a composite clothing image 702, a live-action photo ID 401, and a 3D clothing avatar ID 601, and may include other data items as well.

[0051] The garment image ID 701 for synthesis is an identifier that identifies 2D garment image data used in the image synthesis process according to the embodiment of the present invention. The garment image 702 for synthesis indicates 2D garment image data used in the image synthesis process. The live-action shooting ID 401 is an identifier that identifies live-action shooting associated with the 3D clothing avatar from which the garment image for synthesis was generated. The 3D clothing avatar ID 601 is an identifier of the 3D clothing avatar associated with the 3D garment data that is the source data for the garment image for synthesis.

[0052] (Explanation of various flows) Referring to Figures 8 to 11, the processing flow of the image processing device 10 generating the final clothing model image using the clothing image for synthesis (2D), the model image for synthesis (2D), various setting data, the 3D avatar, and the 3D clothing data will be explained. Figures 8 to 11 show the processing contents of S3 to S6 in Figure 2, respectively. Processing S4 and S5 may be performed in any order.

[0053] (S3: Process for generating 3D clothing avatars and clothing images for synthesis) Figure 8 shows the processing flow in which the image processing device 10 generates a 3D clothing avatar and a composite clothing image using the data generated by the processing described above, with reference to S1 and S2 in Figure 2, namely the composite model image, various setting data, 3D avatar, and 3D clothing data.

[0054] In the process shown in Figure 8, the image processing device 10 provides the user terminal 11 with an application for generating a 3D clothing avatar image and performs processing based on user instructions received via the user terminal 11.

[0055] In S801, the user terminal 11 selects the composite model image, 3D clothing data, and 3D avatar to be processed through the provided application and sends a selection instruction to the image processing device 10. In response to the selection instruction from the user terminal 11, the image processing device 10 reads the selected model image from the live-action shooting data 106, reads the 3D clothing data from the 3D clothing data 107, and reads the selected 3D avatar from the 3D avatar 108. The image processing device 10 can change the pose of the 3D avatar to match the pose of the read model image. Through this process, the pose of the model image and the pose of the 3D avatar match, the model image and the 3D avatar become associated, and the selected live-action shooting ID 401 is stored in the 3D avatar 108.

[0056] The user terminal 11 sends a placement instruction to the image processing device 10 to position 3D clothing data in the appropriate location on the 3D avatar within the application's 3DCG space. In response to the placement instruction from the user terminal 11, the image processing device 10 overlays the 3D clothing data onto the 3D avatar in the 3DCG space, performs a predetermined position calculation, adjusts the size and position of the 3D clothing data, and positions the 3D clothing data in the appropriate location on the 3D avatar. Through this process, the 3D clothing data is "dressed" on the 3D avatar.

[0057] In S802, the user terminal 11 sends a simulation instruction to the image processing device 10 to perform a cloth simulation on a 3D avatar wearing 3D clothing data. Cloth simulation refers to a technology that physically simulates the movement of cloth such as clothing. For example, it performs physical calculations of cloth, such as simulating the state of wrinkles in clothing that form when a 3D avatar wears it.

[0058] In response to a simulation instruction from the user terminal 11, the image processing device 10 performs a cross-simulation of the 3D clothing data on a 3D avatar wearing 3D clothing data in a 3DCG space, according to the body shape and pose of the 3D avatar, and stores the cross-simulated 3D clothing avatar in the first 3D clothing avatar 602 of the 3D clothing avatar 109.

[0059] In S803, the image processing device 10 reads camera setting data and lighting setting data associated with the model image selected in S801 from the live-action shooting data 106. The image processing device 10 applies the read camera setting data and lighting setting data to the cloth-simulated 3D clothing avatar (first 3D clothing avatar) in the application's 3DCG space, performs 3D rendering using predetermined shader setting parameters, and stores the rendered 3D clothing avatar (second 3D clothing avatar) in the second 3D clothing avatar 603 of the 3D clothing avatar 109.

[0060] In step S804, the image processing device 10 extracts 3D clothing data by removing the 3D avatar from the 3D rendered 3D clothing avatar (second 3D clothing avatar). Based on the extracted 3D clothing data, the image processing device 10 generates a 2D clothing image (referred to as a "composite clothing image" in this specification) and stores it in the composite clothing image 110.

[0061] (S4: Process for generating face and hand images for synthesis) Figure 9 shows the processing flow in which the image processing device 10 generates composite face and hand images with the front-to-back relationship between the clothing and the human body determined, using face and hand mask images, composite clothing images, and a 3D clothing avatar. It is assumed that, as a prerequisite for this processing flow, the image processing device 10 communicates with the user terminal 11 through an arbitrary application to generate face and hand mask images from the composite model images. Furthermore, in this specification, the term "hand" is used to refer to the area from the shoulder to the fingertips, the wrist, the palm, or the fingers of the hand, but these may vary depending on the design of the clothing.

[0062] In S901, the image processing device 10 extracts keypoints of the face and hands from the composite model image associated with the real-life shooting ID 401 to be processed, and sets a search range using the extracted keypoints as a bounding box. The image processing device 10 searches for edges while moving from one side of the face or hand to the other side (for example, from the left outer part to the right outer part), and continues searching for edges until it reaches the other side of the face or hand and turns back. The image processing device 10 extracts images of the entire face and hands of the composite model image from the searched edge range.

[0063] The image processing device 10 extracts edge information from pre-generated face and hand mask images using an arbitrary filter. The upper part of Figure 12(a) shows an image of extracting edge information from face and hand mask images.

[0064] In step S902, the image processing device 10 reads the first 3D clothing avatar 602 from the 3D clothing avatar 109 and performs depth information extraction processing on the read first 3D clothing avatar 602. Through this process, the image processing device 10 can obtain depth information for each position of the first 3D clothing avatar 602. The depth information makes it possible to distinguish between wrinkles and outlines in the clothing.

[0065] In S903, the image processing device 10 extracts edge information from the garment image for synthesis using an arbitrary filter. Since directly extracting edge information from the garment image for synthesis could result in noise such as wrinkles caused by the texture of the clothing, the image processing device 10 can extract edge information from the garment image for synthesis using the depth information acquired in S902. The lower part of Figure 12(a) shows an image of extracting edge information from the garment image for synthesis.

[0066] Furthermore, the order in which processing S901 and processing S902 and S903 are performed does not matter, and there is no particular limit to the order in which they are performed. That is, processing S902 and S903 may be performed after processing S901, or processing S901 may be performed after processing S902 and S903. Alternatively, both may be processed simultaneously.

[0067] In S904, the image processing device 10 combines the edge information of the garment image for synthesis with the edge information of the face and hand mask images. Figure 12(b) shows an image combining the two sets of edge information.

[0068] In S905, the image processing device 10 divides the images of the face and hands of the composite model image extracted in S901 into regions enclosed by edges, based on the edge information combined in S904. Figure 13(a) shows an example of dividing the hand portion of the composite model image into several regions enclosed by edges.

[0069] In S906, the image processing device 10 reads the second 3D clothed avatar 603 from the 3D clothed avatar 109 and compares the corresponding parts (for example, the thumbs of the left hands of both) of the read second 3D clothed avatar 603 and the images of the face and hands of the composite model image, which have been divided into several regions, for each divided region. The image processing device 10 calculates the agreement rate between the two and determines that the parts where the agreement rate is equal to or greater than a predetermined threshold (X) are the parts that are actually visible. Based on the determination results for each region, the image processing device 10 extracts the final composite face and hand images by designating certain regions of the face and hands of the composite model image as visible parts or invisible parts. In the example in Figure 13(b), the thumb of the left hand had an agreement rate below the threshold (X), so the image processing device 10 performs a process that excludes the image of this thumb from the final composite face and hand images.

[0070] The threshold (X) can be changed based on depth information. Therefore, the image processing device 10 can change the threshold (X) for each position based on the depth information for each position acquired in S902. As a result, the value of the threshold (X) can change for each divided region enclosed by edges.

[0071] In S907, the image processing device 10 generates a composite face and hand image based on the final composite face and hand images for each region extracted in S906. Figure 13(b) shows examples of the generated composite face and hand image using the conventional method and the hand image of the present invention. As shown in Figure 13(b), the conventional general image synthesis process does not perform the matching judgment described above, so the thumb is visible. In an actual pose, the thumb should be hidden behind the folds of the clothing and not visible, resulting in an unnatural image. On the other hand, when the matching judgment according to the present invention is performed, the thumb is hidden behind the folds of the clothing and not visible. The image processing device 10 determines that the matching rate for this thumb is less than the threshold (X), and therefore does not include this thumb in the composite face and hand image as it is an invisible part.

[0072] (S5: Processing to generate a shadow image) Figure 10 shows the processing flow of the image processing device 10, which generates a shadow image based on the shadow information generated when rendering is performed on a cloth-simulated 3D clothing avatar (first 3D clothing avatar) by reflecting the camera setting data and lighting setting data used when generating the composite model image.

[0073] In S1001, the image processing device 10 reads the first 3D clothing avatar 602 from the 3D clothing avatar 109 based on the live-action shooting ID 401 to be processed. The image processing device 10 also queries the live-action shooting data 106 based on the live-action shooting ID 401 and reads the corresponding camera setting data 404 and lighting setting data 405.

[0074] In step S1002, the image processing device 10 applies the corresponding camera setting data 404 and lighting setting data 405 to the read-out first 3D clothing avatar 602 and performs rendering, calculating whether or not light is hitting it and performing shading.

[0075] In step S1003, the image processing device 10 generates a shaded image based on the shaded information, which is the result of the shading calculation. The image processing device 10 stores the shaded information and the shaded image in the shaded information 604 and shaded image 605 of the 3D clothing avatar 109, respectively.

[0076] (S6: Clothing synthesis process to generate the final clothing image) Figure 11 shows the processing flow for generating a final clothing model image by performing a first clothing synthesis process, which outputs a first clothing model image by superimposing a clothing image for synthesis onto a model image for synthesis associated with the live-action shooting ID 401 to be processed, and a second clothing synthesis process, which generates a final clothing model image by superimposing a face and hand image for synthesis onto the first clothing model image, which is the output of the first clothing synthesis process, and then superimposing a shadow image.

[0077] In S1101, the image processing device 10 performs a first clothing synthesis process. More specifically, the image processing device 10 reads a model image 402 from the live-action shooting data 106 based on the live-action shooting ID 401 to be processed, and uses the live-action shooting ID 401 to query the clothing image for synthesis 110 and reads a clothing image for synthesis 702. The image processing device 10 overlays the clothing image for synthesis onto the model image for synthesis to generate a first clothing model image.

[0078] In S1102, the image processing device 10 performs a second clothing synthesis process. More specifically, the image processing device 10 generates a final clothing model image by superimposing a face and hand image for synthesis onto the generated first clothing model image, and then superimposing a shadow image onto it. The image processing device 10 provides the generated final clothing model image to the user terminal 11.

[0079] (Advantages of the present invention) The above-described process enables the image processing device 10 to perform image synthesis processing while more precisely estimating the front-to-back relationship between the person and the clothing. According to the present invention, problems such as the difficulty in determining the front-to-back relationship between the person and the clothing, and the resulting low accuracy of the output image, such as the loss of clothing areas that should be on top of the person, are resolved.

[0080] (Other embodiments) Although the above explanation has used humans as an example, the principle of the present invention is also applicable to animals other than humans. Recently, clothing for animals kept as pets has become available for sale. By using the principle of the present invention to create composite images for such pet clothing, these composite images can be used in advertising and marketing.

[0081] Furthermore, while the human face and hands have been used as examples above, depending on the type of clothing, the principles of the present invention can be applied to parts of the human body other than the face and hands to generate composite images. For example, the parts of the human body that are exposed when wearing clothing differ depending on whether the clothing is a top or bottom. In the case of a top, the exposed parts of the human body may be the face and / or hands, and in the case of bottoms, the exposed parts of the human body may be the hands and / or ankles. In this specification, "exposed human body parts" refers to the face, hands, feet, ankles, etc., depending on the type of clothing.

[0082] Furthermore, although the principle of the present invention was explained above using the human body and clothing as objects, the objects are not limited to the human body and clothing. For example, the objects may be a human body and a vehicle (car, motorcycle, bicycle, etc.). Moreover, the number of objects may be three or more. In relation to the above example, it is also possible to generate a composite image with small items such as accessories and bags as the third object. In other words, the present invention makes it possible to increase the accuracy of the output composite image even when the spatial relationships of multiple objects are intricately intertwined.

[0083] Although the principles of the present invention have been described above with reference to exemplary embodiments, those skilled in the art will understand that various embodiments with modifications in configuration and details can be realized without departing from the spirit of the invention. That is, the present invention can take the form of, for example, a system, apparatus, method, program, or storage medium. [Explanation of Symbols]

[0084] 1. Image Processing System 10 Image Processing Device 11 User terminals 12 3D scanners 13 Imaging device 14 Networks 101 Control Unit 102 Main memory 103 Auxiliary storage 104 Interface (IF) section 105 Output section 106 Live-action shooting data 107 3D clothing data 108 3D Avatars 109 3D clothing avatars 110 Clothing images for synthesis

Claims

1. A means for generating a first 3D clothing avatar, a second 3D clothing avatar, and a clothing image for synthesis using setting data, a 3D avatar, and 3D clothing data associated with a model image for synthesis, Means for generating a composite image of a human body part with the front-to-back relationship between the clothing and the human body, by dividing the image of the entire human body part of the composite model into regions enclosed by edges based on the edge information of the mask image of the human body part of the composite model associated with the composite model image and the edge information of the composite clothing image, and by calculating the agreement rate of the corresponding parts between the second 3D clothing avatar and the image of the entire human body part of the composite model image divided into the region, thereby determining the front-to-back relationship between the clothing and the human body, A means for generating a shadow image when rendering is performed on the first 3D clothing avatar by reflecting the setting data associated with the composite model image, A means for outputting a first clothing model image by superimposing the clothing image for synthesis onto the aforementioned model image for synthesis, A means for generating a final clothing model image by superimposing the composite image of exposed human body parts onto the first clothing model image, and further superimposing the shadow image. Equipped with an image processing device.

2. A means for generating composite images of exposed human body parts, with the front-to-back relationship between clothing and the human body already determined, Means for determining, when the calculated agreement rate is equal to or greater than a threshold, that a corresponding portion of the image of the entire human body part expressed in the composite model image divided into the region is a visible portion, wherein the visible portion is a portion visible over clothing. The image processing apparatus according to claim 1, further comprising the features of claim 1.

3. The system further comprises means for acquiring depth information of the first 3D clothing avatar, The edge information of the garment image for synthesis is extracted using the depth information. The image processing apparatus according to claim 2.

4. The image processing apparatus according to claim 3, wherein the threshold associated with the region enclosed by the edge is different based on the depth information.

5. A means for extracting key points of exposed human body parts from the aforementioned composite model image, and setting a search area using the extracted key points as a bounding box, A means for searching for edges in the composite model image, moving from one side of the exposed human body part to the other side, and continuing the edge search until reaching the other side of the exposed human body part and turning back, Means for extracting an image of the entire human body part expressed in the composite model image from the range of edges explored, The image processing apparatus according to claim 1, further comprising the above.

6. If the type of clothing is a top, the exposed body parts are the face and / or hands. If the type of clothing is bottoms, the exposed body parts are the feet and / or ankles. The image processing apparatus according to claim 1.

7. An image processing method performed by an image processing device, Using setting data, a 3D avatar, and 3D clothing data associated with a model image for synthesis, a first 3D clothing avatar, a second 3D clothing avatar, and a clothing image for synthesis are generated. Based on the edge information of the mask image of the exposed human body part associated with the composite model image and the edge information of the composite clothing image, the image of the entire exposed human body part of the composite model image is divided into regions enclosed by edges, and for each divided region, the matching rate of the corresponding parts between the second 3D clothing avatar and the image of the entire exposed human body part of the composite model image divided into the region is calculated to generate a composite image of the exposed human body part with the front-to-back relationship between the clothing and the human body already determined. To generate a shadow image when rendering is performed on the first 3D clothing avatar, reflecting the setting data associated with the composite model image, The process involves overlaying the garment image for synthesis onto the aforementioned garment model image to output a first garment model image. The first clothing model image is superimposed with the composite image of exposed human body parts, and then the shadow image is superimposed to generate the final clothing model image. An image processing method comprising:

8. A program that, when executed, causes an image processing device to perform the method described in claim 7.