Deep learning method based on UV-space triangle reprojection for error detection and correction of three-dimensional face scan mesh
The method enhances error detection in 3D face scans by normal mapping and reprojecting triangles into a rectangle for U-net learning, addressing inefficiencies in existing deep learning methods and achieving efficient error correction and restoration in UV space.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- KOREA ELECTRONICS TECH INST
- Filing Date
- 2024-12-13
- Publication Date
- 2026-06-18
AI Technical Summary
Existing deep learning-based methods struggle with inefficient training and error detection in high-resolution texture and normal data defined in UV space of 3D meshes, particularly in 3D face scans, due to irregular error distribution and high empty data ratios, making it difficult to create natural-looking data similar to actual captured data.
A method involving normal mapping, triangle reprojection into a predefined rectangle, and using a U-net deep learning network to classify and correct errors in UV space by learning from feature maps, enabling efficient training and restoration of missing or erroneous data.
Improves error detection performance by reducing training time and naturally reconstructing or restoring empty parts, allowing for fine-tuning on images of the same size and distinguishing between valid and incorrect data effectively.
Smart Images

Figure KR2024096892_18062026_PF_FP_ABST
Abstract
Description
UV Space Triangle Reprojection-Based Deep Learning Method for Error Detection and Correction of 3D Face Scan Meshes
[0001] The present invention relates to a machine learning method capable of detecting errors in mesh data, and specifically, to a deep learning method capable of improving error detection performance through learning by reprojecting triangles in UV space that have been normal-mapped to the mesh.
[0002] This invention is a study conducted by the Korea Electronics Technology Institute under the research project title "Development of Sound-based Photorealistic 3D Face Generation Technology" for the "Development of Core Technologies for Immersive Content" project, with support from the Ministry of Science and ICT (Project No. 2022-0-00058-001, Unique Project No. 1711160477).
[0003] Normal mapping is a method of expressing detailed undulations or textures within a polygon by representing the values of texels applied to the polygon surface as normal vectors. Referring to FIG. 1, U and V vector values are provided along axes named T (Tangent) and B (Binormal) that correspond to object space represented by XYZ location coordinates (or local coordinates). In the illustrated example, vertices at three locations P1, P2, and P3 have texture coordinate values (U1, V1), (U2, V2), and (U3, V3), and the two edges of the triangle E1 and E2 can be represented as ΔU1, ΔU2, ΔV1, and ΔV2. The xyz local space and the tangent space or UV space can be converted to each other using linear determinants.
[0004] When generating texture and normal data in a defined format within a mesh-based UV space using 3D reconstruction technology on images of faces scanned by a multi-view camera, highly detailed and realistic data that is nearly photographic can be obtained. However, errors may exist in some data due to errors in the captured data or errors in merging multi-view data. Although interpolation using adjacent pixel data is generally used to correct these errors, there was a problem in that it was difficult to create natural-looking data that is similar to data created based on actual captured data.
[0005] Existing deep learning-based image segmentation and inpainting can produce results quickly by learning from large amounts of data, but there were some tricky or inefficient aspects when applied to tasks such as finding or correcting errors within texture and normal data defined in the UV space of a 3D mesh.
[0006] First, in the case of photorealistic texture and normal data, the image resolution is very high, some spaces contain empty data, and the ratio of valid texture and normal data to error data is very high, so it is inefficient to train the network by feeding the entire dataset as input at once.
[0007] In addition, after the scanning and 3D reconstruction process, it is impossible to know where errors will occur, and because the range and shape of the error data generated in UV space are irregular, a large amount of actual scan data had to be secured to build the dataset for training the deep learning network.
[0008] The present invention originates from the above understanding, and the objective of the present invention is to provide a method that can improve error detection performance through learning by reprojecting triangles in UV space that have been normal-mapped to a mesh.
[0009] A machine learning method for error detection of data modeled with a mesh according to an embodiment of the present invention for achieving the above objectives comprises the steps of: preparing normal and texture data by normal mapping the mesh; selecting a triangle in UV space by the normal mapping; generating transformed data normalized by the normal and texture data into a rectangle by reprojecting the selected triangle and three adjacent triangles into a predefined rectangle; and inputting the transformed data into a deep learning network.
[0010] Here, the transformation data may include 3-channel normal data reprojected into the rectangle, 3-channel texture data, 1-channel triangle mask data identifying triangles, and 1-channel position mask data representing parts of the mesh.
[0011] Here, the deep learning network may be a U-net.
[0012] And, the above mesh may be a mesh that has been 3D modeled by scanning a face.
[0013] At this time, the machine learning method can learn the classification of valid or incorrect normal data from the feature map extracted at the bottleneck at the end of the contracting path of the above U-net.
[0014] And, the step of inputting the transformed data into a deep learning network includes the step of deleting the normal data of the selected triangle of the transformed data and inputting it into the deep learning network, and the deleted normal data can be learned from the output of the last layer of the expansion path of the U-net.
[0015] Meanwhile, a method for determining and restoring errors in a 3D face scan mesh using an AI model trained by the machine learning method described above according to one embodiment of the present invention comprises the steps of inputting transformation data into the AI model, in which one triangle and three adjacent triangles are re-projected into a predefined rectangle from normal and texture data normal mapped of the 3D face scan mesh, and estimating validity or error from a feature map extracted from the deepest layer (bottleneck) of the AI model.
[0016] Furthermore, the above method may include the steps of deleting normal data of a target triangle estimated as an error by the AI model and inputting it into the AI model, estimating normal data of the target triangle by the AI model, and generating a restored image by inserting the estimated normal data.
[0017] Meanwhile, a machine learning computing system for error detection of data modeled with a mesh according to one embodiment of the present invention includes an input unit that receives normal and texture data normalized by normal mapping of the mesh, and a processor that selects a triangle in UV space by the normal mapping, reprojects the selected triangle and three adjacent triangles into a predefined rectangle to generate transformed data normalized by the rectangle, and inputs the transformed data into a deep learning network to perform machine learning.
[0018] In this case, the transformation data may include 3-channel normal data reprojected into the rectangle, 3-channel texture data, 1-channel triangle mask data identifying the triangle, and 1-channel position mask data representing the part of the mesh.
[0019] In addition, the deep learning network is a U-net, and the mesh may be a mesh that has been 3D modeled by scanning a face.
[0020] In this case, the processor can perform an operation to learn the classification of valid or incorrect normal data from a feature map extracted at the end of the contracting path of the U-net, delete the normal data of the selected triangle of the transformed data, input it into the deep learning network, and perform an operation to learn the deleted normal data from the output of the last layer of the expansion path of the U-net.
[0021] The present invention improves classification performance by reprojecting input data of a deep learning network into a square of a fixed size to distinguish between valid and incorrect data, and enables fine-tuning by utilizing a model trained on different images of the same size, thereby reducing training time.
[0022] Furthermore, the present invention can delete error data and very naturally reconstruct or restore empty or lost parts from surrounding information.
[0023] FIG. 1 is a diagram illustrating conventional normal mapping;
[0024] FIG. 2 is a block diagram showing the configuration of a computing system according to an embodiment of the present invention;
[0025] FIG. 3 is a block diagram showing a specific configuration of a computing system according to an embodiment of the present invention;
[0026] FIG. 4 is a block diagram showing a software module stored in the storage unit of the computing system of FIG. 3;
[0027] FIG. 5 is a diagram illustrating the input data transformation of a deep learning network according to an embodiment of the present invention;
[0028] FIG. 6 is a diagram illustrating the learning of a deep learning network according to an embodiment of the present invention;
[0029] FIG. 7 is a drawing for explaining two embodiments utilizing a model trained with a deep learning network according to one embodiment of the present invention;
[0030] FIG. 8 is a flowchart illustrating a machine learning method for error detection of data modeled with a mesh according to an embodiment of the present invention; and,
[0031] Figure 9 is a flowchart illustrating a method for determining and restoring errors in a 3D face scan mesh using an AI model trained by a machine learning method.
[0032] The present invention will be described in more detail below with reference to the drawings. Furthermore, in describing the present invention, detailed descriptions of related known functions or configurations are omitted if it is determined that such detailed descriptions would unnecessarily obscure the essence of the invention. Additionally, the terms described below are defined considering their functions in the present invention, and these may vary depending on the intentions or relationships of the user or operator. Therefore, their definitions should be based on the content throughout this specification.
[0033]
[0034] FIG. 2 is a block diagram showing the configuration of a computing system according to one embodiment of the present invention.
[0035] Referring to FIG. 2, a machine learning computing system (100) for error detection of data modeled as a mesh includes an input unit (110) and a processor (120).
[0036] The input unit (110) receives normal and texture data that has been normal-mapped from a mesh. The input unit (110) may be configured as a communication unit connected to another device of the computing system (100) or a camera scanner. Alternatively, the input unit (110) may be a serial port that can be connected to a storage driver. The computing system (100) may be a single computer or a number of connected computers.
[0037] A mesh can be a set of polygons of triangles connecting three points (vertices) according to the normal mapping method.
[0038] The processor (120) controls the components of the computing system (100).
[0039] The processor (120) performs operations for a rectangle transformation on a target triangle. Specifically, the processor (120) selects a triangle in UV space by normal mapping. The processor (120) reprojects the selected triangle and three adjacent triangles into a predefined rectangle. Through this, the processor (120) generates transformation data in which normal and texture data are normalized into a rectangle.
[0040] Referring to FIG. 5, FIG. 5a shows the normal texture map of the mesh. FIG. 5b selects a triangle on one cheek of the mesh. Looking at the enlarged view in FIG. 5c, the three points ABC of the selected triangle and the three adjacent triangles ABD, BCE, and ACF that share three sides (edges) are projected together with triangle ABC into the rectangle of FIG. 5d. In the illustrated embodiment, the rectangle is a square, and points D and F are projected onto the centers of two sides of the rectangle, and point E is projected onto a vertex. An appropriate projection matrix for the vector space transformation operation for triangle-to-rectangle reprojection can be used. This mapping of four triangles into a rectangular space can be described as normalizing the normal and texture data into a uniform rectangular image in that it transforms irregular triangles of different sizes into a rectangle of a uniform size.
[0041] The processor (120) can perform machine learning by inputting transformed data into a deep learning network. For machine learning, the processor (120) may include a processor dedicated to machine learning. Alternatively, it may be performed by a processor of a dedicated computer included in the computing system (100).
[0042] Referring to FIG. 6, the input data or transformation data for machine learning consists of a total of 8 channels and includes 3 channels of normal data reprojected into a rectangle, 3 channels of texture data, 1 channel of triangle mask data identifying a triangle, and 1 channel of position mask data representing a part of the mesh.
[0043] The deep learning network that performs machine learning is U-net. U-net is a deep learning network that is a more advanced form of CNN and has excellent performance in image segmentation. U-net includes a contracting path that downsamples (encodes) images using convolution blocks, an expanding path that upsamples (decodes) the downsampled images, and a bottleneck, which is a transition section (bridge) connecting the two paths.
[0044] The original mesh input for machine learning may be a mesh that has been 3D modeled or normal mapped by scanning a face. However, depending on the purpose of the invention and the target of error detection, other original data may be input into the deep learning network.
[0045] The classification of valid or invalid normal data can be distinguished by learning feature maps extracted from the bottleneck at the end of the contracting path of a U-net. A layer such as a softmax that selects one of two classes can be added after the bottleneck for this learning.
[0046] If the normal data of any triangle is lost or erroneous, training can be performed to estimate the normal data of the target triangle from adjacent triangles in order to delete the erroneous data and reconstruct or restore the empty normal data.
[0047] To this end, the normal data of valid triangles of the transformed data is deleted and input into a deep learning network, and learning to estimate the deleted normal data from the output of the last layer of the expansion path of the U-net can be performed.
[0048] Subsequently, the processor (120) can use the AI model generated by the learning described above to input transformation data into the AI model, in which one triangle and three adjacent triangles are re-projected into a predefined rectangle from the normal mapped normal and texture data of a 3D face scan mesh, and perform an operation to estimate validity or error from the feature map extracted from the deepest layer of the AI model, i.e., the bottleneck.
[0049] And, the processor (120) can delete the normal data of the target triangle determined to be an error and input it into an AI model, estimate the normal data of the target triangle by the AI model, and perform an operation to generate a restored image by inserting the estimated normal data.
[0050]
[0051] FIG. 3 is a block diagram showing the specific configuration of a computing system according to one embodiment of the present invention.
[0052] Referring to FIG. 3, the computing system (100) includes an input unit (110), a processor (120), and a storage unit (130). Here, the description of the input unit (110) and the processor (120) is based on the description of the same components previously described in FIG. 2, and redundant descriptions are omitted.
[0053] The processor (120) controls the components of the computing system (100) overall.
[0054] The processor (120) includes RAM (121), ROM (122), main CPU (123), graphics processing unit (124), first to n interfaces (125-1 to 125-n), and a bus (126).
[0055] RAM (121), ROM (122), main CPU (123), graphics processing unit (124), first to n interfaces (125-1 to 125-n), etc. can be connected to each other via a bus (126).
[0056] The first to n interfaces (125-1 to 125-n) are connected to various components (110, 120, 130, 150, 160, 170). One of the interfaces may be a network interface connected to an external device via a network. For example, the input unit (110) may be connected via short-range wireless transmission communication or a Bluetooth link, and the storage unit (130) may be provided to a cloud server via broadband communication.
[0057] The main CPU (123) accesses the storage unit (130) and performs booting using the O / S stored in the storage unit (130). Then, it performs various operations for automatic focus adjustment and content control of the display system (100) using various programs, content, data, etc. stored in the storage unit (130).
[0058] A set of instructions for booting the system is stored in the ROM (122). When a turn-on command is input and power is supplied, the main CPU (123) copies the O / S stored in the storage unit (350) to the RAM (121) according to the instructions stored in the ROM (122), and executes the O / S to boot the system. When booting is complete, the main CPU (123) copies various application programs stored in the storage unit (120) to the RAM (121), and executes the application programs copied to the RAM (121) to perform various operations.
[0059] The graphics processing unit (124) generates a screen containing various objects such as icons, images, and text using a calculation unit (not shown) and a rendering unit (not shown). The calculation unit (not shown) calculates attribute values such as coordinate values, shape, size, and color for each object to be displayed according to the layout of the screen based on a received control command. The rendering unit (not shown) generates a screen of various layouts containing objects based on the attribute values calculated by the calculation unit (not shown).
[0060] In particular, the graphics processing unit (124) can implement objects generated by the main CPU (123) into a GUI (Graphic User Interface), icon, user interface screen, etc.
[0061] Meanwhile, the storage unit (130) stores at least one software module for controlling the display system (100).
[0062] FIG. 4 is a diagram showing the configuration of a storage unit (130) in which a software module for realizing the function of the computing system (100) of FIG. 3 is stored.
[0063] Referring to FIG. 5, the storage unit (130) includes a deep learning network module (131), a triangle-to-square conversion module (132), an error detection module (133), and a restoration module (134).
[0064] The deep learning network module (131) can be executed by the processor (120) to provide the learning algorithm of the deep learning network described above. For example, the deep learning network module (131) can call the triangle-to-rectangle transformation module (132) and provide learning to classify the validity or error of the normal data of the target triangle or to estimate the normal data of the target triangle.
[0065] The triangle-to-square transformation module (132) is executed by the processor (120) and can provide transformation of data to be provided to the learning network or U-net provided by the deep learning network module (131) or reprojection from triangle to square.
[0066] The error detection module (133) is executed by the processor (120) and enables the detection of validity or error in the normal data of a triangle using an AI model generated according to the deep learning network module (131). In one example, the validity or error in the normal data of a target triangle can be estimated from the feature map of the deepest layer of the U-net.
[0067] The restoration module (134) is executed by the processor (120) and can output estimated normal data that fills empty normal data of the target triangle or replaces error data using an AI model generated according to the deep learning network module (131).
[0068] FIG. 7 is a diagram illustrating two embodiments utilizing a model trained with a deep learning network according to one embodiment of the present invention.
[0069] Referring to FIG. 7, in network 1 of FIG. 7a, the input is transformation data of a rectangle, and the target triangle has normal data. Network 1 outputs the validity or error of the target triangle of the corresponding input.
[0070] Network 2 of Fig. 7b can follow Network 1. Network 2 receives input of transformed data of a rectangle from which normal data determined to be an error has been deleted. Network 2 outputs a rectangle filled with normal data of an empty target triangle from the input.
[0071]
[0072] FIG. 8 is a flowchart illustrating a machine learning method for error detection in data modeled with a mesh according to an embodiment of the present invention.
[0073] Referring to FIG. 8, a machine learning method for error detection of data modeled with a mesh includes a step (S810) of preparing normal and texture data by normal mapping the mesh. The preparation step (S810) may vary depending on the target for learning, and in one example, the preparation step (S810) may include preprocessing that reconstructs a 3D image of a large amount of face scan data using a normal mapping method.
[0074] The machine learning method includes a step (S820) of selecting a triangle in UV space by normal mapping. The selection step (S820) may select two or more triangles for parallel processing depending on the performance of the processor for high-speed processing.
[0075] Next, the machine learning method includes the step (S830) of generating transformed data in which the normal and texture data are normalized into the rectangle by reprojecting the selected triangle and three adjacent triangles into the predefined rectangle.
[0076] The mesh may be a mesh that has been 3D modeled by scanning a face, and the transformation data may include 3-channel normal data reprojected into the rectangle, 3-channel texture data, 1-channel triangle mask data identifying triangles, and 1-channel position mask data indicating parts of the mesh.
[0077] And, the machine learning method includes the step (S840) of inputting the transformed data into a deep learning network.
[0078] The machine learning method can be performed using U-net, and can learn the classification of valid or incorrect normal data from the feature map extracted at the bottleneck at the end of the contracting path of U-net.
[0079] Meanwhile, the input step (S840) may include the step of deleting the normal data of the selected triangle of the transformed data and inputting it into the deep learning network. In this case, the machine learning method can learn the deleted normal data from the output of the last layer of the expansion path of the U-net.
[0080]
[0081] Figure 9 is a flowchart illustrating a method for determining and restoring errors in a 3D face scan mesh using an AI model trained by a machine learning method.
[0082] Referring to FIG. 9, the error detection and restoration method for a 3D face scan mesh first includes the step (S910) of preparing a trained AI model. Here, the AI model is generated by the machine learning method of FIG. 8.
[0083] Next, the error determination and restoration method for a 3D face scan mesh includes the step (S920) of inputting transformation data into the AI model, in which one triangle and three adjacent triangles are re-projected into a predefined rectangle from the normal mapped normal and texture data of the 3D face scan mesh.
[0084] And, the error determination and restoration method of a 3D face scan mesh includes a step (S320) of estimating validity or error from a feature map extracted from the deepest layer (bottleneck) of the AI model.
[0085] As a follow-up to the target triangle of the input data presumed to be an error, a method for generating restoration data for the normal data of the error, damage, loss, or deleted target triangle region may follow.
[0086] Specifically, the method for error detection and restoration of a 3D face scan mesh may further include the steps of deleting normal data of a target triangle estimated as an error by the AI model and inputting it into the AI model, estimating normal data of the target triangle by the AI model, and generating a restored image by inserting the estimated normal data.
[0087]
[0088] Meanwhile, a non-transitory computer-readable medium may be provided that stores a machine learning method for detecting errors in data modeled with a mesh according to the present invention and a program for sequentially performing a method for determining and restoring errors in a 3D face scan mesh using an AI model learned by the machine learning method.
[0089] A non-transient readable medium refers to a medium that stores data semi-permanently and can be read by a device, rather than a medium that stores data for a short moment, such as a register, cache, or memory. Specifically, the various applications or programs described above may be stored and provided on non-transient readable media such as CDs, DVDs, hard disks, Blu-ray discs, USBs, memory cards, and ROMs.
[0090] Furthermore, although preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above. It is understood that various modifications can be made by those skilled in the art without departing from the essence of the invention as claimed in the claims, and such modifications should not be understood individually from the technical spirit or perspective of the present invention.
Claims
1. In a machine learning method for error detection of data modeled as a mesh, Step of preparing normal and texture data by normal mapping the mesh; A step of selecting a triangle in UV space by the above normal mapping; A step of generating transformed data in which the normal and texture data are normalized to the rectangle by reprojecting the selected triangle and three adjacent triangles onto a predefined rectangle; and A machine learning method comprising the step of inputting the above-mentioned transformed data into a deep learning network.
2. In Paragraph 1, A machine learning method comprising the above transformation data, 3-channel normal data reprojected into the rectangle, 3-channel texture data, 1-channel triangle mask data identifying a triangle, and 1-channel position mask data representing a region of the mesh.
3. In Paragraph 2, The above deep learning network is a machine learning method, which is a U-net.
4. In Paragraph 3, The above mesh is a machine learning method in which a mesh is a 3D modeled mesh obtained by scanning a face.
5. In Paragraph 4, A machine learning method for learning the classification of valid or incorrect normal data from a feature map extracted at the end of the contracting path of the above U-net.
6. In Paragraph 5, The step of inputting the above-mentioned transformed data into a deep learning network is, The method includes the step of deleting the normal data of the selected triangle of the transformed data and inputting it into the deep learning network. A machine learning method for learning the deleted normal data from the output of the last layer of the expansion path of the above U-net.
7. A method for determining and restoring errors in a 3D face scan mesh using an AI model trained by the machine learning method of paragraph 6, A step of inputting transformation data into the AI model, wherein one triangle and three adjacent triangles are reprojected into a predefined rectangle from the normal mapped normal and texture data of the 3D face scan mesh; and A method for error determination and restoration of a 3D face scan mesh, comprising the step of estimating validity or error from a feature map extracted from the deepest layer (bottleneck) of the AI model.
8. In Paragraph 7, A step of deleting the normal data of the target triangle estimated as an error by the above AI model and inputting it into the above AI model; A step of estimating normal data of the target triangle by the AI model; and A method for error determination and restoration of a 3D face scan mesh, comprising the step of generating a restored image by inserting the above-mentioned estimated normal data.
9. In a machine learning computing system for error detection of data modeled as a mesh, An input unit that receives normal and texture data obtained by normal mapping the mesh; and Select a triangle in UV space based on the above normal mapping, and By reprojecting the selected triangle and three adjacent triangles onto a predefined rectangle, transform data is generated in which the normal and texture data are normalized to the rectangle, and A computing system comprising: a processor that inputs the above-mentioned transformed data into a deep learning network to perform machine learning.
10. In Paragraph 9, A computing system comprising the above transformation data, 3-channel normal data reprojected into the rectangle, 3-channel texture data, 1-channel triangle mask data identifying triangles, and 1-channel position mask data representing parts of the mesh.
11. In Paragraph 10, The above deep learning network is U-net, and The above mesh is a computing system that is a mesh modeled in 3D by scanning a face.
12. In Paragraph 11, The above processor is, Performing an operation to learn the classification of valid or erroneous normal data from the feature map extracted at the bottleneck at the end of the contracting path of the above U-net, A computing system that deletes the normal data of the selected triangle of the transformed data and inputs it into the deep learning network, and performs an operation to learn the deleted normal data from the output of the last layer of the expansion path of the U-net.