System and method for 3D image scanning

The structured light 3D scanner system solves the problem that existing 2D camera systems cannot capture dynamic 3D surface changes in real time and with high precision, achieving high-speed, high-resolution 3D surface image capture, which is suitable for skin cancer detection and stereotactic whole-body radiotherapy.

CN116057348BActive Publication Date: 2026-06-23THE RES FOUND OF STATE UNIV OF NEW YORK

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
THE RES FOUND OF STATE UNIV OF NEW YORK
Filing Date
2021-04-12
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing 2D camera systems cannot capture dynamic 3D surface changes in real time and with high precision, resulting in low efficiency, high cost, and insufficient accuracy in early detection of skin cancer and stereotactic whole-body radiotherapy.

Method used

A structured light-based 3D scanner system is used to project structured light and use a digital camera system and digital projector, combined with computational algorithms to process stripe images, thereby achieving automatic surface registration and high-precision 3D geometric structure and texture reconstruction.

Benefits of technology

It achieves high-speed, high-resolution 3D surface image capture, which is suitable for skin cancer detection and stereotactic whole-body radiotherapy, improving detection accuracy and treatment safety.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116057348B_ABST
    Figure CN116057348B_ABST
Patent Text Reader

Abstract

Systems and methods for 3D image scanners for real-time dynamic 3D surface imaging are disclosed. Embodiments of the present systems and methods describe a system and method comprising a first and / or second camera, and a projector, and a processor, the projector projects structured light with a fringe pattern onto a 3D object, the processor is configured to extract a phase map and a texture image from the image, and the processor calculates depth information from the phase map. Embodiments further describe methods and systems to determine wrapped phase from the image using a Hilbert transform, and to generate absolute phase from the wrapped phase using a combination of a quality-guided path following algorithm, a two-wavelength phase unwrapping algorithm, or a Markov Random Field method, and to generate a phase map from the absolute phase to determine depth information of the 3D object. Algorithms using conformal mapping, optimal transport mapping, and Tschupmueller mapping are used to register and track captured 3D geometry surfaces.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] Cross-references to related applications

[0002] This application relates to and claims priority to U.S. Patent Application No. 63 / 008,268, filed April 10, 2020, the entire disclosure of which is incorporated herein by reference.

[0003] Government funding

[0004] This invention was completed with government support under grants from the National Science Foundation (CCF-0448399 and DMS-1418255). The government holds certain rights to this invention. Background Technology

[0005] A. Technical Field of the Invention

[0006] This invention generally relates to a method and system for a 3D image scanner for real-time dynamic 3D surface imaging, which uses projected structured light to reconstruct the depth and texture information of an object.

[0007] B. Relevant Technical Specifications

[0008] 3D surface imaging is well-known in the industry. However, limitations in rendering quality, speed, and cost currently restrict the practical application of 3D surface imaging.

[0009] Skin cancer is the most common type of cancer in the United States, with more than 5.5 million new cases diagnosed in 2019. Approximately one in five Americans will develop skin cancer in their lifetime. Skin cancer incidence is steadily rising, while incidence of other cancers has declined over the same period. In particular, melanoma is the deadliest form of skin cancer, with its incidence more than doubling in the past 30 years. In a 2015 study comparing cancer treatment costs between 2007–2011 and 2002–2006, researchers from the CDC and the National Cancer Institute found that the average annual total cost of treating skin cancer increased by 126%, while the cost of treating other cancers increased by 25%.

[0010] Skin cancer is treatable if detected early. Studies have shown that sequential whole-body scans are an effective method for early detection of skin cancer, which can save lives, improve treatment outcomes, and reduce healthcare costs. Dermatologists are advised to have patients undergo whole-body scans every three, six, or twelve months, depending on risk factors. Only when patients adhere to this guideline can the chances of early detection of skin cancer be significantly increased. Dermatologists can effectively identify high-risk sites in the skin by comparing scans of the same patient taken at different times and detecting changes in these sites in sequential scans. However, existing digital imaging products using 2D cameras are inefficient and costly in achieving these goals, resulting in low adoption rates among dermatologists and their patients.

[0011] Stereotactic total body radiation therapy (SBRT) is a high-dose cancer treatment method targeting tumors. The goal is to deliver the highest possible dose to kill cancer cells while minimizing exposure to healthy organs. Because extremely high radiation doses can be harmful to patients if cancer cells are not precisely targeted, SBRT requires patients to remain in the same position for each treatment session, and the target area must not move during treatment. Since each treatment session lasts from 30 minutes to 1 hour, this requirement presents a significant challenge for both the patients and the clinicians responsible for continuously monitoring patient position to ensure their safety.

[0012] Several technologies attempt to address this problem. Unfortunately, they all have drawbacks. Some cancer clinics have set up video monitors in treatment rooms and rely entirely on therapists to identify any real-time video movement. Therapists often need to monitor multiple patients receiving treatment simultaneously, which distracts them and further reduces the effectiveness of this approach. X-rays have been used to check overall alignment by matching skeletal anatomy. This results in increased radiation exposure for patients and cannot be used for continuous monitoring during treatment. Radiation oncologists also use lasers to identify skin marks or tattoos on patients. Studies have shown that skin marks or tattoos are unreliable in determining body position for patients with loose skin. Furthermore, depending on the area to be treated, patients may resist permanent marks / tattoos on their skin due to concerns about cancer-related stigma or aesthetic appeal.

[0013] Optical surface imaging is becoming increasingly popular in radiotherapy for patient setup and monitoring. It provides real-time feedback on the patient's position relative to a reference surface captured during treatment planning, allowing clinicians to assess and readjust the patient's setup within the room without the need for radiation or skin markers. However, currently available optical surface imaging systems use 2D cameras to acquire images, failing to capture dynamic 3D surface changes in the human body in real-time with high precision. Undetected alignment inconsistencies, such as hip or upper body rotation (e.g., in prostate or breast cancer treatment) or minute movements in the treatment area (e.g., in brain tumors), can lead to increased patient dose, prolonged setup time, and, most seriously, damage to healthy organs.

[0014] The widespread adoption of whole-body sequential imaging requires practical solutions to two technical challenges. First, it necessitates real-time, high-precision capture of the dynamic skin surface. While currently available high-resolution 2D cameras can capture the color and texture of human skin, they lack depth information about the skin surface. The second challenge for dermatologists is accurately identifying suspicious lesions and examining changes in lesion characteristics using images from 2D camera systems—a time-consuming and labor-intensive process. Due to the high demand for dermatological care from patients, most dermatologists do not have time to examine images generated by these imaging systems in a single scan, let alone compare them with previous scans. Developing reliable image registration methods to accurately compare sequential images captured at different times remains a significant technical challenge due to the 2D nature of these images.

[0015] Improved 3D image scanning systems and methods are beneficial in these and other applications, where high-speed, high-quality 3D imaging can overcome the limitations of previous systems. Summary of the Invention

[0016] To address the shortcomings of the conventional methods described in the background section, exemplary embodiments of this system and method provide a 3D scanner method and system for real-time dynamic 3D surface imaging. This system and method achieve automatic surface registration and allow for the measurement and alignment of 3D objects with higher accuracy.

[0017] Embodiments of the systems and methods described herein provide systems and methods for 3D facial scanning. These systems and methods are high-speed, high-resolution 3D facial scanning systems capable of capturing geometry and texture through dynamic facial expressions. They are portable and easy to use, and feature accurate and robust geometry processing tools. These systems and methods are suitable for facial expression tracking in film and games, VR / AR content generation, and are particularly useful for melanoma detection, orthodontics, and plastic surgery.

[0018] This system and method encompasses hardware and software for the medical field, applicable to dermatologists, dentists, plastic surgeons, and others. It can also be used for security purposes, by government officials or police. Furthermore, it can be applied to facial expression capture systems, film / game research, VR / AR producers, and digital artists.

[0019] This system and method are based on structured light and include a digital camera system, a digital projector, and a computer programmed to operate the system in a novel manner. The projector projects a stripe pattern onto a 3D object, and the camera system captures an image of the object illuminated by the structured light. Each isophase line in the projected stripe pattern is distorted into a curve on the 3D object and projected onto the camera image. Based on the distortion of the isophase lines and the relative geometric relationships between the projector, camera, and the world, computational algorithms process the camera image to reconstruct the 3D geometry and texture.

[0020] The captured stripe image is processed to extract phase and texture images. The algorithm can calculate depth information from the phase image and recover the object's geometry. This 3D scanning system can capture facial surfaces, including those with dynamic expressions, at high resolution and high speed.

[0021] This system and method allow for high-speed 3D surface image capture, which is useful in many applications, including scanning faces with dynamic expressions. The geometry processing software used in this system and method is more accurate and robust than traditional systems.

[0022] Exemplary embodiments provide computer-implemented systems and methods for 3D scanning. The system may include a projector configured to project structured light onto a 3D object. A grayscale camera may be provided and configured to capture a striped image of the object. The system may also provide a color camera configured to capture a color image of the object. A processor is preferably configured to process the striped image to extract a phase map and a texture image, to calculate depth information from the phase map, and to perform 3D surface reconstruction based on the depth information and the texture image.

[0023] According to an exemplary embodiment of the system, images are captured by a first camera for capturing a striped image of a three-dimensional object and a second camera for capturing a color texture image of the object. Preferably, the exposure cycles of the first and second cameras are synchronized. In one example, the first camera is triggered to capture an image in each shutdown cycle, while the second camera is triggered to capture an image in every three cycles.

[0024] According to a further exemplary embodiment, the structured light consists of sinusoidal stripe patterns, each stripe pattern having a channel; the stripe patterns are defocused patterns; the stripe patterns have a quality of less than 8 bits.

[0025] According to a further exemplary embodiment, the processor generates a phase map based on considering the intensity bias (environment) component, the modulation component, and the wrapping phase of the fringe image. The processor determines the unfolded phase based on the wrapping phase and uses a single image to reproduce a smooth geometric surface, such as a face, using the Hilbert Transformation. The processor can determine the unfolded phase using a quality-guided path following algorithm by repeating the following steps: selecting a first pixel; determining the wrapping phase Φ(x,y) of the first pixel; placing pixels adjacent to the first pixel into a priority queue; and selecting the second pixel with the highest quality from the priority queue. The processor can determine the unfolded phase using a dual-wavelength phase unfolding algorithm, where a projector projects a first fringe pattern with a first wavelength λ1 and a second fringe pattern with a second wavelength λ2, where λ1 < λ2, and the processor determines the unfolded phase based on the two wrapping phases for each wavelength. Furthermore, the processor uses a Markov random field method to determine the unfolded phase. Additionally, the dual-wavelength phase unfolding algorithm can be combined with the Markov random field method to improve the determination quality of the unfolded phase.

[0026] According to a further exemplary embodiment, the texture image is used by a processor to locate facial feature points and perform facial feature extraction using deep learning-based computer vision algorithms, such as face detection and face labeling using a single-point detector (SSD) structured network. A quality map and a mask of the facial skin region are generated from the texture image, and the quality map and mask are input by the processor into a phase unwrapping algorithm to determine the unwrapped phase.

[0027] According to a further exemplary embodiment, the processor converts the world coordinates of a point into camera coordinates. The processor can then convert the camera coordinates into camera projection coordinates. The processor can further convert the camera projection coordinates into distorted camera projection coordinates, and then convert the distorted camera projection coordinates into camera image coordinates.

[0028] According to a further exemplary embodiment, the extrinsic and intrinsic parameters of the camera are calibrated using a target board. The target board may include a star-planet pattern comprising multiple larger circular stars, each surrounded by smaller circular planets, where each planet is a solid or hollow circle. The camera's extrinsic and intrinsic parameters can be calibrated as part of an optimization process. In one example, Zhang's algorithm and gradient descent are used to calibrate the camera's extrinsic and intrinsic parameters. The calibration of the camera's extrinsic and intrinsic parameters may consider the center position of each of the multiple stars as a variable in the optimization process.

[0029] According to a further exemplary embodiment, the distortion parameters are determined by the processor using Heikkil's formula.

[0030] According to a further exemplary embodiment, the processor generates at least one point cloud based on depth information, and processes the point cloud to form a high-quality triangular mesh. The processor may further execute conformal geometry methods for image and shape analysis and real-time tracking applications. During the generation of the at least one point cloud, environment, modulation, and projector parameters can be used to estimate surface normal information. Persistent homology algorithms can also be used to compute handle loops and tunnel loops for topology denoising. Furthermore, conformal parameterization is performed, and Delaunay triangulation and / or centroidal Voronoitessellation are applied to the output of the conformal parameters to generate a high-quality triangular mesh.

[0031] According to yet another exemplary embodiment, images are captured from two different viewpoints to obtain stereo depth information, wherein the processor uses a Markov random field method to: i) determine the absolute phase of each pixel to determine depth information from the fringe pattern, and ii) perform a stereo pairing method to obtain stereo depth information. Furthermore, the depth information and stereo depth information are used as input to generate at least one point cloud.

[0032] According to a further exemplary embodiment, a first stripe image is captured for the first time and used by the processor to perform a first 3D surface reconstruction, and a second stripe image is captured for the second time and used by the processor to perform a second 3D reconstruction. The first and second 3D reconstructions are registered for comparison. The second 3D reconstruction is registered to the first 3D reconstruction using conformal geometry. The second 3D reconstruction is registered to the first 3D reconstruction by mapping the surface to a plane and comparing the resulting planar images. The comparison is determined using at least one optimal transfer map. A Fast Fourier Transform (FFT) is applied to at least one optimal transfer map. Texture features and geometric features are extracted from the first and second stripe images. The comparison uses Teichmuller maps to enhance the alignment of features extracted from the first and second stripe images and reduce distortion.

[0033] According to a further exemplary embodiment, at least one prism is used to change the path of one of the projectors or cameras.

[0034] According to a further exemplary embodiment, the phase-height mapping is modeled as a polynomial function at each pixel of the camera, and the coefficients of the polynomial are estimated using an optimization algorithm during camera-projector calibration. Furthermore, the polynomial representation of the phase height is stored as a configuration file.

[0035] These and other advantages will be further described in the detailed description below. Attached Figure Description

[0036] To gain a more complete understanding of the invention and its objects and advantages, reference is now made to the following description in relation to the accompanying drawings, wherein:

[0037] Figure 1 This is a simplified flowchart illustrating a high-level operational example of the system and method for this 3D scanning.

[0038] Figure 2 This is a simplified diagram depicting an exemplary embodiment of a 3D scanning system, showing a high-level system layout.

[0039] Figure 3 The internal layout of an exemplary scanning system, as described in an exemplary embodiment of this system, is depicted.

[0040] Figure 4 A side view depicting an exemplary scanning system internal layout of an exemplary embodiment of this system is shown.

[0041] Figure 5 A front view of a container of an exemplary scanning system facing an object to be scanned, as shown in an exemplary embodiment of this system, is depicted.

[0042] Figure 6This is a diagram illustrating the optical path of an exemplary scanner system that schematically illustrates an exemplary embodiment of the present system.

[0043] Figure 7 A bottom view of a container of an exemplary scanning system oriented towards an object to be scanned, as shown in an exemplary embodiment of this system, is depicted.

[0044] Figure 8 An example of a stripe pattern that may be projected in an exemplary embodiment of this system is depicted.

[0045] Figure 9 This is a timing diagram illustrating an example of exposure time synchronization between a first camera and a second camera in an exemplary embodiment of the system.

[0046] Figure 10 This is a timing diagram illustrating an example of synchronization of a first camera, a second camera, and a projector in an exemplary embodiment of this system.

[0047] Figure 11 A camera coordinate system used in an exemplary embodiment of the system and method for processing images is shown.

[0048] Figure 12 A-12C depicts examples of striped images, phase maps, and grayscale texture images generated according to exemplary embodiments.

[0049] Figure 13 A-13C depicts an example of three original stripe images of a stripe pattern with a wavelength of λ1 = 45, which can be used to reconstruct a frame of a 3D surface.

[0050] Figure 14 A-14C depicts an example of three original stripe images of a stripe pattern with a wavelength of λ2 = 48, which can be used to reconstruct a frame of a 3D surface.

[0051] Figure 15 A-15C depicts an example of an environment, modulation, and texture image generated from an original striped image with a striped pattern wavelength λ1 = 45.

[0052] Figure 15 The D-16F depicts an example of an environment, modulation, and texture image generated from a raw striped image with a striped pattern wavelength of λ2 = 48.

[0053] Figure 16 A-16H depicts various images generated during different stages of the phase unfolding process in an exemplary embodiment of the system and method.

[0054] Figure 17A-17C depicts an example of an image generated according to an exemplary embodiment of the system and method, including: a) a wrapping phase Φ1 having λ1 = 45, b) a wrapping phase Φ2 having λ2 = 48, and c) an unfolded (absolute) phase using a dual-wavelength phase unfolding algorithm.

[0055] Figure 18 A-18C depicts examples of a reconstructed 3G geometry, a geometry with grayscale texture mapping, and a geometry with color texture mapping generated according to exemplary embodiments of the system and method.

[0056] Figure 19 An example is depicted showing a reconstructed facial surface of a generated texture map viewed from different angles according to an exemplary embodiment of the system and method.

[0057] Figure 20 A-20C depicts an example of face detection and facial feature point extraction from a color texture image captured using the Regression Tree Set (ERT) algorithm, according to exemplary embodiments of the present system and method.

[0058] Figure 21 An example of a geometric surface with a colored texture generated according to an exemplary embodiment of the system and method is depicted.

[0059] Figure 22 A-22C respectively depict examples of detecting human facial regions using an SSD structured network, finding facial landmarks, and performing facial feature extraction using the Regression Tree Set (ERT) algorithm in computer vision, according to exemplary embodiments of the system and method.

[0060] Figure 23 A-23C describes an example of a geometric surface generated during the reconstruction of a facial skin surface using phase information and processed by a hole-filling algorithm, as described in an exemplary embodiment of the present system and method.

[0061] Figure 24 A flowchart depicting an exemplary embodiment of the 3D acquisition process of this system and method is provided.

[0062] Figure 25 A mathematical model of the pinhole camera described in an exemplary embodiment of the system and method is shown.

[0063] Figure 26 An example of a target board used for calibration in an exemplary embodiment of this system and method is depicted.

[0064] Figure 27 The imaging relationship between the camera and the projector is illustrated in an exemplary embodiment of this system and method.

[0065] Figure 28 A schematic diagram of an exemplary system for phase-height mapping calibration is depicted.

[0066] Figure 29 A flowchart illustrating a camera calibration and point cloud generation process of an exemplary embodiment of the present system and method is provided.

[0067] Figure 30 A flowchart illustrating an exemplary surface reconstruction process in an exemplary embodiment of the system and method is depicted.

[0068] Figure 31 A flowchart illustrating an exemplary shape analysis process is depicted in an exemplary embodiment of this system and method. Detailed Implementation

[0069] Current systems and methods for efficient, high-speed 3D image scanning typically... Figure 1 The flowchart provides a detailed explanation. In block 105, the system projects structured light onto a 3D object and acquires image data from the illuminated object. The acquired image data typically includes at least one grayscale stripe image, and is preferably a color image. Based on the acquired image data, a phase map is generated in block 110, and a texture image is generated in block 115. For example, in... Figure 12 The image in the image illustrates this point, in which Figure 12 A shows an exemplary striped image. Figure 12 B shows an exemplary phase diagram, and Figure 12 C illustrates an exemplary grayscale texture image. Return to... Figure 1 In block 120, phase maps and texture images, along with calibration data from the camera and projector systems, can be used to generate a point cloud, where the texture represents objects in three-dimensional space. Based on the generated point cloud, surface reconstruction can be performed to generate the surface mesh in block 125. For example, in Figure 19 The image shows a point illustrating an example of reconstructed facial surface generation according to an exemplary embodiment of the system and method, with its texture mapping viewed from different angles. Figure 1 In block 130, shape analysis, including dynamic shape tracking, image analysis, and real-time tracking, can be applied as needed. Each process will be described in further detail below.

[0070] hardware. Figure 2A simplified hardware layout of an exemplary embodiment of a structured light-based 3D scanning system is shown. The system comprises a digital camera system 201, a digital projector 202, and a computer 203. The projector 202 projects structured light, such as a stripe pattern 205, onto a 3D object 204, and the camera system 201 captures an image of the object illuminated by the stripe pattern 205. As shown by 205, the projector can generate three different color channels (red, green, and blue) in each projection cycle. As will be described in further detail, although... Figure 2 The diagram shows a single camera, but camera system 201 may include multiple cameras with different acquisition attributes and spatial orientations. For example, the camera system may include a grayscale camera and a color camera, two grayscale cameras for stereo imaging, and various combinations thereof.

[0071] As shown in reference 207, each isophase line 206 in the stripe pattern 205 is distorted into a curve on the 3D object 204 and projected onto a curve on the camera image 208. Based on the distortion of the isophase lines and the relative geometric relationships between the projector 202, the camera 201, and the world, the computer 203 digitally processes the camera image using computational algorithms to reconstruct the 3D geometry and texture of the object, as described below.

[0072] The devices and settings described in the exemplary embodiments herein are exemplary only and are not intended to be exhaustive. Other combinations of devices and device settings may be substituted for those skilled in the art, as will be apparent to them. For example, exemplary embodiments of the system and method may use a LightCrafter 4500 DLP projector to project a stripe pattern and a Basler acA640-750 μm camera to capture the stripe pattern. For example, the camera may be a grayscale camera with a frame rate of at least 180 frames per second (fps), a resolution of 640×480, and a maximum frame rate of 750 fps. For example, the camera's pixel size may be 4.8 μm x 4.8 μm. Depth accuracy may be 0.2 mm or higher. As will be apparent to those skilled in the art, other types of projectors and cameras may be used, including other types of digital micromirror devices (DMDs). The projector may use a visible light source, or infrared light to avoid interfering with the object being imaged (e.g., a patient in a medical application), or other types of electromagnetic radiation. Exemplary embodiments of this system and method may use an IEEE 1394 PCIe card for multi-camera systems and USB 3.0 for single-camera systems to ensure data transmission bandwidth. Furthermore, exemplary embodiments of this system and method may use solid-state drives (SSDs) to guarantee disk I / O speed and capacity.

[0073] When the projector projects the striped pattern, the camera used in the exemplary embodiment can be triggered once per shutdown pulse. In the exemplary embodiment, an 8-bit striped pattern image can be advantageously used. In this case, the exemplary embodiment of the system and method can capture up to 40 3D fps data. The exemplary embodiment can advantageously use a 4-bit mode to capture moving objects. In this case, the exemplary embodiment of the system and method can capture up to 200 3D fps.

[0074] Exemplary embodiments may advantageously use a second (color) camera to enable color textures or vertex color in the generated 3D mesh. The second camera does not need to be as fast as the first camera, which can be a monochrome camera, since generating the 3D mesh requires only one texture image; as described in the exemplary embodiments herein, only three striped grayscale images are needed to generate the 3D mesh. The second camera is preferably calibrated using a scanning system. In exemplary embodiments of the system and method, for example, a Basler acAl300200uc with a Computar M1614-MP2 F1.4 fl6mm lens can be used as the second camera.

[0075] Exemplary embodiments of this system and method may use a second grayscale camera to combine stereo vision and structured light. Structured light may be based on a dual-wavelength phase-shifting method of interferometry. Stereo vision pairs two images captured from different viewpoints to obtain depth, which is much faster (at least three times faster) than structured light, but less accurate. Structured light encodes phase information by intensity and recovers depth from the phase information, which is slower than stereo vision but more accurate. Conventional 3D acquisition methods use either stereo vision or structured light. Leveraging the power of both methods, the scanner described in the exemplary embodiments can improve both speed and accuracy.

[0076] Exemplary embodiments of this system and method may further utilize one or more prisms to alter the optical path of one or two cameras and / or projectors to reduce the thickness of the device containing components for implementing exemplary embodiments of this system and method. Prisms can be advantageously used to create physically compact scanner systems that may include various components (e.g., projectors and cameras) used in the exemplary embodiments of the system and method described herein. Figure 3-7 An exemplary scanner system is shown. Figure 3An internal layout of an exemplary scanning system is described. 301 refers to a projector, 302 to a color camera, 303 to a grayscale camera, 304 to a prism that modulates the light emitted from the projector 301, 305 to a prism that modulates the light before it is captured by the color camera 302, and 306 to a prism that modulates the light before it is captured by the color camera 305. In this example, prisms 305 and 306 are at a 45° angle. The optical axis of prism 304 is 186 mm away from the optical axis of prism 305, and the optical axis of prism 304 is 60 mm away from the optical axis of prism 306. Although these dimensions may vary depending on the given application, the spacing of the components must be known for calibration and image reconstruction, as described herein.

[0077] Figure 4 A side view depicting the internal layout of an exemplary scanning system is shown. 401 refers to the projector's power connection, 402 refers to the projector's interface (e.g., an output trigger), 403 represents the color camera's interface (e.g., an input trigger), 404 refers to the grayscale camera's interface, and 405 represents the bevel of the projector's prism (e.g., an input trigger). Figure 3 As shown in Figure 304, the oblique angle is as follows: Figure 4 The angle shown is 57.53°. The projector's bevel angle may be merely exemplary, and other bevel angles of the projector's prism may also be used.

[0078] Figure 5 A front view of a container of an exemplary scanning system facing an object to be scanned is depicted. 501 refers to an opening in the container that allows a striped image to be projected onto the object from a projector 301, 502 is an opening in the container that allows light reflected from the object's surface to be captured by a color camera 302, and 503 is an opening in the container that allows light reflected from the object's surface to be captured by a grayscale camera 303.

[0079] Figure 6 This is a schematic diagram illustrating the optical path of an exemplary scanner system. 601 refers to the striped image generated by projector 301, which passes through the prism of projector 304 and then through opening 501, illuminating object 604. 602 refers to the light captured by the color camera, which reflects off the surface of object 604 and passes through opening 502 and the prism of color camera 305. 603 refers to the light captured by grayscale camera 303, which reflects off the surface of object 604 and passes through opening 503 and the prism of grayscale camera 303.

[0080] Figure 7A bottom view of the container of an exemplary scanning system is depicted, facing the object to be scanned. 701 refers to the power connection of the projector 301, 702 refers to the USB interface connection of the projector 301, 703 refers to the I / O connection of the color camera 303, 704 refers to the USB I / O connection of the grayscale camera 302, 705 refers to the threaded mounting for mounting the exemplary scanning system to a fixture or surface, and 706 refers to the bevel of the prism 304 of the projector 301, which bevels at... Figure 7 The bevel is depicted as 57.53°. As previously stated, the bevel angle shown in the exemplary scanning system is merely exemplary, and other bevel angles can be used with the projector's prism.

[0081] Scanning systems such as Figure 3-7 The system can be expanded by replicating the system and sharing system time during image acquisition. In this regard, there will be a first scanning system with a projector 301, a color camera 302, and a grayscale camera 303, along with corresponding optical components, and a second scanning system with a projector 301', a color camera 302', and a grayscale camera 303', along with corresponding optical components. The processor will then use the first and second scanning systems in a time-sharing manner to improve the system's acquisition capability. An exemplary embodiment can project a single-wavelength, three-phase shifted fringe pattern onto an object to capture 3D information.

[0082] In exemplary embodiments of the system and method, the scanner can project a sinusoidal pattern (i.e., a striped pattern) onto a target surface in a very short time. Figure 8 An example of a possible projected stripe pattern is described. The color image has three channels (red, green, and blue), each representing a stripe pattern. It's important to note that depending on the primary geometry of the object being imaged, various orientations of the stripe pattern can be used, such as vertical or horizontal.

[0083] For example, patterns can be generated using the Digital Fringe Generation Technique. For instance, using a three-step phase-shift algorithm, each fringe pattern can be generated as an 8-bit grayscale image, and the pattern can be mathematically represented as:

[0084]

[0085]

[0086]

[0087] In these equations, λ represents the wavelength, the number of pixels per fringe period (i.e., the fringe spacing), and (i,j) represents the pixel index. For example, if λ = 45, it means that one fringe period occupies 45 pixels on the projector screen. This is typically a physical property of the projector screen.

[0088] For example, a DLP projector (as opposed to an LCD or LCoS projector) can be advantageously used to project a striped pattern onto a target surface. A DLP projector generates three distinct color channels (red, green, and blue) in each projection cycle, allowing three DFP images to be combined into a single color image, with each pattern stored in one channel, thus increasing projection speed by a factor of three. However, DLP projectors can exhibit relatively large phase errors when using the red channel, likely due to its longer turn-off time. The camera is triggered with each projector turn-off pulse: if the red channel's turn-off time is longer than the blue and green channels, more ambient light enters the camera, reducing the signal-to-noise ratio (SNR) and potentially affecting the captured phase quality. To address this, exemplary embodiments of this system and method can use a defocused pattern instead of a conventional digital striped projection (DFP), significantly improving phase quality. Furthermore, the use of a DLP projector provides an alternative solution that trades a slight loss in reconstruction quality for a significant increase in scanning speed. Current systems and methods can use lower-quality stripe pattern projection (e.g., 4-bit) so that all six patterns can be stored in a single color image, and the system described in the exemplary embodiments can be speeded up to reduce buffering and projection time. In fact, using a 4-bit stripe pattern can make the system 5-6 times faster than using an 8-bit stripe pattern.

[0089] Camera synchronization. As previously described, in order to include color textures or vertex colors within the 3D mesh generated in the exemplary embodiments of this system and method, the scanning system may use a color camera in addition to a grayscale camera. To eliminate interference from stripe impacts, the exposure time of the second camera should cover the complete cycle of the sinusoidal pattern projected onto the object. An exemplary embodiment may advantageously use a three-step phase-shift method. In one example, the color camera is triggered every three shutdown pulses, such that the exposure time of the second camera is three times longer than the exposure time of the first camera (which may be a grayscale camera). Both the color and grayscale cameras can be triggered by the same projector, thus the camera exposure cycles can be automatically synchronized. Figure 9 The exemplary embodiment of the system and method is shown, in which the exposure time of a first camera (e.g., a "grayscale camera") 9001 is synchronized with that of a second camera (e.g., a "color camera") 9002. The second camera is triggered once (9006) for every three triggers of the first camera (9003, 9004, 9005). Figure 10 An example of synchronization of a first camera, a second camera, and a projector is shown in an exemplary embodiment of the system and method.

[0090] Geometry and phase. Figure 11 The camera's coordinate system is illustrated graphically, and its exemplary embodiment can be used for image processing. In the camera coordinate system, z, 11001 represents depth, and S, 11002 represents the surface of an object. The surface S, 11002 is represented as a function of depth:

[0091] z(x,y)=h(x,y), (2)

[0092] Where (x, y) represents the spatial coordinates of the camera image. In the example, a sinusoidal fringe pattern is projected onto the surface, and the spatial wavelength of the fringe pattern is λ, 11003. The angle between the projector's optical axis and the camera's optical axis (z-axis) is θ, 11004. Therefore, the wavelength of the projected fringe pattern on the plane z = U is λ. x Defined as:

[0093]

[0094] If p o 1107 is fixed on surface S1102, p o p1 1006 and p2 11005 share the same (x,y) coordinates. o On the same phase line, p1 = (x1, y1, 0), p2 = (x2, y2, 0), p o If the depth is h(x2,y2), then the following relationship can be derived:

[0095]

[0096] in It is p k The absolute phase, k = 1, 2. Then the following equation can be derived:

[0097]

[0098] as well as

[0099] Therefore, this system and method can be derived from the phase. Calculate the depth information h(x,y).

[0100] Phase shift: Striped image. Exemplary embodiments of this system and method can reconstruct 3D information from striped images captured by a camera using a phase shift method. Figure 12 A depicts an exemplary striped image. Figure 12 B depicts an exemplary phase diagram. Figure 12C depicts an exemplary grayscale texture image. Based on the phase map, the algorithm described in this paper can calculate depth information and recover the geometric coordinates of the object.

[0101] The following basic formula is used to model the stripe image:

[0102]

[0103] Where x and y are spatial coordinates, I'(x,y) is the intensity deviation (with normal ambient light), and I"(x,y) is half of the peak-valley intensity modulation, which is the intensity of the light from the projector. It is the phase of the time phase difference that controls the sinusoidal change associated with the reference wavefront. If x and y are fixed, there are three unknowns in equation (5) that need to be solved. For each pixel on the striped image, the difference of I'(x,y) is directly represented by the gray value. Therefore, since there are three unknowns in equation (5), at least three striped images are sufficient to reconstruct a 3D frame. The speed and resolution of the 3D frame can be completely controlled by the speed and accuracy of the camera and projector.

[0104] In other words, the target object should be nearly static because x and y are fixed, meaning either the object is not moving or the projector and camera are operating at very high frame rates. In practice, exemplary embodiments can use denoising and pixel tracking methods to reduce the impact of moving objects. Figure 13 A-13C describes an example of a raw fringe image with three fringe patterns at wavelength λ1 = 45, which can be used to reconstruct a 3D frame of the aforementioned 3D surface, and Figure 14 AC describes an example of an original stripe image with three stripe patterns at wavelength λ2 = 48, which can be used to reconstruct a 3D frame of the aforementioned 3D surface.

[0105] Ambient light, modulation, and texture. Exemplary embodiments of this system and method may use a three-step phase-shifting algorithm and two different wavelengths to unfold the absolute phase.

[0106] If the phase shift is δ = 2π / 3, then a three-fringe pattern can be defined as:

[0107]

[0108] For convenience, I1, I2, and I3 refer to I1(x,y), I2(x,y), and I3(x,y), respectively. Φ(x,y) can be solved as:

[0109]

[0110] The average strength can be calculated as follows:

[0111]

[0112] Furthermore, data modulation can be calculated as follows:

[0113]

[0114] Finally, a texture without stripes can be generated as follows:

[0115] I t (x,y)=I'(x,y)+I"(x,y). (10)

[0116] Figure 15 A-15C depicts an example of an environment, modulation, and texture image generated from an original stripe image with a stripe pattern wavelength λ1 = 45 nm, and Figure 15 The D-15F depicts an example of an environment, modulation, and texture image generated from an original striped image with a striped pattern wavelength of λ2 = 48.

[0117] Hilbert transform. Subtracting the environment component from the original image can be calculated as follows:

[0118] I k (x,y)-I'(x,y)=I"(x,y)cos[Φ(x,y)+2kπ / 3].

[0119] After applying the Hilbert transform, we obtain the following equation:

[0120]

[0121] Therefore, the wrap phase can be recovered from a single image using the following calculation:

[0122]

[0123] Traditional phase-shifting algorithms require at least three images to calculate the wrapper phase. By using the Hilbert transform, existing methods can calculate the phase using only a single image. This increases the scan speed by 3 times and significantly improves the robustness of the system.

[0124] Phase unwrapping. By using the assumption that the surface is continuous, the wrapped phase can be recovered to the absolute phase. As described herein, exemplary embodiments can use a quality-guided path tracking phase unwrapping algorithm for dynamic surfaces with low curvature and high-speed acquisition. Exemplary embodiments of the system and method can also use a dual-wavelength phase unwrapping algorithm for surfaces with complex geometries and slow deformation. Exemplary embodiments can further unwrap the wrapped phase using a Markov random field (MRF) based algorithm, which is advantageous for capturing noisy shapes requiring intensive computation. For this purpose, the MRF algorithm can be implemented on a GPU with parallel optimization algorithms (such as minimum cut / maximum flow rate). For large field-of-view views, the system may further use a synchronized dual-camera system. In this case, structured light and stereo pairing algorithms can be implemented to fuse geometric and texture data. For static shapes, exemplary embodiments of the invention can use multi-level Gray codes.

[0125] Quality-guided path tracing. An exemplary embodiment may utilize a quality-guided path tracing algorithm to locate facial skin regions in the face detection and feature extraction steps, and define a mask. The modulation calculated as previously described represents the quality of each pixel, which can be used to define a quality map.

[0126] In the quality-guided path algorithm used in the exemplary embodiments of this system and method, the algorithm selects a seed pixel, uses its wrapper phase as the absolute phase, and places all its neighboring pixels within a mask in a priority queue. At each step, the algorithm selects the highest-quality pixel in the queue, finds its neighboring pixels, unfolds their phases to absolute phases, and then adds its neighboring pixels to the queue. The algorithm may repeat this process until all pixels within the mask have been unfolded.

[0127] Dual-wavelength. Exemplary embodiments of this system and method can also utilize a dual-wavelength algorithm to unwrap the wrapped phase. The described phase-shifting method is in the range of (-π, π), meaning that if the wavelength of the fringe pattern is not large enough, a phase discontinuity of 2kπ will appear on the object surface. In an exemplary embodiment of the invention implementing the dual-wavelength algorithm, two different fringe patterns with different wavelengths (λ1 and λ2, λ1 < λ2) can be used instead of a single very wide fringe pattern to capture the original image of the object. The dual-wavelength algorithm can measure the same object surface using different wavelengths, and the two phase maps are defined as follows:

[0128]

[0129]

[0130] The difference between the two phase diagrams is represented as follows:

[0131]

[0132] here:

[0133] The equivalent wavelength between λ1 and λ2 is sufficiently large, and the absolute phase Φ can be expanded as follows:

[0134] φ=ΔΦ 12 mod 2π. (15)

[0135] Therefore, if If the size is large enough to cover the entire image range, the modulus operator will not change the phase, so the generated phase is the same as the unfolded phase. This method is faster, but it introduces noise compared to high-quality guided path tracing algorithms.

[0136] Markov Random Field Method. Exemplary embodiments can use the Markov Random Field method to unify stereo pairing and phase unrolling algorithms. In the phase-shifting structured light method described in exemplary embodiments of this system and method, the absolute phase is proportional to the height information. Only the wrapper phase information, i.e., the absolute phase modulo 2π, can be obtained from the image. The difference is an integer multiple of 2π, called the wrapper count. The process of recovering the absolute phase from the wrapper phase is a key step in the pipeline. The Markov Random Field method models the wrapper count of each pixel as an integer-valued random variable, and all the wrapper counts of each image pixel can form a random field. Each random variable is influenced by its neighboring quantities. Phase unrolling is equivalent to optimizing the total energy of the random field, which can be solved by transforming the problem into maximizing the flow of the image and then using a maximum flow / minimum cut algorithm. Compared to techniques such as path tracking and dual-wavelength, the graph cutting method used for phase unrolling is more robust to noise and produces higher fidelity. In contrast, the Markov Random Field method can also provide a robust stereo pairing method that searches for the optimal paired pixels along the epipolar line. Therefore, this system and method can use an efficient and stable integer optimization method for phase unwrapping and stereo pairing.

[0137] Unfold the phase process. Figure 16 A-16H depicts various images generated at each stage of the phase unfolding process. First, using the fringe image, equation (6) is solved using equations (8), (9), and (7) to obtain the average intensity I', fringe modulation I" and phase difference Φ. After solving for the above variables, the data modulation γ = I" / I' and texture I can be directly obtained. t =I'+I". Once the unfolded phase is obtained using one or more of the above phase unfolding techniques, the value on each pixel can be mapped to real-world coordinates. Figure 17A-17C depicts the generated image, where a) has a wrapping phase Φ1 with λ1 = 45, b) has a phase Φ2 with λ2 = 48, and c) has an unfolded (absolute) phase using a dual-wavelength phase unfolding algorithm.

[0138] Phase to geometry. In the camera image plane (u c ,v c For each pixel on the array, the phase is unfolded as follows: Corresponding world coordinates (X w ,Y w Z w The depth of a pixel can be recovered using external and internal parameters of the camera and projector. The pixel depth can be approximated by the polynomial in equation (34):

[0139]

[0140] Among all coefficients a0(u c ,v c ), a1(u c ,v c ...estimated by the calibration process.

[0141] Figure 18 A depicts an example of a facial geometry surface. Figure 18 B depicts an example of a geometric surface with a grayscale texture mapping, and Figure 18 C depicts an example of a color texture mapping; all of these can be generated by exemplary embodiments of the system and method described herein. Figure 19 An example of reconstructing a facial surface using texture mapping from different viewpoints is depicted. Figure 20 The paper describes face detection and facial landmark extraction on color texture images using the Regression Tree Ensemble (ERT) algorithm, and... Figure 21 An example of a geometric surface with colored texture is depicted.

[0142] Face detection and facial landmark extraction. The textured image obtained as described in the exemplary embodiment can be used to detect facial regions of a person, and can then be further used to locate facial landmarks and perform facial feature extraction using deep learning-based computer vision algorithms, such as... Figure 20 A and Figure 22 As shown. Figure 22 A-22C describes examples demonstrating the use of SSD-structured networks to detect human facial regions, locate facial landmarks, and extract facial features using the Regression Tree Set (ERT) algorithm in computer vision.

[0143] Facial feature extraction algorithms can locate the eye, nose, mouth, and eyebrow regions. The image formation model in Equation (5) assumes that the surface has Lambertian reflectance properties. Human skin is Lambertian, but the surface of the eye is smooth. Therefore, this model may not be applicable to the eye surface. Consequently, the phase information reconstructed for pixels in the eye region may be unreliable. Facial skin regions can be used to define quality maps and masks, which can be used in phase unwrapping algorithms such as quality-guided path tracing and mask cutting algorithms, and Flynn's minimum discontinuity algorithm.

[0144] Facial feature extraction algorithms can be used to locate the eye region. Exemplary embodiments of this system and method can calculate phase information of the facial skin surface other than the eye region. Figure 23 A-23C depicts an example of a geometric surface generated during the reconstruction of a facial skin surface using phase information and processed by the hole-filling algorithm described in the exemplary embodiments herein. Figure 23 As shown, facial skin surfaces 23A and 23B are initially generated without the eye region. The eye region can then be filled in using different algorithms. For example, the eye region can be reconstructed by calculating a harmonic surface with Dirichlet boundary conditions, as shown in 23C. The phase map can also be median filtered to improve image smoothness.

[0145] Feedback between low-level and high-level vision. Exemplary embodiments of this system and method can utilize feedback between low-level and high-level vision. Conventional image systems employ a bottom-up approach, which first processes low-level tasks (including denoising, edge detection, segmentation, and feature extraction) and then processes high-level tasks (including face detection and pose estimation). Exemplary embodiments can use feedback from high-level vision to improve low-level vision and update high-level tasks. For example, low-level segmentation can be corrected and refined by high-level face detection. As another example, phase unwrapping can be enhanced by extracting eye and mouth regions, while stereo pairing can be refined by pose estimation, etc. High-level vision tasks can be implemented using deep learning methods, such as SSD-structured networks for face detection and ensembles of regression trees for facial feature point extraction, while low-level tasks primarily rely on conventional 3D vision algorithms, such as Markov random fields.

[0146] 3D acquisition process. Now for reference. Figure 24 , Figure 24 This is a flowchart illustrating an example of a 3D acquisition process of an exemplary embodiment of the system and method, and further illustrates... Figure 1Phase map generation in step 110. A projector projects a stripe pattern onto a 3D object, and a camera captures an image of the object illuminated by structured light. Each phase line in the projector stripe image is distorted into a curve on the 3D object and projected onto the camera image. At 24001 and 24002, the system in the exemplary embodiment can capture stripe images from a grayscale camera. The system can also capture a color image 24003 of the object from a color camera. Stripe images 24001 and 24002 can be implemented using equation (8) or by a processor of a computer system as described herein using the Hilbert transform algorithm in the corresponding blocks 24004 and 24005. The processor can process each raw stripe image to obtain the modulation components (24006, 24009), ambient components (24007, 24010), and wrapping phase components (24008, 24011) of the raw stripe image. To unwrap the phase, embodiments of the system may use a dual-wavelength phase unwrap algorithm 24012, a quality-guided path tracking algorithm (not shown), or another phase unwrap (noise) process (24016), and / or a Markov Random Phase Unwrap Algorithm 24020 to calculate the absolute phase to generate a phase map of the image, thereby obtaining depth information. For example, in block 24012, the wrapping phase components 24008, 24011 of the two stripe images are applied to the dual-wavelength phase unwrap process. This process produces a noise-unwrap phase component 24016.

[0147] like Figure 24 As further shown, the phase unwrapping 24011 can also be combined with other steps, such as deep learning techniques 31013, segmentation (e.g., using image cropping method 24015), and / or edge detection using sophisticated filters or similar techniques 24017. Such a process can make certain applications of the 3D scanning system in the exemplary embodiments more efficient for specific 3D acquisition tasks. The exemplary embodiments can advantageously utilize feedback from high-level tasks (e.g., face detection, facial landmark extraction, and pose estimation) to process low-level tasks (e.g., denoising, edge detection, segmentation, and phase unwrapping). Low-level algorithms, such as segmentation and phase unwrapping, can be corrected and refined by high-level face detection. Phase unwrapping can be enhanced by extracting eye and mouth regions, while stereo matching can be refined by pose estimation, etc.

[0148] After applying these processes, the wrapped phase can be unfolded using the Markov Random Field (“MRF”) phase unfolding algorithm 24020 described above to calculate the absolute phase 24021, thereby generating a phase map of the image and obtaining depth information. Each channel may employ MRF phase unfolding processes 24024 and 24019, which receive the noise unfolded phase component from block 24016, the outputs from high-level and low-level processes (e.g., blocks 24013, 24015, 24017), and the corresponding wrapped phase components 24008 and 24011 to generate the final unfolded phase components 24020 and 24021.

[0149] Camera and Projector Calibration. Camera and Projector Model. For camera and projector calibration, exemplary embodiments of this system and method can use a nonlinear distortion camera model. One aspect of this process is modeling the image from phase to height and its inversion. Because these mappings are highly nonlinear, exemplary embodiments of this system and method use high-order polynomials to approximate the mapping for each pixel of the camera. All approximation coefficients are calculated during the calibration process and stored in a configuration file. This approach ensures accuracy and real-time computation.

[0150] The mathematical models of cameras and projectors can be described using the following pipeline:

[0151]

[0152] The top row displays the image formation process from the camera, while the bottom row displays the image formation process from the projector.

[0153] Mapping The transformation from world coordinates to camera coordinates involves rotation and translation, as shown in equation (17).

[0154] It is the projection of the pinhole camera, which is mapped from the camera coordinates to the camera projection coordinates, as shown in equation (18).

[0155] It is the camera distortion mapping in equation (21), which transforms the camera projection coordinates to distorted camera projection coordinates. The distortion includes radial distortion equation (19) and tangential distortion equation (20).

[0156] It is the projection transformation in equation (22), which maps the distorted camera projection coordinates to the camera image coordinates.

[0157] The reciprocal of the value is mapped from the distorted camera projection coordinates to the camera projection coordinates. It is the Heikkila formula in equation (23).

[0158] Due to the principle of light path reversal, a projector can be considered as the reverse of a camera. If a plane π in the world is fixed, called a virtual reference plane, then the coordinates of the virtual reference plane (x...) π ,y π (u) to camera image coordinates (u) c ,v c The mapping of ) is bijective, to the projector image coordinates (u p ,v p The mapping of ) is also bijective:

[0159]

[0160] in Similar definitions. Components The mapping is given as: ψ:(u c v c )→(u p v p ).

[0161] Model of a pinhole camera. Figure 25 The mathematical model of a pinhole camera is shown. (X) w ,Y w Z w (25001, 25002, and 25003 respectively) are world coordinates, (X) c ,Y c Z c (25004, 25005, and 25006 respectively) are camera coordinates, (u,v) are image coordinates. Point p in the world coordinate system is (X... w ,Y w Z w ), which is (X) in the camera coordinate system. c ,Y c Z c ),Then

[0162]

[0163] Where R is the rotation matrix from the world coordinate system to the camera coordinate system, and T is the translation vector.

[0164] The projection of the camera's projected coordinates (ignoring distortion) is determined by the following:

[0165]

[0166] Distortion model. In reality, the camera lens introduces distortion, and the image is not an ideal pinhole camera model; distortion needs to be considered during calibration. Typically, distortion includes radial distortion and tangential distortion. (x,y) can be used to represent the projected coordinates on the image plane, for example, (x... c ,y c Radial distortion (δ) xr ,δ yr This can be represented as:

[0167]

[0168] Where r 2 =x 2 +y 2 k1, k2, k3… are radial distortion parameters. Tangential distortion (δ…) xt ,δ yt This can be represented as:

[0169]

[0170] Where p1 and p2 are tangential distortion parameters.

[0171] After considering camera distortion, the distorted camera projection coordinates (x, y) of point p are... d ,y d This can be represented as:

[0172]

[0173] After the projection transformation, the camera image coordinates of point p can be expressed as:

[0174]

[0175] Where f u f v , respectively, are the effective focal lengths along the u and v directions, s is the tilt parameter of the coordinate axis, and (u0, v0) are the coordinates of the principal point (i.e., the intersection of the camera optical axis and the image plane).

[0176] Camera calibration. Camera calibration aims to identify key camera parameters, including:

[0177] External parameters: rotation R, translation T;

[0178] • Inherent parameter: Effective focal length f u f u ; tilt parameter s, principal center (u0, v0); and

[0179] • Distortion parameters: radial distortion parameters k1, k2 and k3; and tangential distortion parameters p1 and p2.

[0180] In fact, intrinsic parameters also include distortion parameters. Generally, k3 and s are small enough that they are usually treated as zero in the equation. Extrinsic and intrinsic parameters can be expressed as:

[0181] μ=(R c ,T c ,f u ,f v ,s,u o ,v o ),

[0182] And all distortion parameters can be expressed as:

[0183] λ = (k1, k2, k3, P1, P2).

[0184] Target board. Figure 26 The image shows a star-planet pattern on the target board used for calibration. There are 7x5 star systems, each star surrounded by 9 planets. Each planet is either a solid circle or a hollow circle. Each hollow circle represents 1, and each solid circle represents 0. The 9 planets are encoded into a binary string. For example, the planet system 26001 in the top row and second column represents the string 111100000. Two binary strings are equivalent if they are not in a circular permutation. An elliptic detector is used to detect the centers of the stars. Each binary string is used to distinguish different star systems. Figure 26 As shown, the center of the star in the upper left corner is the origin of the world coordinate system, with the horizontal and vertical directions along the X-axis. w and Y w The axis, and the direction perpendicular to the target plane is Z. w axis.

[0185] During the calibration process, exemplary embodiments of this system and method fix the position of the target plate plane π and treat the target plate's local coordinate system as the world coordinate system. The plane equation is Z. w =0, and the center of each star's center is known and represented as

[0186]

[0187] During the calibration process, the image coordinates of the center of each star are captured and represented as:

[0188] {(u1, v1), (u2, v2),…, (u n v n )}.

[0189] Based on the coordinate mapping, To {(u i v iIn an exemplary embodiment, external and internal parameters μ can be estimated.

[0190] The projector can be viewed as an inverted camera. Exemplary embodiments of this system and method can project a sinusoidal stripe pattern onto a target board. It is the center of the star on the target panel. Correspondingly... It can be extracted from the stripe image. The phase at the location is expanded to perform the corresponding processing.

[0191] Internal and external parameter estimation. Image forming mapping (also known as forward projection) depends on both internal and external parameters.

[0192]

[0193] The calibration problem is formulated as an optimization problem:

[0194]

[0195] An exemplary embodiment can use Zhang's algorithm to estimate μ, extrinsic parameters, and intrinsic parameters; then, with μ fixed, E(λ,μ) is optimized relative to λ; third, with λ fixed, E(λ,μ) is optimized relative to μ. Zhang's algorithm is further described in Z. Zhang, A new flexible camera calibration technique, IEEE Journal of Pattern Analysis and Machine Intelligence, 22(11):1330-1334, 2000, which is incorporated herein by reference. By alternating optimization, we can obtain the optimal result using the following equation:

[0196] (λ * μ * ) = argmin λ,μ E(λ, μ).

[0197] The gradient descent algorithm can be used for optimization:

[0198]

[0199] Similar algorithms can be used to estimate the internal and external parameters of a projector.

[0200] A distorted calibration plate. In reality, the target plate used for calibration is not an ideal plane. Ideally, there will always be some small distortion. Therefore, in the optimization process, exemplary embodiments of this system and method also treat the center of the stellar system as a variable. For each center, the ideal world coordinates are... in The value is 0. The coordinate deviation can be expressed as... The real-world coordinates of the center can be represented as:

[0201]

[0202] The deviation can be expressed as:

[0203]

[0204] Then, the energy of the system can be expressed as:

[0205]

[0206] Preferably, calibration is performed by minimizing energy based on the following formula:

[0207]

[0208] Treating the center of the star system as a variable in this way improves calibration accuracy.

[0209] A model of phase-height mapping. Back projection: Heikkiel's formula. Inverse projection of orthographic projection. This is called back projection. Because the radial distortion equation (19) and the tangential distortion equation (20) are nonlinear, in equation (21) from (x,y) to (x d ,y d The transformation of () cannot be directly reversed. It may be necessary to use iterative methods or polynomial approximations to reverse equation (21).

[0210] The inverse transform can be computed using Heikkiel's polynomial approximation:

[0211]

[0212] Where G is defined as

[0213]

[0214] and It is the back projection distortion parameter.

[0215] Phase distribution on the virtual reference plane. Exemplary embodiments of this system and method can calculate the phase distribution on the virtual reference plane based on the following description. Assuming the camera parameters λ and μ of the system coordinate system are known, the back projection can be calculated as follows:

[0216]

[0217] For a point (u) on the camera image plane c ,v c The coordinates (x, y) can be obtained using equations (22) and (23), and then the following formula can be obtained using equation (17) with respect to the coordinates:

[0218]

[0219] If the height Z w Since it is fixed, the following formula can be derived from the equation above:

[0220]

[0221] Where B is represented as:

[0222]

[0223] The above equations enable the exemplary embodiment of the system to have a fixed height Z. w The virtual reference plane is generated from (u c ,v c ) to (X w ,Y w The mapping of ).

[0224] Similarly, after determining all internal, external, and distortion parameters of the projector through the aforementioned calibration process, this method can then, using equations (17), (18), (19), (20), (21), and (22), convert a projector with a fixed Z-axis. w (X) on the virtual reference plane w ,Y w Mapped to projector image coordinates (u p ,v p The components are given from (u) c ,v c ) to (u p ,v p The mapping of ).

[0225]

[0226] During calibration, the projector image coordinates (u) p ,v p This can be represented as a fringe phase, which gives the phase distribution on the corresponding virtual reference plane.

[0227] If we fix the virtual reference plane Z w =z, mapping The following is given in each (u) c ,v c The phase at ) is represented by this mapping as

[0228]

[0229] Phase-height mapping model. An exemplary embodiment may include a phase measurement profilometry (PMP) system that allows the projector to be viewed as a camera based on the principle of optimal path reversibility. Assuming no distortion, the following relationship can be obtained:

[0230]

[0231] This allows the following relationships to be represented:

[0232]

[0233] In the above relationship, x c =X c / Z c y c =Y c / Z c x p =X p / Z p y p =Y p / Z p The first three relationships mentioned above can form the following system of linear equations:

[0234]

[0235] The system of linear equations can be further simplified to

[0236]

[0237] The following equations can be derived from simplification.

[0238]

[0239] This equation can also derive the following equation:

[0240]

[0241] Equation 28 enables the exemplary embodiment of the system to generate a distortion-free phase-height mapping. If C3(x c ,y c C4(x) c ,y c )x p (Most PMP systems will typically meet this condition), and the following equation can be obtained:

[0242]

[0243] like Figure 27 As shown, it illustrates the imaging relationship between the camera and the projector in an exemplary embodiment of this system and method, with straight line O... c P, 27001 is projected onto the projector image plane as line l, 27004, therefore each point (x) on the camera image plane 27002 is... c ,y cThis corresponds to a line on the projector image plane 27003. If distortion exists, the projection of the line is a complex curve. This distortion can be complex and difficult to interpret, and can be approximated using the following polynomial equation:

[0244]

[0245] Equation (31) can be substituted into equation (30) to obtain the following polynomial expression:

[0246]

[0247] In the PMP system described in the exemplary embodiment, the phase of the stripes is linearly distributed on the projector image plane, therefore the phase... It can be used to project the image coordinates u p x pd Represented as u p A linear function. This consideration makes equation (32) calculate as:

[0248]

[0249] As mentioned above, the following mapping relationship is known to be non-linear:

[0250]

[0251] Inverted relationship It is also non-linear, if Approximated by the Heikkiel formula, it can be expressed as (x c y c )=f(u c v c The above equation becomes:

[0252]

[0253] In this equation, the coefficient k i of(u c ,v c ) can be represented as a i (u c ,v c Phase function It can be represented as And the depth function Z w of(u c ,v c ) can be represented as Z w (u c ,v c Using these representations, this method can generate the final phase-height mapping formula based on the following:

[0254]

[0255] Equation (34) enables the polynomial approximation to be used for phase-height mapping.

[0256] Considering the external and internal parameters (including distortion parameters) of the camera and projector, this method can calculate the phase distribution on the virtual reference plane, which is expressed as f in equation (28). z( u c ,v c A set of depths z1, z2, ..., z n It can be selected, and the phase at each pixel on the camera plane can be calculated as... Next, the parameter a0(u) in equation (34) c v c ), a1(u c v c ), ..., a n (u v v c The following optimized equation can be used for calculation:

[0257]

[0258] Phase-height mapping calibration. Assuming the camera's internal parameters are constant, an exemplary embodiment can perform phase-height mapping calibration using the following procedure:

[0259] • Place the planar target at different positions within the measurement volume, denoted as π1, π2, ..., π k Estimate the transformation matrix from the plane to the camera image plane at each location; the transformation matrix is ​​represented as (R1,T1), (R2,T2), ..., (R... k ,T k ).

[0260] For each pair (R) i ,T i ) and (R j ,T j Construct a system of linear equations for the camera's internal and external parameters, and solve for the internal and external parameters μ.

[0261] • Calculate the distortion parameter λ, as well as the optimized internal and external parameters, using optimization methods.

[0262] • With a fixed depth z, the virtual reference plane f is calculated using, for example, equation (28). z (u c ,v c Phase distribution on ).

[0263] • Through each pixel (u c ,v c The polynomial of the expanded phase at ) approximates the phase-height mapping, for example using equation (34), and its coefficients can be estimated using optimization methods, such as equation (35).

[0264] An exemplary system diagram for phase-height mapping calibration is shown below. Figure 28 As shown.

[0265] Camera calibration process. (See now for reference.) Figure 29 , Figure 29 A flowchart illustrating the camera calibration and point cloud generation process of an exemplary embodiment is provided, and further explanation is given. Figure 1 Example of operation in block 120. At 29001, for example, [the following can be used]. Figure 26 The target board shown is used to capture calibration fringe images from cameras in the system. At 29002, the camera calibration process can begin calibrating various camera parameters, including internal and external parameters. The parameters of the camera (e.g., a grayscale camera, a color camera, and a projector) can be calibrated at 29002 using optimization equations (e.g., Zhang's algorithm and gradient descent algorithm) and various raw images obtained from the calibration image of, for example, the target board. Note that the camera can be a single camera or multiple cameras arranged for stereo acquisition. After obtaining the unfolded phase from the raw images at 29004, a phase-height map can be generated using the back-projection calibration technique of the Heikkiel formula (29005). From the phase-height map, the system can generate a point cloud at 29006 based on the fringe calibration image including depth information. Meanwhile, the parameters obtained at 29003 can be used to obtain texture coordinates at 29007. Simultaneously, based on the use of two cameras, the system can generate left environment and modulation information at 29011 and right environment and modulation information at 29012 based on images obtained from cameras with two different views. The stereo pairing algorithm at 29013 can use information from 29011 and 29012 to calculate depth information based on stereo image capture. Based on the environment, modulation, and projector parameters input at 29008, surface normal information can be estimated at 29009. Based on the estimated surface normal information 29009, the texture coordinates 29007 of the fringe calibration image, the point cloud 29006 generated from the fringe calibration image, and the stereo pairing information 29013, the system can generate a very accurate point cloud representing the depth image of the object captured by the camera image, which explains all the inputs described in the figure.

[0266] Computational conformal geometry methods. The software used in exemplary embodiments of this system and method is based on computational conformal geometry methods. Further background information on computational conformal geometry methods is described in X.Gu, R.Guo, F.Luo, J.Sun and T.Wu, Discrete Homogenization Theorem II for Polyhedral Surfaces, Journal of Differential Geometry, 109(3):431-466, 2018 and in X.Gu, F.Luo and J.Sun and T.Wu, Discrete Homogenization Theorem for Polyhedral Surfaces, Journal of Differential Geometry, 109(2):223-256, 2018, both of which are incorporated herein by reference. Conformal geometry methods transform 3D geometry tasks into corresponding 2D image tasks by preserving local shape, deforming 3D surfaces onto planar domains. To pair and register two 3D surfaces, exemplary embodiments can use the Riemann mapping algorithm in conformal geometry to map the surfaces to planar disks and then directly compare their planar images. This is easier and faster than conventional methods. The conformal geometry method described in this paper can handle real-world surfaces with complex topologies and geometries, mapping them to one of three typical shapes: aspherical, Euclidean, or hyperboloid. Conversely, quasi-conformal geometry methods can be used to map planar images with various types of constraints and objectives.

[0267] For example, the conformal flattening of a 3D shape to a plane can be found using the Ricci flow algorithm, which deforms the Riemannian metric proportionally to the current curvature. In this way, the curvature evolves according to a diffusion reaction process and eventually becomes constant. The mapping with minimum elastic distortion energy is modeled as a harmonic mapping, which can be achieved using a nonlinear thermal flow method. The mapping with minimum angular distortion is represented as a Tekmuller mapping, which can be achieved by searching for special Beltrami coefficients in a holomorphic differential space. Further background information on this topic is described in X. Yu, N. Lei, Y. Wang, X. Gu, Intrinsic 3D Dynamic Surface Tracking Based on Dynamic Ricci Flow and Tekmuller Mapping, International Conference on Computer Vision 2017, which is incorporated herein by reference. The mapping of the elements of the preserved region can be computed using the optimal transport mapping. Further background information on the use of optimal transport mapping is described in X.Gu, F.Luo, J.Sun and ST. Yau, Variational principles for Minkowski-type problems, discrete optimal transport and discrete Monge-Ampere equations, 20(2):383-398, Asian Journal of Mathematics (AJM), 2016, which is incorporated herein by reference. Conformal flattening and surface registration algorithms can be used for colorectal cancer screening. Furthermore, this system and method can implement deep learning algorithms for image analysis, image surface segmentation, and face detection applications.

[0268] The software functions related to skin mapping implemented by embodiments of this system and method include:

[0269] • Surface registration and image registration. The captured 3D surface sequences with texture will be precisely registered to track each anatomical point on the skin frame by frame in the sequence. This makes the sequence images and sequence surfaces symmetric.

[0270] • Geometric and texture analysis. This algorithm calculates the principal curvature direction field on the skin surface, tracing wrinkle curves on the surface; the method calculates the surface curvature and umbilicus, which represent the surface roughness. The algorithm can also find the extreme points of curvature and color, which are feature points on the skin. This method can locate skin anomalies.

[0271] • Time-varying change detection. This tool quantifies changes in skin color, texture, roughness, local shape, and other applicable measurements.

[0272] Surface reconstruction process. Now refer to... Figure 30 The figure depicts a flowchart illustrating an exemplary surface reconstruction process in an exemplary embodiment of the system and method, and further explains... Figure 1 The operation in block 125. In 30001, the system can, for example... Figure 29 The process shown generates point cloud 30001. The system can then perform a process in 30002 to merge multiple point clouds to generate a merged point cloud in 30003. From the merged point cloud, the system can perform tetrahedral mesh generation (TetMeshGeneration) 30004 (a known process for creating meshes on arbitrary 3D volumes with tetrahedral elements) and surface mesh generation 30005. After performing surface mesh generation, the system can perform topology denoising in 30006, removing pseudo-handles by computing the basic group generator of the surface (i.e., handle loops and channel loops). The output from tetrahedral mesh generation in 30004 can be input into the topology denoising process 30006 to compute handle and channel loops using persistent homology methods. Afterwards, the system can perform geometric denoising 30007 on the product of the topology denoising process 30006. Next, conformal parameterization 30008 can be performed, followed by Delaunay triangularization 30009 or centroidal Voronoi tessellation 30010. Finally, a high-quality triangular mesh can be generated at 30011.

[0273] Shape analysis process. Now refer to... Figure 31, Figure 31 A flowchart illustrating an exemplary shape analysis process in an exemplary embodiment of the system and method is depicted, and further illustrates... Figure 1 Block 130 in the middle. At 31001, a triangular mesh can be entered, which can be obtained by, for example... Figure 30 The process is outlined in the section on generation. The next conformal mapping (31002) can be performed on a triangular mesh. The conformal mapping algorithm can be applied to conformally map a surface (from a triangular mesh) to a canonical planar domain, such as mapping a human facial surface to a unit disk or torus. In this step, the region distortion factor can be considered as a probability density. The conformal flattening of the 3D shape to the plane can be found using the Ricci flow algorithm, which deforms the Riemannian metric proportionally to the current curvature. The optimal transfer mapping (31003) can then be performed. Since conformal mapping introduces region distortion while preserving angles, and optimal transfer mapping introduces angular distortion while preserving regions, the two processes are complementary, and combining them can provide more accurate feature extraction. The optimal transfer mapping between the region distortion factor and the Lebesgue measure is calculated. The cost of the optimal transfer mapping is an important metric between shapes and can also be used for shape classification and analysis.

[0274] Following this step, the system performs geometric feature extraction 31004, thereby extracting geometric features 31005. Simultaneously, the system receives a texture image 31006 and performs image feature extraction 31007 to obtain image features 31008. Image features 31008 can be refined and / or otherwise assisted at 31009 using various other techniques, such as segmentation, SIFT features, feature point extraction, face detection, and melanoma detection. The system can then apply a Tekmoller-based mapping / optimal transfer process 31010 to both the geometric features 31005 from the triangular mesh 31001 and the image features 31008 from the texture image 31006. Next, the system performs dynamic shape tracking 31011. Finally, at 31012, the system can perform image analysis and / or use the processed information in real-time tracking applications.

[0275] Applications. The novel systems described in the embodiments of this invention can be used as platform technologies that can unlock transformative innovations in a wide range of application areas, including healthcare (e.g., dermatology, orthodontics, plastic surgery, and radiation therapy); cosmetics and skincare; film and gaming (e.g., virtual reality and augmented reality); engineering and manufacturing; and security and law enforcement.

[0276] In the medical field, early detection of melanoma can save lives and improve treatment outcomes by reducing the risk of cancer in other parts of the body. Early detection and treatment of non-melanoma skin cancer can minimize disfigurement and improve the quality of life and productivity of many patients. The automated sequential image analysis software described in exemplary embodiments of the present invention can provide dermatologists with powerful tools to make informed clinical decisions. Therefore, the number of unnecessary biopsies currently being performed could be significantly reduced due to the inefficiency and ineffectiveness of existing skin examination methods (i.e., 2D imaging or naked-eye visual examination). By optimizing the skin assessment process, saving physicians time, reducing overall care costs, and facilitating remote dermatology services, the technology described in exemplary embodiments of the present invention will make skin cancer screening and early detection more affordable for all patients.

[0277] The real-time monitoring solution using the high-performance 3D imaging system described in the exemplary embodiments can also be applied to accurately monitor patient location during radiotherapy, ensuring patient safety and effective treatment. By eliminating the need to track patient location using X-rays (i.e., vehicle-mounted kV and CBCT imaging), existing technologies minimize radiation exposure, thereby improving the health and well-being of patients undergoing cancer treatment. It also eliminates the burden and stress on therapists who often have to watch patients' body movements via video monitors without our technology.

[0278] Current 3D image analysis software offers highly accurate and efficient automated position tracking solutions, thereby improving the productivity of patient cancer care teams. For example, in the case of melanoma detection, the system described in the exemplary embodiments can be used to scan a patient's face at different times of the year and compare the results using computational algorithms. The skin can be screened at millimeter resolution to locate abnormalities. Dermatologists may then further examine suspicious areas of the patient to make informed medical decisions. Compared to traditional diagnostic procedures, the process implemented by the exemplary embodiments of the present invention significantly reduces time and cost while increasing accuracy.

[0279] The system described in the exemplary embodiments of the present invention can also be applied to dentists to compare soft tissue deformities caused by orthodontic surgery and can provide doctors with actionable information for designing customized treatment plans. It can effectively and accurately monitor surgical outcomes and make timely and appropriate adjustments to treatment. Patients will achieve better results from surgery while minimizing the risk of deformities or other significant side effects. Clinicians will benefit from increased productivity and job satisfaction by providing high-quality care to patients.

[0280] Traditional X-ray imaging can only capture the shape of teeth and bones, but cannot measure deformations of soft tissues, such as human facial skin. The system described by exemplary embodiments of the present invention can capture facial shape before and after orthodontic surgery, and the software can register and accurately compare the surfaces. Dentists will be able to adjust their procedures based on the measurements of deformation.

[0281] The system described in the exemplary embodiments of the present invention can also be used in plastic surgery applications. It can help doctors evaluate surgical outcomes and develop informed surgical plans for comparison by accurately recording the 3D shape of the patient's face. Furthermore, the system described in the exemplary embodiments of the present invention is capable of capturing dynamic facial expressions, which helps in detecting details of specific facial muscle movements. For example, this feature can be used to evaluate results and facilitate the injection of botulinum toxin.

[0282] Games and movies. The dynamic human facial geometry and textures obtained through exemplary embodiments of the present invention can be applied to the computer game and film industries. Facial expression capture is one of the most challenging tasks in animation. The dynamic geometric data captured by the system can help overcome this challenge.

[0283] One of the bottlenecks in realizing large-scale virtual reality and augmented reality is content generation. Currently, most animations are generated manually by animators. The invention described herein can capture dynamic VR content more directly than traditional methods.

[0284] Security. Integrating the techniques described in exemplary embodiments of the present invention into facial recognition applications has enormous potential in providing much-needed security solutions. Compared to 2D format ID photos, 3D facial recognition and advanced real-time dynamic 3D facial recognition using the techniques described herein will offer greater accuracy and reliability. The invention described herein can aid in facial data collection and can be used for homeland security purposes in public transportation systems such as airports, train stations, subways, and ferries. It can also be used in driver's licenses, social security, passports, and banking systems.

[0285] Although several embodiments have been disclosed, it should be understood that these embodiments are not mutually exclusive.

[0286] The general aspects of implementing the systems and methods of the present invention will be described below.

[0287] The system of the present invention, or parts thereof, may be in the form of a "processing machine," such as a general-purpose computer. As used herein, the term "processing machine" should be understood to include at least one processor using at least one memory. The at least one memory stores a set of instructions. The instructions may be stored permanently or temporarily in the processor's memory. The processor executes the instructions stored in the memory to process data. The instruction set may include various instructions for performing specific tasks (such as those described above). Such a set of instructions for performing a specific task may be characterized as a program, a software program, or simple software.

[0288] In one embodiment, the processing machine may be a dedicated processor.

[0289] As described above, the processing machine executes instructions stored in memory to process data. This processing of data can be, for example, in response to commands from one or more users of the processor, in response to previous processing, in response to a request from another processor, and / or any other input.

[0290] As described above, the processing machine used to implement the present invention can be a general-purpose computer. However, the aforementioned processing machine can also use any of a variety of other technologies, including one or more graphics processing units (GPUs), dedicated computers, computer systems (e.g., including microcomputers, minicomputers or mainframes, programmable microprocessors, microcontrollers), peripheral integrated circuit elements, CSICs (customer application-specific integrated circuits), or ASICs (application-specific integrated circuits) or other integrated circuits, logic circuits, digital signal processors, programmable logic devices (e.g., FPGAs, PLDs, PLAs, or PALs), or any other means or arrangement of means capable of implementing the steps of the methods of the present invention.

[0291] The processing machine used to implement the present invention can utilize a suitable operating system. Therefore, embodiments of the present invention may include a processing machine running an iOS operating system, an OS X operating system, an Android operating system, or Microsoft Windows. TM Operating systems, Unix operating systems, Linux operating systems, Xenix operating systems, IBM AIX TM Operating system, Hewlett-Packard UX TM Operating system, Novell Netware TM Operating system, Sun Microsystems (Solaris) TM Operating system, OS / 2 TM Operating system, BeOS TM Operating systems, Macintosh operating system, Apache operating system, OpenStepTM Operating system or another operating system or platform.

[0292] It should be understood that, in order to practice the method of the present invention as described above, the processor and / or the processor's memory need not be physically located in the same geographical location. That is, each processor and memory used by the processing machine can be located in geographically different locations and can communicate in any suitable manner. Furthermore, it should be understood that each processor and / or memory can consist of different physical devices. Therefore, the processor need not be a device in one location, and the memory need not be another device in another location. That is, it is conceivable that the processor can be two devices located in two different physical locations. The two different devices can be connected in any suitable manner. Moreover, the memory can include portions of two or more memories located in two or more physical locations.

[0293] To further explain, as described above, the processing is performed by various components and various memories. However, it should be understood that, according to another embodiment of the invention, a process performed by two different components as described above can be performed by a single component. Furthermore, a process performed by one different component as described above can be performed by two different components. Similarly, according to another embodiment of the invention, memory storage performed by two different memory portions as described above can be performed by a single memory portion. Furthermore, memory storage performed by one different memory portion as described above can be performed by two memory portions.

[0294] Furthermore, various technologies can be used to provide communication between various processors and / or memories, and to allow the processors and / or memories of this invention to communicate with any other entity; i.e., for example, to obtain further instructions or to access and use remote memory storage. Such technologies for providing this communication may include, for example, networks, the Internet, intranets, extranets, local area networks, Ethernet, wireless communications via cellular towers or satellites, or any client-server system providing communication. Such communication technologies can use any suitable protocol, such as TCP / IP, UDP, or OSI.

[0295] As described above, an instruction set can be used for the processing of this invention. The instruction set can be in the form of a program or software. The software can be in the form of system software or application software. For example, the software can also be a collection of individual programs, a program module within a larger program, or part of a program module. The software used can also include modular programming in the form of object-oriented programming. The software tells the processing machine how to process the data being processed.

[0296] Furthermore, it should be understood that the instructions or instruction sets used in the implementation and operation of this invention can be in a suitable form that allows a processor to read the instructions. For example, the instructions forming a program can be in the form of a suitable programming language, which is translated into machine language or object code to allow one or more processors to read the instructions. That is, a compiler, assembler, or interpreter is used to translate programming code or source code written in a specific programming language into machine language. Machine language is a binary-coded machine instruction specific to a specific type of processing machine (e.g., a specific type of computer). Computers can understand machine language.

[0297] According to various embodiments of the present invention, any suitable programming language can be used. For example, the programming languages ​​used may include assembly language, Ada, APL, Basic, C, C++, Python, COBOL, dBase, Forth, Fortran, Java, Modula-2, Pascal, Prolog, REXX, Visual Basic, and / or JavaScript. Furthermore, it is not necessary to use a single type of instruction or a single programming language in conjunction with the operation of the systems and methods of the present invention. Instead, any number of different programming languages ​​can be used as needed and / or desired. These programs can also use special libraries such as OpenGL, CUDA, Qt, OpenCV, TensorFlow, and PyTorch.

[0298] Those skilled in the art will readily understand that this invention is readily applicable and widely used. Many embodiments and modifications of the invention (other than those described herein), as well as numerous variations, modifications, and equivalent arrangements, will be apparent or reasonably suggested from the invention and its foregoing description without departing from the spirit or scope of the invention.

[0299] Although embodiments of the invention have been described herein in the context of a particular implementation in a particular environment for a particular purpose, those skilled in the art will recognize that its use is not limited thereto, and that embodiments of the invention may be advantageously implemented in other relevant environments for similar purposes.

Claims

1. A computer-implemented system for 3D scanning, comprising: A projector configured to project structured light onto a three-dimensional object; A camera configured to capture striped images of the object, each image comprising multiple pixels; A processor configured to process the stripe image to extract a phase map and a texture image, wherein the texture image is determined for each pixel as the sum of the average intensity value of the pixel and the modulation value of the pixel; thereby calculating depth information from the phase map and performing 3D surface reconstruction based on the depth information and the texture image; The processor generates the phase map based on considerations of the intensity bias component of the stripe image, the modulation component of the stripe image, and the unfolded phase of the stripe image. Wherein, the modulation value of each pixel indicates the quality value of the pixel; and wherein, the processor determines the unfolded phase of each pixel using a quality-guided path tracing algorithm by repeatedly executing the following steps: Select the first pixel; Determine the wrapping phase Φ(x,y) of the first pixel; Add the pixels adjacent to the first pixel to the priority queue; Select the second pixel with the highest quality value from the priority queue; Find the neighboring pixels of the second pixel; Unfold the phase of the adjacent pixels; and The adjacent pixels are placed into the priority queue.

2. The system of claim 1, wherein the structured light comprises a plurality of phase lines, and each phase line is distorted into a curve on the three-dimensional object.

3. The system of claim 1, wherein the image is captured by a first camera and a second camera, the first camera being used to capture a striped image of the three-dimensional object, and the second camera being used to capture a color texture image of the object.

4. The system of claim 3, wherein the exposure periods of the first camera and the second camera are synchronized; and wherein, The first camera is triggered to capture an image during each shutdown cycle, and the second camera is triggered to capture an image every three shutdown cycles.

5. The system of claim 1, wherein a quality map and a mask of the facial skin region are generated from the texture image, and the quality map and the mask are input by the processor into a phase unwrapping algorithm to determine the unwrapped phase.

6. The system according to claim 1, wherein the processor converts the world coordinates of the point into camera coordinates, transforms the camera coordinates into camera projection coordinates, and transforms the camera projection coordinates into distorted camera projection coordinates.

7. The system of claim 1, wherein the processor generates at least one point cloud based on the depth information, and the processor processes the point cloud to form a triangular mesh; wherein the processor performs conformal geometry methods for image and shape analysis and real-time tracking applications.

8. The system of claim 7, wherein the processor is further configured to use environmental, modulation, and projector parameters to estimate surface normal information during the generation of at least one point cloud.

9. The system of claim 1, wherein images are captured from two different perspectives to obtain stereo depth information; wherein the processor is configured to use the stereo depth information as input for generating at least one point cloud.

10. The system of claim 1, wherein a first stripe image is captured for the first time and used by the processor to perform a first 3D surface reconstruction, a second stripe image is captured for the second time and used by the processor to perform a second 3D reconstruction, and the first 3D surface reconstruction and the second 3D reconstruction are registered for comparison.

11. The system of claim 10, wherein texture features and geometric features are extracted from the first stripe image and the second stripe image.

12. The system of claim 1, wherein the processor is further configured to model the phase-height mapping as a polynomial function at each pixel of the camera, wherein the processor is further configured to estimate the coefficients of the polynomial during the camera-projector calibration process using an optimization algorithm.

13. A computer-implemented method for 3D scanning, comprising: Structured light is projected onto a three-dimensional object using a projector; The object is captured by a camera in striped images, each striped image comprising multiple pixels; The stripe image is processed by a processor to extract a phase map and a texture image, wherein the processor generates the phase map based on considerations of the intensity bias component of the stripe image, the modulation component of the stripe image, and the unfolded phase of the stripe image. The texture image is determined for each pixel as the sum of the pixel's intensity value and the pixel's modulation value; The modulation value of each pixel indicates the quality value of that pixel; Furthermore, the processor determines the unfolded phase of each pixel using a quality-guided path tracing algorithm by repeatedly executing the following steps: Select the first pixel; Determine the wrapping phase Φ(x,y) of the first pixel; Add the pixels adjacent to the first pixel to the priority queue; Select the second pixel with the highest quality value from the priority queue; Find the neighboring pixels of the second pixel; Unfold the phase of the adjacent pixels; and The adjacent pixels are placed into the priority queue; The processor calculates depth information from the phase map; and 3D surface reconstruction is performed based on the depth information and the texture image.

14. The method of claim 13, wherein the image is captured by a first camera and a second camera, the first camera being used to capture the stripe image of the three-dimensional object, and the second camera being used to capture a color texture image of the object.

15. The method of claim 14, wherein the first camera is triggered to capture an image in each shutdown cycle, and the second camera is triggered to capture an image every three shutdown cycles.

16. The method of claim 13, wherein the processor converts the world coordinates of the point into distorted camera projection coordinates.

17. The method according to claim 13, wherein, Images are captured from two different perspectives to obtain stereo depth information; wherein the processor uses the stereo depth information as input to generate at least one point cloud.

18. The method according to claim 13, wherein, The first stripe image is captured for the first time and used to perform a first 3D surface reconstruction, the second stripe image is captured for the second time and used to perform a second 3D reconstruction, and the first 3D surface reconstruction and the second 3D reconstruction are registered.

19. A computer-based method for 3D scanning, comprising: Structured light is projected onto a three-dimensional object using a projector; The object is captured by a camera in striped images, each striped image comprising multiple pixels; The stripe image is processed by a processor to extract a phase map and a texture image, wherein the texture image is determined for each pixel as the sum of the intensity value of that pixel and the modulation value of that pixel; The modulation value of each pixel indicates the quality value of that pixel; Furthermore, the processor determines the unfolded phase of each pixel using a quality-guided path tracing algorithm by repeatedly executing the following steps: Select the first pixel; Determine the wrapping phase Φ(x,y) of the first pixel; Add the pixels adjacent to the first pixel to the priority queue; Select the second pixel with the highest quality value from the priority queue; Find the neighboring pixels of the second pixel; Unfold the phase of the adjacent pixels; and The adjacent pixels are placed into the priority queue; The processor calculates depth information from the phase map; and 3D surface reconstruction is performed based on the depth information and the texture image; wherein the processor generates at least one point cloud based on the depth information, and the processor processes the point cloud to form a triangular mesh; wherein the processor executes conformal geometry methods for image and shape analysis and real-time tracking applications.