Three-dimensional reconstruction method and system based on phase and polarization information fusion
By acquiring single-frame fringe intensity and polarization images of the target object, information fusion and ordinal regression are performed using a cascaded spectral network. Combined with a differentiable phase-normal physical operator, the problem of low measurement accuracy of single-frame fringe projection in dynamic scenes is solved, achieving highly robust and high-precision 3D reconstruction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- EAST CHINA JIAOTONG UNIVERSITY
- Filing Date
- 2026-04-17
- Publication Date
- 2026-06-26
Smart Images

Figure CN122049255B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the fields of optical 3D measurement and computer vision technology, and in particular to a 3D reconstruction method and system based on the fusion of phase and polarization information. Background Technology
[0002] High-precision, high-speed 3D reconstruction technology is a key foundation for fields such as industrial inspection, robot vision, and intelligent manufacturing. Fringe projection profilometry (FPP), as a non-contact, full-field, high-precision optical 3D measurement method, is widely used. However, traditional multi-frame phase-shift based FPP technology requires the object under test to remain absolutely still during multiple projections and acquisitions, which severely limits its application in measuring dynamic scenes or fast-moving objects.
[0003] Currently, to recover absolute phase from a single image, mainstream deep learning-based solutions mainly rely on composite coding strategies, two-stage unwrapping, or low-frequency avoidance techniques. However, these solutions all have systemic flaws. While composite coding strategies attempt to embed multiple information within a single pattern, they are highly susceptible to spectral aliasing caused by object deformation or crosstalk between channels due to the influence of object surface color. This not only introduces significant phase errors but also limits real-time performance due to its complex signal separation process. The widely adopted two-stage unwrapping method, with its serial structure of first predicting the wrapped phase and then calculating the fringe order, causes local errors in the noisy region in the first stage to be amplified sharply in the second stage, often leading to global order jumps and poor reconstruction robustness. Some other methods directly use low-frequency fringes to avoid unwrapping, which simplifies the process but sacrifices measurement sensitivity and detail accuracy, making it difficult to meet the requirements of high-precision 3D reconstruction.
[0004] In summary, existing single-frame FPP techniques generally face a dilemma between "coding complexity," "error accumulation," and "accuracy compromise." It is difficult to achieve robust and high-precision absolute phase recovery using only a single ordinary high-frequency sine fringe pattern (to balance measurement sensitivity and versatility). Therefore, there is an urgent need for a novel single-snapshot 3D reconstruction method that can fundamentally avoid complex coding and two-stage error propagation while fully utilizing the advantages of high-frequency fringes. Summary of the Invention
[0005] To address the issues of low measurement accuracy and efficiency caused by phase ambiguity in single-frame fringe projection, this disclosure proposes a 3D reconstruction method based on the fusion of phase and polarization information to solve these problems.
[0006] According to one aspect of this disclosure, a three-dimensional reconstruction method based on the fusion of phase and polarization information is provided, including:
[0007] S10. Acquire a single-frame fringe intensity image and a polarization image of the target object, wherein the single-frame fringe intensity image and the polarization image are obtained by projecting a single-frame sinusoidal fringe pattern onto the target object using a projector and simultaneously acquiring the image using a polarization camera.
[0008] S20. Obtain a polarization parameter map based on the polarization image. Perform joint encoding and multi-scale fusion on the single fringe intensity image and polarization parameter map through a cascaded spectrum network to obtain the numerator and denominator predicted values of the wrapping phase. Predict the fringe order information through ordinal regression.
[0009] S30. Based on the predicted values of the numerator and denominator, the wrapping phase is calculated. The wrapping phase and the stripe order information are subjected to phase expansion operation to obtain an absolute phase map. The absolute phase map is mapped to a surface normal map by a preset differentiable phase-normal physical operator.
[0010] S40. Based on the absolute phase map and the relative pose parameters of the polarization camera and the projector, calculate the initial three-dimensional point cloud of the target object according to the principle of triangulation, optimize the initial three-dimensional point cloud using the surface normal map, and obtain the three-dimensional reconstruction result of the target object.
[0011] Preferably, the cascaded spectrum network is trained in the following manner:
[0012] Obtain a mixed dataset containing both real and synthetic data;
[0013] Construct the initial cascaded spectrum network;
[0014] Design a composite loss function based on the consistency relationship between the absolute phase truth value, numerator truth value, denominator truth value and stripe level sub-truth value in the hybrid dataset;
[0015] The initial cascaded spectral network is trained end-to-end using the hybrid dataset and the composite loss function.
[0016] Preferably, the cascaded spectrum network includes:
[0017] Stripe coding module, polarization coding module, feature fusion and downsampling module, decoding and recovery module, numerator prediction head, denominator prediction head and stripe order prediction head;
[0018] The stripe encoding module is used to encode the single stripe intensity image and extract image features;
[0019] The polarization encoding module is used to encode the polarization parameter map, extract polarization features, and fuse them with the image features extracted by the stripe encoding module;
[0020] The feature fusion and downsampling module is used to downsample and transform the fused features.
[0021] The decoding and recovery module is used to upsample the downsampled features to restore the resolution;
[0022] The numerator prediction head is used to output the numerator prediction value of the wrapped phase; the denominator prediction head is used to output the denominator prediction value of the wrapped phase.
[0023] The stripe level prediction head is used to output logical values corresponding to multiple preset thresholds for ordinal regression.
[0024] Preferably, the composite loss function is expressed as:
[0025] ,
[0026] In the formula, For supervised loss in numerator and denominator, The loss is the number of stripe-level ordinal regressions. For edge-aware smoothing loss, For the normal consistency loss based on differential geometry, and These are the gradient consistency loss and Laplacian consistency loss for absolute phase, respectively.
[0027] Preferably, the stripe-level order number regression loss is expressed as:
[0028] ,
[0029] In the formula, It is the Sigmoid activation function. The corresponding threshold output by the cascaded spectrum network t logical value, For true stripe levels, Effective pixel count Maximum value of the order, It is the minimum value of the order. i , j For pixel index.
[0030] Preferably, the edge-aware smoothing loss is expressed as:
[0031] ,
[0032] In the formula, The spatial gradient of the soft stripe order predicted by the network. The gradient of the phase is used to wrap the phase. To control the hyperparameters of edge sensitivity, For the set of valid pixels,N This represents the total number of valid pixels.
[0033] Preferably, the normal consistency loss based on differential geometry is constructed in the following manner:
[0034] By using a preset differentiable phase-normal physical operator, the predicted absolute phase map is mapped to the surface normal estimate, and the cosine similarity between the surface normal estimate and the true surface normal value is used as a consistency constraint.
[0035] The normal consistency loss based on differential geometry is expressed as:
[0036] ,
[0037] In the formula, The surface normal vector is obtained by processing the predicted absolute phase map using the preset differentiable phase-normal physical operator. Let be the true vector of the surface normal.
[0038] According to one aspect of this disclosure, a three-dimensional reconstruction system based on the fusion of phase and polarization information is provided, comprising:
[0039] The image acquisition module acquires a single-frame fringe intensity image and a polarization image of the target object. The single-frame fringe intensity image and polarization image are obtained by projecting a single sinusoidal fringe pattern onto the target object using a projector and simultaneously acquiring the image using a polarization camera.
[0040] The phase information prediction module obtains a polarization parameter map based on the polarization image, performs joint encoding and multi-scale fusion of the single fringe intensity image and polarization parameter map through a cascaded spectrum network, obtains the numerator and denominator prediction values of the phase, and predicts the fringe order information through ordinal regression.
[0041] The phase processing and normal mapping module calculates the wrapping phase based on the numerator and denominator predicted values, performs phase expansion operation on the wrapping phase and the fringe order information to obtain an absolute phase map, and maps the absolute phase map to a surface normal map through a preset differentiable phase-normal physical operator.
[0042] The 3D reconstruction module calculates the initial 3D point cloud of the target object based on the absolute phase map and the relative pose parameters of the polarization camera and the projector according to the principle of triangulation. It then optimizes the initial 3D point cloud using the surface normal map to obtain the 3D reconstruction result of the target object.
[0043] According to one aspect of this disclosure, an electronic device is provided, comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to: perform the above-described three-dimensional reconstruction method based on the fusion of phase and polarization information.
[0044] According to one aspect of this disclosure, a computer-readable storage medium is provided that stores a computer program / instructions and a bit stream thereon, wherein the computer program / instructions, when executed by a processor, implement the aforementioned three-dimensional reconstruction method based on the fusion of phase and polarization information to generate the bit stream.
[0045] Compared to the prior art, the beneficial effects of this disclosure are as follows:
[0046] 1) This disclosure achieves robust prediction of the wrapping phase component and fringe order by cross-modal deep collaboration of fringe and polarization information, combined with frequency domain enhancement and cross-branch attention mechanism.
[0047] 2) This disclosure adopts ordinal regression-based order prediction and edge-aware smoothing constraint, which effectively suppresses order jump errors while maintaining the sharpness and geometric continuity of the reconstructed edges.
[0048] 3) This disclosure introduces a differentiable phase-normal physical operator and normal consistency constraint to transform phase information into geometrically valid surface normals, which significantly improves the geometric accuracy and detail integrity of the reconstruction results.
[0049] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure.
[0050] Other features and aspects of this disclosure will become clear from the following detailed description of exemplary embodiments with reference to the accompanying drawings. Attached Figure Description
[0051] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this disclosure and, together with the specification, serve to illustrate the technical solutions of this disclosure.
[0052] Figure 1 A flowchart of a three-dimensional reconstruction method based on the fusion of phase and polarization information according to an embodiment of this disclosure is shown;
[0053] Figure 2 A schematic diagram of the calibration system structure in an embodiment of this disclosure is shown;
[0054] Figure 3 A schematic diagram of the core architecture of the cascaded spectrum network in an embodiment of this disclosure is shown;
[0055] Figure 4A block diagram of a three-dimensional reconstruction system based on the fusion of phase and polarization information in an embodiment of this disclosure is shown. Detailed Implementation
[0056] Various exemplary embodiments, features, and aspects of this disclosure will now be described in detail with reference to the accompanying drawings. The same reference numerals in the drawings denote elements that have the same or similar functions. Although various aspects of the embodiments are shown in the drawings, they are not necessarily drawn to scale unless specifically indicated otherwise.
[0057] The term “exemplary” as used herein means “serving as an example, embodiment, or illustration.” Any embodiment illustrated herein as “exemplary” is not necessarily to be construed as superior to or better than other embodiments.
[0058] In this document, the term "and / or" is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent three cases: A alone, A and B simultaneously, and B alone. Furthermore, the term "at least one" in this document means any combination of at least two of any one or more elements. For example, including at least one of A, B, and C can mean including any one or more elements selected from the set consisting of A, B, and C.
[0059] Furthermore, to better illustrate this disclosure, numerous specific details are set forth in the following detailed description. Those skilled in the art will understand that this disclosure can be practiced without certain specific details. In some instances, methods, means, components, and circuits well known to those skilled in the art have not been described in detail in order to highlight the main points of this disclosure.
[0060] To make the objectives, technical solutions, and advantages of the embodiments of this disclosure clearer, the technical solutions of the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are some embodiments of this disclosure, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of this disclosure without creative effort are within the scope of protection of this disclosure.
[0061] Based on the above ideas, this disclosure proposes a three-dimensional reconstruction method based on the fusion of phase and polarization information. Figure 1 A flowchart illustrating a 3D reconstruction method based on the fusion of phase and polarization information is shown. The method includes:
[0062] S10. Acquire a single-frame fringe intensity image and a polarization image of the target object, wherein the single-frame fringe intensity image and the polarization image are obtained by projecting a single-frame sinusoidal fringe pattern onto the target object using a projector and simultaneously acquiring the image using a polarization camera.
[0063] S20. Obtain a polarization parameter map based on the polarization image. Perform joint encoding and multi-scale fusion on the single fringe intensity image and polarization parameter map through a cascaded spectrum network to obtain the numerator and denominator predicted values of the wrapping phase. Predict the fringe order information through ordinal regression.
[0064] S30. Based on the predicted values of the numerator and denominator, the wrapping phase is calculated. The wrapping phase and the stripe order information are subjected to phase expansion operation to obtain an absolute phase map. The absolute phase map is mapped to a surface normal map by a preset differentiable phase-normal physical operator.
[0065] S40. Based on the absolute phase map and the relative pose parameters of the polarization camera and the projector, calculate the initial three-dimensional point cloud of the target object according to the principle of triangulation, optimize the initial three-dimensional point cloud using the surface normal map, and obtain the three-dimensional reconstruction result of the target object.
[0066] The above embodiments construct a cross-modal differential collaborative neural network (cascaded spectral network) to fuse fringe intensity and polarization information at the same scale, and introduce frequency domain enhancement and cross-branch attention during the encoding-decoding process to achieve stable prediction of the wrapping phase numerator / denominator and fringe order K; combined with ordinal regression and edge-aware smoothing constraints, it effectively suppresses fringe order misjudgment and maintains edge sharpness; by applying normal consistency constraints through differentiable phase-normal physical operators, the reconstruction results are geometrically continuous and detailed, thereby improving the accuracy and robustness of single snapshot 3D reconstruction.
[0067] This disclosure further extends the above-described method with detailed possible implementations, specifically including:
[0068] S10. Acquire a single-frame fringe intensity image and a polarization image of the target object, wherein the single-frame fringe intensity image and the polarization image are obtained by projecting a single sinusoidal fringe pattern onto the target object using a projector and simultaneously acquiring the image using a polarization camera.
[0069] In one embodiment, a calibration system consisting of a DLP projector 20 and a polarization camera 30 is constructed. A schematic diagram of the calibration system structure is shown below. Figure 2 As shown. The polarization camera 30 is equipped with a micro-polarizer array, which can simultaneously capture a single fringe intensity image and... Polarization images at four polarization angles. Intrinsic parameters and relative pose parameters of camera 30 and projector 20 are obtained through calibration.
[0070] S20. Based on the polarization image, a polarization parameter map is obtained. The single fringe intensity image and the polarization parameter map are jointly encoded and fused at multiple scales through a cascaded spectrum network to obtain the numerator and denominator predicted values of the wrapping phase. The fringe order information is then predicted through ordinal regression.
[0071] In this embodiment, the cross-modal differential collaborative neural network is a cascaded spectrum network (CascadeSpectralNet), which adopts a hierarchical architecture of "same-scale fusion encoder - cross-branch frequency domain enhancement bottleneck - step-by-step decoding recovery - dual-head prediction". The cascaded spectrum network includes: stripe coding module, polarization coding module, feature fusion and downsampling module, bottleneck module, decoding recovery module, numerator prediction head, denominator prediction head and stripe order prediction head.
[0072] In the encoding stage, the cascaded spectral network uses depthwise separable convolution to extract local texture features and introduces frequency domain units (2D FFT → 1×1 hybrid → iFFT) on some channels to capture global periodic structures; stripe and polarization features are stitched and fused at half resolution scale before entering subsequent cascaded downsampling.
[0073] The stripe encoding module is used to encode the single stripe intensity image, extract local texture and global periodic information through depth-separable convolution and frequency domain units, and extract multi-scale stripe image features.
[0074] The polarization encoding module is used to receive the polarization parameter map calculated from the polarization image, extract polarization geometric clues through convolutional encoding, extract polarization features, and fuse them with the image features extracted by the stripe encoding module.
[0075] The feature fusion and downsampling module is used to perform spatial downsampling and deep feature extraction on the spliced cross-modal features to generate multi-scale semantic information.
[0076] The bottleneck module exchanges information between the spatial and spectral branches through efficient cross-branch attention, and further enhances robustness to large-scale fringe periods and occlusion regions by combining frequency domain enhancement.
[0077] The decoding and recovery module is used to upsample the downsampled features to restore the resolution.
[0078] The numerator prediction head is used to output the numerator prediction value of the wrapped phase; the denominator prediction head is used to output the denominator prediction value of the wrapped phase; the fringe order prediction head is used to output the logic value corresponding to multiple preset thresholds for ordinal regression, thereby reconstructing the absolute phase map and further obtaining the surface normal map through the differentiable phase-normal physical operator.
[0079] Figure 3This is a schematic diagram of the core architecture of the cascaded spectrum network in this embodiment, used to achieve joint encoding, cross-scale fusion, and key parameter prediction of a single fringe intensity image and a polarization parameter map, as detailed below:
[0080] The architecture employs a hierarchical design of "dual-modal coding - cross-branch enhancement - decoding and recovery - dual-head prediction," primarily comprising: a stripe coding module: performing full / half-resolution two-level coding on a single stripe intensity image, extracting local texture and global periodic features through depthwise separable convolutions and Fourier units, and outputting a multi-scale stripe feature map; a polarization map coding module: performing half-resolution coding on the polarization parameter map, extracting local geometric cues, and concatenating and fusing it at the same scale with the half-resolution stripe features to achieve dual-modal feature complementarity; a bottleneck module: integrating a bottleneck cross-attention mechanism and Fourier units to achieve interactive complementarity of spatial and spectral features, strengthening the capture of stripe periodic information, and improving robustness in complex scenes; and a decoding and recovery module: restoring feature resolution through "upsampling + skip connections," introducing multi-scale features from the coding stage, and preserving global structure and local details. Prediction Head Module: Head Module B outputs the predicted values of the numerator and denominator of the wrapped phase, which are then processed by atan2 to obtain the wrapped phase; Head Module A outputs the threshold Logits of the fringe level K, which are decoded to obtain Ksoft; Combining the wrapped phase and Ksoft, the 2π ambiguity is eliminated through the (phase + K × 2π) operation to reconstruct the absolute phase map, providing core data for 3D reconstruction.
[0081] The cross-modal differential cooperative neural network is obtained through the following steps:
[0082] A hybrid dataset containing real and synthetic data is obtained to construct an initial cascaded spectral network. A composite loss function is designed based on the consistency relationship between the absolute phase ground truth, numerator ground truth, denominator ground truth and stripe order ground truth in the hybrid dataset. The composite loss function includes at least numerator / denominator supervision loss, stripe order regression loss, edge-aware smoothing loss and normal consistency loss. The initial cascaded spectral network is then trained end-to-end using the hybrid dataset and the composite loss function to obtain the trained cascaded spectral network.
[0083] In this embodiment, obtaining a hybrid dataset containing real data and synthetic data includes:
[0084] Acquire a single-frame fringe intensity image and polarization image containing object 10, as well as the corresponding true data of absolute phase, surface normal, and depth.
[0085] A virtual scene is constructed using a physically based rendering engine (such as Mitsuba). Various material parameters and lighting conditions are set in the virtual scene to simulate the imaging process of the projector 20 and the polarization camera 30, generating synthetic data that includes simulated single-frame fringe intensity images, polarization images, and corresponding absolute phase true values, surface normal true values, and depth true values.
[0086] Real and synthetic data are mixed in a preset ratio, and data augmentation is performed on the mixed image data to obtain a hybrid dataset for training cross-modal differential collaborative neural networks, thereby improving the generalization ability of the model.
[0087] Furthermore, the composite loss function is expressed as:
[0088] ,
[0089] In the formula, For supervised loss in numerator and denominator, The loss is the number of stripe-level ordinal regressions. For edge-aware smoothing loss, For the normal consistency loss based on differential geometry, and These are the gradient consistency loss and Laplacian consistency loss for absolute phase, respectively.
[0090] The stripe-level ordinal number regression loss is obtained in the following way:
[0091] Based on the multiple threshold comparison information output by the cascaded spectral network, for a set of preset order thresholds, the relationship loss between the true stripe order at each image location and each of the preset order thresholds is calculated using a binary cross-entropy function. The average of the relationship losses for all valid image locations and all preset order thresholds is then summed and expressed as follows:
[0092] ,
[0093] In the formula, It is the Sigmoid activation function. The corresponding threshold output by the cascaded spectrum network t logical value, For true stripe levels, Effective pixel count The maximum value of the order. It is the minimum value of the order. i , j For pixel index.
[0094] Preferably, the edge-aware smoothing loss applies a weighted constraint to the spatial gradient of the soft stripe order, with the weights adaptively generated by the adjacent differences of the wrapping phase, expressed as:
[0095] ,
[0096] In the formula, The spatial gradient of the soft stripe order predicted by the network. The gradient of the phase is used to wrap the phase. To control the hyperparameters of edge sensitivity, For the set of valid pixels, N This represents the total number of effective pixels. The loss function reduces weights at points where phase transitions occur and forces the stripe order to remain continuous in smooth regions.
[0097] The differential geometry-based normal consistency loss is constructed as follows:
[0098] By using a preset differentiable phase-normal physical operator, the predicted absolute phase map is mapped to the surface normal estimate, and the cosine similarity between the surface normal estimate and the true surface normal value is used as a consistency constraint.
[0099] The normal consistency loss based on differential geometry is expressed as:
[0100] ,
[0101] In the formula, The surface normal vector is obtained by processing the predicted absolute phase map using the preset differentiable phase-normal physical operator. Let be the true vector of the surface normal.
[0102] S30. Based on the predicted values of the numerator and denominator, the wrapped phase is calculated. The wrapped phase and the stripe order information are subjected to phase expansion operation to obtain an absolute phase map. The absolute phase map is mapped to a surface normal map by a preset differentiable phase-normal physical operator.
[0103] In this embodiment, the single-stripe intensity image and polarization parameter map are jointly encoded and fused at multiple scales through a cascaded spectrum network to output the numerator prediction, denominator prediction, and ordinal regression prediction of the stripe order information of the wrapped phase. The wrapped phase is calculated based on the numerator and denominator prediction values, and the wrapped phase and the stripe order information are subjected to phase expansion operation to reconstruct the absolute phase map. Furthermore, the absolute phase map is mapped to a surface normal map through a preset differentiable phase-normal physical operator.
[0104] S40. Based on the absolute phase map and the relative pose parameters of the polarization camera and the projector, calculate the initial three-dimensional point cloud of the target object according to the principle of triangulation, optimize the initial three-dimensional point cloud using the surface normal map, and obtain the three-dimensional reconstruction result of the target object.
[0105] In this embodiment, the initial three-dimensional point cloud is optimized using the surface normal map. The optimization includes at least one of smoothing, noise suppression, and detail enhancement. The optimized three-dimensional point cloud is then reconstructed by meshing to obtain the three-dimensional reconstruction result of the target object 10.
[0106] This disclosure constructs a cascaded spectral network and introduces frequency domain enhancement and cross-branch attention to achieve deep fusion of the global periodic features of a single fringe intensity image and the structural features of a polarization image. Based on this, a collaborative loss function based on physical consistency and a hybrid dataset training strategy are proposed. This allows the cascaded spectral network to achieve joint optimization during training through phase-normal mapping and ordinal regression constraints, significantly improving its generalization ability and reconstruction robustness under complex surfaces, varying materials, and different lighting conditions. Furthermore, by combining the calibration parameters of the polarization camera 30 and the projector 20 with the triangulation principle, the absolute phase map and surface normal map are jointly used for 3D point cloud reconstruction, achieving end-to-end reconstruction from a single snapshot input to high-precision 3D topography output. This significantly improves dynamic scene adaptability while ensuring measurement accuracy.
[0107] As another aspect of the embodiments of this disclosure, a three-dimensional reconstruction system 100 based on the fusion of phase and polarization information is also provided, such as Figure 4 As shown, it includes:
[0108] Image acquisition module 1 acquires a single-frame fringe intensity image and a polarization image of the target object. The single-frame fringe intensity image and polarization image are obtained by projecting a single sinusoidal fringe pattern onto the target object using a projector and simultaneously acquiring the image using a polarization camera.
[0109] Phase information prediction module 2 obtains a polarization parameter map based on the polarization image, performs joint encoding and multi-scale fusion of the single stripe intensity image and polarization parameter map through a cascaded spectrum network, obtains the numerator and denominator prediction values of the wrapped phase, and predicts the stripe order information through ordinal regression.
[0110] Phase processing and normal mapping module 3 calculates the wrapping phase based on the numerator and denominator predicted values, performs phase expansion operation on the wrapping phase and the fringe order information to obtain an absolute phase map, and maps the absolute phase map to a surface normal map through a preset differentiable phase-normal physical operator.
[0111] The 3D reconstruction module 4 calculates the initial 3D point cloud of the target object based on the absolute phase map and the relative pose parameters of the polarization camera and the projector according to the principle of triangulation, and optimizes the initial 3D point cloud using the surface normal map to obtain the 3D reconstruction result of the target object.
[0112] Without causing contradictions, the above-described modules in the system of the present disclosure embodiments can implement any of the above-described methods.
[0113] Based on the description of the above embodiments, it can be seen that the embodiments of this disclosure can achieve the following technical effects:
[0114] 1) The embodiments of this disclosure achieve highly robust prediction of the wrapping phase component and fringe order by cross-modal depth synergy of fringe and polarization information, combined with frequency domain enhancement and cross-branch attention mechanism.
[0115] 2) The embodiments of this disclosure employ ordinal regression-based order prediction and edge-aware smoothing constraints, which effectively suppress order jump errors while maintaining the sharpness and geometric continuity of the reconstructed edges.
[0116] 3) The embodiments of this disclosure introduce a differentiable phase-normal physical operator and normal consistency constraint, which transforms phase information into geometrically effective surface normals, significantly improving the geometric accuracy and detail integrity of the reconstruction results.
[0117] This disclosure also proposes an electronic device, including: a processor; and a memory for storing processor-executable instructions; wherein the processor is configured to perform the aforementioned three-dimensional reconstruction method based on phase and polarization information fusion. The electronic device can be provided as a terminal, a server, or other type of device.
[0118] This disclosure also proposes a computer-readable storage medium storing a computer program / instructions and a bitstream thereon. When the computer program / instructions are executed by a processor, they implement the aforementioned three-dimensional reconstruction method based on the fusion of phase and polarization information to generate the bitstream. The computer-readable storage medium can be a non-volatile computer-readable storage medium.
[0119] Those skilled in the art will understand that, in the above-described three-dimensional reconstruction method and system based on the fusion of phase and polarization information in specific embodiments, the order in which each step is written does not imply a strict execution order and does not constitute any limitation on the implementation process. The specific execution order of each step should be determined by its function and possible internal logic.
[0120] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction containing one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than those shown in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.
[0121] The various embodiments of this disclosure have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical applications, or technical improvements to the technology in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.
Claims
1. A three-dimensional reconstruction method based on the fusion of phase and polarization information, characterized in that, include: S10. Acquire a single-frame fringe intensity image and a polarization image of the target object, wherein the single-frame fringe intensity image and the polarization image are obtained by projecting a single-frame sinusoidal fringe pattern onto the target object using a projector and simultaneously acquiring the image using a polarization camera. S20. Obtain a polarization parameter map based on the polarization image. Perform joint encoding and multi-scale fusion on the single fringe intensity image and polarization parameter map through a cascaded spectrum network to obtain the numerator and denominator predicted values of the wrapping phase. Predict the fringe order information through ordinal regression. The cascaded spectrum network is trained in the following way: Obtain a mixed dataset containing both real and synthetic data; Construct the initial cascaded spectrum network; Design a composite loss function based on the consistency relationship between the absolute phase truth value, numerator truth value, denominator truth value and stripe level sub-truth value in the hybrid dataset; The initial cascaded spectral network is trained end-to-end using the hybrid dataset and the composite loss function; The cascaded spectrum network includes: Stripe coding module, polarization coding module, feature fusion and downsampling module, decoding and recovery module, numerator prediction head, denominator prediction head and stripe order prediction head; The stripe encoding module is used to encode the single stripe intensity image and extract image features; The polarization encoding module is used to encode the polarization parameter map, extract polarization features, and fuse them with the image features extracted by the stripe encoding module; The feature fusion and downsampling module is used to downsample and transform the fused features. The decoding and recovery module is used to upsample the downsampled features to restore the resolution; The numerator prediction head is used to output the numerator prediction value of the wrapped phase; the denominator prediction head is used to output the denominator prediction value of the wrapped phase. The stripe level prediction head is used to output logical values corresponding to multiple preset thresholds for ordinal regression. S30. Based on the predicted values of the numerator and denominator, the wrapping phase is calculated. The wrapping phase and the stripe order information are subjected to phase expansion operation to obtain an absolute phase map. The absolute phase map is mapped to a surface normal map by a preset differentiable phase-normal physical operator. S40. Based on the absolute phase map and the relative pose parameters of the polarization camera and the projector, calculate the initial three-dimensional point cloud of the target object according to the principle of triangulation, optimize the initial three-dimensional point cloud using the surface normal map, and obtain the three-dimensional reconstruction result of the target object.
2. The method according to claim 1, characterized in that, The composite loss function is expressed as follows: , In the formula, For supervised loss in numerator and denominator, The loss is the number of stripe-level ordinal regressions. For edge-aware smoothing loss, For the normal consistency loss based on differential geometry, and These are the gradient consistency loss and Laplacian consistency loss for absolute phase, respectively.
3. The method according to claim 2, characterized in that, The stripe order number regression loss is expressed as: , In the formula, It is the Sigmoid activation function. The corresponding threshold output by the cascaded spectrum network t logical value, For true stripe levels, Effective pixel count The maximum value of the order. It is the minimum value of the order. i , j For pixel index.
4. The method according to claim 2, characterized in that, The edge-aware smoothing loss is expressed as: , In the formula, The spatial gradient of the soft stripe order predicted by the network. The gradient of the phase is used to wrap the phase. To control the hyperparameters of edge sensitivity, For the set of valid pixels, N This represents the total number of valid pixels.
5. The method according to claim 2, characterized in that, The differential geometry-based normal consistency loss is constructed as follows: By using a preset differentiable phase-normal physical operator, the predicted absolute phase map is mapped to the surface normal estimate, and the cosine similarity between the surface normal estimate and the true surface normal value is used as a consistency constraint. The normal consistency loss based on differential geometry is expressed as: , In the formula, The surface normal vector is obtained by processing the predicted absolute phase map using the preset differentiable phase-normal physical operator. Let be the true vector of the surface normal.
6. A three-dimensional reconstruction system based on the fusion of phase and polarization information, characterized in that, include: The image acquisition module acquires a single-frame fringe intensity image and a polarization image of the target object. The single-frame fringe intensity image and polarization image are obtained by projecting a single sinusoidal fringe pattern onto the target object using a projector and simultaneously acquiring the image using a polarization camera. The phase information prediction module obtains a polarization parameter map based on the polarization image, performs joint encoding and multi-scale fusion of the single fringe intensity image and polarization parameter map through a cascaded spectrum network, obtains the numerator and denominator prediction values of the phase, and predicts the fringe order information through ordinal regression. The cascaded spectrum network is trained in the following way: Obtain a mixed dataset containing both real and synthetic data; Construct the initial cascaded spectrum network; Design a composite loss function based on the consistency relationship between the absolute phase truth value, numerator truth value, denominator truth value and stripe level sub-truth value in the hybrid dataset; The initial cascaded spectral network is trained end-to-end using the hybrid dataset and the composite loss function; The cascaded spectrum network includes: Stripe coding module, polarization coding module, feature fusion and downsampling module, decoding and recovery module, numerator prediction head, denominator prediction head and stripe order prediction head; The stripe encoding module is used to encode the single stripe intensity image and extract image features; The polarization encoding module is used to encode the polarization parameter map, extract polarization features, and fuse them with the image features extracted by the stripe encoding module; The feature fusion and downsampling module is used to downsample and transform the fused features. The decoding and recovery module is used to upsample the downsampled features to restore the resolution; The numerator prediction head is used to output the numerator prediction value of the wrapped phase; the denominator prediction head is used to output the denominator prediction value of the wrapped phase. The stripe level prediction head is used to output logical values corresponding to multiple preset thresholds for ordinal regression. The phase processing and normal mapping module calculates the wrapping phase based on the numerator and denominator predicted values, performs phase expansion operation on the wrapping phase and the fringe order information to obtain an absolute phase map, and maps the absolute phase map to a surface normal map through a preset differentiable phase-normal physical operator. The 3D reconstruction module calculates the initial 3D point cloud of the target object based on the absolute phase map and the relative pose parameters of the polarization camera and the projector according to the principle of triangulation. It then optimizes the initial 3D point cloud using the surface normal map to obtain the 3D reconstruction result of the target object.
7. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the three-dimensional reconstruction method based on the fusion of phase and polarization information as described in any one of claims 1 to 5.
8. A computer-readable storage medium storing a computer program / instructions and a bit stream thereon, characterized in that, When the computer program / instruction is executed by the processor, it implements the three-dimensional reconstruction method based on the fusion of phase and polarization information as described in any one of claims 1 to 5 to generate the bit stream.