An aberration compensation and phase unwrapping synchronous decoupling method based on deep learning

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing the HAG-EffUNet network, aberration compensation and phase unwrapping are simultaneously decoupled, solving the problems of error accumulation and misjudgment of low-frequency object shape in traditional methods, and improving the accuracy and efficiency of 3D shape reconstruction of holographic measurement system.

CN122265075APending Publication Date: 2026-06-23XIAN UNIV OF TECH

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: XIAN UNIV OF TECH
Filing Date: 2026-03-11
Publication Date: 2026-06-23

Smart Images

Figure CN122265075A_ABST

Patent Text Reader

Abstract

A deep learning-based method for simultaneous decoupling of aberration compensation and phase unwrapping includes the following steps: simulating and generating realistic phase samples; superimposing aberration surfaces and adding noise onto these samples, then constructing a dataset through a wrapping operation; constructing a HAG-EffUNet network, introducing a gradient-aware mechanism in the encoder to extract features; based on deep features, outputting Zernike polynomial coefficients as the first result through an aberration prediction head, while simultaneously fusing a hollow spatial pyramid pooling and multi-scale perception modules in the decoder, and introducing an attention gating mechanism to filter noise, then upsampling and outputting the phase unwrapping result as the second result; finally, simultaneously predicting aberration coefficients and unwrapped phases from noisy wrapped distorted phases. This invention effectively avoids error propagation in step-by-step processing, possesses excellent noise suppression capabilities, achieves high-fidelity restoration of large low-frequency samples, and improves the decoupling efficiency and 3D topography reconstruction accuracy of holographic measurement systems.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of optical detection and image processing technology, specifically relating to a method for synchronous decoupling of aberration compensation and phase unwrapping based on deep learning. Background Technology

[0002] Digital holography has been widely applied in 3D measurement fields such as the morphology inspection of microstructured devices. In practical digital holographic measurement systems, global system aberrations are inevitably introduced due to factors such as optical component defects, optical path assembly errors, and environmental interference. Furthermore, the strong coherence of laser light and scattering from optically rough surfaces introduce multiplicative speckle noise and additive Gaussian noise, causing the hologram's enveloping phase to be subject to both the true phase of the measured object and interference from noise and background aberrations. Therefore, it is necessary to perform noise suppression and aberration compensation during enveloping phase decoupling to achieve high-precision reconstruction of the 3D morphology of the measured object.

[0003] Existing aberration compensation methods have significant limitations: 1) Traditional numerical compensation methods (such as Zernike fitting aberration compensation and principal component analysis) have good aberration suppression effects when dealing with high-frequency objects such as phase gratings. However, when dealing with low-frequency objects with smooth surfaces such as microlenses, they are prone to misjudging the object shape as spherical aberration or defocus aberration for compensation. Especially when the measured object is large and occupies more than 50% of the field of view background area, the problem of failure of traditional fitting methods is further aggravated due to the insufficient background area available for extracting aberration reference, resulting in serious distortion of the final reconstructed 3D shape.

[0004] 2) While deep learning-based phase aberration compensation methods offer new avenues for improvement, most only compensate for aberrations in the phase after unwrapping. This step-by-step processing leads to residual errors from the phase unwrapping stage being directly propagated to the aberration compensation stage, causing error accumulation and reducing the accuracy of the phase reconstruction results. Furthermore, the step-by-step decoupled algorithm is complex, requiring phase unwrapping and aberration compensation to be performed separately, resulting in low decoupling efficiency, especially when both steps employ deep learning networks, where processing efficiency drops significantly.

[0005] Therefore, when faced with the influence of noise and aberrations, for aberration compensation and phase unwrapping algorithms, regardless of how traditional algorithms and deep learning algorithms are combined step by step, existing technologies cannot avoid the problem of error propagation and accumulation. Moreover, it is difficult to achieve high-fidelity recovery in noisy interference and low-frequency large sample scenarios, which ultimately leads to limited reconstruction accuracy of the holographic measurement system, thus affecting the measurement accuracy of subsequent digital holography. Summary of the Invention

[0006] To overcome the shortcomings of the prior art, the purpose of this invention is to provide a deep learning-based method for synchronous decoupling of aberration compensation and phase unwrapping, which integrates phase compensation and unwrapping functions and features noise suppression and high decoupling accuracy.

[0007] To achieve the above objectives, the technical solution adopted by this invention is: a method for synchronous decoupling of aberration compensation and phase unwrapping based on deep learning, characterized by comprising the following steps: Step S1: Use two-dimensional mathematical functions to generate the original three-dimensional phase as a real phase sample; Step S2: A global background aberration surface is generated by fitting a multi-order Zernike polynomial coefficient matrix. This surface is then superimposed on the true phase, and multiplicative speckle noise and additive Gaussian noise are introduced to obtain a noisy distorted phase. The noisy distorted phase is then wrapped to obtain a noisy wrapped distorted phase. Step S3: Take the noisy wrapped distortion phase as input, the corresponding Zernike polynomial coefficient matrix and the real phase sample as ground truth, generate a dataset, and divide it into training set, validation set and test set. Step S4: Construct the HAG-EffUNet network based on the encoder-decoder structure; input the training set into the encoder based on EfficientNet-B0, use multiple cascaded moving inverted bottleneck convolutional modules (MBConv) to perform deep feature extraction and downsampling, and introduce a gradient-aware coordinate attention mechanism inside the module to extract deep feature maps containing spatial gradient information. Step S5: Input the deep feature map obtained by the encoder into the aberration prediction head to predict the Zernike polynomial coefficients, which are used as the first output of the network, denoted as Output1. Step S6: The deep feature map obtained in step S4 is used as the input of the decoder. Multi-scale features are fused using the hollow spatial pyramid pooling module located at the bottleneck layer of the network. The resulting transition feature map is fed into the decoding convolution block containing the multi-scale perception module and the continuous feature extraction unit to perform cross-scale feature extraction and preliminary fusion. Step S7: An attention gating mechanism is introduced at the skip connection of the decoder. The upsampled semantic features are used to perform spatial adaptive filtering on the shallow features of the corresponding layer of the encoder. In the whole decoder, the features are processed by upsampling layer, gating filtering fusion and decoding convolutional block alternately to restore spatial resolution step by step and output phase unwrapping result as the second output result of the network, denoted as Output2. Step S8: Input the test set into the trained HAG-EffUNet network to obtain the Zernike polynomial coefficients and phase unwrapping results of the test set.

[0008] Step S1 specifically includes: Step S101: Use Gaussian function simulation to generate continuous phases as the first type of phase samples; Step S102: Use the Sigmoid function to simulate and generate continuous phases as the second type of phase samples; Step S103: Mix the first type of phase sample obtained in steps S101 and S102 with the second type of phase sample to form a real phase sample. .

[0009] Step S2 specifically includes: Step S201: Randomly generate multiple sets of Zernike polynomial coefficient matrices; Step S202: Fit the spatial wavefront aberration surface using the Zernike polynomial coefficient matrix obtained in step S201, and use it as the global background aberration surface. The specific implementation process is as follows: (1) In equation (1), Indicates the first i A Zernike polynomial, Denotes the order of the Zernike polynomial. Indicates the corresponding Zernike polynomial coefficients; Step S203: The global background aberration surface obtained in step S202 is superimposed with the real phase sample obtained in step S103 to form a phase containing aberration distortion. Step S204: Add multiplicative speckle noise with different standard deviations and fixed additive Gaussian noise to the aberration-distorted phase obtained in step S203 to obtain a noisy distorted phase. U The specific implementation process is as follows: (2) In equation (2), Indicates the amplitude of light waves from a noise-free object; The standard deviation of the multiplicative speckle noise determines the magnitude of the speckle noise level; and For random Gaussian noise, represent random distributions in amplitude and phase, respectively; j represents the imaginary unit; This represents additive Gaussian noise; Step S205: Use the arctangent function to correct the noisy distorted phase in step S204. U Perform a wrapping operation to truncate its absolute phase within the interval [-π, π], resulting in a noisy wrapped distorted phase. The specific implementation process is as follows: 3) In equation (3), Im represents the imaginary part operation and Re represents the real part operation.

[0010] Step S3 specifically includes: Using the true phase sample obtained in step S103 and the Zernike polynomial coefficient matrix in step S201 as the true values, the noisy wrapped distortion phase corresponding to step S205 is... As input, the dataset is divided into training set, validation set, and test set according to a preset ratio.

[0011] Step S4 specifically includes: Step S401: Construct a HAG-EffUNet network based on an encoder-decoder structure. Use the training set obtained in step S3 as the network input and feed it into the encoder with a lightweight EfficientNet-B0 backbone architecture to extract shallow feature maps. Step S402: The shallow feature maps obtained in step S401 are sequentially fed into multi-level moving inverted bottleneck convolutional modules for deep feature extraction and downsampling. Within each multi-level moving inverted bottleneck convolutional module, the input feature map first passes through a 1×1 convolutional layer, a batch normalization layer, and a Swish activation function for channel upscaling. Subsequently, it enters a depthwise separable convolutional layer with a kernel size of k×k, a batch normalization layer, and a Swish activation function to extract spatial features, resulting in an intermediate feature map. I ; Step S403: Input the intermediate feature map obtained in step S402 into the gradient-aware coordinate attention mechanism. First, to prevent high-frequency edge interference caused by zero-padding during gradient calculation, a copy operation is performed on the intermediate feature map to maintain the continuity of image stripes. The spatial gradients of the intermediate feature map I in the horizontal and vertical directions are extracted using the Sobel edge operator with fixed weights. The specific calculation formula is as follows: (4) In equation (4), and These represent Sobel convolution kernels in the horizontal and vertical directions, respectively, and * represents the convolution operation. Represents the spatial gradient in the horizontal direction. The spatial gradient represents the vertical direction; combining the spatial gradient described above, the gradient magnitude is calculated. M for: (5) In equation (5), To prevent extremely small constant values where the gradient is 0; Step S404: In the GE-CA module, the intermediate feature map obtained in step S402 is... I The algorithm is divided into a residual branch, a retained branch, and another branch. The other branch is compared with the gradient magnitude map calculated in step S403. M Average pooling is performed along the X and Y directions to obtain position and gradient feature vectors. These vectors are then concatenated and fused, and subsequently processed through convolutional layers, batch normalization layers, and non-linear activation functions, performing position-aware and gradient-aware operations in parallel. The processed features are then separated along the width and height directions, with each feature passing through a convolutional layer and a sigmoid activation function to output attention scores in the width and height dimensions. Finally, the attention scores are multiplied by the residual branch, and a re-weighting operation is performed to obtain an attention-weighted feature map. This forces the network to assign higher feature weights to regions with dense wrapper lines, overcoming the high-frequency truncation effect of traditional pooling operations. In step S405, in the MBConv module, the attention-weighted feature map obtained in step S404 is sequentially processed by a 1×1 convolutional layer, a batch normalization layer, and a Dropout layer for dimensionality reduction. The residual is added to the initial input feature map of the multi-level moving inverted bottleneck convolutional module. After the feature map is processed and downsampled by all multi-level moving inverted bottleneck convolutional modules, a deep feature map is output.

[0012] Step S5 specifically includes: Step S501: Input the deep feature map obtained in step S4 into the aberration prediction head, and convert it into a one-dimensional feature vector through average pooling and flattening operations. Step S502: The one-dimensional feature vector from step S501 is sequentially processed by a fully connected layer for dimensionality increase and decrease to output an intermediate feature vector. Step S503: The intermediate feature vector from step S502 is passed through a fully connected layer to output the Zernike polynomial coefficient sequence, which is used as the first output of the network, denoted as Output1.

[0013] Step S6 specifically includes: Step S601: Input the deep feature map obtained in step S4 into the hollow spatial pyramid pooling module located at the bottleneck layer of the network, and output the transition feature map after fusing multi-scale features. In step S602, the transition feature map from step S601 is fed into the multi-scale perception module in the decoding convolutional block and processed by four 3×3 dilated convolutional branches with dilation rates of 1, 2, 4, and 8 respectively. After the number of output channels of each branch is compressed, the channels are concatenated and then processed by a 1×1 convolutional layer, a batch normalization layer, and a ReLU activation function to restore the number of channels of the fused output feature map to the initial number of channels when it was input into the HA module. Step S603: The output feature map from step S602 is sequentially fed into two consecutive feature extraction units. Each unit consists of a two-dimensional 3×3 convolutional layer, a BN layer, and a ReLU function connected in series, and outputs the output feature map of the decoding convolutional block.

[0014] Step S7 specifically includes: Step S701: An attention gating mechanism is introduced at the skip connection of the decoder: the shallow features of the encoder are denoted as x, and the upsampled semantic features of the previous level are denoted as the gating signal g; the linear mappings of x and g are calculated by 1×1 convolution, and the elements are added together. The output attention score map is multiplied by x after passing through the ReLU function, 1×1 convolution and Sigmoid function to obtain the high-frequency edge features; the high-frequency edge features are concatenated with the gating signal g in the channel dimension to generate fused features and input into the decoding convolution block of the current level; In step S702, the deep features obtained in step S4 are processed alternately by upsampling layer, attention gating mechanism and decoding convolutional block, and finally the number of feature channels is compressed to 1 by unbiased 3×3 convolutional layer, and the phase unwrapping result is output as the second output result of the network, denoted as Output2. Step S703: Compare Output1 and Output2 with the true values to calculate the loss and feed back to correct the network parameters; repeat the above training steps until the network model converges, and save the optimal weights of the trained network model.

[0015] Step S8 specifically includes: The test set or the actual acquired noisy wrapped distortion phase map is input into the trained HAG-EffUNet model. Through a single forward propagation, the Zernike polynomial coefficients and the true unwrapped phase results of the test sample are output simultaneously.

[0016] Compared with existing technologies, the deep learning-based aberration compensation and phase unwrapping synchronous decoupling method provided by this invention has the following significant technical advantages: Digital holography is widely used in high-precision measurement of three-dimensional topography. However, the actual obtained encapsulated phase is often affected by the dual interference of mixed noise and complex system aberrations, which affects the correctness and accuracy of phase reconstruction. The traditional step-by-step processing method of "unwrapping first, then aberration compensation" has an inherent defect, that is, the residual error in the unwrapping stage will be transmitted to the aberration compensation stage, causing error accumulation.

[0017] 1) Synchronous decoupling avoids error propagation: The aberration coefficients (Output1) and phase unwrapping results (Output2) can be output synchronously through a single forward propagation of the network, avoiding the error propagation and accumulation problems introduced by the step-by-step decoupling mode; 2) Accurate restoration of high-frequency transition boundaries: To address the issue that phase fringes are easily smoothed by the network, a gradient-aware coordinate attention mechanism (GE-CA) is introduced into the encoder. By explicitly extracting the two-dimensional spatial gradient, the network is forced to assign higher weights to the phase transition boundary, which significantly improves the ability to extract and restore the phase transition boundary under complex distortion. 3) Enhanced robustness for large-size samples: For large-sized samples, the model integrates multi-scale features by using Spatial Pyramid Pooling with Hollows (ASPP) and a multi-scale perception module (HA) (expansion rate 1 / 2 / 4 / 8) in the bottleneck layer of the decoder, thereby enhancing the robustness of the model for shape restoration of low-frequency large-sample scenes ("achieving noise suppression and high-fidelity restoration of low-frequency large samples"). 4) Effective suppression of noise interference: To address the noise interference problem, an attention gating mechanism is introduced at the skip connection of the decoder, and semantic features are used to adaptively filter the shallow feature space (step S701), which significantly suppresses the interference of stray light and speckle noise on phase unwrapping. 5) Measurement efficiency and accuracy are improved simultaneously: The simplified processing flow to a single forward propagation significantly improves the decoupling efficiency of the high holographic system. It achieves high-precision reconstruction of the 3D shape of the object under test. Test results show that the average structural similarity between the output unwrapped result and the true phase is 0.933. 6) Strong applicability to engineering projects: It can directly process noisy, distorted phase maps acquired in practice, and is suitable for industrial scenarios such as optical inspection and micro-device topography measurement.

[0018] In summary, the method proposed in this invention can simultaneously obtain quantized aberration coefficients and correct phase unwrapping results in a single forward propagation under measurement environments affected by complex system aberrations and background noise. This simplifies the processing flow, improves decoupling efficiency, and achieves noise suppression and high-fidelity recovery of large low-frequency samples. It is of great significance for improving the reconstruction accuracy of holographic 3D topography. Attached Figure Description

[0019] Figure 1 This is a flowchart of a deep learning-based aberration compensation and phase unwrapping synchronous decoupling method according to an embodiment of the present invention.

[0020] Figure 2a This is an example of a noisy, wrapped distortion dataset generated by a Gaussian function according to an embodiment of the present invention.

[0021] Figure 2b This is an example of a noisy, wrapped distortion dataset generated by the Sigmoid function according to an embodiment of the present invention.

[0022] Figure 3 For the embodiments of the present invention Figure 2a The noisy wrapped distortion phase map is the predicted aberration coefficients and phase unwrapping results output by the network when input.

[0023] Figure 4 For the embodiments of the present invention Figure 2b The noisy wrapped distortion phase map is the predicted aberration coefficients and phase unwrapping results output by the network when input.

[0024] Figure 5 This is a schematic diagram of the HAG-EffUNet network architecture according to an embodiment of the present invention. Detailed Implementation

[0025] The present invention will now be described in further detail with reference to the accompanying drawings and embodiments.

[0026] See Figure 1 A deep learning-based method for synchronous decoupling of aberration compensation and phase unwrapping is proposed, and the specific implementation steps are as follows: Step S1 involves simulating and generating real phase samples using two-dimensional mathematical functions. In a preferred embodiment of the present invention, step S1 specifically includes: Step S101: Using a Gaussian function, continuous phases with random distribution of size and phase are generated. The image radius is in the range of [10 pixels, 80 pixels], and the random phase height is in the range of [10 rad, 50 rad]. These are used as the first type of phase samples, and their number is set to account for 80% of the total number of samples in the dataset. Step S102: Use the Sigmoid function to simulate and generate continuous phases with random distribution of size and phase. The image radius is in the range of [10 pixels, 80 pixels], and the random phase height is in the range of [10 rad, 50 rad]. These are used as the second type of phase samples, and their number is set to account for 20% of the total number of samples in the dataset. Step S103: Mix the real phase samples obtained in step S101 and step S102 to form a real phase sample. ; Step S2 involves superimposing an aberration surface onto the real phase sample and adding noise, then performing a wrapping operation to obtain a noisy wrapped distortion phase dataset. In a preferred embodiment of the present invention, step S2 specifically includes: Step S201: Randomly generate 15,000 independent sets of coefficient matrices containing 15th-order Zernike polynomials; Step S202: Using the Zernike polynomial coefficient matrix obtained in step S201, a spatial wavefront aberration surface of size 224×224 pixels is fitted as the global background aberration surface. The specific implementation process is as follows: (1) In equation (1), Indicates the first i A Zernike polynomial, Indicates the corresponding Zernike polynomial coefficients; Step S203, the global background aberration surface obtained in step S202 is... Compared with the true phase sample obtained in step S103 The phases are superimposed to form a phase containing aberrations and distortions; Step S204: Add multiplicative speckle noise with different standard deviations and fixed additive Gaussian noise to the aberration-distorted phase obtained in step S203 to obtain a noisy distorted phase. U The specific implementation process is as follows: Noise in the holographic wrapper phase includes multiplicative speckle and additive Gaussian noise, and the object wave containing noise and aberrations can be represented as: (2) In equation (2), This represents the amplitude of a noise-free object's light wave; here we let... ; The standard deviation of the multiplicative speckle noise determines the magnitude of the speckle noise level; and Let be random Gaussian noise with a mean of 0 and a standard deviation of 1, and let represent the random distribution of amplitude and phase, respectively; j represents the imaginary unit. This represents additive Gaussian noise with a mean of 0 and a standard deviation of 0.2. Step S205: Use the arctangent function to correct the noisy distorted phase in step S204. U Perform a wrapping operation to truncate its absolute phase within the interval [-π, π], resulting in a noisy wrapped distorted phase. The specific implementation process is as follows: (3) In equation (3), Im represents the imaginary part operation and Re represents the real part operation.

[0027] like Figure 2a and Figure 2bAs shown in the figure, this embodiment presents examples of noisy wrapped distortion datasets generated by Gaussian and Sigmoid functions, respectively. From left to right, they are: the undisturbed true phase (gt), the global background aberration surface reconstructed using 15th-order Zernike polynomial coefficients (aberration), and the noisy wrapped distortion phase input to the network (input). It can be seen that the phase stripes in the input image are significantly deformed due to the coupling effect of strong noise and background aberration.

[0028] Step S3: The real phase samples, the Zernike polynomial coefficient matrix, and the distorted phase wrapped in noise are used to generate a dataset; in a preferred embodiment of the present invention, step S3 specifically includes: The real phase sample obtained in step S103 Using the Zernike polynomial coefficient matrix from step S201 as the true value, the noisy wrapped distortion phase corresponding to step S205... As input, the data is divided into training set, validation set, and test set in a 7:2:1 ratio. Step S4: Construct a HAG-EffUNet network based on an encoder-decoder structure; input the training set into an encoder based on EfficientNet-B0, and use multiple cascaded moving inverted bottleneck convolutional modules for deep feature extraction and downsampling, and introduce a gradient-aware coordinate attention mechanism within this module to extract multi-scale feature maps containing spatial gradient information; In a preferred embodiment of the present invention, step S4 specifically includes: Step S401, refer to Figure 5 In this example, a HAG-EffUNet network based on an encoder-decoder structure is constructed. The training set obtained in step S3 is used as the network input and fed into the encoder with a lightweight EfficientNet-B0 backbone architecture. After a 3×3 convolution, feature extraction is performed, and a shallow feature map with a size of 112×112 pixels is output. Step S402: The shallow feature maps obtained in step S401 are sequentially fed into 15 cascaded moving inverted bottleneck convolutional modules (MBConv) for deep feature extraction and downsampling. Combined with... Figure 5 As shown in the MBConv module sub-graph, within each MBConv module, the input shallow feature map first passes through a 1×1 convolutional layer, a batch normalization (BN) layer, and a Swish activation function for channel upscaling; then it enters a deepwise separable convolutional layer (Deepwise Conv) with a kernel size of k×k, a BN layer, and a Swish activation function to extract spatial features, resulting in an intermediate feature map. I ; Step S403: Input the intermediate feature map obtained in step S402 into the Gradient-Enhanced Coordinate Attention (GE-CA) mechanism. Combined with... Figure 5 As shown in the GE-CA module sub-graph, firstly, to prevent high-frequency edge interference caused by zero-padding during gradient calculation, a "replicate" operation is performed on the intermediate feature map to maintain the continuity of image stripes; subsequently, the intermediate feature map is extracted using a Sobel edge operator with fixed weights. I Spatial gradient in the horizontal and vertical directions and The specific calculation formula is as follows: (4) In equation (4), and These represent the Sobel convolution kernels in the horizontal and vertical directions, respectively, with * indicating a convolution operation. Combining this with the spatial gradient described above, the gradient magnitude is calculated. M for: (5) In equation (5), To prevent extremely small constant values where the gradient is 0; Step S404: In the GE-CA module, the intermediate feature map obtained in step S402 is... I The algorithm is divided into two branches. One branch is retained as the residual branch, and the other branch is used in conjunction with the gradient magnitude map calculated in step S403. M Average pooling is performed along the X and Y directions to obtain position and gradient feature vectors. These vectors are then concatenated and fused, and subsequently processed through convolutional layers, batch normalization (BN) layers, and non-linear activation functions to perform position-aware and gradient-aware operations in parallel. The processed features are then separated along the width and height directions, with each feature passing through a convolutional layer and a sigmoid activation function to output attention scores in the width and height dimensions. Finally, the attention scores are multiplied by the residual branch, and a re-weighting operation is performed to obtain an attention-weighted feature map. This forces the network to assign higher feature weights to regions with dense wrapper lines, overcoming the high-frequency truncation effect of traditional pooling operations. In step S405, within the MBConv module, the attention-weighted feature map obtained in step S404 is sequentially processed through a 1×1 convolutional layer, a BN layer, and a Dropout layer for dimensionality reduction. The residual is then added to the initial input feature map of the MBConv module to complete the feature extraction process for a single MBConv module. After the input data undergoes layer-by-layer processing and maximum depth downsampling by all MBConv modules in the encoder, a deep feature map with a size of 7×7×320 is finally output, which is used for aberration prediction and feature decoding in subsequent network nodes. Step S5: Input the deep feature map obtained by the encoder into the aberration prediction head, and predict the Zernike polynomial coefficients as the first output of the network, denoted as Output1; In a preferred embodiment of the present invention, step S5 specifically includes: Step S501: Input the deep feature map obtained in step S4 into the aberration prediction head, and combine it with... Figure 5 As can be seen from the Aberration Head subgraph, the deep feature map first undergoes average pooling to compress its spatial dimensions, both height and width, to 1. Then, a flattening operation is performed to transform the deep feature map into a one-dimensional feature vector with a length of 320. Step S502: The one-dimensional feature vector obtained in step S501 is sequentially passed through a first fully connected layer (FC) to increase the dimension to 512, processed by the ReLU function and a random dropout layer with a dropout rate of 0.3, and then passed through a second fully connected layer to reduce the dimension to 256, processed by the ReLU function and a random dropout layer with a dropout rate of 0.2, and finally outputs a 256-dimensional intermediate feature vector. Step S503: The 256-dimensional intermediate feature vector output in step S502 is passed through a fully connected layer to finally output a numerical sequence; this numerical sequence directly corresponds to the Zernike polynomial coefficients of the system's global aberrations and is used as the first output of the network, denoted as Output1; Step S6: The deep feature map obtained in step S4 is used as the input to the decoder. Multi-scale features are fused using the dilated spatial pyramid pooling module located at the network bottleneck layer. The resulting transition feature map is then fed into the decoding convolutional block containing a multi-scale perception module and a continuous feature extraction unit for cross-scale feature extraction and preliminary fusion. In a preferred embodiment of the present invention, step S6 specifically includes: Step S601: Input the deep feature map obtained in step S4 into the Spatial Pyramid Pooling (ASPP) module located at the network bottleneck layer: The deep feature map is fed into three 3×3 dilated convolution branches with dilation rates of 6, 12 and 18, respectively, and processed sequentially by dilated convolution, batch normalization (BN) layer and ReLU activation function; then the outputs of the above three dilated convolution branches are concatenated in the channel dimension, and dimensionality reduction and fusion are performed through a 1×1 convolution layer to output a transition feature map; Step S602: The transition feature map obtained in step S601 is fed into the decoder block. In the decoder block, the transition feature map is input into the multi-scale sensing module (HA) and combined with... Figure 5 As shown in the sub-graph of the HA module, the transition feature map is simultaneously processed by four 3×3 dilated convolution branches with dilation rates of 1, 2, 4, and 8, respectively. Each branch compresses the number of feature channels to one-quarter of the number of input channels through convolution operations. Then, the outputs of the above four branches are concatenated in the channel dimension and processed sequentially through a 1×1 convolutional layer, a BN layer, and a ReLU activation function, so that the number of channels of the fused output feature map is restored to the initial number of channels when it was input into the HA module. Step S603, combined with Figure 5 As can be seen from the Decoder Block subgraph, the output feature map fused by the HA module in step S602 is sequentially fed into two consecutive feature extraction units. Each feature extraction unit consists of a two-dimensional 3×3 convolutional layer, a BN layer, and a ReLU function connected in series. After the above continuous extraction and standardization processing, the output feature map of the decoding convolutional block is obtained. Step S7 introduces an attention gating mechanism at the skip connections of the decoder, using upsampled semantic features to perform spatial adaptive filtering on the shallow features of the corresponding layer of the encoder; throughout the decoder, the features are processed multiple times through alternating upsampling layers, gating filtering fusion, and decoding convolutional blocks, progressively restoring the spatial resolution and outputting the phase unwrapping result (i.e., the true phase map), which serves as the second output of the network, denoted as Output2; in a preferred embodiment of the present invention, step S7 specifically includes: Step S701 introduces an attention gate mechanism at the skip connections of the decoder. Specifically, the shallow features of each level in step S401 are denoted as x, and the semantic features output from the previous level feature map through the upsampling layer in the decoder are denoted as the gate signal g. The linear mappings of x and g are calculated through 1×1 convolution operations. The calculation results are added element-wise and processed by the ReLU function, 1×1 convolution and Sigmoid function to output the spatial dimension attention score map. This map is then multiplied element-wise with the original shallow feature x to obtain the spatially filtered high-frequency edge features. The high-frequency edge features and the corresponding gate signal g are concatenated in the channel dimension to generate fused features, which are then input into the decoding convolution block of the current level. In step S702, throughout the entire decoding path, the deep features obtained in step S4 are processed multiple times through an upsampling layer, the attention gating mechanism described in step S701, and the decoding convolutional block, gradually restoring the spatial resolution and high-frequency details of the feature map. Finally, a 3×3 convolutional layer without bias is used to compress the number of feature channels to 1, and the phase unwrapping result (i.e., the real phase map) is output as the second output result of the network, denoted as Output2. Step S703: During the network training phase, construct a multi-task joint loss function that integrates mean squared error and structural similarity. The Zernike polynomial coefficients predicted in step S503 and the phase unwrapping results predicted in step S702 are compared with the true values to calculate the loss feedback correction network parameters, which are used to calculate the error between the network's dual outputs and the true values. The calculation formula is as follows: (6) In equation (6), This shows the phase compensation map (Output2) output by the network and the actual phase sample. The mean square error between them; This represents the phase compensation map (Output2) and the true phase sample. Structural similarity error between them; This represents the mean square error between the Zernike polynomial coefficients output by the network (Output1) and the true Zernike polynomial coefficients. The weighting parameter, used to control the balance among the loss terms, is set to 1; Step S704: Based on the loss function calculated in step S703 The weights and bias parameters of the network are iteratively updated using the backpropagation algorithm and the Adam optimizer. The above training steps are repeated until the network model converges, and the optimal weights of the trained network model are saved.

[0029] Step S8 involves inputting the test set into the trained network and simultaneously outputting the predicted Zernike polynomial coefficients and the unwrapped phase result. In a preferred embodiment of the present invention, step S8 specifically includes: The test set or the actual acquired noisy wrapped distortion phase map is input into the trained HAG-EffUNet model. Through a single forward propagation, the Zernike polynomial coefficients and the true unwrapped phase map of the test sample are output simultaneously. Then, the mean square error (MSE) of the prediction coefficients, the root mean square error (RMSE) of the unwrapping results, and the structural similarity (SSIM) are calculated. The formulas for each evaluation index are as follows: (7) In equation (7), For the predicted Zernike polynomial coefficients, These are the true Zernike polynomial coefficients. n Let Zernike be the order of the polynomial. (8) In equation (8), This represents the unwrapped phase map output by the network in spatial coordinates. Phase value at that point, This represents the phase value of the true unwrapped phase map at the corresponding coordinates; M and N These represent the height and width of the phase diagram, respectively. (9) In equation (9), and These are the phase mean values of the output unwrapped phase map and the true phase map, respectively; and These are the variances of the two variables, respectively. Let be the covariance of the two. and It is a very small constant.

[0030] Figure 3 The input was displayed. Figure 2a The corresponding network output results are shown in the left bar chart, which compares the predicted 15th-order Zernike polynomial coefficients with the actual Zernike coefficient values. It can be seen that the predicted values (black bars) and the actual values (slanted bars) are highly consistent, with a mean square error of 0.048, proving the network's ability to accurately quantify complex aberrations. The right bar chart shows the phase unwrapping results output by the network, with a root mean square error of 0.775 rad and a structural similarity of 0.944, achieving high-precision reconstruction of the three-dimensional shape of the measured object.

[0031] Figure 4 The input was displayed.Figure 2b The corresponding network output shows that the wrapping stripes at the object's edge are denser, yet the network still exhibits extremely high aberration prediction accuracy (left histogram) and high-fidelity unwrapping reconstruction quality (right figure). The mean square error of the prediction coefficients is 0.048, and the root mean square error of the phase unwrapping results and the structural similarity are 0.440 rad and 0.921, respectively. This fully demonstrates that the HAG-EffUNet network proposed in this invention can achieve noise suppression and high-fidelity reconstruction of large low-frequency samples with only a single forward propagation, significantly improving the decoupling efficiency and 3D topography reconstruction accuracy of the holographic measurement system.

[0032] After the above 8 steps, the proposed deep learning-based aberration compensation and phase unwrapping synchronous decoupling method can be realized.

Claims

1. A deep learning-based method for synchronous decoupling of aberration compensation and phase unwrapping, characterized in that, Includes the following steps: Step S1: Use two-dimensional mathematical functions to generate the original three-dimensional phase as a real phase sample; Step S2: A global background aberration surface is generated by fitting a multi-order Zernike polynomial coefficient matrix. This surface is then superimposed on the true phase, and multiplicative speckle noise and additive Gaussian noise are introduced to obtain a noisy distorted phase. The noisy distorted phase is then wrapped to obtain a noisy wrapped distorted phase. Step S3: Take the noisy wrapped distortion phase as input, the corresponding Zernike polynomial coefficient matrix and the real phase sample as ground truth, generate a dataset, and divide it into training set, validation set and test set. Step S4: Construct the HAG-EffUNet network based on the encoder-decoder structure; input the training set into the encoder based on EfficientNet-B0, use multiple cascaded moving inverted bottleneck convolutional modules to perform deep feature extraction and downsampling, and introduce a gradient-aware coordinate attention mechanism inside the module to extract deep feature maps containing spatial gradient information. Step S5: Input the deep feature map obtained by the encoder into the aberration prediction head to predict the Zernike polynomial coefficients, which are used as the first output of the network, denoted as Output1. Step S6: The deep feature map obtained in step S4 is used as the input of the decoder. Multi-scale features are fused using the hollow spatial pyramid pooling module located at the bottleneck layer of the network. The resulting transition feature map is fed into the decoding convolution block containing the multi-scale perception module and the continuous feature extraction unit to perform cross-scale feature extraction and preliminary fusion. Step S7: An attention gating mechanism is introduced at the skip connection of the decoder to perform spatial adaptive filtering of the shallow features of the corresponding layer of the encoder using upsampled semantic features. Throughout the decoder, features are processed multiple times through upsampling layers, gated filtering fusion, and decoding convolutional blocks, gradually restoring spatial resolution and outputting phase unwrapping results as the network's second output, denoted as Output2. Step S8: Input the test set into the trained HAG-EffUNet network to obtain the Zernike polynomial coefficients and phase unwrapping results of the test set.

2. The method for synchronous decoupling of aberration compensation and phase unwrapping based on deep learning according to claim 1, characterized in that, Step S1 specifically includes: Step S101: Use Gaussian function simulation to generate continuous phases as the first type of phase samples; Step S102: Use the Sigmoid function to simulate and generate continuous phases as the second type of phase samples; Step S103: Mix the first type of phase sample obtained in steps S101 and S102 with the second type of phase sample to form a real phase sample. .

3. The method for synchronous decoupling of aberration compensation and phase unwrapping based on deep learning according to claim 1, characterized in that, Step S2 specifically includes: Step S201: Randomly generate multiple sets of Zernike polynomial coefficient matrices; Step S202: Fit the spatial wavefront aberration surface using the Zernike polynomial coefficient matrix obtained in step S201, and use it as the global background aberration surface. The specific implementation process is as follows: （1） In equation (1), Denotes the i-th Zernike polynomial. Denotes the order of the Zernike polynomial. Indicates the corresponding Zernike polynomial coefficients; Step S203: The global background aberration surface obtained in step S202 is superimposed with the real phase sample obtained in step S103 to form a phase containing aberration distortion. Step S204: Add multiplicative speckle noise with different standard deviations and fixed additive Gaussian noise to the phase with aberrations obtained in step S203 to obtain the noisy distorted phase U. The specific implementation process is as follows: （2） In equation (2), Indicates the amplitude of light waves from a noise-free object; The standard deviation of the multiplicative speckle noise represents the magnitude of the speckle noise level. and For random Gaussian noise, represent random distributions in amplitude and phase, respectively; j represents the imaginary unit; This represents additive Gaussian noise; Step S205: The arctangent function is used to perform a wrapping operation on the noisy distorted phase U from step S204, truncating its absolute phase within the interval [-π, π], thus obtaining the noisy wrapped distorted phase. The specific implementation process is as follows: （3） In equation (3), Im represents the imaginary part operation and Re represents the real part operation.

4. The method for synchronous decoupling of aberration compensation and phase unwrapping based on deep learning according to claim 1, characterized in that, Step S3 specifically includes: Using the true phase sample obtained in step S103 and the Zernike polynomial coefficient matrix in step S201 as the true values, the noisy wrapped distortion phase corresponding to step S205 is... As input, the dataset is divided into training set, validation set, and test set according to a preset ratio.

5. The method for synchronous decoupling of aberration compensation and phase unwrapping based on deep learning according to claim 1, characterized in that, Step S4 specifically includes: Step S401: Construct a HAG-EffUNet network based on an encoder-decoder structure. Use the training set obtained in step S3 as the network input and feed it into the encoder with a lightweight EfficientNet-B0 backbone architecture to extract shallow feature maps. Step S402: The shallow feature map obtained in step S401 is sequentially fed into a multi-level moving inverted bottleneck convolutional module for deep feature extraction and downsampling. Inside each multi-level moving inverted bottleneck convolutional module, the input feature map first passes through a 1×1 convolutional layer, a batch normalization layer, and a Swish activation function for channel upscaling. Then, it enters a depth-separable convolutional layer with a kernel size of k×k, a batch normalization layer, and a Swish activation function to extract spatial features, resulting in an intermediate feature map I. Step S403: Input the intermediate feature map obtained in step S402 into the gradient-aware coordinate attention mechanism. First, to prevent high-frequency edge interference caused by zero-padding during gradient calculation, the intermediate feature map is copied with boundary pixels to maintain the continuity of image stripes. The spatial gradient of the intermediate feature map I in the horizontal and vertical directions is extracted using the Sobel edge operator with fixed weights. The specific calculation formula is as follows: （4） In equation (4), and These represent Sobel convolution kernels in the horizontal and vertical directions, respectively, and * represents the convolution operation. Represents the spatial gradient in the horizontal direction. The spatial gradient represents the vertical direction; combining the spatial gradient above, the gradient magnitude M is calculated as follows: （5） In equation (5), To prevent extremely small constant values where the gradient is 0; In step S404, within the GE-CA module, the intermediate feature map I obtained in step S402 is divided into a residual branch and another branch. The other branch and the gradient magnitude map M calculated in step S403 are subjected to average pooling along the X and Y directions, respectively, to obtain position and gradient feature vectors. After concatenation and fusion, they are processed sequentially through convolutional layers, batch normalization layers, and nonlinear activation functions, with position-aware and gradient-aware operations performed in parallel. The processed features are then processed through a convolutional layer and a sigmoid activation function along the width and height directions, respectively, to output attention scores in the width and height dimensions. Finally, the attention scores are multiplied by the residual branch, and a reweighting operation is performed to obtain an attention-weighted feature map. In step S405, in the MBConv module, the attention-weighted feature map obtained in step S404 is sequentially processed by a 1×1 convolutional layer, a batch normalization layer, and a Dropout layer for dimensionality reduction. The residual is added to the initial input feature map of the multi-level moving inverted bottleneck convolutional module. After the feature map is processed and downsampled by all multi-level moving inverted bottleneck convolutional modules, a deep feature map is output.

6. The method for synchronous decoupling of aberration compensation and phase unwrapping based on deep learning according to claim 1, characterized in that, Step S5 specifically includes: Step S501: Input the deep feature map obtained in step S4 into the aberration prediction head, and convert it into a one-dimensional feature vector through average pooling and flattening operations. Step S502: The one-dimensional feature vector from step S501 is sequentially processed by a fully connected layer for dimensionality increase and decrease to output an intermediate feature vector. Step S503: The intermediate feature vector from step S502 is passed through a fully connected layer to output the Zernike polynomial coefficient sequence, which is used as the first output of the network, denoted as Output1.

7. The method for synchronous decoupling of aberration compensation and phase unwrapping based on deep learning according to claim 1, characterized in that, Step S6 specifically includes: Step S601: Input the deep feature map obtained in step S4 into the hollow spatial pyramid pooling module located at the bottleneck layer of the network, and output the transition feature map after fusing multi-scale features. In step S602, the transition feature map from step S601 is fed into the multi-scale perception module in the decoding convolutional block and processed by four 3×3 dilated convolutional branches with dilation rates of 1, 2, 4, and 8 respectively. After the number of output channels of each branch is compressed, the channels are concatenated and then processed by a 1×1 convolutional layer, a batch normalization layer, and a ReLU activation function to restore the number of channels of the fused output feature map to the initial number of channels when it was input into the HA module. Step S603: The output feature map from step S602 is sequentially fed into two consecutive feature extraction units. Each unit consists of a two-dimensional 3×3 convolutional layer, a BN layer, and a ReLU function connected in series, and outputs the output feature map of the decoding convolutional block.

8. The method for synchronous decoupling of aberration compensation and phase unwrapping based on deep learning according to claim 1, characterized in that, Step S7 specifically includes: Step S701: An attention gating mechanism is introduced at the skip connection of the decoder: the shallow features of the encoder are denoted as x, and the upsampled semantic features of the previous level are denoted as the gating signal g; the linear mappings of x and g are calculated by 1×1 convolution, and the elements are added together. The output attention score map is multiplied by x after passing through the ReLU function, 1×1 convolution and Sigmoid function to obtain the high-frequency edge features; the high-frequency edge features are concatenated with the gating signal g in the channel dimension to generate fused features and input into the decoding convolution block of the current level; In step S702, the deep features obtained in step S4 are processed alternately by upsampling layer, attention gating mechanism and decoding convolutional block, and finally the number of feature channels is compressed to 1 by unbiased 3×3 convolutional layer, and the phase unwrapping result is output as the second output result of the network, denoted as Output2. Step S703: Compare Output1 and Output2 with the true values to calculate the loss and feed back to correct the network parameters; repeat the above training steps until the network model converges, and save the optimal weights of the trained network model.

9. The method for synchronous decoupling of aberration compensation and phase unwrapping based on deep learning according to claim 1, characterized in that, Step S8 specifically includes: The test set or the actual acquired noisy wrapped distortion phase map is input into the trained HAG-EffUNet model. Through a single forward propagation, the Zernike polynomial coefficients and the true unwrapped phase results of the test sample are output simultaneously.