A planar diffractive lens image reconstruction method based on deep learning
By optimizing planar diffractive lens image reconstruction using a deep learning method based on cyclic consistent generative adversarial networks, the imaging quality problem under a wide spectrum is solved, and efficient and adaptive image reconstruction results are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NANJING UNIV OF SCI & TECH
- Filing Date
- 2026-03-12
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies struggle to effectively address issues such as focus shift, dispersive halos, and image quality degradation in planar diffraction lens imaging systems across a wide spectrum. Furthermore, traditional algorithms incur significant computational overhead and require complex parameter adjustments, while deep learning methods demand expensive hardware configurations for strict synchronization.
A deep learning approach based on Cyclic Generative Adversarial Networks (CGN) is adopted. The microstructure of the planar diffractive lens is optimized by combining the direct binary search algorithm, a reconstruction generator is constructed and pre-trained, and the image reconstruction process is optimized by using an improved attention module and loss function. A CGN is then constructed and trained.
It maintains high focusing efficiency over a wide spectral range, suppresses sidelobes, improves imaging quality and the algorithm's adaptability, and reduces computational overhead and hardware costs.
Smart Images

Figure CN122199698A_ABST
Abstract
Description
Technical Field
[0001] This invention pertains to an optimization method for planar diffractive lens imaging, specifically a planar diffractive lens image reconstruction method based on deep learning. Background Technology
[0002] With the rapid development of modern optical imaging technology towards miniaturization, lightweighting, and integration, planar imaging systems based on diffractive optical elements have become a research hotspot in the field of computational imaging due to their significant advantages such as small size, light weight, and high degree of design freedom. Planar diffractive lenses can achieve phase modulation of light waves by etching micro- and nano-structures on substrates with a thickness of only micrometers, compressing traditional centimeter-level optical lenses to the micrometer level, thus making it possible to realize low-cost, lightweight, large-scale mass-produced imaging devices.
[0003] However, planar diffractive lenses face significant physical challenges in broadband imaging applications. Due to the extremely strong negative dispersion characteristics of diffractive elements, their focal length is inversely proportional to the wavelength. Under broadband illumination, light of different wavelengths is focused on different planes, resulting in severe focus shift, large-scale dispersive halos, and a sharp decrease in contrast in the imaging results. Although researchers have proposed hardware solutions such as multi-level phase structures and achromatic superlenses, these often come at the cost of sacrificing aperture size, reducing light energy utilization, or increasing fabrication difficulty, making it difficult to meet the practical imaging requirements of large field of view and high throughput.
[0004] Traditional image decoding algorithms are mainly based on deconvolution theory, such as Wiener filtering and the alternating direction multiplier method. Wiener filtering deconvolves in the frequency domain by minimizing the mean square error, but since the optical transfer function of diffractive systems often has zeros or decays sharply in the high-frequency part, direct division will lead to the infinite amplification of noise signals and introduce severe ringing artifacts. Iterative optimization algorithms such as the alternating direction multiplier method alleviate the ill-posedness of the inverse problem by introducing prior constraints such as total variational regularization, but when dealing with the large-scale, non-uniform point spread function that varies drastically with wavelength unique to planar diffractive lens imaging systems, the computational cost is huge and it is difficult to meet the requirements of real-time imaging. The adjustment of the regularization parameter is also extremely cumbersome.
[0005] In recent years, deep learning technology has made revolutionary progress in the field of computational imaging. Ideal training requires precise pairing of a sharp image of the same scene captured by a standard imaging system and a degraded coded image captured by a planar diffraction lens system. Such data is extremely difficult to obtain in practice, requiring strictly synchronized hardware configurations and scene control, which is costly and unsuitable for dynamic or complex environments. Summary of the Invention
[0006] This invention proposes a deep learning-based method for reconstructing planar diffractive lens images. By using a deep learning method based on a cyclic consistent generative adversarial network model, the imaging quality of a planar diffractive lens imaging system is improved.
[0007] The technical solution to achieve the objective of this invention is: a planar diffraction lens image reconstruction method based on deep learning, comprising the following steps:
[0008] Step 1: The microstructure height distribution of the planar diffractive lens is discretized and iteratively optimized using a direct binary search algorithm to design diffractive optical elements suitable for imaging.
[0009] Step 2: Construct an imaging system using the diffractive optical elements designed in Step 1, acquire blurred image data, and build a training dataset;
[0010] Step 3: Construct a reconstruction generator based on the Unet infrastructure and pre-train the reconstruction generator. The reconstruction generator introduces a convolutional block attention module, which enhances the extraction capability of high-frequency texture features through a channel attention and spatial attention serial mechanism. A residual module is introduced in the encoder, and an improved color attention mechanism is introduced in the skip connections to achieve adaptive correction of color deviation.
[0011] Step 4: Construct and train an improved Cyclic Consistent Generative Adversarial Network (CGN) model. The CGN model includes the pre-trained reconstruction generator, degradation generator, sharp image discriminator, and blurry image discriminator from Step 3. The trained reconstruction generator is used to reconstruct blurry images and output high-quality sharp images.
[0012] Compared with existing technologies, the present invention has the following significant advantages: (1) The present invention adopts an iterative optimization strategy based on direct binary search, using a composite evaluation function that includes energy concentration and spot shape fitting error, maintaining high focusing efficiency over a wide spectral range, and achieving energy concentration in the PSF main lobe while effectively suppressing side lobes, which is significantly better than traditional designs. (2) The present invention uses deep learning networks to solve the problem of poor generalization of the FDL dataset. The physical degradation model of FDL is implicitly embedded into the degradation generator, forcing the network to understand the physical process of imaging. A two-stage training strategy of "supervised pre-training + unsupervised fine-tuning" is adopted to improve the algorithm's adaptability and robustness in different scenarios.
[0013] The present invention will now be described in further detail with reference to the accompanying drawings. Attached Figure Description
[0014] Figure 1 This is the algorithm framework flow for designing and optimizing DOE in this invention.
[0015] Figure 2 The reconstruction generator network structure designed for this invention.
[0016] Figure 3 This is the improved CBAM attention mechanism module of the present invention.
[0017] Figure 4 The training process of the recurrent consistent network designed for this invention.
[0018] Figure 5 This is a performance comparison chart of the optimal network model of this invention.
[0019] Figure 6 This is a reconstruction image of the real-world target from the present invention.
[0020] Figure 7 This is an experimental setup for a planar diffraction lens imaging system. Detailed Implementation
[0021] The concept of this invention is as follows: a deep learning-based method for reconstructing planar diffractive lenses (FDL) images. This method utilizes a deep learning approach based on a recurrent consistent generative adversarial network (RCL) model to optimize the quality of reconstructed images, thereby improving the performance of FDL imaging systems. The FDL is designed and optimized based on a physical model. A screen capture dataset is obtained using the designed FDL to form an imaging system. A reconstruction generator is designed and pre-trained based on U-net. A RCL is built to construct the generator and discriminator. The model is trained, and the pre-trained weights are fine-tuned according to the loss function. The computation process is optimized during this process to ensure the differentiability of each processing step. Finally, the weights mapping from a blurred image to a sharp image are obtained. The specific steps are as follows:
[0022] Step 1: The height distribution of the microstructure of the planar diffractive lens is discretized and iteratively optimized using a direct binary search algorithm to design a diffractive optical element suitable for imaging.
[0023] Step 1.1: Select S1818 photoresist as the substrate material and use the Cauchy dispersion formula to describe its refractive index. With wavelength The relationship between the changes is calculated, and the phase delay introduced by the i-th ring on the wavelength is determined. Specifically:
[0024]
[0025]
[0026] in, Let be the height of the i-th ring, in Discretize values within the range;
[0027] Step 1.2: Construct an evaluation function that includes an "energy concentration term" and a "morphology fitting term," specifically as follows:
[0028]
[0029] in, This is the energy concentration term, which physically represents the actual light intensity distribution. A larger value indicates that the energy is more concentrated in the main lobe. The shape fitting term is determined by calculating the mean square error between the normalized light intensity value and the ideal Gaussian distribution. The smaller the value, the closer the actual PSF shape is to the ideal Gaussian shape and the fewer the side lobes.
[0030] Step 1.3: Use the Direct Binary Search (DBS) algorithm to find the optimal height sequence, load the initial height distribution of the substrate material, calculate the initial light field and evaluation function value. The initial light field is the point spread function calculated from the DOE initial height distribution. Perturb each ring height of the FDL one by one, i.e., add or subtract a quantization step, calculate the evaluation function value after perturbation, and update the system state according to the change of the evaluation function value. Update the ring height distribution and the current optimal evaluation function value, and complete one round of increase and decrease attempts for the ring. For a ring height, try to increase first, then try to decrease. Regardless of whether the height is finally updated, if the evaluation function value cannot be improved, the ring is determined to have entered the convergence state. In subsequent iterations, the rings in the list will be automatically skipped, and only the remaining rings will be calculated.
[0031] Step 2: Construct an imaging system using optimized diffractive optics and sensors, such as... Figure 7 As shown, the imaging system includes a diffractive optical element, an image acquisition device, an optical displacement stage, and an LED display. The diffractive optical element is fixed 1m in front of the screen, and the image acquisition device is fixed 50mm away from the diffractive optical element. The optical path was rigorously calibrated by adjusting the displacement stage to ensure that the center points of the LED display, the diffractive optical element, and the image acquisition device are on the same horizontal optical path. The LED display displays a truth map of the non-destructive target scene, which is captured by a sensor fixed approximately 50mm behind the LED screen after being encoded using FDL at a distance of 1m.
[0032] The DIV2K color dataset was selected as the lossless target scene. This dataset contains 900 RGB three-channel target scenes. For each scene, it was cut into 9 non-overlapping blocks, each image being 512×512 pixels, resulting in 6708 color scene images. The color scenes were sequentially projected onto an LED display. Using the aforementioned experimental setup, corresponding capture images were obtained by an FDL camera. The dataset was then formed through calibration, cropping, and scaling operations. The sensor used was a QHYCCD astronomical camera.
[0033] Step 3: Build the rebuild generator based on the Unet infrastructure, with the architecture as follows: Figure 2 As shown.
[0034] In a further embodiment, the encoder comprises five cascaded residual blocks. As the layers deepen, the feature map size is gradually halved (downsampled), while the number of channels increases exponentially, from 64 to 1024. Each residual block contains two 3×3 convolutional layers and introduces a direct addition skip path. The decoder comprises four decoding blocks. Each block is first upsampled by a factor of two through transposed convolution, then concatenated with the feature map from the encoder, and finally fused through two standard convolutional layers.
[0035] In a further embodiment, a convolutional block attention module is introduced. Through two sequential sub-modules, channel attention and spatial attention, features are reconstructed from the "content" and "location" dimensions, respectively. Improvements are made specifically for color image reconstruction. Channel attention focuses on the intermediate feature map of the input. Spatial information is aggregated by using global average pooling and global max pooling in parallel, and learnable feature-corrected convolutional layers and residual connections are introduced to dynamically learn the nonlinear bias of color features. The spatial attention mechanism compresses channel information by performing average pooling and max pooling along the channel dimension, highlighting salient regions in the image. Channel attention adjustment is performed first, followed by spatial attention adjustment, as follows:
[0036]
[0037]
[0038]
[0039] in, It is Convolutional networks are used to capture non-linear dependencies between channels. This represents the Sigmoid activation function, used to generate normalized channel weights. , This represents a feature correction module consisting of convolutional layers. This represents the splicing operation along the channel dimension. The CBAM attention mechanism structure is as follows: Figure 3 As shown.
[0040] In a further embodiment, the reconstruction generator is pre-trained, and the overall loss function L is defined as follows:
[0041]
[0042] MSE loss represents mean squared error loss, which ensures the statistical integrity and global contrast of the image by minimizing the pixel-level Euclidean distance between the predicted image and the ground truth. LPIPS loss represents perceptual loss, which can preserve the topological structure, texture features and clear edges of the image, and can significantly reduce the "blur" of the generated image. By increasing the weight of LPIPS to 0.6, the network is given a stronger "detail correction" capability. This weight tilt can drive the network to pay more attention to the sharpening of key details during training iterations. Reducing the weight of MSE loss to 0.4 ensures that the reconstructed image is consistent with the original image in terms of macroscopic power distribution and global brightness.
[0043] Step 4: Construct an improved Cyclic Consistent Generative Adversarial Network (CGINN) model and train it using the training dataset constructed in Step 2. The CGINN model includes the pre-trained reconstruction generator, degradation generator, sharp image discriminator, and blurry image discriminator from Step 3. The trained reconstruction generator is used to reconstruct blurry images and output high-quality sharp images.
[0044] The reconstruction generator is used to map the blurred image to the sharp domain of the ground truth map. , To rebuild the generator, For blurred images, This is the generated image for reconstruction.
[0045] The degradation generator is used to simulate the physical imaging process, mapping sharp images back to the DOE blur domain. It serves as a learnable physical diffraction model; the degradation generator employs a lightweight U-Net architecture to simulate the energy propagation and frequency modulation characteristics of an optical system;
[0046] The degradation generator includes an encoding path, a skip connection layer, and a decoding path. The encoding path includes three cascaded CBR modules, which are downsampled step by step through convolutional layers with a stride of 2 to obtain multi-scale encoded features.
[0047] A channel attention module (SELayer module) is introduced at the skip connection of U-Net. The encoded features of each level are input into the SELayer module, and spatial information is compressed by global average pooling. The dependencies between different channels are learned, channel attention weights are generated, and the features are weighted to obtain spectral weighted encoded features.
[0048] The decoding path progressively upsamples the deepest encoded features through bilinear interpolation. After each upsampling, the features are concatenated with the corresponding spectral weighted encoded features along the channel dimension, and the energy propagation and frequency modulation of the optical system are simulated by the CBR module.
[0049] Both the clear image discriminator and the blurry image discriminator adopt the PatchGAN architecture, which maps the input image into an N×N discriminant matrix, where each element represents the true or false discrimination result of a specific receptive field in the original image.
[0050] The sharp image discriminator and the blurry image discriminator are composed of multiple stacked convolutional blocks. Each convolutional block contains a convolutional layer with a stride of 2, instance normalization, and the LeakyReLU activation function;
[0051] The image is sequentially input into multiple convolutional blocks, and downsampled stepwise through convolutional layers with a stride of 2 to extract multi-scale discriminative features. The features of the last layer are mapped to an N×N discriminant matrix. Each element in the matrix corresponds to the real or fake image discrimination result of a fixed receptive field region in the input image (the closer the value is to 1, the more likely it is to be a real image; the closer it is to 0, the more likely it is to be a generated image). The sharp image discriminator is used to distinguish between the sharp image of GT and the sharp image output by the reconstruction generator, and the blurry image discriminator is used to distinguish between the blurry image of DOE and the blurry image output by the degradation generator.
[0052] In a further embodiment, the recurrent consistent generative adversarial network model is trained in an unsupervised manner, and fine-tuned based on the pre-training to enhance the network's generalization ability. The total loss function is expressed as:
[0053]
[0054] In the formula, To counteract the loss, least-squares generative adversarial loss is used to make the generated image approximate the real image in terms of distribution, thereby improving the realism of texture details. To ensure the reversibility of the mapping, the image should revert to its original state after a closed loop of "reconstruction-degradation" or "degradation-reconstruction". To prevent identity loss, the discriminator should constrain its feature representation to avoid excessive distortion, ensuring its similarity to its ground truth value. Specifically:
[0055]
[0056]
[0057]
[0058] The specific training process is as follows: Figure 4 As shown.
[0059] Step 5: Using blurry images from the validation set obtained by screen capture, input them into the trained generative adversarial network model to obtain clear images for validation. Compare the results with the pre-trained algorithm, using PSNR and SSIM metrics to evaluate network performance. Furthermore, reconstruct the FDL capture image of the real target scene to verify the network's performance in reconstructing real targets.
[0060] Finally, the network weights mapped from the blurred FDL capture map to the sharp image are obtained.
[0061] This invention proposes a deep learning-based planar diffraction lens image reconstruction method, which is a deep learning image reconstruction method with generalization ability and adaptability to real-world scenarios.
Claims
1. A planar diffraction lens image reconstruction method based on deep learning, characterized in that, Includes the following steps: Step 1: The microstructure height distribution of the planar diffractive lens is discretized and iteratively optimized using a direct binary search algorithm to design diffractive optical elements suitable for imaging. Step 2: Construct an imaging system using the diffractive optical elements designed in Step 1, acquire blurred image data, and build a training dataset; Step 3: Construct a reconstruction generator based on the Unet infrastructure and pre-train the reconstruction generator. The reconstruction generator introduces a convolutional block attention module, which enhances the extraction capability of high-frequency texture features through a channel attention and spatial attention serial mechanism. A residual module is introduced in the encoder, and an improved color attention mechanism is introduced in the skip connections to achieve adaptive correction of color deviation. Step 4: Construct and train an improved Cyclic Consistent Generative Adversarial Network (CGN) model. The CGN model includes the pre-trained reconstruction generator, degradation generator, sharp image discriminator, and blurry image discriminator from Step 3. The trained reconstruction generator is used to reconstruct blurry images and output high-quality sharp images.
2. The deep learning-based planar diffraction lens image reconstruction method according to claim 1, characterized in that, The specific method for designing diffractive optical elements suitable for imaging is as follows: The microstructure height distribution of a planar diffractive lens is discretized and iteratively optimized using a direct binary search algorithm. Step 1.1: Determine the substrate material and describe its refractive index using the Cauchy dispersion formula. With wavelength The relationship between the changes is calculated, and the phase delay introduced by the i-th ring on the wavelength is also calculated. Specifically: in, Let be the height of the i-th ring, in Discretize values within the range, This represents the maximum value of the ring's height. The refractive index of the material; Step 1.2: Construct an evaluation function that includes an energy concentration term and a morphology fitting term, specifically as follows: in, For energy concentration, For morphological fitting, For wavelength set, For an ideal light intensity distribution, for The normalized value; Step 1.3: Use the direct binary search algorithm to find the optimal height sequence. The specific method is as follows: The initial height distribution of the substrate material is loaded, and the initial light field and evaluation function value are calculated. The initial light field is the point spread function calculated from the initial height distribution of DOE. The height of each ring in the FDL is perturbed one by one, i.e., increased or decreased by a quantization step. The evaluation function value after perturbation is calculated, and the system state is updated according to the change of the evaluation function value. The ring height distribution and the current evaluation function value are updated, and one round of increase or decrease attempts is completed for the ring. For a ring height, increase is tried first, then decrease is tried. Regardless of whether the height is finally updated, if the evaluation function value cannot be improved, the ring is determined to have entered the convergence state. In subsequent iterations, the rings in the convergence state are automatically skipped, and only the remaining rings are calculated.
3. The planar diffraction lens image reconstruction method based on deep learning according to claim 1, characterized in that, An imaging system is constructed using optimized diffractive optical elements and sensors. The color scene is projected sequentially onto an LED display. The corresponding captured image is obtained by taking pictures through the imaging system, and then a dataset is formed through calibration, cropping, and scaling operations.
4. The deep learning-based planar diffraction lens image reconstruction method according to claim 3, characterized in that, The imaging system includes a diffractive optical element, an image acquisition device, an optical displacement stage, and an LED display. The diffractive optical element is positioned between the target image to be captured and the image acquisition device. The center points of the LED display, the diffractive optical element, and the image acquisition device are on the same horizontal optical path.
5. The planar diffraction lens image reconstruction method based on deep learning according to claim 1, characterized in that, The reconstruction generator includes an encoder and a decoder. The encoder includes five cascaded residual blocks. As the layers deepen, the feature map size is gradually halved while the number of channels increases exponentially. Each residual block contains two 3×3 convolutional layers and introduces a direct addition jump path. The input feature map of the residual coding block is adjusted in dimension by a 1×1 convolutional layer and then directly added element-wise with the output feature map of the second 3×3 convolutional layer before being output through an activation function. The decoder consists of four decoding blocks. Each decoding block is first upsampled by a factor of two through transposed convolution, then concatenated with the feature map from the encoder, and finally fused through two standard convolutional layers. Before the skip connection features output from each level of the encoder are passed to the decoder, the skip connection features of each layer are sequentially connected to the convolutional block attention module: the convolutional block attention module first performs weighted reconstruction of the channel dimension features of the feature map through the channel attention submodule, and then performs weighted reconstruction of the spatial dimension features of the feature map through the spatial attention submodule; the skip connection features processed by the convolutional block attention module are then concatenated with the feature map upsampled by the decoder along the channel dimension, specifically: Channel attention for intermediate feature maps of the input First, the spatial dimension information is aggregated in parallel through global average pooling and global max pooling. Then, a learnable feature correction convolutional layer and residual connection are connected to dynamically learn and correct the nonlinear deviation of color features, and finally the channel dimension weighted feature map is obtained. Spatial attention is applied to the feature map after channel attention adjustment, performing average pooling and max pooling operations along the channel dimension to compress channel-dimensional information and highlight salient regions in the image, ultimately resulting in a spatially weighted feature map. The process involves first adjusting channel attention and then spatial attention, as follows: in, It is Convolutional networks are used to capture non-linear dependencies between channels. This represents the Sigmoid activation function, used to generate normalized channel weights. , For global average pooling, This is global max pooling, where F is the intermediate input feature map. For a convolutional layer with a kernel size of k×k, This represents a feature correction module consisting of convolutional layers. This indicates a splicing operation at the channel dimension.
6. The planar diffraction lens image reconstruction method based on deep learning according to claim 1, characterized in that, The overall loss function L used in the pre-training of the reconstruction generator is: Where MSE loss represents mean squared error loss, and LPIPS represents perceptual loss. and The weight representing the loss.
7. The planar diffraction lens image reconstruction method based on deep learning according to claim 1, characterized in that, The reconstruction generator is used to map a blurred image to a sharp domain to generate a sharp image; The degradation generator is used to simulate the physical imaging process, mapping the sharp image generated by the reconstruction generator back to the blurry domain to generate a blurred image. Both the clear image discriminator and the blurry image discriminator adopt the PatchGAN architecture, which maps the input image into an N×N discriminant matrix. Each element represents the true / false discrimination result of a specific receptive field in the original image. The clear image discriminator is used to distinguish between the ground truth image and the clear image output by the reconstruction generator, and the blurry image discriminator is used to distinguish between the blurry image input to the reconstruction generator and the blurry image output by the degradation generator.
8. The planar diffraction lens image reconstruction method based on deep learning according to claim 7, characterized in that, The degradation generator includes an encoding path, a skip connection layer, and a decoding path. The encoding path includes three cascaded CBR modules, which are downsampled step by step through convolutional layers with a stride of 2 to obtain multi-scale encoded features. A channel attention module is introduced at the skip connection of U-Net. The encoded features of each level are input into the SELayer module. Spatial information is compressed by global average pooling, the dependency relationship of different channels is learned, channel attention weights are generated and the features are weighted to obtain spectral weighted encoded features. The decoding path progressively upsamples the deepest encoded features through bilinear interpolation. After each upsampling, the features are concatenated with the corresponding spectral weighted encoded features along the channel dimension, and the energy propagation and frequency modulation of the optical system are simulated by the CBR module.
9. The planar diffraction lens image reconstruction method based on deep learning according to claim 1, characterized in that, The Cyclic Consistent Generative Adversarial Network (CGN) model is trained in an unsupervised manner, using the following total loss function: In the formula, To combat the losses, For cycle consistency loss, The loss of identity is specifically as follows: in, For a clear domain discriminator, For true GT clear images, For the expectation operation of the sample, For degradation generator, To rebuild the generator, It is an L1 norm.