A method, apparatus, device, and storage medium for single diffraction intensity image reconstruction based on self-supervised deep learning.

The single diffraction intensity image reconstruction method based on self-supervised learning utilizes complementary masks and a dual-channel deep neural network to construct a self-supervised training loss function, which solves the problem of limited generalization ability of deep learning models in hologram reconstruction and achieves efficient and robust hologram reconstruction.

CN119107241BActive Publication Date: 2026-06-30ANHUI UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ANHUI UNIV
Filing Date
2024-09-02
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing deep learning models rely on supervised learning in hologram reconstruction, require a large amount of labeled data, have limited generalization ability, are sensitive to noise, and cannot effectively handle complex scenes and different types of holograms.

Method used

A single diffraction intensity image reconstruction method based on self-supervised learning is adopted. A dataset is constructed by complementary mask downsampling, and the original scene image is reconstructed by using a dual-channel deep neural network and a self-supervised training loss function. This includes the deep neural network architecture of the SSDL-CS framework and the self-supervised training process.

Benefits of technology

This method enables efficient reconstruction of the phase and amplitude of different types of images from a single hologram, improving the model's generalization ability and robustness, reducing dependence on labeled data, and enhancing the quality and speed of hologram reconstruction.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119107241B_ABST
    Figure CN119107241B_ABST
Patent Text Reader

Abstract

This invention relates to a method, apparatus, device, and storage medium for reconstructing single diffraction intensity images based on self-supervised learning. The method includes the following steps: acquiring a single diffraction measurement image; extracting complementary measurement values ​​from the single diffraction measurement image using complementary mask downsampling; constructing a dataset for self-supervised training; reconstructing different estimates of the original scene image using a dual-channel deep neural network based on the complementary measurement values ​​in the training set; constructing a loss function for self-supervised training; minimizing the loss function using an optimizer; and performing self-supervised training of the deep neural network using the loss function; and estimating the single diffraction intensity image scene using the trained deep neural network based on the complementary measurement values ​​in the test set. This invention can reconstruct the phase and amplitude of different types of images.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and specifically to a method, apparatus, device, and storage medium for single diffraction intensity image reconstruction based on self-supervised learning. Background Technology

[0002] In recent years, the development of deep learning has transformed fields such as computational imaging, microscopy, and holographic imaging. Deep learning can not only optimize image reconstruction algorithms by learning from massive amounts of data, thereby improving the quality and resolution of holographic images and making imaging results clearer and more accurate, but it can also optimize reconstruction algorithms to achieve faster image reconstruction and processing, thus meeting the requirements of real-time imaging and processing. The introduction of deep learning technology into holographic imaging can enhance the performance and functionality of imaging systems, improve image quality and accuracy, and expand the application and development prospects of holographic imaging technology in various fields.

[0003] In 2018, Rivenson et al. proposed a new framework for holographic image reconstruction based on deep learning—Holographic Imaging using Deep learning for Extended Focus (HIDEF)—in their paper "Phase recovery and holographic image reconstruction using deep learning in neural networks." Compared to concurrent holographic phase retrieval methods, it can quickly eliminate artifacts related to double images and self-interference using only a single holographic intensity map, improving both the quality and speed of image reconstruction. However, the performance of this deep learning model is highly dependent on the quantity and quality of the training data. It may perform well on specific datasets but poorly on other types or conditions of holographic images, indicating limited generalization ability. Transferring from one type of holographic image to another may be difficult, requiring additional tuning and retraining. Furthermore, this model is sensitive to noise and distortion in the input data, and its stability and robustness may be insufficient when facing different types of interference and noise.

[0004] In 2022, Chen et al. proposed an end-to-end deep neural network called the Fourier Image Network (FIN) in their paper "Fourier Imager Network (FIN): A deep neural network for hologram reconstruction with superior external generalization." Compared with existing deep learning models based on multi-height phase retrieval (MH-PR), FIN exhibits unprecedented generalization performance and is much faster inference. However, the above deep learning-based methods all rely on supervised learning models to supervise the optimization process, requiring large-scale, high-quality, and diverse training datasets. These datasets require significant labor, time, and cost to acquire, align, and preprocess for training, which may lead to inference errors and limited generalization to new types of objects never seen before during training.

[0005] Self-supervised learning has many advantages over supervised learning: (1) it does not require a large amount of labeled data, reducing data requirements and labor costs; (2) it does not require obtaining the true values ​​of the image; (3) it can learn useful representations from the data itself, improving the adaptability and generalization of the algorithm; and (4) it can learn effectively when the amount of data is insufficient, and has good scalability. Because of these advantages of self-supervised deep learning, it solves the shortcomings of traditional hologram reconstruction, such as insufficient resolution, slow data processing speed, weak ability to handle complex scenes, and high imaging cost. Therefore, it is useful and necessary to introduce self-supervised deep learning into hologram reconstruction.

[0006] In 2023, Huang Lei et al. proposed a self-supervised learning method for hologram reconstruction based on physical consistency. This method proposed a self-supervised learning model called GedankenNet, which eliminates the need for labeled or experimental training data. Without prior knowledge of relevant samples, it uses physical consistency loss and artificially randomized images to train the self-supervised learning model, demonstrating effectiveness and superior generalization on hologram reconstruction tasks. However, it has several drawbacks: first, the input holograms must be two or more, limiting the input distance; second, it does not consider the case where the training images are damaged; and finally, the construction of the loss function still has room for optimization.

[0007] Therefore, there is a need to implement a method, apparatus, device, and storage medium for single diffraction intensity image reconstruction based on self-supervised learning. Summary of the Invention

[0008] To address the shortcomings of existing technologies, the present invention aims to provide a method, apparatus, device, and storage medium for single diffraction intensity image reconstruction based on self-supervised learning.

[0009] To achieve the above objectives, the present invention adopts the following technical solution:

[0010] In a first aspect of the present invention, a method for reconstructing a single diffraction intensity image based on self-supervised learning is disclosed, the method comprising the following steps:

[0011] S1. Obtain a single diffraction measurement pattern, and extract complementary measurement values ​​from the single diffraction measurement pattern using complementary mask downsampling to construct a dataset for self-supervised training; the dataset includes a training set and a test set.

[0012] S2. Based on complementary measurements in the training set, different estimates of the original scene image are reconstructed using a dual-channel deep neural network.

[0013] S3. Construct a loss function for self-supervised training, minimize the loss function using an optimizer, and use the loss function to perform self-supervised training on the deep neural network.

[0014] S4. Based on complementary measurements in the test set, estimate the single diffraction intensity image scene using a trained deep neural network.

[0015] The deep neural network is a deep neural network based on the SSDL-CS framework, and the construction process of the deep neural network based on the SSDL-CS framework is as follows:

[0016] S211. Determine the architecture of the SSDL-CS framework;

[0017] The SSDL-CS framework consists of two 1 × 1 convolutional layers at the head and tail, respectively, with several SPAF groups and a large-scale residual connection in between. Each SPAF group contains two recursive SPAF modules, which share the same parameters. There is a short jumper between each SPAF group, thus forming a mesoscale residual connection. There is a small-scale residual connection between the input and output of each SPAF module.

[0018] S212. Use two-dimensional discrete Fourier transform to transform the tensor of the SPAF group to the frequency domain, and use formula (7) to perform a linear transformation on the transformed data in the frequency domain, using a window of size k / 2 to truncate the high-frequency signal:

[0019] (7)

[0020] In formula (7), Indicates the weighted matrix Frequency domain data of different categories or channels Weighted summation yields the processed frequency domain data. ; This represents the truncated frequency domain of the SFAP module input after undergoing a two-dimensional discrete Fourier transform. This represents the trainable weights, where c is the number of channels and the window size is k / 2.

[0021] S213. Use the two-dimensional discrete Fourier inverse transform to obtain the processed data in the spatial domain, and use the parameter rectified linear unit activation function shown in formula (8):

[0022] (8)

[0023] In formula (8), Input values ​​for the activation function. These are learnable parameters;

[0024] S214. Optimize the linear transformation using formula (9):

[0025] (9)

[0026] In formula (9), It is the truncated frequency component. These are trainable weights.

[0027] According to a preferred embodiment of the present invention, step S1, which involves obtaining a single diffraction measurement pattern, extracting complementary measurement values ​​from the single diffraction measurement pattern using complementary mask downsampling, and constructing a dataset for self-supervised training, includes:

[0028] S11. Obtain the light field on the z = 0 plane using formula (1) Complex-valued functions:

[0029] (1)

[0030] In formula (1), Indicates in The light field distribution at a point is a complex function that describes the light wave in the plane. Amplitude and phase information on the wave; The amplitude distribution represents the amplitude or intensity of the light wave in the (x, y) plane. It is a real-valued function and is usually used to describe the brightness distribution of the light wave. Let be a complex exponential function, representing the phase information of the light wave in the (x, y) plane, where j is the imaginary unit. The phase distribution is a real-valued function that describes the phase change of a light wave.

[0031] S12, Based on the light fieldO The complex-valued function is obtained by using formula (2) to obtain the wavelength of the object. Incident coherent light waves After irradiation, at a distance from the object Imaging on a plane :

[0032] (2)

[0033] In equation (2), express The light field distribution at a distance indicates the distance traveled. The light field afterward; The inverse Fourier transform operator transforms a function in the frequency domain back to the spatial domain; Let be the propagation function, also known as the propagation phase factor. This is a function in the frequency domain that describes the phase change of a light wave during propagation. Represents frequency variables. Indicates wavelength. Indicates the propagation distance; It is a Fourier transform operator that transforms a function in the spatial domain to the frequency domain; The distribution of the incident light field at z=0 represents the amplitude and phase information of the light wave on the initial plane z=0; The distribution of the object's light field at z=0 represents the amplitude and phase information of the light wave on the object's plane at z=0;

[0034] in,

[0035] (3)

[0036] In equation (3), The propagation function describes the phase change of a light wave during propagation. For the first half of the propagation phase factor, where, Propagation distance, representing the distance a light wave travels. The wavelength of light. This represents the latter half of the propagation phase factor. Indicates frequency components in Contribution in direction Indicates frequency components in Contribution in a particular direction;

[0037] S13. Use the detector to obtain the diffraction image of the object, and use formula (4) to obtain the intensity of the diffraction image to obtain a single diffraction measurement image:

[0038] (4)

[0039] In equation (4), express The light intensity distribution at a point; light intensity is the square of the light field amplitude, representing the light wave's intensity in the plane. Energy density on; For light field The square of the modulus; Is Light field on a plane The amplitude of the light field is represented by , and the square of the amplitude represents the light intensity. Indicates wavelength and transmission distance The relevant scaling factor, which represents a constant or function, is used to normalize the intensity of the light field; The amplitude of the initial light field or a certain reference amplitude represents the light field before modulation or propagation; I 0 is the detector The captured hologram;

[0040] S14. A pair of complementary masks are used to sample a single diffraction measurement pattern, i.e., the hologram is sampled. Sampling is performed to obtain complementary measurements;

[0041] S15. Based on complementary measurements, construct a dataset for self-supervised training, the dataset including a training set and a test set.

[0042] According to a preferred embodiment of the invention, the pair of complementary masks includes a sampling mask. and sampling mask ;

[0043] The holographic image acquired by the detector is sampled using two sampling branches, with one branch passing through a sampling mask. Perform downsampling to obtain the measured value Another branch connects to the sampling mask. Complementary sampling masks Perform downsampling to obtain the measured value ;

[0044] A pair of complementary masks satisfies the following condition: and The intersection is ,and and The union of these sets is the universal set.

[0045] According to a preferred embodiment of the present invention, step S2, based on complementary measurements in the training set, reconstructs different estimates of the original scene image using a dual-channel deep neural network, including:

[0046] S21. Construct a deep neural network based on the SSDL-CS framework; the deep neural network includes two parallel SSDL-CS systems;

[0047] S22. Input two complementary measurements from the training set into two parallel SSDL-CS, and reconstruct the hologram using formula (5) to obtain different estimates of the original scene image:

[0048] (5)

[0049] In formula (5), The optimal solution or estimated light field, the light field obtained through the optimization process, represents the best light field distribution under given conditions; Represents the light field Optimize to minimize the objective function; The loss function measures the processed light field. With detector The captured hologram The differences between them; This is a regularization term used to introduce a regularization factor for the light field. Constraints or prior knowledge are used to avoid overfitting or preserve specific properties; regularization terms are used to preserve the smoothness, sparsity, or other desired properties of the light field.

[0050] Among them, the thin sample under coherent illumination, Simplified to

[0051] (6)

[0052] In formula (6), For free space transformation matrices, For light field, To randomly detect noise, This is the sampling function for the photoelectric sensor array that records the intensity of the light field.

[0053] According to a preferred embodiment of the present invention, in step S21, constructing a deep neural network based on the SSDL-CS framework includes:

[0054] S211. Determine the architecture of the SSDL-CS framework;

[0055] The SSDL-CS framework consists of two 1 × 1 convolutional layers at the head and tail, respectively, with several SPAF groups and a large-scale residual connection in between. Each SPAF group contains two recursive SPAF modules, which share the same parameters. There is a short jumper between each SPAF group, thus forming a mesoscale residual connection. There is a small-scale residual connection between the input and output of each SPAF module.

[0056] S212. Use two-dimensional discrete Fourier transform to transform the tensor of the SPAF group to the frequency domain, and use formula (7) to perform a linear transformation on the transformed data in the frequency domain, using a window of size k / 2 to truncate the high-frequency signal:

[0057] (7)

[0058] In formula (7), Indicates the weighted matrix Frequency domain data of different categories or channels Weighted summation yields the processed frequency domain data. This is a typical linear transformation operation that uses a window of size k / 2 to truncate high-frequency signals, and is widely used in image processing, signal processing, and other fields. This represents the truncated frequency domain of the SFAP module input after undergoing a two-dimensional discrete Fourier transform. This represents the trainable weights, where c is the number of channels and the window size is k / 2.

[0059] S213. Use the two-dimensional discrete Fourier inverse transform to obtain the processed data in the spatial domain, and use the parameter rectified linear unit activation function shown in formula (8):

[0060] (8)

[0061] In formula (8), Input values ​​for the activation function. These are learnable parameters;

[0062] S214. Optimize the linear transformation using formula (9):

[0063] (9)

[0064] In formula (9), It is the truncated frequency component. These are trainable weights.

[0065] Step S211 defines the architecture of the SSDL-CS framework, providing the foundation for data flow and computation. Step S212 processes the SPAF group in the frequency domain, preparing for the next step of spatial domain processing. Step S213 transforms the frequency domain data back to the spatial domain and applies activation functions to achieve nonlinear transformation. Step S214 optimizes the linear transformation and improves the weights in the processing to further enhance system performance. These steps together constitute the complete processing flow of the SSDL-CS framework, from network architecture design to frequency domain processing, and then to spatial domain activation and optimization; each step supports the final optical field reconstruction or signal processing.

[0066] According to a preferred embodiment of the present invention, step S3, which involves constructing a loss function for self-supervised training, minimizing the loss function using an optimizer, and performing self-supervised training of the deep neural network using the loss function, includes:

[0067] S31. Construct a loss function for self-supervised training and minimize the loss function using an optimizer;

[0068] The loss function Depend on , and It consists of three parts, as shown in formula (11):

[0069]

[0070] In formula (11), It is a predicted value. and hologram The loss between them is a weighted average of the mean squared error loss (MSE), the Fourier domain mean absolute error loss (FDMAE), and the total variation loss:

[0071] (12)

[0072] In formula (12), It is a predicted value. and hologram The losses between It is a predicted value. and hologram Mean absolute error loss in the Fourier domain For predicted values and hologram Mean squared error loss between them These are the loss weight values;

[0073] (13)

[0074] In formula (13), It is the total number of pixels. Indicates the predicted value Perform a Fourier transform. Indicates the hologram Perform a Fourier transform;

[0075] (14)

[0076] In formula (14), It's a hologram. It is a measured value The predicted value, It is the total number of pixels;

[0077] In formula (11), It is a predicted value. and hologram The loss between them is expressed as:

[0078] (15)

[0079] In formula (15), Loss 2 is the predicted value. and hologram I Losses between 0 and 0 L FDMAE ( , I 0) is the predicted value and hologram I The mean absolute error loss in the Fourier domain between 0 and 0. L MSE ( , I 0) is the predicted value and hologram I Mean squared error loss between 0 and 0 , The loss weight value;

[0080] (16)

[0081] (17)

[0082] The loss between the predicted values ​​is expressed as:

[0083] (18)

[0084] In formula (18), Loss Different-loss It is a predicted value. and predicted value The losses between L FDMAE ( , () is the predicted value and predicted value The mean absolute error loss in the Fourier domain between them L MSE ( , () is the predicted value and predicted value Mean squared error loss between them and The loss weight value is calculated as follows:

[0085] (19)

[0086] (20)

[0087] In formulas (19) and (20), It is the total number of pixels. Indicates the predicted value Perform a Fourier transform. Indicates the predicted value Perform a Fourier transform;

[0088] S32. Use the loss function to train the deep neural network using a self-supervised deep learning method.

[0089] In a second aspect, a single diffraction intensity image reconstruction device based on self-supervised learning is disclosed. The device includes a measurement value extraction module, an original scene image reconstruction module, a deep neural network training module, and a single diffraction intensity image scene estimation module.

[0090] The measurement value extraction module is used to acquire a single diffraction measurement pattern, extract complementary measurement values ​​from the single diffraction measurement pattern using complementary mask downsampling, and construct a dataset for self-supervised training; the dataset includes a training set and a test set.

[0091] The original scene image reconstruction module is used to reconstruct different estimates of the original scene image based on complementary measurements in the training set using a dual-channel deep neural network.

[0092] The deep neural network training module is used to construct a loss function for self-supervised training, minimize the loss function using an optimizer, and use the loss function to perform self-supervised training on the deep neural network.

[0093] The single diffraction intensity image scene estimation module is used to estimate the single diffraction intensity image scene based on complementary measurements in the test set and using a trained deep neural network.

[0094] The deep neural network is a deep neural network based on the SSDL-CS framework, and the construction process of the deep neural network based on the SSDL-CS framework is as follows:

[0095] S211. Determine the architecture of the SSDL-CS framework;

[0096] The SSDL-CS framework consists of two 1 × 1 convolutional layers at the head and tail, respectively, with several SPAF groups and a large-scale residual connection in between. Each SPAF group contains two recursive SPAF modules, which share the same parameters. There is a short jumper between each SPAF group, thus forming a mesoscale residual connection. There is a small-scale residual connection between the input and output of each SPAF module.

[0097] S212. Use two-dimensional discrete Fourier transform to transform the tensor of the SPAF group to the frequency domain, and use formula (7) to perform a linear transformation on the transformed data in the frequency domain, using a window of size k / 2 to truncate the high-frequency signal:

[0098] (7)

[0099] In formula (7), Indicates the weighted matrix Frequency domain data of different categories or channels Weighted summation yields the processed frequency domain data. ; This represents the truncated frequency domain of the SFAP module input after undergoing a two-dimensional discrete Fourier transform. This represents the trainable weights, where c is the number of channels and the window size is k / 2.

[0100] S213. Use the two-dimensional discrete Fourier inverse transform to obtain the processed data in the spatial domain, and use the parameter rectified linear unit activation function shown in formula (8):

[0101] (8)

[0102] In formula (8), Input values ​​for the activation function. These are learnable parameters;

[0103] S214. Optimize the linear transformation using formula (9):

[0104] (9)

[0105] In formula (9), It is the truncated frequency component. These are trainable weights.

[0106] In a third aspect of the invention, an electronic device is disclosed, comprising: at least one processor; and a memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the above-described self-supervised learning-based single diffraction intensity image reconstruction method.

[0107] In a fourth aspect of the invention, a machine-readable storage medium is disclosed, which stores executable instructions that, when executed, cause the machine to perform the above-described self-supervised learning-based single diffraction intensity image reconstruction method.

[0108] Compared with the prior art, the advantages of the present invention are:

[0109] To address the aforementioned problems, this invention proposes a complementary sampling self-supervised deep learning reconstruction framework based on a single diffraction intensity image. This framework first starts with a single diffraction measurement image, using a complementary downsampling mask to extract a pair of complementary measurements to construct a dataset for self-supervised training. Using the single diffraction intensity image as input to the deep neural network eliminates distance limitations. Second, starting from the complementary training dataset, a dual-channel deep neural network reconstructs different estimates of the original scene image from the complementary undersampled measurements. Next, a loss function for self-supervised training is constructed, comprising two main parts: the error between the estimated scene and the original scene, and the error between the two different estimated scenes. Introducing a novel differential loss between the predicted values ​​effectively constrains the solution space and improves reconstruction performance. Finally, during neural network training, a complementary sampling mask method is proposed, using two non-overlapping masks to divide the measurements into two disjoint datasets as input. The loss function between the input and predicted values ​​is trained to optimize the network's reconstruction quality. The parameters of the deep neural network are trained by minimizing the loss function through an optimizer, and the trained DNN network estimates the scene from the measurements. After self-supervised training, the complementary sampling self-supervised deep learning reconstruction framework based on single diffraction intensity images can be successfully extended to most experimental holograms to reconstruct the phase and amplitude of different types of images. Attached Figure Description

[0110] Figure 1 This is a flowchart of the single diffraction intensity image reconstruction method based on self-supervised learning in this invention;

[0111] Figure 2 This is a schematic diagram illustrating the principle of constructing a complementary self-supervised training dataset using a complementary downsampling mask in this invention.

[0112] Figure 3 This is a schematic diagram of the single diffraction intensity image reconstruction method based on self-supervised learning in this invention.

[0113] Figure 4 This is a schematic diagram of the deep neural network architecture in this invention. Detailed Implementation

[0114] The present disclosure will be further described below with reference to the accompanying drawings and embodiments:

[0115] It should be noted that the following detailed descriptions are exemplary and intended to provide further explanation of this disclosure. Unless otherwise specified, all technical and scientific terms used in this invention have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains.

[0116] It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the exemplary embodiments according to this disclosure. As used herein, the singular form is intended to include the plural form as well, unless the context clearly indicates otherwise. Furthermore, it should be understood that when the terms "comprising" and / or "including" are used in this specification, they indicate the presence of features, steps, operations, devices, components, and / or combinations thereof.

[0117] Where there is no conflict, the embodiments and features described herein can be combined with each other.

[0118] Example 1

[0119] This embodiment provides a single diffraction intensity image reconstruction method based on self-supervised learning, such as... Figure 1 As shown, the method includes the following steps:

[0120] S1. Starting from a single diffraction measurement pattern, a pair of complementary measurements are extracted using a complementary downsampling mask to construct a dataset for self-supervised training.

[0121] Typically, the light field on the z = 0 plane This can be represented by a complex-valued function:

[0122] (1)

[0123] In the formula, , Let be the amplitude and phase of the light field. Consider the incident coherent light wavefront as... , wavelength is At a distance from the object On the plane, it can be used To indicate:

[0124] (2)

[0125] in, and It refers to the inverse Fourier transform operator and the Fourier transform operator. It is the angular spectral transfer function, which depends on the spatial frequency components. ,wavelength And the propagation distance z. It can be represented as:

[0126] (3)

[0127] Now assume that the detector plane only includes the diffraction pattern of the object, and consider in equation (2) Then the intensity of the measured image can be:

[0128] (4)

[0129] in, It is a propagation operator. It is a detector The captured hologram.

[0130] Figure 2 This is a schematic diagram illustrating the principle of constructing a complementary self-supervised training dataset using a complementary downsampling mask in this invention. Figure 2 middle, For a two-dimensional light field, It is a light field Angular spectrum propagation by detector The captured hologram. Mask. and mask For a pair of complementary masks, the measured values for Through mask Sampled and measured values for Through mask Obtained by sampling.

[0131] S2. Starting from complementary training datasets, use a dual-channel deep neural network to reconstruct different estimates of the original scene image from complementary undersampled measurements.

[0132] Generally speaking, the hologram reconstruction task can be described as an inverse problem:

[0133] (5)

[0134] In formula (5), This represents a vectorized hologram, with the hologram having a dimension of . , It is a light field. Forward imaging model, For loss function, This is the regularization term. Thin samples under coherent illumination... It can be simplified to

[0135] (6)

[0136] In formula (6), For free space transformation matrices, To randomly detect noise, The sampling function for the (photoelectric) sensor array that records the intensity of the light field.

[0137] To address the aforementioned issues, this invention proposes the SSDL-CS framework, which constructs a training dataset using two complementary measurements, inputs them into two parallel SSDL-CS datasets, and designs a special loss function that fits physical constraints for training a deep convolutional neural network using a self-supervised deep learning method.

[0138] The network used for reconstruction is SSDL-CS, which contains two 1 × 1 convolutional layers at the head and tail, respectively, and consists of a series of Spatial Fourier Transform (SPAF) groups and a large-scale residual connection in between. The network architecture is as follows. Figure 3 As shown. In Figure 3 middle, For a two-dimensional light field, It is a light field Angular spectrum propagation by detector The captured hologram. Mask. and mask For a pair of complementary masks, the measured values for Through mask Sampled and measured values for Through mask Obtained by sampling. For measured values The output obtained through the network, For output Predicted values ​​obtained through angular spectrum propagation For measured values The output obtained through the network, For output Predicted values ​​obtained through angular spectrum propagation. ASP (Angular Spectrum Propagation) is a numerical method for simulating the propagation of light waves in space. ASM (Angular Spectrum Method) is a numerical calculation method used to simulate the propagation of light waves in space. It is a predicted value. and hologram The losses between It is a predicted value. and hologram The losses between This represents the loss between the predicted values.

[0139] Each SPAF group contains two recursive SPAF modules that share the same parameters, which can increase network capacity without significantly increasing network size. There is a short jumper between each SPAF group, forming a mesoscale residual connection, while there is a small-scale residual connection between the input and output of each SPAF module. Figure 3 The schematic diagram of the SPAF module shows that after transforming the tensor to the frequency domain using a two-dimensional discrete Fourier transform, a linear transformation is performed on it, and then the high-frequency signal is truncated using a window of size k / 2.

[0140] (7)

[0141] in, This represents the truncated frequency domain after the SFAP module input has undergone a two-dimensional discrete Fourier transform. Let represent the trainable weights, c be the number of channels, and the window size be k / 2. After this linear transformation, a two-dimensional discrete Fourier transform is used to obtain the processed data in the spatial domain, and then the Parametric Rectified Linear Unit (PReLU) activation function is applied.

[0142] (8)

[0143] in, These are learnable parameters.

[0144] PreLU (Parametric Rectified Linear Unit) is a commonly used activation function in deep learning. It's an improvement on ReLU (Rectified Linear Unit), primarily addressing the "dead ReLU" problem that can occur during training, where some neurons may never be activated (the output is always 0). PreLU is mainly used in deep neural networks, especially when training deep networks, where it can significantly improve training effectiveness and model performance. It has wide applications in different types of neural networks, including Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). When the input x is greater than or equal to 0, the PreLU output is x itself. When the input is negative, the output is ax, where a is a learnable parameter, usually a positive number. This allows PreLU to have a non-zero slope when the input x is negative.

[0145] This invention uses the PreLU activation function and has the following characteristics:

[0146] (1) Flexibility: By introducing the learning parameter a, the PreLU activation function allows different slopes on the negative half axis, thereby increasing the nonlinear expressive power of the model.

[0147] (2) Avoid “dead neurons”: Unlike the ReLU function, PreLU allows gradients in the negative region, which helps to alleviate the common “dead neuron” problem in the ReLU function, that is, when the input is negative, the output of ReLU is zero, causing the neuron to fail to update.

[0148] (3) Adaptability: The parameter a is learnable, so the network can adaptively adjust the slope of the negative region during training. PreLU is a common activation function, especially in deep convolutional neural networks (CNNs), which helps to improve model performance and training efficiency.

[0149] To adapt the SPAF module to high-resolution image processing in deeper networks, we reduced the matrix size. This significantly reduces the model size. The optimized linear transformation is defined as:

[0150] (9)

[0151] in, It is the truncated frequency component. These are trainable weights.

[0152] Specifically, the two SPAF modules in each SPAF group share hyperparameters, but a pyramidal setup is applied, which reduces the k-value in deeper SPAF modules. This pyramidal structure maps the high-frequency information of the holographic diffraction pattern to the low-frequency region of the first few layers and passes the low-frequency information to subsequent layers with a smaller window size, thereby making better use of multi-scale features while greatly reducing the model size and avoiding potential overfitting and generalization problems.

[0153] To obtain two complementary measurements in self-supervised deep learning, the hologram is downsampled using two complementary masks, with one branch passing through the mask. Sampling yields measured values Another route is with Complementary masks Sampling The two complementary measurements are fed into two parallel networks, SSDL-CS (complementary single-layer), for self-supervised deep learning training. During network optimization, the predicted values ​​are used... and hologram Losses between ;2) Predicted value and hologram Losses between 3) Loss between predicted values The reconstruction loss is calculated, and the hologram can be reconstructed in real time using SSDL-CS in the final reconstruction stage.

[0154] Figure 4 The overall process of the complementary sampling self-supervised deep learning reconstruction framework for single diffraction intensity images proposed in this invention is given. Figure 4 In this context, conv stands for Convolution, which in mathematics represents the convolution operation. SPAF stands for Spatial Fourier Transform. Residual connection represents a residual connection. Short skip connection represents a short skip connection. FT represents Fourier Transform, and IFT represents Inverse Fourier Transform. While existing network architectures exist, this invention proposes a novel architecture, SSDL-CS. SSDL-CS stands for Self-Supervised Deep Learning Reconstruction Framework Using Single Diffraction IntensityImage and Complementary Sampling, representing a self-supervised deep learning reconstruction framework using a single diffraction intensity image and complementary sampling.

[0155] like Figure 4 As shown, the entire architecture operates as follows:

[0156] (1) Data input: The input data first passes through two 1×1 convolutional layers in the head, the number of channels is adjusted and preliminary processing is performed.

[0157] (2) SPAF group processing: After several SPAF groups, each SPAF group contains two recursive SPAF modules. There are mesoscale residual connections between SPAF groups and small-scale residual connections within each module.

[0158] (3) Frequency domain processing: In step S212, the data is converted to the frequency domain using a two-dimensional discrete Fourier transform, and then linear transformation and high-frequency signal truncation are performed. In step S213, the data is converted back to the spatial domain using a two-dimensional inverse discrete Fourier transform, and the PreLU activation function is applied for nonlinear processing.

[0159] (4) Optimization and output: The linear transformation is optimized in step S214, and the weights are further adjusted. Finally, the data passes through two 1×1 convolutional layers at the end, and the processing result is output.

[0160] The proposed SSDL-CS framework combines spatial and frequency domain processing methods, extracting and processing features through recursive SPAF modules and residual connections. It utilizes two-dimensional discrete Fourier transform and PreLU activation functions for processing and nonlinear transformations in both the frequency and spatial domains, and further improves performance through optimization steps. The entire architecture is designed to efficiently extract and process image or signal features, achieving a combination of compressed sensing and deep learning.

[0161] S3. By constructing a loss function for self-supervised training, it mainly consists of two parts: the error between the estimated scene and the original scene, and the difference loss between the two different estimated scenes.

[0162] Typical supervised deep learning methods use only a single reconstruction network, which is trained on a large training dataset to obtain a deep convolutional neural network. After training, it can be directly used to reconstruct the original image from the measurement values. However, supervised learning has the following shortcomings: 1) It requires a large amount of manually labeled data to train the model; 2) The generalization ability of the model depends on the quality and quantity of labeled data; 3) There is still room for improvement in the construction of deep neural network architecture and the optimization of training mechanisms.

[0163] The classic loss function in supervised learning is defined as:

[0164] (10)

[0165] However, this paper proposes to use a self-supervised deep learning method, constructing a training dataset using measurements obtained through complementary mask downsampling. The loss function for self-supervised deep learning is defined by... , and It consists of three parts, represented as:

[0166]

[0167] (1) It is a predicted value. and hologram The losses between these are the Mean Square Error (MSE) loss and the Fourier Domain Mean Absolute Error (FDMAE) loss.

[0168] (12)

[0169] in, These are the loss weights. They are calculated as follows:

[0170] (13)

[0171] (14)

[0172] Among them, It's a hologram. It is a measured value The predicted value, It is the total number of pixels. It represents the two-dimensional discrete Fourier transform.

[0173] (2) It is a predicted value. and hologram The loss between them is expressed as:

[0174] (15)

[0175] in, and These are the loss weights. They are calculated as follows:

[0176] (16)

[0177] (17)

[0178] (3) The loss between predicted values ​​can be expressed as:

[0179] (19)

[0180] in, These are the loss weights. They are calculated as follows:

[0181] (20)

[0182] (twenty one)

[0183] S4. Train the parameters of the deep neural network by minimizing the loss function through the optimizer, and finally use the trained CNN network to estimate the scene from the measurement values.

[0184] Example 2

[0185] This embodiment provides a single diffraction intensity image reconstruction device based on self-supervised learning. The device includes a measurement value extraction module, an original scene image reconstruction module, a deep neural network training module, and a single diffraction intensity image scene estimation module.

[0186] The measurement value extraction module is used to acquire a single diffraction measurement pattern, extract complementary measurement values ​​from the single diffraction measurement pattern using complementary mask downsampling, and construct a dataset for self-supervised training; the dataset includes a training set and a test set.

[0187] The original scene image reconstruction module is used to reconstruct different estimates of the original scene image based on complementary measurements in the training set using a dual-channel deep neural network.

[0188] The deep neural network training module is used to construct a loss function for self-supervised training, minimize the loss function using an optimizer, and use the loss function to perform self-supervised training on the deep neural network to train the parameters of the deep neural network.

[0189] The single diffraction intensity image scene estimation module is used to estimate the single diffraction intensity image scene based on complementary measurements in the test set and using a trained deep neural network.

[0190] Example 3

[0191] This embodiment also provides an electronic device, including: at least one processor; and a memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the self-supervised learning-based single diffraction intensity image reconstruction method as described above.

[0192] In this embodiment, the electronic device may include, but is not limited to: personal computer, server computer, workstation, desktop computer, laptop computer, notebook computer, mobile computing device, smartphone, tablet computer, cellular phone, personal digital assistant (PDA), handheld device, messaging device, wearable computing device, consumer electronic device, etc.

[0193] Example 4

[0194] This embodiment also provides a machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the self-supervised learning-based single diffraction intensity image reconstruction method as described above.

[0195] Specifically, a system or apparatus equipped with a readable storage medium may be provided, on which software program code implementing the functions of any of the embodiments described above is stored, and the computer or processor of the system or apparatus can read and execute the instructions stored in the readable storage medium.

[0196] In this case, the program code read from the readable medium itself can perform the functions of any of the above embodiments, and therefore the machine-readable code and the readable storage medium storing the machine-readable code constitute a part of this specification.

[0197] Examples of readable storage media include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD-RW), magnetic tapes, non-volatile memory cards, and ROMs. Alternatively, program code can be downloaded from a server computer or the cloud via a communication network.

[0198] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0199] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0200] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0201] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0202] The above-described embodiments are merely preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Various modifications and improvements made by those skilled in the art to the technical solutions of the present invention without departing from the spirit of the present invention should fall within the protection scope defined by the claims of the present invention.

Claims

1. A method for reconstructing a scene from a single diffraction intensity image based on self-supervised learning, characterized in that, The method includes the following steps: S1. Obtain a single diffraction measurement pattern, and extract complementary measurement values ​​from the single diffraction measurement pattern using complementary mask downsampling to construct a dataset for self-supervised training; the dataset includes a training set and a test set; S2. Based on complementary measurements in the training set, different estimates of the original scene image are reconstructed using a dual-channel deep neural network. S3. Construct a loss function for self-supervised training, minimize the loss function using an optimizer, and use the loss function to perform self-supervised training on the deep neural network. S4. Based on complementary measurements in the test set, estimate the single diffraction intensity image scene using a trained deep neural network. The deep neural network is a deep neural network based on the SSDL-CS framework, comprising two parallel SSDL-CS instances; the construction process of the deep neural network based on the SSDL-CS framework is as follows: S211. Determine the architecture of the SSDL-CS framework; The SSDL-CS framework consists of two 1 × 1 convolutional layers at the head and tail, respectively, with several SPAF groups and a large-scale residual connection in between. Each SPAF group contains two recursive SPAF modules, which share the same parameters. There is a short jumper between each SPAF group, thus forming a mesoscale residual connection. There is a small-scale residual connection between the input and output of each SPAF module. S212. Use two-dimensional discrete Fourier transform to transform the tensor of the SPAF group to the frequency domain, and use formula (7) to perform a linear transformation on the transformed data in the frequency domain, using a window of size k / 2 to truncate the high-frequency signal: (7) In formula (7), Indicates the weighted matrix Frequency domain data of different categories or channels Weighted summation yields the processed frequency domain data. ; This represents the truncated frequency domain of the SFAP module input after undergoing a two-dimensional discrete Fourier transform. This represents the trainable weights, where c is the number of channels and the window size is k / 2. S213. Use the two-dimensional discrete Fourier inverse transform to obtain the processed data in the spatial domain, and use the parameter rectified linear unit activation function shown in formula (8): (8) In formula (8), Input values ​​for the activation function. These are learnable parameters; S214. Optimize the linear transformation using formula (9): (9) In formula (9), It is the truncated frequency component. These are trainable weights.

2. The method for reconstructing a single diffraction intensity image scene based on self-supervised learning according to claim 1, characterized in that, Step S1 involves acquiring a single diffraction measurement pattern, extracting complementary measurement values ​​from the single diffraction measurement pattern using complementary mask downsampling, and constructing a dataset for self-supervised training, including: S11. Obtain the light field on the z = 0 plane using formula (1) Complex-valued functions: (1) In formula (1), Indicates in The light field distribution at a point is a complex function that describes the light wave in the plane. Amplitude and phase information on the wave; The amplitude distribution represents the amplitude or intensity of the light wave in the (x, y) plane. It is a real-valued function used to describe the brightness distribution of the light wave. Let be a complex exponential function, representing the phase information of the light wave in the (x, y) plane, where j is the imaginary unit. The phase distribution is a real-valued function that describes the phase change of a light wave. S12, Based on the light field O The complex-valued function is obtained by using formula (2) to obtain the wavelength of the object. Incident coherent light waves After irradiation, at a distance from the object Imaging on a plane : (2) In equation (2), express The light field distribution at a distance indicates the distance traveled. The light field afterward; The inverse Fourier transform operator transforms a function in the frequency domain back to the spatial domain; Let be the propagation function, also known as the propagation phase factor. This is a function in the frequency domain that describes the phase change of a light wave during propagation. Represents frequency variables. Indicates wavelength. Indicates the propagation distance; It is a Fourier transform operator that transforms a function in the spatial domain to the frequency domain; The distribution of the incident light field at z=0 represents the amplitude and phase information of the light wave on the initial plane z=0; The distribution of the object's light field at z=0 represents the amplitude and phase information of the light wave on the object's plane at z=0; in, (3) In equation (3), The propagation function describes the phase change of a light wave during propagation. For the first half of the propagation phase factor, where, Propagation distance, representing the distance a light wave travels. The wavelength of light. This represents the latter half of the propagation phase factor. Indicates frequency components in Contribution in direction Indicates frequency components in Contribution in a particular direction; S13. Use the detector to obtain the diffraction image of the object, and use formula (4) to obtain the intensity of the diffraction image to obtain a single diffraction measurement image: (4) In equation (4), express z=0 The light intensity distribution at a point, where light intensity is the square of the light field amplitude, representing the light wave in the plane. z=0 Energy density on; For light field The square of the modulus; Is Light field on a plane The amplitude of the light field is represented by , and the square of the amplitude represents the light intensity. Indicates wavelength and transmission distance The relevant scaling factor, which represents a constant or function, is used to normalize the intensity of the light field; The amplitude of the initial light field or a certain reference amplitude represents the light field before modulation or propagation; I 0 is the detector The captured hologram; S14. A pair of complementary masks are used to sample a single diffraction measurement pattern, i.e., the hologram is sampled. Sampling is performed to obtain complementary measurements; S15. Based on complementary measurements, construct a dataset for self-supervised training, the dataset including a training set and a test set.

3. The method for reconstructing a single diffraction intensity image scene based on self-supervised learning according to claim 2, characterized in that, The pair of complementary masks includes a sampling mask. and sampling mask ; The holographic image acquired by the detector is sampled using two sampling branches, with one branch passing through a sampling mask. Perform downsampling to obtain the measured value Another branch connects to the sampling mask. Complementary sampling masks Perform downsampling to obtain the measured value ; A pair of complementary masks satisfies the following condition: and The intersection is ,and and The union of these sets is the universal set.

4. The method for reconstructing a single diffraction intensity image scene based on self-supervised learning according to claim 3, characterized in that, Step S2, based on complementary measurements from the training set, uses a dual-channel deep neural network to reconstruct different estimates of the original scene image, including: S21. Construct a deep neural network based on the SSDL-CS framework; S22. Input two complementary measurements from the training set into two parallel SSDL-CS, and reconstruct the hologram using formula (5) to obtain different estimates of the original scene image: (5) In formula (5), The optimal solution or estimated light field, the light field obtained through the optimization process, represents the best light field distribution under given conditions; Represents the light field Optimize to minimize the objective function; The loss function measures the processed light field. With detector The captured hologram The differences between them; This is a regularization term used to introduce a regularization factor for the light field. Constraints or prior knowledge are used to avoid overfitting or preserve specific properties; regularization terms are used to preserve the smoothness, sparsity, or other desired properties of the light field. Among them, the thin sample under coherent illumination, Simplified to (6) In formula (6), For free space transformation matrix, For light field, To randomly detect noise, This is the sampling function for the photoelectric sensor array that records the intensity of the light field.

5. The method for reconstructing a single diffraction intensity image scene based on self-supervised learning according to claim 1, characterized in that, Step S3, which involves constructing a loss function for self-supervised training, minimizing the loss function using an optimizer, and performing self-supervised training of the deep neural network using the loss function, includes: S31. Construct a loss function for self-supervised training and minimize the loss function using an optimizer; The loss function Depend on , and It consists of three parts, as shown in formula (11): In formula (11), It is a predicted value. and hologram The loss between them is a weighted average of the mean squared error loss (MSE), the Fourier domain mean absolute error loss (FDMAE), and the total variation loss: (12) In formula (12), It is a predicted value. and hologram The losses between It is a predicted value. and hologram Mean absolute error loss in the Fourier domain For predicted values and hologram Mean squared error loss between them These are the loss weight values; (13) In formula (13), It is the total number of pixels. Indicates the predicted value Perform a Fourier transform. Indicates the hologram Perform a Fourier transform; (14) In formula (14), It's a hologram. It is a measured value The predicted value, It is the total number of pixels; In formula (11), It is a predicted value. and hologram The loss between them is expressed as: (15) In formula (15), Loss 2 is the predicted value. and hologram I Losses between 0 and 0 L FDMAE ( , I 0) is the predicted value and hologram I The mean absolute error loss in the Fourier domain between 0 and 0. L MSE ( , I 0) is the predicted value and hologram I Mean squared error loss between 0 and 0 , The loss weight value; (16) (17) The loss between the predicted values ​​is expressed as: (18) In formula (18), It is a predicted value. and predicted value The losses between L FDMAE ( , () is the predicted value and predicted value The mean absolute error loss in the Fourier domain between them L MSE ( , () is the predicted value and predicted value Mean squared error loss between them and The loss weight value is calculated as follows: (19) (20) In formulas (19) and (20), It is the total number of pixels. Indicates the predicted value Perform a Fourier transform. Indicates the predicted value Perform a Fourier transform; S32. Use the loss function to train the deep neural network using a self-supervised deep learning method.

6. A single diffraction intensity image reconstruction device based on self-supervised learning, characterized in that, The device includes a measurement value extraction module, an original scene image reconstruction module, a deep neural network training module, and a single diffraction intensity image scene estimation module; The measurement value extraction module is used to acquire a single diffraction measurement pattern, extract complementary measurement values ​​from the single diffraction measurement pattern using complementary mask downsampling, and construct a dataset for self-supervised training; the dataset includes a training set and a test set; The original scene image reconstruction module is used to reconstruct different estimates of the original scene image based on complementary measurements in the training set using a dual-channel deep neural network. The deep neural network training module is used to construct a loss function for self-supervised training, minimize the loss function using an optimizer, perform self-supervised training on the deep neural network using the loss function, and train the parameters of the deep neural network. The single diffraction intensity image scene estimation module is used to estimate the single diffraction intensity image scene based on complementary measurements in the test set and using a trained deep neural network. The deep neural network is a deep neural network based on the SSDL-CS framework, comprising two parallel SSDL-CS instances; the construction process of the deep neural network based on the SSDL-CS framework is as follows: S211. Determine the architecture of the SSDL-CS framework; The SSDL-CS framework consists of two 1 × 1 convolutional layers at the head and tail, respectively, with several SPAF groups and a large-scale residual connection in between. Each SPAF group contains two recursive SPAF modules, which share the same parameters. There is a short jumper between each SPAF group, thus forming a mesoscale residual connection. There is a small-scale residual connection between the input and output of each SPAF module. S212. Use two-dimensional discrete Fourier transform to transform the tensor of the SPAF group to the frequency domain, and use formula (7) to perform a linear transformation on the transformed data in the frequency domain, using a window of size k / 2 to truncate the high-frequency signal: (7) In formula (7), Indicates the weighted matrix Frequency domain data of different categories or channels Weighted summation yields the processed frequency domain data. ; This represents the truncated frequency domain of the SFAP module input after undergoing a two-dimensional discrete Fourier transform. This represents the trainable weights, where c is the number of channels and the window size is k / 2. S213. Use the two-dimensional discrete Fourier inverse transform to obtain the processed data in the spatial domain, and use the parameter rectified linear unit activation function shown in formula (8): (8) In formula (8), Input values ​​for the activation function. These are learnable parameters; S214. Optimize the linear transformation using formula (9): (9) In formula (9), It is the truncated frequency component. These are trainable weights.

7. An electronic device, characterized in that, include: At least one processor; and a memory, the memory storing instructions that, when executed by the at least one processor, cause the at least one processor to... One less processor is needed to execute the single diffraction intensity image reconstruction method based on self-supervised learning as described in any one of claims 1 to 5.

8. A machine-readable storage medium, characterized in that, It stores executable instructions that, when executed, cause the machine to perform the single diffraction intensity image reconstruction method based on self-supervised learning as described in any one of claims 1 to 5.