Image reconstruction method, system and medium for coded aperture compressive sensing lens

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By setting a coded aperture mask with random patterns in the optical path and combining it with a deep learning reconstruction network, the problem of image quality degradation in low-light and high-frame-rate scenes of traditional imaging systems is solved, and efficient image reconstruction and perception quality improvement are achieved.

CN122265066APending Publication Date: 2026-06-23HANGZHOU HUICUI INTELLIGENT TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: HANGZHOU HUICUI INTELLIGENT TECH CO LTD
Filing Date: 2026-03-24
Publication Date: 2026-06-23

Application Information

Patent Timeline

24 Mar 2026

Application

23 Jun 2026

Publication

CN122265066A

IPC: G06T5/60; G06T5/50; G06N3/0455; G06N3/084

AI Tagging

Application Domain

Image enhancement Biological models

Technology Topics

Pattern recognitionResolution recovery

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Traditional imaging systems are limited by hardware noise, sampling delay and storage bandwidth in low-light environments, weak signal observation or high frame rate video acquisition scenarios, resulting in degraded image quality. Furthermore, existing compressed sensing methods have high computational complexity and poor real-time performance.

Method used

By employing a compressed sensing lens based on coded aperture, projection modulation is achieved by setting a coded aperture mask with random patterns in the optical path. Combined with a deep learning reconstruction network and a multi-head attention mechanism, physical compression and information recovery are realized.

Benefits of technology

It significantly reduces the amount of data collected, improves image reconstruction efficiency and perception quality, adapts to multiple scenarios, and is suitable for video surveillance and high-resolution medical microscopic image reconstruction under low light conditions.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122265066A_ABST

Patent Text Reader

Abstract

The application provides an image reconstruction method, system and medium of a compressive sensing lens based on a coded aperture, the method comprising: acquiring an original scene image, performing projection modulation to obtain an image received by a sensor; constructing a sensing matrix and a projection matrix, mapping a transfer function of a mask based on the sensing matrix, performing spatial coding sampling on the image received by the sensor through the projection matrix to obtain a compressed measurement result; constructing a deep learning reconstruction network, processing the compressed measurement result based on the deep learning reconstruction network to obtain a two-dimensional feature map; capturing different subspace information based on the two-dimensional feature map through a multi-head attention mechanism, restoring the image resolution based on the different subspace information, and outputting a reconstructed image; and using the mask to realize physical compression, greatly reducing the data acquisition amount, restoring information through the deep learning reconstruction network, improving the image reconstruction efficiency and sensing quality, and adapting to multiple scene environments.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of image reconstruction technology, and more specifically, to an image reconstruction method, system, and medium based on a compressed sensing lens with coded aperture. Background Technology

[0002] With the rapid development of technologies such as artificial intelligence, image processing, and computational photography, traditional imaging systems are facing unprecedented innovation. Traditional imaging systems primarily rely on the classic Nyquist sampling theorem, requiring sampling at a rate no less than twice the signal bandwidth to fully reconstruct the image. However, in low-light environments, weak signal observations, or high-frame-rate video acquisition scenarios, traditional sampling systems are often limited by factors such as hardware noise, sampling delay, and storage bandwidth, leading to a significant deterioration in image quality. Especially in fields such as medical microscopy imaging, astronomical telescope observation, and intelligent monitoring, compressed imaging technology is gradually becoming a research hotspot due to limitations in the sensitivity, speed, and cost of imaging equipment.

[0003] The theory of compressed sensing (CS) has provided a new paradigm for signal acquisition. Its core idea is that if a signal exhibits sparsity in a transform domain, it can be sampled at a rate much lower than the Nyquist rate, and the original signal can be recovered using a nonlinear reconstruction algorithm. Common reconstruction methods include regularized minimization, matching pursuit (OMP), LASSO, and Basis Pursuit. However, these methods suffer from high computational complexity and poor real-time performance. Summary of the Invention

[0004] The purpose of this application is to provide an image reconstruction method, system, and medium based on a compressed sensing lens with coded aperture. Physical compression is achieved through a mask, which greatly reduces the amount of data collected. Information is restored through a deep learning reconstruction network, which improves image reconstruction efficiency and perception quality and adapts to multiple scene environments.

[0005] This application also provides an image reconstruction method based on a compressed sensing lens with coded aperture, including: The original scene image is acquired, and a coded aperture mask with a random pattern is set in the optical path to project and modulate the original scene image to obtain the image received by the sensor. Construct a perception matrix and a projection matrix. Based on the transfer function of the mask mapped by the perception matrix, spatially encode and sample the image received by the sensor through the projection matrix to obtain compressed measurement results. A deep learning reconstruction network is constructed, and the compressed measurement results are processed based on the deep learning reconstruction network to obtain a two-dimensional feature map; The multi-head attention mechanism is used to capture information from different subspaces of the 2D feature map, and the image resolution is restored based on the information from different subspaces to output the reconstructed image. Optionally, in the image reconstruction method of the compressed sensing lens based on coded aperture described in the embodiments of this application, the optical path construction method includes: Scene objective: Obtain the original 3D scene Assuming the focal plane is in The projection onto the imaging plane yields a two-dimensional projection. ; Encoded mask: A mask pattern fixed to the front of the lens. The mask pattern is a random distribution of 0-1; Main lens imaging: Imaging light enters the lens after passing through a coded mask, and undergoes imaging transformation. Intensity distribution after diffraction modulation ; Photosensitive sensor: obtains data by sampling on the imaging surface. The data is converted from digital to analog and then sent to the data processing module. in: Original image; Mask pattern; Encoded image; : Compress the observation vector.

[0006] Optionally, in the image reconstruction method of the compressed sensing lens based on coded aperture described in the embodiments of this application, the transfer function of the coded aperture mask is modeled as follows: Let the image be The mask is The transfer function formula is then constructed as follows: ; That is, in the frequency domain, the image becomes the convolution of the original image spectrum and the mask spectrum.

[0007] Optionally, in the image reconstruction method of the compressed sensing lens based on coded aperture described in the embodiments of this application, the sensing matrix construction method is as follows: Let the real scene image be The size of the photosensitive surface is The random coding mask is The mask modulates the light intensity using a multiplicative modulation method; therefore, the image received by the sensor is: ; in, This represents the Hadamard product, which refers to the element-wise multiplication between two matrices (or vectors) of the same dimension.

[0008] Obtain compressed information, for Spatial coding sampling is performed using a projection matrix. To achieve dimensionality reduction sampling, let compressed measurement be denoted as... ,but: ; Introducing the overall perception matrix, the formula is as follows: .

[0009] Optionally, in the image reconstruction method of the compressed sensing lens based on coded aperture described in the embodiments of this application, the network training and reconstruction process is as follows: Data generation: Extract training images from publicly available image datasets; Randomly select mask pattern Generate compressed observation vectors ; Neural network structure: Input layer: vector Mapped to the initial image estimate; Encoding layer: Multiple Transformer encoder layers model long-distance dependencies between image patches; Decoding layer: Upsamples to the target resolution using convolution and PixelShuffle; Output layer: Restores the image ; Construct the comprehensive loss function: ; ▽ (gradient symbol): This represents the spatial gradient calculation of the image (usually the difference between the horizontal and vertical directions). λ1, λ2, and λ3 represent weighting coefficients used to balance the importance of different terms in the loss function; λ1: Weights that control pixel-level reconstruction error (L2 loss); λ2: The weights of the gradient constraint term (L1 loss) are used to preserve the edge structure of the image; λ3: Weights that control the Structural Similarity (SSIM) loss, used to improve the perceptual quality of the image; By adjusting these coefficients, the model's performance on different tasks (such as detail restoration and edge enhancement) can be optimized.

[0010] Where: the first term is the pixel reconstruction error; the second term is the gradient constraint term, which preserves the edge structure; and the third term is the structural similarity index SSIM. After backpropagation training using the Adam optimizer, the network can be deployed on edge devices for fast decoding once it converges.

[0011] Where: the first term is the pixel reconstruction error; the second term is the gradient constraint term, which preserves the edge structure; and the third term is the structural similarity index SSIM. After backpropagation training using the Adam optimizer, the network can be deployed on edge devices for fast decoding once it converges.

[0012] Optionally, in the image reconstruction method of a compressed sensing lens based on coded aperture described in this application embodiment, different subspace information is captured from the two-dimensional feature map based on a multi-head attention mechanism, and image resolution is restored based on the different subspace information to output a reconstructed image, specifically including: Subspace partitioning is performed based on different feature dimensions of the two-dimensional feature map; Information is captured for each subspace based on a multi-head attention mechanism to obtain information for each subspace; All subspace information is spliced and fused to obtain the fused feature information; Image resolution is restored based on the fused feature information, and the reconstructed image is output.

[0013] Secondly, embodiments of this application provide an image reconstruction system based on a compressed sensing lens with coded aperture. The system includes a memory and a processor. The memory includes a program for an image reconstruction method based on a compressed sensing lens with coded aperture. When the program for the image reconstruction method based on a compressed sensing lens with coded aperture is executed by the processor, it performs the following steps: The original scene image is acquired, and a coded aperture mask with a random pattern is set in the optical path to project and modulate the original scene image to obtain the image received by the sensor. Construct a perception matrix and a projection matrix. Based on the transfer function of the mask mapped by the perception matrix, spatially encode and sample the image received by the sensor through the projection matrix to obtain compressed measurement results. A deep learning reconstruction network is constructed, and the compressed measurement results are processed based on the deep learning reconstruction network to obtain a two-dimensional feature map; The multi-head attention mechanism is used to capture information from different subspaces of the 2D feature map, and the image resolution is restored based on the information from different subspaces to output the reconstructed image.

[0014] Optionally, in the image reconstruction system of a compressed sensing lens based on coded aperture described in this application embodiment, the optical path construction method includes: Scene objective: Obtain the original 3D scene Assuming the focal plane is in The projection onto the imaging plane yields a two-dimensional projection. ; Encoded mask: A mask pattern fixed to the front of the lens. The mask pattern is a random distribution of 0-1; Main lens imaging: Imaging light enters the lens after passing through a coded mask, and undergoes imaging transformation. Intensity distribution after diffraction modulation ; Photosensitive sensor: obtains data by sampling on the imaging surface. The data is converted from digital to analog and then sent to the data processing module. in: Original image; Mask pattern; Encoded image; : Compress the observation vector.

[0015] Optionally, in the image reconstruction system of the compressed sensing lens based on coded aperture described in the embodiments of this application, the transfer function of the coded aperture mask is modeled as follows: Let the image be The mask is The transfer function formula is then constructed as follows: ; This represents the intensity distribution of the original scene image in the spatial domain. Note: This is a two-dimensional function describing the image at each pixel. The light intensity value at a given location is usually a real matrix (such as grayscale value or single-channel brightness).

[0016] The pattern function of the encoded aperture mask is a binary random distribution with values of 0 or 1. Explanation: The mask physically modulates the image by blocking (0) or transmitting (1) light rays, achieving compressed sampling. For example, This indicates that light passes through at that location. It indicates that something is being blocked.

[0017] This represents the Fourier transform operator. Explanation: It transforms spatial domain signals (such as...) into Fourier transform operators. or The frequency domain is then transformed to obtain the corresponding spectral representation.

[0018] The frequency domain representation of the coded image received by the sensor is the original image spectrum. With mask spectrum The convolution result reflects the modulated frequency domain information.

[0019] Meaning: The convolution operator represents the convolution operation of two functions in the frequency domain, i.e. .

[0020] Meaning: Frequency domain coordinate variable. Explanation: Corresponds to spatial frequency components, for example... Indicates the frequency in the horizontal direction. Indicates the frequency in the vertical direction.

[0021] That is, in the frequency domain, the image becomes the convolution of the original image spectrum and the mask spectrum.

[0022] Thirdly, embodiments of this application also provide a computer-readable storage medium, which includes an image reconstruction method program for a compressed sensing lens based on coded aperture. When the image reconstruction method program for a compressed sensing lens based on coded aperture is executed by a processor, it implements the steps of the image reconstruction method for a compressed sensing lens based on coded aperture as described in any of the preceding claims.

[0023] As can be seen from the above, the image reconstruction method, system, and medium based on a compressed sensing lens with coded aperture provided in this application embodiment acquires an original scene image, projects and modulates the original scene image by setting a coded aperture mask with a random pattern in the optical path, and obtains an image received by the sensor; constructs a sensing matrix and a projection matrix, maps the transfer function of the mask based on the sensing matrix, and performs spatial coding sampling on the image received by the sensor through the projection matrix to obtain a compressed measurement result; constructs a deep learning reconstruction network, processes the compressed measurement result based on the deep learning reconstruction network, and obtains a two-dimensional feature map; captures different subspace information of the two-dimensional feature map based on a multi-head attention mechanism, restores image resolution based on different subspace information, and outputs a reconstructed image; utilizes a mask to achieve physical compression, significantly reducing the amount of data acquisition, and uses a deep learning reconstruction network to restore information, thereby improving image reconstruction efficiency and sensing quality, and adapting to multiple scene environments. Attached Figure Description

[0024] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments of this application will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0025] Figure 1 A flowchart illustrating the image reconstruction method using a compression sensing lens based on coded aperture, as provided in this application embodiment; Figure 2A schematic diagram of the structure of an image reconstruction system based on a coded aperture compression sensing lens provided in an embodiment of this application; Figure 3 A schematic diagram of a random pattern of the coded aperture mask for an image reconstruction system based on a compressed sensing lens provided in an embodiment of this application; Figure 4 A flowchart illustrating the mathematical model mapping of an image reconstruction system based on a compressed sensing lens with coded aperture, provided in an embodiment of this application. Figure 5 A deep learning reconstruction network structure diagram for an image reconstruction system based on a compressed sensing lens with coded aperture provided in an embodiment of this application; Figure 6 Comparison of low-light monitoring scenarios for the image reconstruction system based on a compressed sensing lens with coded aperture provided in this application embodiment. Detailed Implementation

[0026] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of the embodiments. The components of the embodiments of this application described and shown in the accompanying drawings can generally be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of this application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely represents selected embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without inventive effort are within the scope of protection of this application.

[0027] It should be noted that similar reference numerals and letters in the following figures indicate similar items; therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures. Furthermore, in the description of this application, the terms "first," "second," etc., are used only to distinguish descriptions and should not be construed as indicating or implying relative importance.

[0028] Please refer to Figure 1 , Figure 1 This is a flowchart illustrating an image reconstruction method based on a compressed sensing lens with coded aperture, according to some embodiments of this application. This image reconstruction method based on a compressed sensing lens with coded aperture is used in a terminal device and includes the following steps: S101, acquire the original scene image, set a random pattern coded aperture mask in the optical path to project and modulate the original scene image to obtain the image received by the sensor; S102, construct the perception matrix and projection matrix, based on the transfer function of the mask mapped by the perception matrix, and perform spatial encoding sampling on the image received by the sensor through the projection matrix to obtain the compressed measurement result; S103, Construct a deep learning reconstruction network, and process the compressed measurement results based on the deep learning reconstruction network to obtain a two-dimensional feature map; S104 captures information from different subspaces of the two-dimensional feature map based on a multi-head attention mechanism, restores image resolution based on the information from different subspaces, and outputs a reconstructed image.

[0029] It should be noted that the perception process and the design of the encoding matrix are as follows: During the perception phase, the choice of mask pattern M directly determines the measurement matrix. The structural properties of a mask. A good mask should possess the following characteristics: Sparsity: enables the information of each image patch to be locally modulated, reducing the perceptual dimension; Unpredictability: Prevents periodic overlap of image information, improving reconstruction accuracy; Orthogonality: Enhanced measurement matrix The column independence improves system stability.

[0030] Consider transforming the mask mode into a matrix action, i.e.: ; in, This indicates expanding the vector into a diagonal matrix. Because... The value can be either 0 or 1, therefore the mask's function is to block the direct projection of some pixels. Projection matrix It can be designed as follows: Random Gaussian matrix; Random Bernoulli matrix (±1); Uniform sampling matrix (some columns are unit vectors, simulated row selection); Hadamard transformation combination matrix (structure can be calculated quickly).

[0031] According to CS theory, The Restricted Isometry Property (RIP) should be satisfied to ensure the uniqueness of the sparse solution in the image. Therefore, several sets of mask sequences can be generated in hardware using a dynamic DMD. ,correspond The final combination of the measurements is as follows: ; in The weighting coefficients for each mask group can be learned automatically by the system. According to an embodiment of the present invention, the optical path construction method includes: Scene objective: Obtain the original 3D scene Assuming the focal plane is in The projection onto the imaging plane yields a two-dimensional projection. ; Encoded mask: A mask pattern fixed to the front of the lens. The mask pattern is a random distribution of 0-1; Main lens imaging: Imaging light enters the lens after passing through a coded mask, and undergoes imaging transformation. Intensity distribution after diffraction modulation ; x′, y′ are the spatial coordinates of the sensor's imaging plane, reflecting the final position of the light after mask modulation, lens diffraction, and geometric transformation. Its intensity distribution I(x′, y′) is the representation of the coded image in the frequency domain or physical optical path.

[0032] Photosensitive sensor: obtains data by sampling on the imaging surface. The data is converted from digital to analog and then sent to the data processing module. in: Original image; Mask pattern; Encoded image; : Compress the observation vector.

[0033] According to an embodiment of the present invention, the transfer function of the coded aperture mask is modeled as follows: Let the image be The mask is The transfer function formula is then constructed as follows: ; This represents the intensity distribution of the original scene image in the spatial domain. Note: This is a two-dimensional function describing the image at each pixel. The light intensity value at a given location is usually a real matrix (such as grayscale value or single-channel brightness).

[0034] The pattern function of the encoded aperture mask is a binary random distribution with values of 0 or 1. Explanation: The mask physically modulates the image by blocking (0) or transmitting (1) light rays, achieving compressed sampling. For example, This indicates that light passes through at that location. It indicates that something is being blocked.

[0035] This represents the Fourier transform operator. Explanation: It transforms spatial domain signals (such as...) into Fourier transform operators. or The frequency domain is then transformed to obtain the corresponding spectral representation.

[0036] The frequency domain representation of the coded image received by the sensor is the original image spectrum. With mask spectrum The convolution result reflects the modulated frequency domain information.

[0037] Meaning: The convolution operator represents the convolution operation of two functions in the frequency domain, i.e. .

[0038] Meaning: Frequency domain coordinate variable. Explanation: Corresponds to spatial frequency components, for example... Indicates the frequency in the horizontal direction. Indicates the frequency in the vertical direction.

[0039] In other words, the image in the frequency domain is transformed into the convolution of the original image spectrum and the mask spectrum. If the mask has white noise characteristics, its spectrum is uniformly distributed, which can randomly diffuse the image's frequency domain information, improving the coding effect. This also helps to ensure the full rank of the subsequent perceptual matrix.

[0040] According to an embodiment of the present invention, the method for constructing a perception matrix is as follows: Let the real scene image be The size of the photosensitive surface is The random coding mask is The mask modulates the light intensity using a multiplicative modulation method; therefore, the image received by the sensor is: ; in, Represents the Hadamard product; Obtain compressed information, for Spatial coding sampling is performed using a projection matrix. To achieve dimensionality reduction sampling, let compressed measurement be denoted as... ,but: ; Introducing the perception matrix, the formula is as follows: .

[0041] The vectorization operation flattens a matrix or high-dimensional tensor into a one-dimensional vector in a specific order (usually column-major).

[0042] According to an embodiment of the present invention, the network training and reconstruction process is as follows: Data generation: Extract training images from publicly available image datasets; Randomly select mask pattern Generate compressed observation vectors ; Neural network structure: Input layer: vector Mapped to the initial image estimate; Encoding layer: Multiple Transformer encoder layers model long-distance dependencies between image patches; Decoding layer: Upsamples to the target resolution using convolution and PixelShuffle; Output layer: Restores the image ; Construct the comprehensive loss function: ; ▽ (gradient symbol): This represents the spatial gradient calculation of the image (usually the difference between the horizontal and vertical directions). λ1, λ2, and λ3 represent weighting coefficients used to balance the importance of different terms in the loss function; λ1: Weights that control pixel-level reconstruction error (L2 loss); λ2: The weights of the gradient constraint term (L1 loss) are used to preserve the edge structure of the image; λ3: Weights that control the Structural Similarity (SSIM) loss, used to improve the perceptual quality of the image; By adjusting these coefficients, the model's performance on different tasks (such as detail restoration and edge enhancement) can be optimized.

[0043] Where: the first term is the pixel reconstruction error; the second term is the gradient constraint term, which preserves the edge structure; and the third term is the structural similarity index SSIM. After backpropagation training using the Adam optimizer, the network can be deployed on edge devices for fast decoding once it converges.

[0044] Where: the first term is the pixel reconstruction error; the second term is the gradient constraint term, which preserves the edge structure; and the third term is the structural similarity index SSIM. After backpropagation training using the Adam optimizer, the network can be deployed on edge devices for fast decoding once it converges.

[0045] It should be noted that the sparse modeling capability of deep learning Transformer networks is analyzed as follows: To further elaborate on the attention mechanism, let the input be a compressed image vector. , its first The attention output is: ; in: ; ; ; This mechanism can be viewed as a weighted basis expansion form in sparse transformation, similar to sparse coding: ; They are essentially the same in form. That is, the Transformer (deep learning reconstruction network) automatically learns a set of sparse basis weights through the attention mechanism to achieve optimal unfolding of compressed information.

[0046] According to an embodiment of the present invention, a multi-head attention mechanism is used to capture information from different subspaces of a two-dimensional feature map, and image resolution is restored based on the information from these different subspaces to output a reconstructed image. Specifically, this includes: Subspace partitioning is performed based on different feature dimensions of the two-dimensional feature map; Information is captured for each subspace based on a multi-head attention mechanism to obtain information for each subspace; All subspace information is spliced and fused to obtain the fused feature information; Image resolution is restored based on the fused feature information, and the reconstructed image is output.

[0047] According to an embodiment of the present invention, the method further includes: constructing an end-to-end joint optimization objective, and uniformly modeling the entire imaging chain as a function mapping: ; diag is a mathematical symbol that represents a diagonal matrix. To improve system performance, the mask pattern can be jointly trained with the reconstruction network, i.e., a learnable mask can be introduced. The goal becomes: ; This constitutes a collaborative optimization framework for learnable masks and neural reconstructors.

[0048] According to embodiments of the present invention, application examples and system tests are as follows: This invention's system has extremely high practical value and broad market prospects in two typical scenarios: video surveillance under low-light conditions and high-resolution medical microscopic image reconstruction. The following sections will explain the experimental design, simulation verification, and result comparison.

[0049] Low-light nighttime monitoring applications are as follows: Scenario and Requirements: In nighttime environments or low-light areas (such as underground parking garages and field monitoring stations), traditional surveillance cameras are prone to severe problems such as blurriness, noise, and loss of structural information due to the low signal-to-noise ratio of their photosensitive elements. Compressed sensing lens systems, however, can acquire full image information through a single compressed sampling and decode the information at the backend, breaking through the limits of light sensitivity.

[0050] Experimental Design: The brightness of the nighttime scene was set to be only 5% of normal indoor lighting. The experiment used the following configuration: Raw photosensitive image The footage was taken from real nighttime road surveillance footage. Random Encoded Mask A 0-1 matrix with 50% sparsity; Compression ratio set to These represent strong compression, medium compression, and weak compression, respectively. The network is reconstructed using the Transformer of this invention, and the reconstruction result is denoted as... .

[0051] Baseline comparison methods include: Fourier domain random sampling + ISTA algorithm; TV regularization compression perception reconstruction; A deep CS reconstruction method based on U-Net.

[0052] Experimental results (partial formula evaluation indicators): Peak signal-to-noise ratio (PSNR): ; Structural Similarity Index (SSIM): ; Experiments show that, Even under extreme compression conditions, the PSNR of the system of this invention remains at 30.2 dB, and the SSIM reaches 0.92, which is far superior to other methods (the average PSNR is about 26 dB). This demonstrates that it can still retain rich structural information under extreme compression conditions.

[0053] Applications of medical microscopic image reconstruction include: Application Background: In cytopathological imaging and blood section imaging, acquiring high-resolution microscopic images is often affected by imaging time, sample drift, and equipment limitations. Traditional methods require multiple scans, are time-consuming, and introduce interference. The lens system of this invention acquires compressed images through a single exposure, achieving sub-pixel level detail preservation after reconstruction, which has significant medical value.

[0054] Experimental setup: Image set used: Cell images (512×512 grayscale) from the UCSB BioSeg database. Noise simulation: Poisson-Gaussian composite noise is added to simulate real imaging errors; Mask sampling rate: ; Evaluation metrics: PSNR, SSIM, image gradient preservation Defined as: ; Experimental Results and Analysis: In medical imaging, edges and tissue details are particularly critical. Experiments show that reconstructed images based on the system of this invention exhibit high fidelity in cell nuclei and membrane structures. It significantly outperforms other methods (the highest being only 0.88).

[0055] Furthermore, since masks can be mass-produced in hardware, there is no need to repeatedly move the scanner, which greatly speeds up the image acquisition process.

[0056] In summary, this invention innovatively integrates coded aperture optical design with a Transformer sensing reconstruction network to achieve a compressed sensing lens system with high precision, low power consumption, and high robustness. Its main advantages are summarized as follows: Optical domain compression imaging: Physical compression is achieved using a programmable mask, which significantly reduces the amount of data acquired on the hardware side; End-to-end deep decoding: Information is restored through the Transformer network, improving image reconstruction efficiency and perceptual quality; Adaptable to various environments: compatible with complex environments such as low light, security monitoring, medical imaging, and edge device deployment; Highly scalable: The mask design can be optimized by a self-learning module to form an adaptive imaging system architecture; Model consistency closed loop: From physical optical path to neural network, a complete formula derivation chain is constructed, which has strong mathematical support and engineering feasibility.

[0057] This application can be further extended to multi-spectral imaging (such as infrared / visible light joint compression), dynamic video compression (using time masking for unfolding), and edge AI chip deployment, to realize a full-chain intelligent perception and imaging platform.

[0058] like Figures 2-6 As shown, this application embodiment provides an image reconstruction system based on a compressed sensing lens with coded aperture. The system includes a memory and a processor. The memory includes a program for an image reconstruction method based on a compressed sensing lens with coded aperture. When the program for the image reconstruction method based on a compressed sensing lens with coded aperture is executed by the processor, it implements the following steps: The original scene image is acquired, and a coded aperture mask with a random pattern is set in the optical path to project and modulate the original scene image to obtain the image received by the sensor. Construct a perception matrix and a projection matrix. Based on the transfer function of the mask mapped by the perception matrix, spatially encode and sample the image received by the sensor through the projection matrix to obtain compressed measurement results. A deep learning reconstruction network is constructed, and the compressed measurement results are processed based on the deep learning reconstruction network to obtain a two-dimensional feature map; The multi-head attention mechanism is used to capture information from different subspaces of the 2D feature map, and the image resolution is restored based on the information from different subspaces to output the reconstructed image.

[0059] According to an embodiment of the present invention, the optical path construction method includes: Scene objective: Obtain the original 3D scene Assuming the focal plane is in The projection onto the imaging plane yields a two-dimensional projection. ; Encoded mask: A mask pattern fixed to the front of the lens. The mask pattern is a random distribution of 0-1; Main lens imaging: Imaging light enters the lens after passing through a coded mask, and undergoes imaging transformation. Intensity distribution after diffraction modulation ; Photosensitive sensor: obtains data by sampling on the imaging surface. The data is converted from digital to analog and then sent to the data processing module. in: Original image; Mask pattern; Encoded image; : Compress the observation vector.

[0060] According to an embodiment of the present invention, the transfer function of the coded aperture mask is modeled as follows: Let the image be The mask is The transfer function formula is then constructed as follows: ; That is, in the frequency domain, the image becomes the convolution of the original image spectrum and the mask spectrum.

[0061] According to an embodiment of the present invention, the method for constructing a perception matrix is as follows: Let the real scene image be The size of the photosensitive surface is The random coding mask is The mask modulates the light intensity using a multiplicative modulation method; therefore, the image received by the sensor is: ; in, Represents the Hadamard product; Obtain compressed information, for Spatial coding sampling is performed using a projection matrix. To achieve dimensionality reduction sampling, let compressed measurement be denoted as... ,but: ; Introducing the perception matrix, the formula is as follows: .

[0062] According to an embodiment of the present invention, the network training and reconstruction process is as follows:

[0063] Data generation: Extract training images from publicly available image datasets; Randomly select a mask pattern M to generate a compressed observation vector y; Neural network structure: Input layer: vector Mapped to the initial image estimate; Encoding layer: Multiple Transformer encoder layers model long-distance dependencies between image patches; Decoding layer: Upsamples to the target resolution using convolution and PixelShuffle; Output layer: Restores the image ; Construct the comprehensive loss function: ; Where: the first term is the pixel reconstruction error; the second term is the gradient constraint term, which preserves the edge structure; and the third term is the structural similarity index SSIM. After backpropagation training using the Adam optimizer, the network can be deployed on edge devices for fast decoding once it converges.

[0064] It should be noted that the sparse modeling capability of deep learning Transformer networks is analyzed as follows: To further elaborate on the attention mechanism, let the input be a compressed image vector. , its first The attention output is: ; in: ; ; ; This mechanism can be viewed as a weighted basis expansion form in sparse transformation, similar to sparse coding: ; They are essentially the same in form. That is, the Transformer (deep learning reconstruction network) automatically learns a set of sparse basis weights through the attention mechanism to achieve optimal unfolding of compressed information.

[0065] According to an embodiment of the present invention, a multi-head attention mechanism is used to capture information from different subspaces of a two-dimensional feature map, and image resolution is restored based on the information from these different subspaces to output a reconstructed image. Specifically, this includes: Subspace partitioning is performed based on different feature dimensions of the two-dimensional feature map; Information is captured for each subspace based on a multi-head attention mechanism to obtain information for each subspace; All subspace information is spliced and fused to obtain the fused feature information; Image resolution is restored based on the fused feature information, and the reconstructed image is output.

[0066] According to an embodiment of the present invention, the method further includes: constructing an end-to-end joint optimization objective, and uniformly modeling the entire imaging chain as a function mapping: ; To improve system performance, the mask pattern can be jointly trained with the reconstruction network, i.e., a learnable mask can be introduced. The goal becomes: ; This constitutes a collaborative optimization framework for learnable masks and neural reconstructors.

[0067] According to embodiments of the present invention, application examples and system tests are as follows: This invention's system has extremely high practical value and broad market prospects in two typical scenarios: video surveillance under low-light conditions and high-resolution medical microscopic image reconstruction. The following sections will explain the experimental design, simulation verification, and result comparison.

[0068] Low-light nighttime monitoring applications are as follows: Scenario and Requirements: In nighttime environments or low-light areas (such as underground parking garages and field monitoring stations), traditional surveillance cameras are prone to severe problems such as blurriness, noise, and loss of structural information due to the low signal-to-noise ratio of their photosensitive elements. Compressed sensing lens systems, however, can acquire full image information through a single compressed sampling and decode the information at the backend, breaking through the limits of light sensitivity.

[0069] Experimental Design: The brightness of the nighttime scene was set to be only 5% of normal indoor lighting. The experiment used the following configuration: Raw photosensitive image The footage was taken from real nighttime road surveillance footage. Random Encoded Mask A 0-1 matrix with 50% sparsity; Compression ratio set to These represent strong compression, medium compression, and weak compression, respectively. The network is reconstructed using the Transformer of this invention, and the reconstruction result is denoted as... .

[0070] Baseline comparison methods include: Fourier domain random sampling + ISTA algorithm; TV regularization compression perception reconstruction; A deep CS reconstruction method based on U-Net.

[0071] Experimental results (partial formula evaluation indicators): Peak signal-to-noise ratio (PSNR): ; Structural Similarity Index (SSIM): ; Experiments show that, Even under extreme compression conditions, the PSNR of the system of this invention remains at 30.2 dB, and the SSIM reaches 0.92, which is far superior to other methods (the average PSNR is about 26 dB). This demonstrates that it can still retain rich structural information under extreme compression conditions.

[0072] Applications of medical microscopic image reconstruction include: Application Background: In cytopathological imaging and blood section imaging, acquiring high-resolution microscopic images is often affected by imaging time, sample drift, and equipment limitations. Traditional methods require multiple scans, are time-consuming, and introduce interference. The lens system of this invention acquires compressed images through a single exposure, achieving sub-pixel level detail preservation after reconstruction, which has significant medical value.

[0073] Experimental setup: Image set used: Cell images (512×512 grayscale) from the UCSB BioSeg database. Noise simulation: Poisson-Gaussian composite noise is added to simulate real imaging errors; Mask sampling rate: ; Evaluation metrics: PSNR, SSIM, image gradient preservation Defined as: ; Experimental Results and Analysis: In medical imaging, edges and tissue details are particularly critical. Experiments show that reconstructed images based on the system of this invention exhibit high fidelity in cell nuclei and membrane structures. It significantly outperforms other methods (the highest being only 0.88).

[0074] Furthermore, since masks can be mass-produced in hardware, there is no need to repeatedly move the scanner, which greatly speeds up the image acquisition process.

[0075] In summary, this invention innovatively integrates coded aperture optical design with a Transformer sensing reconstruction network to achieve a compressed sensing lens system with high precision, low power consumption, and high robustness. Its main advantages are summarized as follows: Optical domain compression imaging: Physical compression is achieved using a programmable mask, which significantly reduces the amount of data acquired on the hardware side; End-to-end deep decoding: Information is restored through the Transformer network, improving image reconstruction efficiency and perceptual quality; Adaptable to various environments: compatible with complex environments such as low light, security monitoring, medical imaging, and edge device deployment; Highly scalable: The mask design can be optimized by a self-learning module to form an adaptive imaging system architecture; Model consistency closed loop: From physical optical path to neural network, a complete formula derivation chain is constructed, which has strong mathematical support and engineering feasibility.

[0076] This application can be further extended to multi-spectral imaging (such as infrared / visible light joint compression), dynamic video compression (using time masking for unfolding), and edge AI chip deployment, to realize a full-chain intelligent perception and imaging platform.

[0077] A third aspect of the present invention provides a computer-readable storage medium including a program for an image reconstruction method based on a compressed sensing lens with an coded aperture. When the program for the image reconstruction method based on a compressed sensing lens with an coded aperture is executed by a processor, it implements the steps of the image reconstruction method based on a compressed sensing lens with an coded aperture as described above.

[0078] This invention discloses an image reconstruction method, system, and medium based on a compressed sensing lens with coded aperture. The method involves acquiring an original scene image, projecting and modulating the original scene image using a coded aperture mask with a random pattern set in the optical path to obtain an image received by a sensor; constructing a perception matrix and a projection matrix; mapping the transfer function of the mask based on the perception matrix; and performing spatial coding sampling on the image received by the sensor using the projection matrix to obtain compressed measurement results; constructing a deep learning reconstruction network; processing the compressed measurement results using the deep learning reconstruction network to obtain a two-dimensional feature map; capturing different subspace information from the two-dimensional feature map using a multi-head attention mechanism; restoring image resolution based on the different subspace information; and outputting a reconstructed image. The method utilizes a mask to achieve physical compression, significantly reducing the amount of data acquisition; and uses a deep learning reconstruction network for information recovery, improving image reconstruction efficiency and perception quality, and adapting to various scene environments.

[0079] In the several embodiments provided in this application, it should be understood that the disclosed devices and methods can be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of units is only a logical functional division, and in actual implementation, there may be other division methods, such as: multiple units or components can be combined, or integrated into another system, or some features can be ignored or not executed. In addition, the coupling, direct coupling, or communication connection between the various components shown or discussed can be through some interfaces, and the indirect coupling or communication connection between devices or units can be electrical, mechanical, or other forms.

[0080] The units described above as separate components may or may not be physically separate. The components shown as units may or may not be physical units. They may be located in one place or distributed across multiple network units. Some or all of the units may be selected to achieve the purpose of this embodiment according to actual needs.

[0081] In addition, in the various embodiments of the present invention, each functional unit can be integrated into one processing unit, or each unit can be a separate unit, or two or more units can be integrated into one unit; the integrated unit can be implemented in hardware or in the form of hardware plus software functional units.

[0082] Those skilled in the art will understand that all or part of the steps of the above method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a readable storage medium. When the program is executed, it performs the steps of the above method embodiments. The aforementioned storage medium includes various media capable of storing program code, such as mobile storage devices, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0083] Alternatively, if the integrated units of the present invention are implemented as software functional modules and sold or used as independent products, they can also be stored in a readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present invention, or the parts that contribute to the prior art, can be embodied in the form of a software product. This software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as mobile storage devices, ROM, RAM, magnetic disks, or optical disks.

Claims

1. An image reconstruction method based on a compressed sensing lens with coded aperture, characterized in that, include: The original scene image is acquired, and a coded aperture mask with a random pattern is set in the optical path to project and modulate the original scene image to obtain the image received by the sensor. Construct a perception matrix and a projection matrix. Based on the transfer function of the mask mapped by the perception matrix, spatially encode and sample the image received by the sensor through the projection matrix to obtain compressed measurement results. A deep learning reconstruction network is constructed, and the compressed measurement results are processed based on the deep learning reconstruction network to obtain a two-dimensional feature map; The multi-head attention mechanism is used to capture information from different subspaces of the two-dimensional feature map, and the image resolution is restored based on the information from different subspaces to output the reconstructed image. Optical path construction methods include: Scene objective: Obtain the original 3D scene Assuming the focal plane is in The projection onto the imaging plane yields a two-dimensional projection. ; Encoded mask: A mask pattern fixed to the front of the lens. The mask pattern is a random distribution of 0-1; Main lens imaging: Imaging light enters the lens after passing through a coded mask, and undergoes imaging transformation. Intensity distribution after diffraction modulation ; These are the spatial coordinates of the sensor's imaging plane; Photosensitive sensor: obtains data by sampling on the imaging surface. The data is converted from digital to analog and then sent to the data processing module. in: Original image; Mask pattern; Encoded image; : Compress the observation vector.

2. The image reconstruction method based on coded aperture compression sensing lens according to claim 1, characterized in that, The transfer function model for the coded aperture mask is as follows: Let the image be The mask is The transfer function formula is then constructed as follows: ； This represents the intensity distribution of the original scene image in the spatial domain; Pattern function for encoding aperture masks; Represents the Fourier transform operator; This represents the frequency domain representation of the coded image received by the sensor. Meaning: Convolution operator; Represents the frequency domain coordinate variable.

3. The image reconstruction method based on coded aperture compression sensing lens according to claim 2, characterized in that, The method for constructing the perception matrix is as follows: Let the real scene image be The size of the photosensitive surface is The random coding mask is The mask modulates the light intensity using a multiplicative modulation method; therefore, the image received by the sensor is: ； in, This represents the Hadamard product, which refers to the element-wise multiplication between two matrices of the same dimension. Obtain compressed information, for Spatial coding sampling is performed using a projection matrix. To achieve dimensionality reduction sampling, let compressed measurement be denoted as... ,but: ； Indicates vectorization operation; Introducing the overall perception matrix, the formula is as follows: 。 4. The image reconstruction method based on coded aperture compression sensing lens according to claim 3, characterized in that, The network training and reconstruction process is as follows: Data generation: Extract training images from publicly available image datasets; Randomly select mask pattern Generate compressed observation vectors ; Neural network structure: Input layer: vector Mapped to the initial image estimate; Encoding layer: Multiple Transformer encoder layers model long-distance dependencies between image patches; Decoding layer: Upsamples to the target resolution using convolution and PixelShuffle; Output layer: Restores the image ; Construct the comprehensive loss function: ； ▽ indicates the calculation of the spatial gradient of the image; λ1, λ2, and λ3 represent the weighting coefficients; λ1: Weights that control pixel-level reconstruction error (L2 loss); λ2: The weights of the gradient constraint term (L1 loss) are used to preserve the edge structure of the image; λ3: Weights that control the Structural Similarity (SSIM) loss, used to improve the perceptual quality of the image; Where: the first term is the pixel reconstruction error; the second term is the gradient constraint term, which preserves the edge structure; and the third term is the structural similarity index SSIM. After backpropagation training using the Adam optimizer, the network can be deployed on edge devices for fast decoding once it converges.

5. The image reconstruction method based on coded aperture compression sensing lens according to claim 4, characterized in that, A multi-head attention mechanism is used to capture information from different subspaces of a 2D feature map. Image resolution is then restored based on this subspace information, and a reconstructed image is output. Specifically, this includes: Subspace partitioning is performed based on different feature dimensions of the two-dimensional feature map; Information is captured for each subspace based on a multi-head attention mechanism to obtain information for each subspace; All subspace information is spliced and fused to obtain the fused feature information; Image resolution is restored based on the fused feature information, and the reconstructed image is output.

6. An image reconstruction system based on a compressed sensing lens with coded aperture, characterized in that, The system includes a memory and a processor. The memory contains a program for an image reconstruction method based on a compressed sensing lens with an coded aperture. When the program for the image reconstruction method based on a compressed sensing lens with an coded aperture is executed by the processor, it performs the following steps: The original scene image is acquired, and a coded aperture mask with a random pattern is set in the optical path to project and modulate the original scene image to obtain the image received by the sensor. Construct a perception matrix and a projection matrix. Based on the transfer function of the mask mapped by the perception matrix, spatially encode and sample the image received by the sensor through the projection matrix to obtain compressed measurement results. A deep learning reconstruction network is constructed, and the compressed measurement results are processed based on the deep learning reconstruction network to obtain a two-dimensional feature map; The multi-head attention mechanism is used to capture information from different subspaces of the two-dimensional feature map, and the image resolution is restored based on the information from different subspaces to output the reconstructed image. Optical path construction methods include: Scene objective: Obtain the original 3D scene Assuming the focal plane is in The projection onto the imaging plane yields a two-dimensional projection. ; Encoded mask: A mask pattern fixed to the front of the lens. The mask pattern is a random distribution of 0-1; Main lens imaging: Imaging light enters the lens after passing through a coded mask, and undergoes imaging transformation. Intensity distribution after diffraction modulation ; Photosensitive sensor: obtains data by sampling on the imaging surface. The data is converted from digital to analog and then sent to the data processing module. in: Original image; Mask pattern; Encoded image; : Compress the observation vector.

7. The image reconstruction system based on coded aperture compression sensing lens according to claim 6, characterized in that, The transfer function model for the coded aperture mask is as follows: Let the image be The mask is The transfer function formula is then constructed as follows: ； That is, in the frequency domain, the image becomes the convolution of the original image spectrum and the mask spectrum.

8. A computer-readable storage medium, characterized in that, The computer-readable storage medium includes an image reconstruction method program for a compressed sensing lens based on coded aperture. When the image reconstruction method program for a compressed sensing lens based on coded aperture is executed by a processor, it implements the steps of the image reconstruction method for a compressed sensing lens based on coded aperture as described in any one of claims 1 to 5.