A hyperspectral image reconstruction method based on prior images and an autoencoder model
By combining prior images and autoencoder models, a hyperspectral image reconstruction method has been developed, which solves the problems of slow reconstruction speed and insufficient spectral quality in hyperspectral imaging technology and achieves efficient and highly generalizable hyperspectral image reconstruction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HUNAN UNIV
- Filing Date
- 2023-11-21
- Publication Date
- 2026-06-26
AI Technical Summary
Existing hyperspectral imaging technologies suffer from problems such as slow reconstruction speed, poor generalization, and insufficient spectral quality when reconstructing hyperspectral images. In particular, dual-camera coded aperture snapshot spectral imaging systems rely on manual prior information and have room for improvement in reconstruction performance.
A hyperspectral image reconstruction method based on prior images and autoencoder models is adopted. The autoencoder model is trained and regularization and constraint terms are designed. Combined with optimization iterative algorithm, the spatial information of RGB images is used for efficient reconstruction.
It improves the spatial and spectral resolution of hyperspectral images, reduces noise and artifacts, has good generalization ability and interpretability, and the reconstruction effect is significantly better than existing methods.
Smart Images

Figure CN117392327B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of hyperspectral computational imaging, and in particular to a hyperspectral image reconstruction method based on prior images and autoencoder models. Background Technology
[0002] Hyperspectral imaging technology has attracted much attention due to the rich spectral information it provides. However, its massive data volume and complex processing pose challenges for practical applications. To overcome these problems, snapshot compressed spectral imaging and coded aperture snapshot spectral imaging systems have emerged, acquiring rich spectral and spatial information from hyperspectral images through a single exposure. Among them, the coded aperture snapshot spectral compressed imaging system is widely used as a classic system with low cost and fast shooting speed.
[0003] After obtaining a hyperspectral compressed image, it needs to be decoded using appropriate compressed sensing (CS) technology. However, this process presents a series of challenges. Traditional optimization-based methods, such as iterative algorithms like low-rank and total variation, while capable of reconstruction, are slow. In contrast, deep learning-based methods show more potential, mainly in the following ways: First, using supervised networks, such as λ-net which uses convolutional neural networks to achieve end-to-end spectral compressed imaging (SCI) sampling and reconstruction. However, this approach is only applicable to specific systems and has poor generalization. Second, using plug-and-play denoising networks, such as PnP-HSI which uses a deep learning-based denoiser to replace the traditional TV (Total Variation) denoising problem. However, because the "noise" distribution in the CSI problem exhibits different characteristics from the standard Gaussian denoising problem, reconstruction performance cannot be guaranteed. Third, using self-supervised networks, such as PnP-DIP, which employs a self-supervised approach for spectral image reconstruction. While this approach has good generalization ability, the reconstruction process for each scene is time-consuming. In conclusion, although various algorithms have been developed in the field of spectral compression imaging to restore high-quality hyperspectral images, many challenges remain, requiring further research and innovation to achieve better results.
[0004] Dual-camera coded aperture snapshot spectral imaging (DC-CASSI) systems mitigate the ill-posedness of reconstruction by introducing an RGB camera to capture red, green, and blue (RGB) images of the scene as prior information. Although previous DC-CASSI methods have performed well in reconstruction, they often rely on handcrafted prior information and pay less attention to the spectral quality of the recovered hyperspectral images, which limits their application to some extent. At the same time, despite the achievements of these methods in reconstruction, there is still potential for further improvement in their reconstruction performance. Summary of the Invention
[0005] To address the problem of how to efficiently utilize the spectral and spatial information of RGB images in a dual-camera coded aperture snapshot spectral imaging system to reconstruct high-quality hyperspectral images, this invention proposes a hyperspectral image reconstruction method based on prior images and an autoencoder model.
[0006] A hyperspectral image reconstruction method based on prior images and an autoencoder model includes the following steps:
[0007] S1. Preset autoencoder model, train the autoencoder model and calculate the loss using a preset loss function to obtain the trained autoencoder model;
[0008] S2. A dual-camera coded aperture snapshot spectral imaging system is used to capture images of the target scene, obtain RGB images and two-dimensional compressed measurement images, and preprocess the RGB images and two-dimensional compressed measurement images to obtain preprocessed RGB images and preprocessed two-dimensional compressed measurement images.
[0009] S3. Preset the hardware coding model corresponding to the dual-camera coded aperture snapshot spectral imaging system. The hardware coding model receives and processes the preprocessed two-dimensional compressed measurement image to obtain the initial three-dimensional data cube of the target scene.
[0010] S4. Using the preprocessed RGB image as the prior image, design a regularization term based on the semantic similarity of the prior image, construct a software decoding model based on the regularization term, and use the trained autoencoder model as a constraint term to obtain a software decoding model with constraint terms.
[0011] S5. The software decoding model with constraints receives the initial 3D data cube of the target scene and uses an optimization iterative algorithm to optimize and solve it, thereby obtaining the 3D data cube of the reconstructed target scene.
[0012] Preferably, S2 specifically includes:
[0013] S21. A dual-camera coded aperture snapshot spectral imaging system is used to capture images of the target scene, obtaining RGB images and two-dimensional compressed measurement images of the target scene;
[0014] S22. Cropping the RGB image and the two-dimensional compressed measurement image with a preset band range and band interval to obtain the preprocessed RGB image and the preprocessed two-dimensional compressed measurement image.
[0015] Preferably, in S3, the hardware coding model receives and processes the preprocessed two-dimensional compressed measurement image to obtain the initial three-dimensional data cube of the target scene, specifically including:
[0016] S31. The hardware coding model receives the preprocessed two-dimensional compressed measurement image and models the process of the dual-camera coded aperture snapshot spectral imaging system acquiring the two-dimensional compressed measurement image as follows:
[0017] y=(1-γ)Hx+n
[0018] Where y is the measured value of the preprocessed two-dimensional compressed measurement image, γ is the beam splitter splitting ratio, γ∈[0,1], H is the measurement matrix, x is the initial three-dimensional data cube of the target scene, and n is the noise in the process of acquiring the two-dimensional compressed measurement image.
[0019] S32. Solve the model for the two-dimensional compressed measurement image acquired by the dual-camera coded aperture snapshot spectral imaging system to obtain the initial three-dimensional data cube of the target scene.
[0020] Preferably, in S4, a regularization term based on prior image semantic similarity is designed, specifically including the following:
[0021] S41. Calculate the L1 norm of the total variational difference between the initial 3D data cube of the target scene and the preprocessed RGB image:
[0022] R(x, y) RGB = |TV(x)-TV(y) RGB )∣∣
[0023] Where R(x, y) RGB The table shows the regularization terms for the initial 3D data cube of the target scene and the preprocessed RGB image, where x represents the initial 3D data cube of the target scene, and y represents the regularization term. RGB , where is the measured value of the preprocessed RGB image, TV(·) is the total variation norm, and ||·|| is the L1 norm;
[0024] S42. Take the upper limit of the L1 norm after the total variational difference to obtain the regularization term based on the prior image semantic similarity.
[0025] Preferably, in S4, a software decoding model is constructed based on the regularization term. Specifically, the software decoding model is as follows:
[0026]
[0027] in, The three-dimensional data cube reconstructed for the target scene, where x is the initial three-dimensional data cube of the target scene, y is the measurement value of the preprocessed two-dimensional compressed measurement image, β is the weighting parameter, H is the measurement matrix, and TV(xy) is the measurement matrix. RGB ) represents the regularization term. This represents the L2 norm.
[0028] Preferably, the software decoding model with constraint terms in S4 is specifically as follows:
[0029]
[0030] subject tox=N AE (y RGB )
[0031] Among them, P AE N represents the parameters of the autoencoder model. AE (·) represents the autoencoder model.
[0032] Preferably, S5 specifically includes the following:
[0033] S51. The software decoding model with constraints receives the initial three-dimensional data cube of the target scene;
[0034] S52. Transform the software decoding model with constraints into an augmented Lagrangian function;
[0035] S53. Simplify the augmented Lagrangian function to obtain the expression for the solution of the software decoding model with constraints;
[0036] S54. The expression for the solution is solved sequentially using the matrix inversion lemma, the Chambolle projection algorithm, and the trained autoencoder network, and the 3D data cube of the reconstructed target scene is output.
[0037] Preferably, the augmented Lagrangian function in S52 is specifically:
[0038]
[0039] Where b and d are auxiliary variables, z b and z d For Lagrange multipliers, z' b and z' d They are z b and z d transpose, λ b and λ d Here, AE(d) represents the output d after regularization of the autoencoder model, where d is the weight parameter.
[0040] Preferably, the autoencoder model in S1 specifically includes an encoder module and a decoder module connected to the encoder module. The encoder module includes several encoding layers connected in sequence, and the decoder module includes a number of decoding layers connected in sequence, the same as the number of encoding layers. The encoding layers and the decoding layers are connected in a skip connection.
[0041] Preferably, in S1, the autoencoder model is trained and the loss is calculated using a preset loss function, specifically:
[0042]
[0043] Among them, Γ AE denoted as N, where α is the weight parameter and SSTV(·) is the final output N of the autoencoder model. AE (y RGB The total variational norm of the spatial and spectral directions is given by , where j is the iteration number.
[0044] The aforementioned hyperspectral image reconstruction method based on prior images and an autoencoder model first employs a dual-camera coded aperture snapshot spectral imaging system to capture the target scene, obtaining RGB images and two-dimensional compressed measurement images, which are then preprocessed to obtain preprocessed RGB images and two-dimensional compressed measurement images. Next, a hardware coding model corresponding to the dual-camera coded aperture snapshot spectral imaging system is designed. This hardware coding model receives the preprocessed two-dimensional compressed measurement images and models the process of the dual-camera coded aperture snapshot spectral imaging system acquiring the two-dimensional compressed measurement images, thereby obtaining an initial three-dimensional data cube of the target scene. Then, using the preprocessed RGB images as prior images, a regularization term based on the semantic similarity of the prior images is designed. A software decoding model is constructed based on the regularization term, and a pre-defined autoencoder model is used as a constraint term to obtain a software decoding model with constraints. Finally, an optimization iterative algorithm is used to solve the software decoding model with constraints, resulting in a three-dimensional data cube with relatively high overall spatial and spectral resolution for the reconstructed target scene. This method uses an autoencoder model as a constraint to improve spectral resolution. Through the synergistic effect of the prior image and the autoencoder model, it fully utilizes the full-resolution spatial information of the RGB image while avoiding spectral distortion caused by the relatively simpler spectral information of the RGB image compared to the hyperspectral image. Given a prior image as input, the autoencoder model's task is to generate a hyperspectral image while ensuring smoothness in both spatial and spectral dimensions. This means that the generated hyperspectral image not only matches the input prior image spectrally but also maintains spatial smoothness, which helps reduce noise and artifacts. Furthermore, in the process of solving the software decoding model with constraints using an iterative optimization algorithm, the autoencoder model is trained in a self-supervised manner in each iteration. This method has strong interpretability, good generalization ability, and good reconstruction results. Attached Figure Description
[0045] Figure 1 This is a flowchart of a hyperspectral image reconstruction method based on prior images and an autoencoder model in one embodiment of the present invention;
[0046] Figure 2 This is a schematic diagram of the structure of a dual-camera coded aperture snapshot spectral imaging system according to an embodiment of the present invention;
[0047] Figure 3 This is a schematic diagram of the structure of an autoencoder model in one embodiment of the present invention;
[0048] Explanation of reference numerals in the attached figures:
[0049] 1. Beam splitter; 2. Physical mask; 3. Dispersion prism; 4. Grayscale camera; 5. RGB camera. Detailed Implementation
[0050] To enable those skilled in the art to better understand the technical solution of the present invention, the present invention will be further described in detail below with reference to the accompanying drawings.
[0051] In one embodiment, see Figure 1 and Figure 2 , Figure 1 This is a flowchart of a hyperspectral image reconstruction method based on prior images and an autoencoder model according to an embodiment of the present invention. Figure 2 This is a schematic diagram of a dual-camera coded aperture snapshot spectral imaging system according to an embodiment of the present invention. A hyperspectral image reconstruction method based on prior images and an autoencoder model is also provided, comprising the following:
[0052] S1. Preset autoencoder model, train the autoencoder model and calculate the loss using a preset loss function to obtain the trained autoencoder model;
[0053] S2. A dual-camera coded aperture snapshot spectral imaging system is used to capture images of the target scene, obtaining RGB images and two-dimensional compressed measurement images. The RGB images and two-dimensional compressed measurement images are preprocessed to obtain preprocessed RGB images and preprocessed two-dimensional compressed measurement images.
[0054] S3. Preset the hardware coding model corresponding to the dual-camera coded aperture snapshot spectral imaging system. The hardware coding model receives and processes the preprocessed two-dimensional compressed measurement image to obtain the initial three-dimensional data cube of the target scene.
[0055] S4. Using the preprocessed RGB image as the prior image, design a regularization term based on the semantic similarity of the prior image, construct a software decoding model based on the regularization term, and use the trained autoencoder model as a constraint term to obtain a software decoding model with constraint terms.
[0056] S5. The software decoding model with constraints receives the initial 3D data cube of the target scene and uses an optimization iterative algorithm to optimize and solve it, thereby obtaining the 3D data cube of the reconstructed target scene.
[0057] Specifically, the dual-camera coded aperture snapshot spectral imaging system used in this embodiment includes a beam splitter 1 and an objective lens ( Figure 2 (Not shown), physical mask 2, relay lens ( Figure 2 (Not shown) A dispersive prism 3, a grayscale camera 4, and an RGB camera 5 are used. Before officially photographing the target scene, it is confirmed that the relevant components are correctly installed and aligned. The grayscale camera 4 and the RGB camera 5 are placed at a 90-degree angle. Compared with the single-camera coded aperture snapshot spectral imaging system, the dual-camera coded aperture snapshot spectral imaging system used in this invention has an additional beam splitter 1 and an uncoded RGB camera 5. The beam splitter 1 splits the incident light into two directions. One beam of incident light is in the same direction as the single-camera coded aperture snapshot spectral imaging system, and a grayscale two-dimensional compressed measurement image of the target scene is obtained through the single-camera coded aperture snapshot spectral imaging system. The other beam of incident light is in the same direction as the uncoded RGB camera 5, and the target scene is photographed through the RGB camera 5 to obtain an RGB image with tri-color spectral information and rich spatial information. The two-dimensional compressed measurement image and the RGB image are preprocessed. Then, the dual-camera coded aperture is designed. The hardware coding model corresponding to the snapshot spectral imaging system has preset beam splitter splitting ratio γ, sensor matrix H, and dispersive prism offset step size s. The preprocessed two-dimensional compressed measurement image, along with the preset beam splitter splitting ratio γ, sensor matrix H, and dispersive prism offset step size s, is input into the hardware coding model for processing to obtain the initial three-dimensional data cube of the target scene. Then, a regularization term based on the semantic similarity of the prior image is designed, and an autoencoder model is designed as a constraint term. The regularization term and the constraint term together constitute the software decoding model. The software decoding model receives the initial three-dimensional data cube, and finally, the software decoding model is solved based on the traditional optimization iterative algorithm to obtain a reconstructed three-dimensional data cube with high overall spatial and spectral resolution.
[0058] In one embodiment, S2 specifically includes:
[0059] S21. A dual-camera coded aperture snapshot spectral imaging system is used to capture images of the target scene, obtaining RGB images and two-dimensional compressed measurement images of the target scene;
[0060] S22. Cropping the RGB image and the two-dimensional compressed measurement image with a preset band range and band interval to obtain the preprocessed RGB image and the preprocessed two-dimensional compressed measurement image.
[0061] Specifically, a dual-camera coded aperture snapshot spectral imaging system is used to capture the target scene. The incident light is first split into two different directions by a beam splitter. The incident light in one direction is captured by an RGB camera sensor to obtain an RGB image, while the incident light in the other direction is spatially encoded by a physical mask to obtain a spatially encoded 3D data cube. The spatially encoded 3D data cube is then spectrally encoded by a dispersive prism to obtain a spectrally encoded 3D data cube. Finally, the sensor on the grayscale camera captures the integrated wavelength of the spectrally encoded 3D data cube to generate a two-dimensional compressed measurement image.
[0062] After obtaining the RGB image and the 2D compressed measurement image, preset the band range, band interval, image size, etc., and then select the bands of the obtained RGB image and 2D compressed measurement image according to the preset band range and band interval, and then crop them to obtain an image of the preset size. For example, if the preset band range is 450nm to 650nm and the band interval is 10nm, select the image within the preset band range of 450nm to 650nm, and use interpolation to unify the bands of the two types of images with a preset band interval of 10nm. Then, crop them uniformly to a size of 512*512, and convert them to .mat format to obtain the preprocessed RGB image and 2D compressed measurement image, which facilitates the subsequent reconstruction of the 3D data cube.
[0063] In one embodiment, the hardware coding model in S3 receives and processes the preprocessed two-dimensional compressed measurement image to obtain an initial three-dimensional data cube of the target scene, specifically including:
[0064] S31. The hardware coding model receives the preprocessed two-dimensional compressed measurement image and models the process of the dual-camera coded aperture snapshot spectral imaging system acquiring the two-dimensional compressed measurement image as follows:
[0065] y=(1-γ)Hx+n
[0066] Where y is the measured value of the preprocessed two-dimensional compressed measurement image, γ is the beam splitter splitting ratio, γ∈[0,1], H is the measurement matrix, x is the initial three-dimensional data cube of the target scene, and n is the noise in the process of acquiring the two-dimensional compressed measurement image.
[0067] S32. Solve the model for the two-dimensional compressed measurement image acquired by the dual-camera coded aperture snapshot spectral imaging system to obtain the initial three-dimensional data cube of the target scene.
[0068] In this embodiment, the beam splitter's splitting ratio γ, the measurement matrix H, and the dispersive prism offset step size s are preset. Specifically, the beam splitter's splitting ratio is set to 50:50 (different splitting ratios will affect the quality of the acquired image), that is, the incident light is split into two beams of uniform size and different directions. The physical mask in the dual-camera coded aperture snapshot spectral imaging system is modeled as the measurement matrix H. The measurement matrix is set to a size of 512*512, which consists of randomly generated binary 0s and 1s, with a probability of 50% for both 0 and 1. The dispersive prism offset step size is set to 2.
[0069] The process of spatially encoding the 3D data cube of the target scene using a physical mask to obtain the spatially encoded 3D data cube can be simulated by the Hadamard product, specifically as follows:
[0070]
[0071] Where X(:,:,a) is the a-th band image of the three-dimensional data cube of the target scene. is the a-th band image of the spatially encoded 3D data cube, ⊙ is the Hadamard product operator, and F is the physical mask;
[0072] To simulate the spectral dispersion effect of a dispersive prism, the pixels between adjacent bands of the spatially encoded 3D data cube are shifted to obtain a spectrally encoded data cube, which can be specifically represented as:
[0073]
[0074] Where (i, j) are the spatial coordinates of the i-th row and j-th column of the a-th band image of the spatially encoded 3D data cube, and s is the offset step size of the dispersive prism. Encode the data cube for the spectrum.
[0075] Two-dimensional compressed measurement images are obtained by compressing the spectral encoded data cube.
[0076] The hardware coding model is used to represent the process by which a dual-camera coded aperture snapshot spectral imaging system compresses a 3D data cube of a target scene into a 2D compressed measurement image. The process by which the dual-camera coded aperture snapshot spectral imaging system acquires the 2D compressed measurement image is modeled as follows:
[0077] y=(1-γ)Hx+n (3)
[0078] Where y is the measured value of the preprocessed two-dimensional compressed measurement image, G is the measurement matrix, n is the noise in the process of acquiring the two-dimensional compressed measurement image, γ is the beam splitter splitting ratio, γ∈[0,1], and x is the initial three-dimensional hyperspectral image of the target scene, which is the three-dimensional data cube.
[0079] The hardware coding model is also used to represent the process of a dual-camera coded aperture snapshot spectral imaging system capturing an RGB image of a target scene. The process of the dual-camera coded aperture snapshot spectral imaging system acquiring an RGB image is modeled as follows:
[0080] y RGB =γH' RGB x+σ (4)
[0081] Among them, y RGB H represents the measured values of the preprocessed RGB image. RGB H' is the measurement matrix of the preprocessed RGB image. RGB The measurement matrix H of the preprocessed RGB image RGB The transpose of σ is the noise in the process of acquiring the RGB image.
[0082] To simplify the formula, use y RGB Let y and y replace y respectively RGB Let's take y = (1-γ)Hx + n as an example to illustrate the concepts of / γ and y / (1-γ).
[0083] Ideally, there would be no noise, and formula (3) would become:
[0084] y=(1-γ)Hx
[0085] Placing the beam splitter's splitting ratio on the left-hand side of the equation, the above formula becomes:
[0086] y / (1-γ)=Hx
[0087] To simplify the formula, let's replace y / (1-γ) with y, and the above equation becomes:
[0088] y = Hx
[0089] At this point, y on the left side of the equation is the measurement value corresponding to the two-dimensional compressed measurement image obtained by the dual-camera coded aperture snapshot spectral imaging system after preprocessing, and Hx on the right side of the equation is the value of the two-dimensional compressed measurement image obtained by the hardware coding model through calculation.
[0090] In order to make the hardware coding model as close as possible to the dual-camera coded aperture snapshot spectral imaging system, it is necessary to minimize y-Hx so that y-Hx approaches 0.
[0091] After obtaining the preprocessed two-dimensional compressed measurement image, the initial three-dimensional data cube x of the target scene can be obtained according to the formula y=(1-γ)Hx+n. Then, according to the formula y RGB =γH' RGB x+σ yields the measured value y of the preprocessed RGB image. RGB .
[0092] In one embodiment, S4 designs a regularization term based on prior image semantic similarity, specifically including the following:
[0093] S41. Calculate the L1 norm of the total variational difference between the initial 3D data cube of the target scene and the preprocessed RGB image:
[0094] R(x, y) RGB = |TV(x)-TV(y) RGB )∣∣
[0095] Where R(x, y) RGB The table shows the regularization terms for the initial 3D data cube of the target scene and the preprocessed RGB image, where x represents the initial 3D data cube of the target scene, and y represents the regularization term. RGB , where is the measured value of the preprocessed RGB image, TV(·) is the total variation norm, and ||·|| is the L1 norm;
[0096] S42. Take the upper limit of the L1 norm after the total variational difference to obtain the regularization term based on the prior image semantic similarity.
[0097] Specifically, the software decoding model represents a reconstruction algorithm that uses a preprocessed RGB image as a prior image to recover a 3D data cube from a preprocessed 2D compressed measurement image. First, a regularization term based on the semantic similarity of the prior image is designed. This regularization term is the L1 norm of the total variation between the initial 3D hyperspectral image and the preprocessed RGB image of the target scene. Its mathematical expression is:
[0098] R(x, y) RGB = |TV(x)-TV(y) RGB )∣∣ (5)
[0099] Where R(x, y) RGB ) represents the regularization term of the initial 3D hyperspectral image and the preprocessed RGB image of the target scene, TV(·) is the total variation norm, ||·|| represents the L1 norm, x is the initial 3D hyperspectral image of the target scene, i.e., the 3D data cube, and y RGB This refers to the measured values of the preprocessed RGB image.
[0100] Since solving the above regularization term is quite complex, we instead solve its optimization upper bound TV∣∣xy RGB For details, please refer to the method for solving the optimization upper limit of the regularization term in the invention patent application number CN202211398389, entitled "A Reconstruction Method for a Snapshot-Type Spectral Imaging System Based on Prior Image Guidance". It can be proven that the optimization upper limit of the above regularization term is: |TV(xy RGB )∣∣.
[0101] In one embodiment, in step S4, a software decoding model is constructed based on the regularization term. The software decoding model is specifically as follows:
[0102]
[0103] in, The three-dimensional data cube reconstructed for the target scene, where x is the initial three-dimensional data cube of the target scene, and y is the measurement value of the preprocessed two-dimensional compressed measurement image. Describing the L2 norm, For the fidelity term, β is the weighting parameter, and TV(xy) RGB ) represents the regularization term, y RGB The measurement values refer to the preprocessed RGB image, where H is the measurement matrix.
[0104] Specifically, to make the hardware coding model as close as possible to the dual-camera coded aperture snapshot spectral imaging system, it is necessary to minimize y-Hx so that y-Hx approaches 0. It should be noted that TV(·) in the above formula represents the upper limit of optimization, which is mathematically equivalent to TV∣∣·∣∣, but using TV(·) makes it easier to understand.
[0105] In one embodiment, the software decoding model with constraints in S4 is specifically as follows:
[0106]
[0107] subject tox=N AE (y RGB )
[0108] Among them, P AE N represents the parameters of the autoencoder model. AE (·) represents the autoencoder model.
[0109] Specifically, the trained autoencoder model N AE (·) serves as a constraint term in the software decoding model, thus yielding a software decoding model with constraint terms.
[0110] In one embodiment, S5 specifically includes the following:
[0111] S51. The software decoding model with constraints receives the initial three-dimensional data cube of the target scene;
[0112] S52. Transform the software decoding model with constraints into an augmented Lagrangian function;
[0113] S53. Simplify the augmented Lagrangian function to obtain the expression for the solution of the software decoding model with constraints;
[0114] S54. The expression for the solution is solved sequentially using the matrix inversion lemma, the Chambolle projection algorithm, and the trained autoencoder network, and the 3D data cube of the reconstructed target scene is output.
[0115] In one embodiment, the augmented Lagrangian function in S52 is specifically:
[0116]
[0117] Where b and d are auxiliary variables, z b and z d For Lagrange multipliers, z' b and z' d They are z b and z d transpose, λ b and λ d Here, AE(d) represents the weight parameters, indicating that the output of the autoencoder model after regularization is d, and P... AE N represents the parameters of the autoencoder model. AE (·) represents the autoencoder model.
[0118] Specifically, since directly solving the software decoding model (7) with constraints is too complicated, the software decoding model with constraints is constructed as an augmented Lagrangian function by introducing auxiliary variables and then solved.
[0119] By introducing auxiliary variables b and d, the software decoding model with constraints is transformed into:
[0120]
[0121] subjectto d=N AE (y RGB ), x = b, x = d
[0122] Based on this, the augmented Lagrange function is constructed as follows:
[0123]
[0124] Where b and d are auxiliary variables, z b and z d For Lagrange multipliers, z' b and z' d They are z b and z d transpose, λ b and λ d Here, AE(d) represents the weight parameters, indicating that the output of the autoencoder model after regularization is d, and P... AE N represents the parameters of the autoencoder model. AE(·) represents the autoencoder model;
[0125] To simplify formula (8), we construct the following two binomials:
[0126]
[0127]
[0128] Substituting the two binomials (9) and (10) above into the augmented Lagrange function (8), we get:
[0129]
[0130] Let n b =z b / λ b n d =z d / λ d Then the above formula (11) can be transformed into:
[0131]
[0132] Let K be the number of iterations, then the expression for the solution of the software decoding model with constraints is obtained as follows:
[0133]
[0134] Where, n b =z b / λ b n d =z d / λ d K is the iteration number, k = 1, 2, ..., K.
[0135] Next, the expression for the solution, that is, each subproblem, is solved sequentially, as follows:
[0136] 1) Solve the subproblem of x
[0137]
[0138] When b k , d k and When x is a subproblem, it is a binomial, which can be solved directly by taking the partial derivative:
[0139]
[0140] Where H'H is a diagonal matrix, the inversion part in formula (14) is simplified according to the matrix inversion lemma:
[0141] (H'H+λ b I+λ d I) -1
[0142] =(λ b +λ d ) -1 -(λ b +λ d ) -1 H'[I+(λ b +λ d )HH'] -1 H(λ b +λ d ) -1 (14-1)
[0143] Where I is a diagonal matrix, the subproblem of x can now be optimized iteratively.
[0144] 2) Solve the subproblems of b
[0145]
[0146] When given x k+1 , and y RGB When b is a subproblem, it can be viewed as a total variational denoising problem. Let the auxiliary variable φ satisfy the following conditions:
[0147] φ k+1 =b k+1 -y RGB (15)
[0148] Meanwhile, ψ is considered as its noise, and the following conditions are satisfied:
[0149]
[0150] Combining the above two conditions, the subproblem of b can be transformed into a subproblem of φ:
[0151]
[0152] The solution can be obtained by using the Chambolle projection algorithm:
[0153]
[0154] Among them, divΔ4 satisfies the following condition:
[0155]
[0156] 3) Solve the subproblems of d
[0157]
[0158] Given x k+1 and The subproblem of d can then be solved iteratively by the trained autoencoder model. k+1 The final output of the autoencoder model:
[0159] d k+1 =N AE (y TGB (20)
[0160] 4) n b and n d subproblems
[0161] Regarding n b and n d All subproblems can be solved by the dual ascent method.
[0162] In summary, the process of solving the software decoding model with constraints is as follows: first, based on the H, y, and y values provided by the dual-camera coded aperture snapshot spectral imaging system... RGB Then initialize x (the initial 3D data cube of the target scene output by the hardware coding model), b, d, and z. b and z d Then, the subproblems are solved iteratively in sequence: the subproblem of x, the subproblem of b, the subproblem of d, and so on, until n is solved. b Subproblems of n d The subproblems are solved until the maximum number of iterations K is reached, and the result after the Kth iteration is used as the 3D data cube for reconstructing the target scene.
[0163] In one embodiment, the autoencoder model in S1 specifically includes an encoder module and a decoder module connected to the encoder module. The encoder module includes several encoding layers connected in sequence, and the decoder module includes a number of decoding layers connected in sequence, the same as the number of encoding layers. The encoding layers and the decoding layers are connected in a skip connection.
[0164] In one embodiment, S1 involves training the autoencoder model and calculating the loss using a preset loss function, specifically:
[0165]
[0166] Among them, Γ AE denoted as the loss of the autoencoder model, α is the weight parameter, and SSTV(·) is the final output N of the autoencoder model. AE (y RGB The total variational norm of the spatial and spectral directions is k, where k is the number of iterations.
[0167] Specifically, see Figure 3 , Figure 3 This is a schematic diagram of the structure of an autoencoder model in one embodiment of the present invention.
[0168] like Figure 3 As shown, the autoencoder model design adopts a classic encoder-decoder structure, with the encoder module connected to the decoder module. The encoder module includes five sequentially connected encoding layers, and the decoder module includes five sequentially connected decoding layers. The five encoding layers and five decoding layers are skipped. Figure 3 (Not shown in the diagram). Each encoding or decoding layer includes a convolutional layer, a batch normalization function, and an activation function. The batch normalization function is used to stabilize the training process, and the activation function is used to introduce non-linearity. Skip connections are introduced in the five encoding and five decoding layers to tightly link the functions of the encoder and decoder modules.
[0169] In the construction of the autoencoder model, the number of output channels for each encoding or decoding layer is set to 64, which helps to extract rich feature information. Simultaneously, the filter size for each encoding or decoding layer is set to 3×3, and this local receptive field helps to capture local features in the feature map obtained after convolution. In the encoder module, a stride of 2 and a padding size of 1 are used to reduce the size of the feature map. In the decoder module, a stride of 1 and a padding size of 1 are used to maintain the size of the output image (corresponding to the solution to the subproblem with respect to d) matching the size of the input image (corresponding to the preprocessed RGB image, i.e., the prior image).
[0170] Train the autoencoder model and calculate the loss using a predefined loss function:
[0171]
[0172] Among them, Γ AE denoted as the loss of the autoencoder model, α is the weight parameter, and SSTV(·) is the final output N of the autoencoder model. AE (y RGB The total variational norm of the spatial and spectral directions is k, where k is the number of iterations.
[0173] Adam was chosen as the autoencoder model optimizer, with a learning rate of 0.001 and 800 epochs to ensure the autoencoder model had sufficient time to learn and extract high-level features from the preprocessed RGB image, obtaining a solution to the subproblem d (a subproblem x with a solution given x already exists before solving the subproblem d). k+1 Given a subproblem with a solution to b After obtaining the solution to the subproblem concerning d, solve n in sequence. b Subproblems of n d The subproblems are solved until the maximum number of iterations is reached, resulting in a 3D data cube reconstructed from the target scene. (The solution to the subproblem concerning x is the final solution of the entire algorithm, which is the three-dimensional data cube of the target scene reconstructed by the final solution).
[0174] Furthermore, to verify the effectiveness of the technical solution of the present invention, this embodiment conducts a simulation experiment and comparative analysis on a dual-camera coded aperture snapshot spectral imaging system.
[0175] This experiment primarily selects 10 scenes from the Generalized Classification Pixel Camera (CAVE) dataset for comparison, and compares the method proposed in this invention with existing state-of-the-art model-based methods. Existing state-of-the-art model-based methods mainly include: GAP-TV, DeSCI, Plug and Play Hyperspectral Image Denoising (PnP-HIS), Plug and Play Depth Image Prior (PnP-DIP), TV-RGB, PFusion, and Prior Image Semantic Similarity Model (PIDS). The evaluation metrics are Peak Signal-to-Noise Ratio (PSNR), Structural Similarity (SSIM), and Spectral Angle Mapping (SAM), used to measure the quality and feature similarity of the reconstructed images, respectively, to comprehensively evaluate the effect of image restoration or processing. See Table 1, which lists the average data comparing the quality and feature similarity of the reconstructed images obtained using different state-of-the-art methods in 10 test scenes on the CAVE dataset.
[0176] Table 1
[0177] method PSNR SSIM SAM GAP-TV 27.65±3.9 0.809±0.14 0.121±0.05 DeSCI 29.58±6.3 0.890±0.11 0.127±0.06 PnP-HSI 29.83±4.4 0.850±0.12 0.113±0.04 PnP-DIP 28.09±3.0 0.775±0.11 0.174±0.09 TV-RGB 30.26±4.7 0.867±0.13 0.097±0.04 PFusion 38.42±4.0 0.975±0.02 0.074±0.03 PIDS 41.71±4.5 0.984±0.02 0.068±0.03 The method in this application 43.51±3.5 0.992±0.01 0.057±0.03
[0178] Table 1 shows the reported average PSNR, SSIM, and SAM values for different methods across ten test scenarios, including the mean and standard deviation. For example, the average PSNR for the GAP-TV method is 27.65 ± 3.9, indicating a mean PSNR of 27.65 and a standard deviation of 3.9. As can be seen from Table 1, the method in this application exhibits the best performance in terms of average PSNR, SSIM, and SAM. Specifically, compared to the existing best method (PIDS), the method in this application improves the average PSNR by 1.8 dB, the average SSIM by 0.017, and the average SAM by 0.011.
[0179] The aforementioned hyperspectral image reconstruction method based on prior images and an autoencoder model first employs a dual-camera coded aperture snapshot spectral imaging system to capture the target scene, obtaining RGB images and two-dimensional compressed measurement images, which are then preprocessed to obtain preprocessed RGB images and two-dimensional compressed measurement images. Next, a hardware coding model corresponding to the dual-camera coded aperture snapshot spectral imaging system is designed. This hardware coding model receives the preprocessed two-dimensional compressed measurement images and models the process of the dual-camera coded aperture snapshot spectral imaging system acquiring the two-dimensional compressed measurement images, thereby obtaining an initial three-dimensional data cube of the target scene. Then, using the preprocessed RGB images as prior images, a regularization term based on the semantic similarity of the prior images is designed. A software decoding model is constructed based on the regularization term, and a pre-defined autoencoder model is used as a constraint term to obtain a software decoding model with constraints. Finally, an optimization iterative algorithm is used to solve the software decoding model with constraints, resulting in a three-dimensional data cube with relatively high overall spatial and spectral resolution for the reconstructed target scene. This method uses an autoencoder model as a constraint to improve spectral resolution. Through the synergistic effect of the prior image and the autoencoder model, it fully utilizes the full-resolution spatial information of the RGB image while avoiding spectral distortion caused by the relatively simpler spectral information of the RGB image compared to the hyperspectral image. Given a prior image as input, the autoencoder model's task is to generate a hyperspectral image while ensuring smoothness in both spatial and spectral dimensions. This means that the generated hyperspectral image not only matches the input prior image spectrally but also maintains spatial smoothness, which helps reduce noise and artifacts. Furthermore, in the process of solving the software decoding model with constraints using an iterative optimization algorithm, the autoencoder model is trained in a self-supervised manner in each iteration. This method has strong interpretability, good generalization ability, and good reconstruction results.
[0180] The above provides a detailed description of a hyperspectral image reconstruction method based on prior images and an autoencoder model provided by this invention. Specific examples have been used to illustrate the principles and implementation methods of this invention. The descriptions of the embodiments above are merely for the purpose of helping to understand the core ideas of this invention. It should be noted that those skilled in the art can make various improvements and modifications to this invention without departing from its principles, and these improvements and modifications also fall within the protection scope of the claims of this invention.
Claims
1. A hyperspectral image reconstruction method based on prior images and an autoencoder model, characterized in that, The method includes: S1. Preset an autoencoder model, train the autoencoder model and calculate the loss using a preset loss function to obtain the trained autoencoder model; S2. The target scene is captured by a dual-camera coded aperture snapshot spectral imaging system to obtain RGB images and two-dimensional compressed measurement images. The RGB images and two-dimensional compressed measurement images are preprocessed to obtain preprocessed RGB images and preprocessed two-dimensional compressed measurement images. S3. Preset the hardware coding model corresponding to the dual-camera coded aperture snapshot spectral imaging system. The hardware coding model receives and processes the preprocessed two-dimensional compressed measurement image to obtain the initial three-dimensional data cube of the target scene. S4. Using the preprocessed RGB image as a prior image, design a regularization term based on the semantic similarity of the prior image, construct a software decoding model according to the regularization term, and use the trained autoencoder model as a constraint term to obtain a software decoding model with constraint terms. S5. The software decoding model with constraints receives the initial three-dimensional data cube of the target scene and uses an optimization iterative algorithm to optimize and solve it, thereby obtaining the three-dimensional data cube of the reconstructed target scene.
2. The hyperspectral image reconstruction method based on prior images and autoencoder models as described in claim 1, characterized in that, S2 specifically includes: S21. A dual-camera coded aperture snapshot spectral imaging system is used to capture images of the target scene, obtaining RGB images and two-dimensional compressed measurement images of the target scene; S22. The RGB image and the two-dimensional compressed measurement image are cropped according to a preset band range and band interval to obtain the preprocessed RGB image and the preprocessed two-dimensional compressed measurement image.
3. The hyperspectral image reconstruction method based on prior images and autoencoder models as described in claim 2, characterized in that, The hardware coding model described in S3 receives and processes the preprocessed two-dimensional compressed measurement image to obtain the initial three-dimensional data cube of the target scene, specifically including: S31. The hardware coding model receives the preprocessed two-dimensional compressed measurement image and models the process of the dual-camera coded aperture snapshot spectral imaging system acquiring the two-dimensional compressed measurement image as follows: y=(1-γ)Hx+n Where y is the measured value of the preprocessed two-dimensional compressed measurement image, γ is the beam splitter splitting ratio, γ∈[0,1], H is the measurement matrix, x is the initial three-dimensional data cube of the target scene, and n is the noise in the process of acquiring the two-dimensional compressed measurement image; S32. Solve the model of the two-dimensional compressed measurement image obtained by the dual-camera coded aperture snapshot spectral imaging system to obtain the initial three-dimensional data cube of the target scene.
4. The hyperspectral image reconstruction method based on prior images and autoencoder models as described in claim 3, characterized in that, The design described in S4 is based on a regularization term for prior image semantic similarity, specifically including the following: S41. Calculate the L1 norm of the total variational difference between the initial 3D data cube of the target scene and the preprocessed RGB image: R(x, y) RGB )=∣∣TV(x)-TV(y RGB )∣∣ Where R(x, y) RGB The table shows the regularization terms for the initial 3D data cube of the target scene and the preprocessed RGB image, where x represents the initial 3D data cube of the target scene, and y represents the regularization term. RGB , where is the measured value of the preprocessed RGB image, TV(·) is the total variation norm, and ||·|| is the L1 norm; S42. Take the upper limit of the L1 norm after the total variational difference to obtain the regularization term based on prior image semantic similarity.
5. The hyperspectral image reconstruction method based on prior images and autoencoder models as described in claim 4, characterized in that, In S4, a software decoding model is constructed based on the regularization term. Specifically, the software decoding model is as follows: in, The three-dimensional data cube reconstructed for the target scene, where x is the initial three-dimensional data cube of the target scene, y is the measurement value of the preprocessed two-dimensional compressed measurement image, β is the weighting parameter, H is the measurement matrix, and TV(xy) is the measurement matrix. RGB ) represents the regularization term. This represents the L2 norm.
6. The hyperspectral image reconstruction method based on prior images and autoencoder models as described in claim 5, characterized in that, The software decoding model with constraints described in S4 is specifically as follows: subject to x=N AE (y RGB ) Among them, P AE N represents the parameters of the autoencoder model. AE (·) represents the autoencoder model.
7. The hyperspectral image reconstruction method based on prior images and autoencoder models as described in claim 6, characterized in that, S5 specifically includes the following: S51, The software decoding model with constraints receives the initial three-dimensional data cube of the target scene; S52. Transform the software decoding model with constraints into an augmented Lagrangian function; S53. Simplify the augmented Lagrangian function to obtain the expression for the solution of the software decoding model with constraints; S54. The expression of the solution is solved sequentially using the matrix inversion lemma, the Chambolle projection algorithm, and the trained autoencoder network, and the three-dimensional data cube of the reconstructed target scene is output.
8. The hyperspectral image reconstruction method based on prior images and autoencoder models as described in claim 7, characterized in that, The augmented Lagrange function described in S52 is specifically: Where b and d are auxiliary variables, z b and z d For Lagrange multipliers, z' b and z' d They are z b and z d transpose, λ b and λ d Here, AE(d) represents the output d after regularization of the autoencoder model, where d is the weight parameter.
9. The hyperspectral image reconstruction method based on prior images and autoencoder models as described in claim 1, characterized in that, The autoencoder model described in S1 specifically includes an encoder module and a decoder module connected to the encoder module. The encoder module includes several encoding layers connected in sequence, and the decoder module includes a number of decoding layers connected in sequence, the same as the number of encoding layers. The encoding layers and the decoding layers are connected in a skip connection.
10. The hyperspectral image reconstruction method based on prior images and autoencoder models as described in claim 1, characterized in that, In S1, the autoencoder model is trained and the loss is calculated using a preset loss function, which is specifically: Among them, Γ AE denoted as N, where α is the weight parameter and SSTV(·) is the final output N of the autoencoder model. AE (y RGB The total variational norm of the spatial and spectral directions is k, where k is the number of iterations.