Hyperspectral heterogeneous endmember unmixing method based on variational autoencoding and diffusion model

By combining variational autoencoders with diffusion models, the problems of endmember heterogeneity and noise interference in hyperspectral unmixing are solved, achieving accurate characterization of endmember distribution features and effective noise removal, thus improving unmixing accuracy and robustness.

CN122200331APending Publication Date: 2026-06-12NINGBO UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NINGBO UNIV
Filing Date
2026-03-05
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing hyperspectral unmixing methods suffer from insufficient unmixing accuracy and robustness when faced with endmember heterogeneity and noise interference, making it difficult to accurately characterize endmember distribution features and effectively remove noise interference.

Method used

A variational autoencoder and diffusion model is adopted. Spatial-spectral joint feature extraction is performed through a cross-branch cross-sharing module, semantic modeling is performed through a multi-scale feature modeling module, noise processing is performed through a lightweight diffusion denoising module, and abundance map optimization is performed through a gated residual module, finally generating an optimized abundance map.

🎯Benefits of technology

It effectively simulates the intrinsic variability of endmembers, enhances the interpretability and stability of abundance maps, and improves the accuracy and robustness of hyperspectral image unmixing.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122200331A_ABST
    Figure CN122200331A_ABST
Patent Text Reader

Abstract

The present application relates to a hyperspectral heterogeneous endmember unmixing method based on variational autoencoding and diffusion model, comprising: constructing a cross-branch cross-sharing module, performing space-spectrum joint feature extraction on the input hyperspectral image, generating a first feature representation for endmember modeling and a second feature representation for abundance estimation; constructing a multi-scale feature modeling module, outputting the distribution parameters of the endmember latent variable and the preliminary abundance map; constructing a lightweight diffusion denoising module to generate a denoised abundance map; constructing a gating residual module to generate an optimized abundance map; constructing a decoder to reconstruct the hyperspectral image according to the linear mixing model combined with the optimized abundance map. The beneficial effects of the present application are: the present application simulates the distribution of endmembers through the latent variables of variational autoencoding, describes the change characteristics of endmembers under nonlinear mixing conditions, and the latent variables can be explained as the heterogeneity of endmembers caused by factors such as light and environmental disturbance.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of remote sensing image processing technology, and more specifically, to a method for demixing hyperspectral heterogeneous endmembers based on variational autoencoders and diffusion models. Background Technology

[0002] Hyperspectral imaging technology plays a crucial role in environmental monitoring and precision agriculture by acquiring continuous and detailed spectral features of target areas. However, due to the relatively low spatial resolution of sensors, hyperspectral images contain many mixed pixels. Constrained by the mixed pixel effect, the actual acquired spectral signal is essentially a linear or nonlinear combination of multiple endmembers, and the spectral features of these endmembers are easily affected by environmental factors such as illumination conditions, atmospheric disturbances, and phenological changes, resulting in significant variations. Existing mixed pixel decomposition methods can be categorized into traditional linear models, physically corrected models, and deep learning methods.

[0003] Linear unmixing models (LMMs) are the simplest and most widely used unmixing method. Their principle is based on the assumption that each mixed pixel is a convex combination of endmembers, with abundance as a coefficient. However, this assumption cannot represent endmember heterogeneity caused by conditions such as illumination and climate. To overcome the limitations of standard linear mixing models, subsequent studies have proposed a series of extended models, such as the Hapke model and perturbation-based linear mixing models (PLMMs). The Hapke model introduces a bidirectional reflectance distribution function (BRDF) to correct for geometric optical effects. Although it considers optical and geometric parameters to address endmember heterogeneity, it struggles to handle a large number of parameters. The PLMM method adds perturbation terms to the endmember matrix. While these perturbation terms can compensate for errors in LMMs, this perturbation model lacks physical interpretation and struggles to express large-scale endmember variability due to variations in illumination conditions. In conclusion, the existence of endmember heterogeneity undoubtedly adds further difficulty and challenges to hyperspectral unmixing.

[0004] In recent years, deep learning technology has been widely applied to hyperspectral unmixing, demonstrating powerful feature learning and nonlinear mapping capabilities. For example, convolutional neural networks can effectively extract spatial-spectral features of pixels; generative adversarial networks can be used to simulate complex data distributions and generate more realistic mixed pixels; and Gaussian mixture models provide a framework for describing the statistical properties of endmembers or abundance. However, while these methods have driven the development of the field, they also have obvious limitations. First, most of them rely on an inherent premise that each land cover category corresponds to only one fixed "pure" endmember. This method cannot simulate the spectral fluctuations of endmembers caused by factors such as illumination, material, and particle size in the real environment. Therefore, when facing the common phenomenon of endmember heterogeneity, its unmixing accuracy and robustness are severely limited. Second, existing deep learning unmixing models often lack explicit modeling and robust design for noise when generating abundance maps. Hyperspectral images are inevitably contaminated by various noises during the acquisition process, while current networks usually treat the data as a "clean" ideal signal for processing. This leads to a significant degradation in abundance estimation results under noise interference, resulting in insufficient generalization ability.

[0005] In summary, while deep learning provides a powerful nonlinear modeling tool for hyperspectral unmixing, it faces significant challenges in two key areas: "endmember heterogeneity modeling" and "noise robustness." Therefore, future research should focus on developing next-generation intelligent unmixing models that can simultaneously perceive both the intrinsic variability of endmembers and external noise interference, thereby achieving more accurate and stable spectral unmixing for real-world complex scenarios.

[0006] Several key issues remain to be addressed in current spectral unmixing research. 1) First, how to construct a suitable model to simulate endmember heterogeneity caused by physical conditions such as atmosphere and illumination, thereby reducing the impact of endmember heterogeneity on the accuracy of hyperspectral unmixing. 2) Second, it is necessary to develop methods that can accurately characterize endmember distribution features, and on this basis, construct an unmixing framework that can alleviate endmember heterogeneity. 3) Furthermore, in the process of generating abundance maps, effectively removing noise interference and enhancing the representation quality and stability of the abundance maps are also crucial steps that cannot be ignored, as their effectiveness directly affects the reliability and practicality of the final unmixing results. Summary of the Invention

[0007] The purpose of this invention is to overcome the shortcomings of the prior art and to provide a hyperspectral heterogeneous endmember demixing method based on variational autoencoder and diffusion model.

[0008] Firstly, a hyperspectral heterogeneous endmember unmixing method based on variational autoencoders and diffusion models is provided, including:

[0009] Step 1: Construct a cross-branch cross-sharing module to extract joint spatial and spectral features from the input hyperspectral image, generating a first feature representation for endmember modeling and a second feature representation for abundance estimation;

[0010] Step 2: Construct a multi-scale feature modeling module to perform multi-scale semantic modeling on the first feature representation and the second feature representation respectively, and output the distribution parameters and preliminary abundance map of the endmember latent variables.

[0011] Step 3: Construct a lightweight diffusion denoising module to perform forward denoising and reverse denoising on the preliminary abundance features to generate a denoised abundance map.

[0012] Step 4: Construct a gated residual module to adaptively weight and fuse the preliminary abundance map and the denoised abundance map to generate an optimized abundance map;

[0013] Step 5: Construct a decoder, generate a heterogeneous endmember set based on the distribution parameters of the endmember latent variables, and reconstruct the hyperspectral image according to the optimized abundance map and the linear mixture model.

[0014] As a preferred option, it also includes:

[0015] Step 6: Construct a joint loss function and perform end-to-end optimization training on the network parameters to achieve collaborative optimization of endmember generation, abundance estimation and image reconstruction.

[0016] Preferably, in step 1, the cross-branch sharing module adopts a dual-branch cross-sharing network structure, including a spatial attention branch and a channel attention branch, which respectively extract the spatial structure information and spectral response features of the image, and establish a dynamic correlation between spatial features and spectral features through a cross-sharing mechanism, and finally generate the first feature representation and the second feature representation.

[0017] Preferably, the multi-scale feature modeling module adopts a parameter sharing mechanism, and performs multi-scale semantic modeling of input features through a multi-layer fully connected network, a batch normalization layer, a Dropout layer, and a nonlinear activation function; for the endmember modeling branch, it outputs the mean and variance of the latent variables, and obtains the latent spatial variables through reparameterization; for the abundance estimation branch, it outputs a preliminary abundance map.

[0018] Preferably, the lightweight diffusion denoising module gradually adds Gaussian noise to the preliminary abundance map through a forward diffusion process, and then gradually removes the noise through a reverse diffusion process. In the reverse process, skip connections are introduced to preserve the structural information of the abundance, and finally a denoised abundance map is generated.

[0019] Preferably, the gated residual module generates adaptive fusion weights through a gated network, performs residual weighted fusion of the preliminary abundance map and the denoised abundance map, and generates an optimized abundance map, while maintaining the non-negativity of abundance and the physical constraint that the sum is one.

[0020] Preferably, the decoder is based on a fully convolutional neural network framework, which maps the latent spatial variables of endmembers to the spectral space, generates a heterogeneous endmember set, and reconstructs the hyperspectral image based on a linear mixture model.

[0021] Preferably, the joint loss function includes a combination of the following loss terms: reconstruction error loss, KL divergence loss, spectral angular distance loss, and minimum volume constraint loss.

[0022] Secondly, a hyperspectral heterogeneous endmember unmixing system based on a variational autoencoder and diffusion model is provided for performing any of the methods described in the first aspect, including:

[0023] A cross-branch sharing module is used to perform joint spatial-spectral feature extraction on the input hyperspectral image, generating a first feature representation for endmember modeling and a second feature representation for abundance estimation.

[0024] The multi-scale feature modeling module is used to perform multi-scale semantic modeling on the first feature representation and the second feature representation respectively, and output the distribution parameters and preliminary abundance map of the endmember latent variables.

[0025] A lightweight diffusion denoising module is used to perform forward denoising and reverse denoising on the preliminary abundance features to generate a denoised abundance map.

[0026] The gated residual module is used to adaptively weight and fuse the preliminary abundance map and the denoised abundance map to generate an optimized abundance map.

[0027] The decoder is used to generate a heterogeneous endmember set based on the distribution parameters of the endmember latent variables, and reconstruct the hyperspectral image according to the linear mixture model by combining the optimized abundance map.

[0028] Thirdly, a computer storage medium is provided, wherein a computer program is stored therein; when the computer program is run on a computer, the computer causes the computer to perform any of the methods described in the first aspect.

[0029] The beneficial effects of this invention are:

[0030] 1. This invention simulates the distribution of endmembers by using latent variables of variational autoencoder to characterize the variation characteristics of endmembers under nonlinear mixed conditions. The latent variables can be interpreted as the heterogeneity of endmembers caused by factors such as illumination and environmental disturbances.

[0031] 2. This invention extracts features from the input hyperspectral image by designing a cross-branch sharing module, and extracts global and local information from the processed hyperspectral image by designing a multi-scale spatial-spectral cross-fusion network, reducing the impact of redundant information and better preserving semantic information. Furthermore, this invention designs a lightweight diffusion model to remove noise that may be present in the abundance map, enhancing the generation of spatially continuous and clearly defined abundance maps, thus improving the interpretability and practicality of the abundance maps.

[0032] 3. This invention designs a gated residual network to perform difference fusion on the denoised abundance map and the abundance map generated by spatial-spectral cross-fusion, resulting in a smoother, physically interpretable final abundance map. Furthermore, this invention generates corresponding endmembers through multi-scale convolution based on the posterior distribution of latent variables in the latent space. Finally, a reconstructed hyperspectral image is generated based on a linear mixture model. Attached Figure Description

[0033] Figure 1 The overall flowchart provided for this invention;

[0034] Figure 2 A schematic diagram of the multi-scale spatial attention module provided by the present invention;

[0035] Figure 3 A schematic diagram of the channel attention branch provided by the present invention;

[0036] Figure 4 A schematic diagram of the cross-branch cross-sharing module provided by the present invention;

[0037] Figure 5 A schematic diagram of the lightweight diffusion model module provided by this invention;

[0038] Figure 6 This is a schematic diagram of the gated residual control module provided by the present invention;

[0039] Figure 7 A schematic diagram showing the abundance plot visualization results of different methods on a simulated dataset;

[0040] Figure 8 A schematic diagram illustrating the projection of pixels and endmembers of analog data onto a two-dimensional plane using PCA;

[0041] Figure 9 A schematic diagram illustrating the abundance plot visualization results of different methods on the Jasper Bridge dataset;

[0042] Figure 10 This is a schematic diagram showing the abundance plot visualization results of different methods on the Apex dataset. Detailed Implementation

[0043] The present invention will be further described below with reference to embodiments. The description of the embodiments below is only for the purpose of helping to understand the present invention. It should be noted that those skilled in the art can make several modifications to the present invention without departing from the principle of the present invention, and these improvements and modifications also fall within the protection scope of the claims of the present invention.

[0044] Example 1:

[0045] To address the problem that existing hyperspectral unmixing methods suffer from endmember heterogeneity due to factors such as illumination conditions, observation environment, and differences within ground features, leading to unstable unmixing results and insufficient interpretability, this invention proposes a hyperspectral heterogeneous endmember unmixing method based on a variational autoencoder and diffusion model.

[0046] Specifically, the method includes:

[0047] Step 1: Construct a cross-branch cross-sharing module to extract joint spatial and spectral features from the input hyperspectral image, generating a first feature representation for endmember modeling and a second feature representation for abundance estimation.

[0048] In step 1, the cross-branch sharing module adopts a dual-branch cross-sharing network structure, including a spatial attention branch and a channel attention branch, which respectively extract the spatial structure information and spectral response features of the image, and establish a dynamic relationship between spatial features and spectral features through a cross-sharing mechanism, and finally generate the first feature representation and the second feature representation.

[0049] The purpose of this step is to leverage a dual-branch cross-sharing network architecture to overcome the limitations of local feature extraction, uncover global correlation information of hyperspectral images in both spectral and spatial dimensions, provide more comprehensive feature support for endmember extraction and abundance estimation, reduce outlier interference, and avoid unreasonable and drastic fluctuations during the unmixing process.

[0050] Step 2: Construct a multi-scale feature modeling module to perform multi-scale semantic modeling on the first feature representation and the second feature representation respectively, and output the distribution parameters and preliminary abundance map of the endmember latent variables.

[0051] In step 2, the multi-scale feature modeling module adopts a parameter sharing mechanism to perform multi-scale semantic modeling of the input features through a multi-layer fully connected network, a batch normalization layer, a Dropout layer, and a non-linear activation function; for the endmember modeling branch, the mean and variance of the latent variables are output, and the latent spatial variables are obtained through reparameterization; for the abundance estimation branch, a preliminary abundance map is output.

[0052] The purpose of this step is to achieve complementary fusion of features of different dimensions by using a hierarchical structure of multi-scale convolution, based on the dimensional characteristics and semantic focus of the two types of derived images. This avoids the omission of local details or global correlations by single-scale features, and can reduce computational redundancy while reducing model complexity and improving inference efficiency.

[0053] Step 3: Construct a lightweight diffusion denoising module to perform forward denoising and reverse denoising on the preliminary abundance features to generate a denoised abundance map.

[0054] In step 3, the lightweight diffusion denoising module gradually adds Gaussian noise to the preliminary abundance map through a forward diffusion process, and then gradually removes the noise through a reverse diffusion process. In the reverse process, skip connections are introduced to preserve the structural information of the abundance, and finally a denoised abundance map is generated.

[0055] The purpose of this step is to denoise the abundance map generated by the encoder, sharpen blurred edge information, reduce data dimensionality, and generate an abundance map with more prominent representational capabilities. When given a coarse abundance input, the diffusion model will use this learned "prior knowledge" to correct the parts that do not conform to the true abundance map distribution, making great use of spatial context information and improving the spatial consistency and structural authenticity of the abundance map.

[0056] Step 4: Construct a gated residual module to adaptively weight and fuse the preliminary abundance map with the denoised abundance map to generate an optimized abundance map.

[0057] In step 4, the gated residual module generates adaptive fusion weights through a gated network, performs residual weighted fusion of the preliminary abundance map and the denoised abundance map, and generates an optimized abundance map while maintaining the non-negativity of abundance and the physical constraint that the sum is one.

[0058] The purpose of this step is to use a gated residual network to achieve adaptive fusion of two abundance maps and learn a gated residual correction mechanism to adaptively adjust the contribution of the residual terms according to the input features. This preserves the advantages of both abundance maps while suppressing their disadvantages, thereby generating a physically consistent and smoother abundance map and enhancing the accuracy of the abundance map.

[0059] Step 5: Construct a decoder, generate a heterogeneous endmember set based on the distribution parameters of the endmember latent variables, and reconstruct the hyperspectral image according to the optimized abundance map and the linear mixture model.

[0060] The purpose of this step is to reconstruct hyperspectral images using the processed abundance map and endmembers obtained by multi-scale convolution, employing the traditional linear mixture model method.

[0061] In step 5, the decoder is based on a fully convolutional neural network framework, which maps the latent spatial variables of endmembers to the spectral space, generates a heterogeneous endmember set, and reconstructs the hyperspectral image based on a linear mixture model.

[0062] Example 2:

[0063] Based on Example 1, Example 2 of this application provides a more specific method for demixing hyperspectral heterogeneous endmembers based on variational autoencoders and diffusion models, including the following steps:

[0064] Step 1: Construct a Cross-Branch Sharing Module (CBSN) to perform joint spatial-spectral feature modeling on the input hyperspectral image H. The CBSN module includes a Multi-Scale Spatial Attention Branch (MSA) and a Channel Attention Branch (CA), which are used to extract the spatial structure information and spectral response features of the image, respectively. A cross-sharing mechanism is used to establish a dynamic correlation between spatial and spectral features, breaking the limitations of single-information extraction and achieving collaborative mining of spectral and spatial features. Simultaneously, a similar weighting strategy is used to generate feature representations for endmember latent modeling. and feature representations for abundance estimation CBSN can enhance the feature learning capability of global context and effectively suppress the interference of abnormal pixels on subsequent demixing results.

[0065] Specifically, the MSA branch is mainly used to extract spatial features from hyperspectral images. First, it analyzes the spatial dimensions of the hyperspectral image. Average pooling and max pooling operations are performed separately to obtain the response features of pixels under different spatial statistical meanings, and then fused. Simultaneously, the features are copied to the same input for both branches. , :

[0066]

[0067] in, , These represent average pooling and max pooling operations, respectively. This represents the feature concatenation operation. Building upon this, convolutional kernels of different scales are introduced to perform multi-scale convolution operations on the spatial features, obtaining spatial difference information across multiple receptive fields while preserving local details.

[0068]

[0069] in, Indicates adoption Two-dimensional convolution operations with varying kernel sizes are used to acquire spatial context information at different scales. These multi-scale spatial features are then fused and combined with three-dimensional convolution. The activation function is normalized to obtain the multi-scale spatial attention weights, which are expressed as follows:

[0070]

[0071] in, and These represent the spatial attention weights for the latent variable branch and the abundance branch, respectively. This represents a 3D convolution operation.

[0072] The CA branch is primarily used to extract spectral features from hyperspectral images. First, the input hyperspectral image... Average pooling and max pooling operations are performed separately along the spatial dimension to extract global statistical features for different spectral channels:

[0073]

[0074] in, This represents the activation function. It incorporates statistical features. and By fusing the data, we obtain the channel attention weights:

[0075]

[0076] in, and These represent the spectral weights corresponding to the latent variable branch and the abundance branch, respectively.

[0077] Based on the above spatial attention weights With channel attention weights A cross-sharing mechanism is introduced to achieve information interaction between spatial and spectral features. First, an intermediate cross-sharing representation is generated through concatenation and convolution operations:

[0078]

[0079] in, and These represent the results of similar fusions, namely spatial fusion and spectral fusion; subsequently, weighting coefficients are introduced. The original attention weights and cross-shared weights are then fused using a weighted average method to obtain the fused spatial weights and spectral weights, which are expressed as follows:

[0080]

[0081] in, , , This represents the weights after fusion. This represents a multiplication operation; finally, the fused spatial and spectral weights are applied to the original hyperspectral image, respectively. The feature representation output by the CBSN module is obtained, and its calculation process is as follows:

[0082]

[0083] in, and These represent feature maps used for abundance estimation and endmember latent modeling, respectively.

[0084] Step 2: Construct a multi-scale convolutional module (MSC) to represent the features. and The inputs are fed into the MSC module, which models features at different scales and semantic levels through a multi-layer fully connected network, batch normalization layers, Dropout layers, and non-linear activation functions. This extracts fine-grained local and edge information. A parameter-sharing mechanism is employed to reduce model complexity and computational redundancy, thereby improving inference efficiency. For the endmember extraction branch, the MSC module outputs the mean of the latent variables. With variance And obtain the latent space variables through reparameterization. It is used to characterize the distribution characteristics of endmembers under the influence of factors such as illumination changes and environmental disturbances; for the abundance estimation branch, the MSC module outputs the preliminary estimated abundance features. .

[0085] In step 2, the MSC module is used to perform further multi-scale semantic modeling on the feature maps output by the CBSN module. The MSC module processes the feature representations used for endmember latent modeling. Characteristic representation used for abundance estimation By sharing the structure but outputting independently, local structural information and global semantic features at different scales are extracted, providing high-quality input for subsequent latent variable modeling and abundance generation.

[0086] Specifically, firstly, and The inputs are fed into a multi-scale convolutional network, which consists of multiple fully connected layers, normalization layers, and nonlinear activation functions. It maps features at different scales through a parameter-sharing mechanism. The computation process is as follows:

[0087]

[0088] in, The MSC module, consisting of a fully connected layer (FCNN), a batch normalization layer (BN), a dropout layer, and an activation function (ReLU), is used to extract multi-scale semantic features and suppress overfitting.

[0089] In the endmember latent modeling branch, the mapped features Parametric modeling is performed to obtain the posterior distribution parameters of the latent variables. The calculation process is as follows:

[0090]

[0091] in, and Let these represent the mean vector and standard deviation vector of the latent variables, respectively. and This is a learnable parameter matrix. From this, we obtain the approximate posterior distribution of the latent variables:

[0092]

[0093] in, This represents a latent variable vector, used to characterize the implicit changes in endmembers under the influence of factors such as illumination conditions, terrain differences, and changes in the observation environment, thus providing a physically interpretable latent representation for endmember heterogeneity modeling. Representing latent variables An approximate distribution of the posterior distribution, used when the true posterior cannot be directly solved. In this case, a parametric approximation model is performed on it; Let represent the diagonal covariance matrix.

[0094] In the abundance estimation branch, the mapped features Perform a nonlinear transformation to generate preliminary abundance estimation results. :

[0095]

[0096] in, Representing a nonlinear function, the MSC module ultimately outputs two results: one is the latent variable distribution parameters used for endmember generation. Secondly, it serves as a preliminary abundance map for subsequent diffusion denoising and fusion processing. .

[0097] Step 3: Construct a lightweight diffusion model module (LDM) to further improve the spatial continuity and boundary clarity of the abundance results, and to incorporate the preliminary abundance features. The input is fed into the LDM module, where it undergoes a forward noise addition and a backward noise reduction process to... To suppress existing noise, enhance spatial consistency of similar land cover areas, sharpen blurred edge information, reduce data dimensionality, and generate denoised abundance features. .

[0098] In step 3, the extracted preliminary abundance map is processed. Denoising and spatial consistency enhancement are performed by simulating the evolution of abundance distribution under noise perturbation. The unstable regions in the initial abundance caused by noise, local discontinuities and boundary ambiguity are corrected, thereby generating an optimized abundance representation with stronger spatial continuity and clearer boundaries.

[0099] Specifically, during the forward diffusion process, Gaussian noise is gradually added to the initial abundance map to construct a perturbation sequence of the abundance distribution. For the preliminary abundance map, at time step The diffusion state under the following conditions is denoted as Its forward diffusion process is defined as:

[0100]

[0101] in, For noise dispatch coefficient, The identity matrix is ​​used to control the intensity of injected noise at different diffusion stages. Through the forward diffusion process, the initial abundance map is gradually mapped to the noise space, thus explicitly characterizing the evolution of the abundance distribution under random perturbations.

[0102] In the back diffusion stage, a parameterized model is introduced. An approximate model is constructed for the inverse conditional distribution corresponding to the forward diffusion process to progressively remove noise and recover the latent structural information of the abundance:

[0103]

[0104] in, and They represent the parameters respectively. The control mean and variance functions are used to characterize the conditional evolution of abundance distribution during the denoising process.

[0105] By iteratively sampling this inverse distribution, noise components in the initial abundance map are gradually suppressed. After multiple steps of inverse diffusion iteration, the denoised abundance estimation result is obtained at the final time step, denoted as... Considering the need for consistency in the multi-scale spatial structure of abundance maps, a skip connection mechanism is introduced to preserve structural information in the initial abundance data:

[0106]

[0107] Furthermore, classic diffusion models typically require 2000 steps of forward noise addition to ensure generation quality. However, excessive steps significantly increase the computational cost of training and sampling. Therefore, considering the characteristics of the experimental data and computational resource constraints, this invention adjusts the noise addition steps to 1000 steps in the experiments presented here for lightweighting. Preliminary experimental results show that a setting of 1000 steps is sufficient for the model to converge fully, achieving generation performance comparable to 2000 steps, while reducing training time by approximately 50%, thus achieving an effective balance between accuracy and efficiency.

[0108] Step 4: Construct the gated residual module (GRF) to process the residuals generated by MSC. With LDM generated The input is fed into the GRF module, where it is adaptively adjusted via a gating network. and The fusion weights are determined, and a residual structure is used for weighted fusion, while maintaining the non-negativity of abundance and the physical constraint that the sum is one, ultimately generating the optimized abundance results. .

[0109] In step 4, the abundance generated by MSCs abundance generated by LDM Adaptive fusion is performed by introducing a gating mechanism to maintain the continuity of the abundance space and the clarity of the boundaries, while avoiding the loss of detail information caused by excessive smoothing, thereby generating a final abundance representation that is both stable and discriminative.

[0110]

[0111] in, This represents a nonlinear mapping function consisting of convolutional and fully connected layers, with gated weights. Used to depict different spatial locations and The relative reliability of abundance information. A weighted combination of preliminary and optimized abundance values ​​is performed using residual fusion.

[0112]

[0113] in, This represents the final abundance map after gating residual fusion.

[0114] Through the aforementioned gated residual fusion mechanism, the model can preferentially introduce denoised stable abundance information in spatially continuous regions, and retain local structural features in the initial abundance in boundary or detail regions, thereby achieving an adaptive balance between smoothness and detail preservation.

[0115] Step 5: Construct a decoder based on a fully convolutional neural network framework, using latent spatial variables. Using the posterior distribution characteristics as input, a heterogeneous endmember set is generated. Combined with the abundance results generated by the GRF module Reconstructed hyperspectral images are generated based on a linear mixing model.

[0116] Specifically, a decoder based on the FCNN framework is constructed, and the latent variables are processed through an endmember decoder. Mapped to endmember spectral space:

[0117]

[0118] in, Indicates by parameters The controlled endmember generation network consists of a fully convolutional network and multiple layers of nonlinear mappings. This represents the generated set of endmembers. This process achieves explicit characterization of endmember heterogeneity through generative modeling of endmember spectra, thereby characterizing endmember heterogeneity caused by variations in illumination, differences in observation conditions, and inhomogeneities within ground features. Compared with the optimized abundance map As input, hyperspectral images are reconstructed based on a linear mixture model:

[0119]

[0120] in, This represents the reconstructed hyperspectral image. While maintaining the linear mixing physics assumption, the reconstruction process introduces an endmember generation mechanism driven by latent variables, enabling the reconstruction results to adaptively reflect the spatial variation characteristics of the endmember spectra.

[0121] Step 6: Construct a joint loss function for multi-objective optimization. Based on the joint loss function of the reconstruction error loss function of hyperspectral imagery, KL divergence loss function, spectral angle loss function, and minimum volume constraint loss function, optimize the parameters of the above network to achieve synergistic optimization of endmember generation, abundance estimation, and image reconstruction.

[0122] Specifically, to achieve unified optimization training for cross-branch feature extraction, latent variable modeling, abundance denoising optimization, and endmember generation reconstruction processes, a joint loss function composed of image reconstruction consistency, latent distribution constraints, spectral similarity constraints, and physical feasibility constraints is constructed to perform end-to-end optimization of the entire network. First, the hyperspectral image reconstruction error loss:

[0123]

[0124] Secondly, to constrain the continuity and interpretability of the latent variable space, the KL divergence loss is calculated to make the posterior distribution of the latent variables approximate the prior distribution:

[0125]

[0126] To enhance the morphological consistency between the generated endmembers and the true spectrum, the spectral angular distance loss is calculated:

[0127]

[0128] in, express The mean of the generated endmembers. To avoid excessive dispersion of generated endmembers in the spectral space, a minimum volume constraint loss is introduced:

[0129]

[0130] Finally, the above losses are weighted and combined to form the overall optimization objective function:

[0131]

[0132] in This represents the weight coefficient. By optimizing the loss function above, we can ensure that the model can accurately capture endmember variability while avoiding problems such as over- or under-modeling of variability and endmember distortion, ultimately achieving the goal of high unmixing accuracy and strong generalization ability.

[0133] The core objective of this step is to guide the network to simultaneously meet multiple technical objectives of hyperspectral unmixing during training by using a multi-loss function joint constraint approach, thereby achieving synergistic optimization of the accuracy, rationality, and physical consistency of endmember extraction and abundance estimation.

[0134] Furthermore, embodiments of the present invention also provide experimental comparisons based on different data.

[0135] Table 1 summarizes the quantitative analysis results of different algorithms on the simulated dataset, including the RMSE and SAD values ​​of each endmember. Figure 7 The abundance maps estimated by different algorithms on the simulated dataset are shown, where EM#1, EM#2, and EM#3 represent different substances.

[0136] Table 1 shows the analysis results of different algorithms on the simulated dataset. (Bold text indicates the optimal result.)

[0137]

[0138] As shown in Table 1, the proposed method outperforms other comparative methods in estimating endmembers EM#1 and EM#2. Specifically, on EM#1, compared to the second-best performing PPM-Net, the proposed method reduces the estimation error by approximately 59.9%; on EM#2, the error is reduced by approximately 29%. This is equivalent to improving the endmember extraction accuracy by approximately 2.5 times and 1.4 times, respectively. However, on EM#3, PPM-Net's accuracy is slightly higher than our method. We speculate that this phenomenon may be due to PPM-Net's stronger ability to model the specific spectral features of EM#3 during the unmixing process. Regarding the SAD metric, MuCAEU performs best overall, thanks to its combination of multi-scale downsampling and multi-stage edge consistency mechanisms. Although our method is slightly lower than this result, the difference is small, and both are within a reasonable error range. While PPM-Net maintains its lead in metrics such as AAD, our proposed method excels in the crucial SAD metric, significantly outperforming other methods.

[0139] Principal component analysis (PCA) projects the pixels and endmembers in the simulated data onto a two-dimensional plane, yielding the visualization results as shown below. Figure 8 As shown. From Figure 8 It is evident that the performance of MiSiCNet, MuCAEU, and DFNN methods is unsatisfactory, with significant deviations between their endmember estimation results and the true distribution. This is mainly because these methods do not fully consider the spectral variability of endmembers during the modeling process, resulting in insufficient adaptability to endmember variations in mixed pixels. This further illustrates that endmember variability is a crucial factor that cannot be ignored when processing hyperspectral image unmixing. In contrast, methods such as ELMM, DGMSSU, PUMSU, and HUMSCAN, which introduce endmember variability mechanisms to some extent, exhibit different distribution characteristics. Specifically, DGMSSU and HUMSCAN estimate relatively dense endmember distributions, and some estimated points in DGMSSU even exceed the actual pixel boundaries, indicating that these methods may have an overfitting tendency, i.e., they are too sensitive to modeling variability. On the other hand, the endmember distributions of PUMSU, PPM-Net, and SSAF-Net are relatively sparse, and some estimated points deviate from the true pixel region. This may be because their models do not adequately characterize endmember variability, resulting in limited expressive power.

[0140] Notably, the proposed method in this paper most closely approximates the true endmember distribution (GT) in the two-dimensional scatter plot, significantly outperforming other comparative methods. This advantage stems primarily from two aspects: first, the proposed method explicitly models the spectral variability of endmembers, enabling a more accurate description of their dynamic changes in both spatial and spectral dimensions; second, the denoising mechanism introduced into the model effectively suppresses noise interference in endmember extraction, reducing information loss and thus improving the model's robustness and interpretability. These two aspects work together to significantly enhance the model's ability to resolve complex mixed pixels, resulting in superior overall performance in endmember estimation tasks.

[0141] In addition, Table 2 presents the quantitative evaluation results of different methods on the Jasper Bridge dataset. Figure 9 The abundance estimation results of the corresponding methods on this dataset are then presented in a visualization. The differences in spatial distribution reconstruction performance among the methods can be intuitively compared from the abundance plot.

[0142] Quantitative results show that the proposed method achieves optimal values ​​for both the RMSE and overall aRMSE of the four endmembers, demonstrating its comprehensive advantage in joint estimation of endmembers and abundance. Regarding the SAD index, the proposed method leads in all endmembers except soil; although the SAD value of PUMSU in the soil endmember is slightly lower than that of the proposed method, the difference is only 8.5%, which is within an acceptable error range. This result further illustrates that the proposed method has stable unmixing performance across different land cover types.

[0143] In terms of the SID (Spectral Unmixing Index) metric, PPM-Net and HUMSCAN outperform our proposed method. This is because PPM-Net learns endmember parameters from the pixel latent space using a variational autoencoder, enabling it to capture more detailed spectral variations and thus excelling in SID, a metric that emphasizes spectral shape differences. HUMSCAN, on the other hand, effectively models endmember variability through multi-scale convolution and attention mechanisms, thus offering an advantage in spectral unmixing accuracy. Notably, our proposed method achieves the best result of 0.15802 in the AAD (Anti-Aggregation Rank) metric, significantly outperforming other comparative methods and further validating its superior overall unmixing performance.

[0144] Table 2 shows the analysis results of different algorithms on the Jasper Bridge dataset. Bold text indicates the optimal result.

[0145]

[0146] In addition, Table 3 shows the performance metrics of different methods on the Apex dataset, and the corresponding abundance estimation results are as follows: Figure 10As shown in the figure, this abundance diagram visually presents the differences in spatial decomposition effects among various methods, facilitating comprehensive comparative analysis.

[0147] On the Apex dataset, our proposed method achieved the lowest RMSE values ​​for all endmembers except the Water endmember, with an overall aRMSE of only 0.05582, significantly outperforming other comparative methods. The traditional method ELMM showed relatively balanced RMSE performance (approximately 0.12–0.16), but still higher than our proposed result. MiSiCNet, MuCAEU, and DFNN generally had higher RMSEs (some exceeding 0.4), reflecting significant biases in endmember estimation, possibly due to their limited adaptability to endmember variation and scene noise. In terms of AAD, HUMSCAN achieved the best value (0.27509), indicating its advantage in spectral matching uniformity. Our proposed method achieved an AAD of 0.27899, outperforming most methods, but with slightly lower accuracy than HUMSCAN, suggesting room for improvement in preserving subtle spectral structures. On the AID metric, which measures relative error, SSAF-Net (2.45821) and the proposed method (2.39439) perform similarly and are both the best, indicating that both exhibit good robustness in overall error control, further verifying the high accuracy and strong robustness of the proposed method in complex pixel decomposition tasks.

[0148] Our proposed method outperforms HUMSCAN on multiple evaluation metrics, primarily due to the explicit modeling mechanism for endmember spectral variability and the integration of effective noise suppression strategies. This allows the method to significantly improve unmixing accuracy while maintaining strong generalization ability. Although it slightly lags behind HUMSCAN on some metrics reflecting local spectral fidelity (such as AAD), our method significantly outperforms existing mainstream methods in global unmixing accuracy (such as aRMSE), result stability, and overall comprehensive performance. Experimental results fully validate the effectiveness and advancement of our proposed model in hyperspectral unmixing tasks.

[0149] Table 3 shows the analysis results of different algorithms on the Apex dataset. The bolded results indicate the optimal results.

[0150]

[0151] It should be noted that the parts in this embodiment that are the same as or similar to those in Embodiment 1 can be referred to each other, and will not be repeated in this application.

[0152] Example 3:

[0153] Based on Example 2, Example 3 of this application provides a hyperspectral heterogeneous endmember demixing system based on a variational autoencoder and diffusion model, comprising:

[0154] A cross-branch sharing module is used to perform joint spatial-spectral feature extraction on the input hyperspectral image, generating a first feature representation for endmember modeling and a second feature representation for abundance estimation.

[0155] The multi-scale feature modeling module is used to perform multi-scale semantic modeling on the first feature representation and the second feature representation respectively, and output the distribution parameters and preliminary abundance map of the endmember latent variables.

[0156] A lightweight diffusion denoising module is used to perform forward denoising and reverse denoising on the preliminary abundance features to generate a denoised abundance map.

[0157] The gated residual module is used to adaptively weight and fuse the preliminary abundance map and the denoised abundance map to generate an optimized abundance map.

[0158] The decoder is used to generate a heterogeneous endmember set based on the distribution parameters of the endmember latent variables, and reconstruct the hyperspectral image according to the linear mixture model by combining the optimized abundance map.

[0159] It should be noted that the system provided in this embodiment is the system corresponding to the method provided in embodiment 2. Therefore, the parts that are the same as or similar to those in embodiment 2 in this embodiment can be referred to each other, and will not be described again in this application.

Claims

1. A hyperspectral heterogeneous endmember unmixing method based on variational autoencoder and diffusion model, characterized in that, include: Step 1: Construct a cross-branch cross-sharing module to extract joint spatial and spectral features from the input hyperspectral image, generating a first feature representation for endmember modeling and a second feature representation for abundance estimation; Step 2: Construct a multi-scale feature modeling module to perform multi-scale semantic modeling on the first feature representation and the second feature representation respectively, and output the distribution parameters and preliminary abundance map of the endmember latent variables. Step 3: Construct a lightweight diffusion denoising module to perform forward denoising and reverse denoising on the preliminary abundance features to generate a denoised abundance map. Step 4: Construct a gated residual module to adaptively weight and fuse the preliminary abundance map and the denoised abundance map to generate an optimized abundance map; Step 5: Construct a decoder, generate a heterogeneous endmember set based on the distribution parameters of the endmember latent variables, and reconstruct the hyperspectral image according to the optimized abundance map and the linear mixture model.

2. The hyperspectral heterogeneous endmember unmixing method based on variational autoencoder and diffusion model according to claim 1, characterized in that, Also includes: Step 6: Construct a joint loss function and perform end-to-end optimization training on the network parameters to achieve collaborative optimization of endmember generation, abundance estimation and image reconstruction.

3. The hyperspectral heterogeneous endmember unmixing method based on variational autoencoder and diffusion model according to claim 2, characterized in that, In step 1, the cross-branch cross-sharing module adopts a dual-branch cross-sharing network structure, including a spatial attention branch and a channel attention branch, which respectively extract the spatial structure information and spectral response features of the image, and establish a dynamic correlation between spatial features and spectral features through a cross-sharing mechanism, and finally generate the first feature representation and the second feature representation.

4. The hyperspectral heterogeneous endmember unmixing method based on variational autoencoder and diffusion model according to claim 3, characterized in that, The multi-scale feature modeling module adopts a parameter sharing mechanism, and performs multi-scale semantic modeling of input features through a multi-layer fully connected network, a batch normalization layer, a Dropout layer, and a non-linear activation function; for the endmember modeling branch, the mean and variance of the latent variables are output, and the latent spatial variables are obtained through reparameterization. For the abundance estimation branch, output a preliminary abundance map.

5. The hyperspectral heterogeneous endmember unmixing method based on variational autoencoder and diffusion model according to claim 4, characterized in that, The lightweight diffusion denoising module gradually adds Gaussian noise to the preliminary abundance map through a forward diffusion process, and then gradually removes the noise through a reverse diffusion process. In the reverse process, skip connections are introduced to preserve the structural information of the abundance, and finally a denoised abundance map is generated.

6. The hyperspectral heterogeneous endmember unmixing method based on variational autoencoder and diffusion model according to claim 5, characterized in that, The gated residual module generates adaptive fusion weights through a gated network, performs residual weighted fusion on the preliminary abundance map and the denoised abundance map, and generates an optimized abundance map while maintaining the non-negativity of abundance and the physical constraint that the sum is one.

7. The hyperspectral heterogeneous endmember unmixing method based on variational autoencoder and diffusion model according to claim 6, characterized in that, The decoder is based on a fully convolutional neural network framework, which maps the latent spatial variables of endmembers to the spectral space, generates a heterogeneous set of endmembers, and reconstructs hyperspectral images based on a linear mixture model.

8. The hyperspectral heterogeneous endmember unmixing method based on variational autoencoder and diffusion model according to claim 7, characterized in that, The joint loss function includes a combination of the following loss terms: reconstruction error loss, KL divergence loss, spectral angular distance loss, and minimum volume constraint loss.

9. A hyperspectral heterogeneous endmember unmixing system based on a variational autoencoder and diffusion model, characterized in that, For performing the method according to any one of claims 1 to 8, comprising: A cross-branch sharing module is used to perform joint spatial-spectral feature extraction on the input hyperspectral image, generating a first feature representation for endmember modeling and a second feature representation for abundance estimation. The multi-scale feature modeling module is used to perform multi-scale semantic modeling on the first feature representation and the second feature representation respectively, and output the distribution parameters and preliminary abundance map of the endmember latent variables. A lightweight diffusion denoising module is used to perform forward denoising and reverse denoising on the preliminary abundance features to generate a denoised abundance map. The gated residual module is used to adaptively weight and fuse the preliminary abundance map and the denoised abundance map to generate an optimized abundance map. The decoder is used to generate a heterogeneous endmember set based on the distribution parameters of the endmember latent variables, and reconstruct the hyperspectral image according to the linear mixture model by combining the optimized abundance map.

10. A computer storage medium, characterized in that, The computer storage medium stores a computer program; when the computer program is run on the computer, it causes the computer to perform the method described in any one of claims 1 to 8.