A self-supervised hyperspectral change detection method and system

By employing self-supervised learning and multi-scale mask simulation, a global-local feature extraction and aggregation network was designed. This solved the problems of time-consuming and laborious labeling of data and difficulty in pairing multi-temporal samples in hyperspectral change detection, thus achieving efficient change detection.

CN119295936BActive Publication Date: 2026-06-19CHONGQING UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHONGQING UNIV
Filing Date
2024-10-08
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing methods for detecting hyperspectral changes rely on labeled data. Sample acquisition and labeling are time-consuming and labor-intensive, and pairing samples from multiple time periods is difficult, resulting in insufficient model generalization ability.

Method used

A self-supervised learning method is adopted to simulate the changing region through multi-scale masks and learn the change features from single-temporal hyperspectral images. A global-local multi-scale feature extraction and aggregation network is designed, and the model is trained using mask supervision loss and reconstruction loss, without the need for labeled samples and downstream fine-tuning.

🎯Benefits of technology

This method improves the accuracy and generalization ability of hyperspectral change detection without requiring labeled samples or downstream fine-tuning, outperforming existing methods.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119295936B_ABST
    Figure CN119295936B_ABST
Patent Text Reader

Abstract

This invention relates to a self-supervised hyperspectral change detection method and system, belonging to the field of deep learning technology. The method includes the following steps: S1: hyperspectral image preprocessing; S2: multi-scale mask change simulation; S3: inputting single-phase samples and pseudo-second-phase samples obtained after mask-simulated changes into a global-local feature aggregation encoder-decoder for training; S4: after training, performing change detection on the complete image to obtain the results. The method described in this invention outperforms other hyperspectral image farmland change detection methods, and can efficiently train the model without the need for labeled samples or downstream fine-tuning.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of deep learning technology and relates to a self-supervised hyperspectral change detection method and system. Background Technology

[0002] Remote sensing image change detection detects changes by observing remote sensing images of the same area at different times, and is widely used in urban planning, environmental monitoring, agricultural surveys, and disaster assessment. Compared to multispectral remote sensing images that only contain partial bands, hyperspectral data has the advantage of providing continuous and detailed spectral features over a large spectral range, exhibiting a "map-spectrum integration" characteristic, which can better distinguish the changed parts between two images. With the development of imaging spectral technology, many satellites have provided abundant hyperspectral images for studying surface changes, but they also face significant challenges in labeling.

[0003] Classical change detection methods primarily utilize spectral information to construct algebraic operations or project hyperspectral images into a low-dimensional feature space to reveal the attributes of change, represented by methods such as change vector analysis, iterative multivariate change detection, slow feature analysis, and principal component analysis. Traditional change detection methods are often based on spectral differences between images from different time phases, failing to fully utilize the inherent features of complex hyperspectral images. In recent years, the development of deep learning technology has brought new solutions to improve the efficiency and accuracy of change detection. Initially, deep learning methods heavily relied on supervised information for learning; some supervised methods based on labeled data, such as BCNNS, MLEDAN, and MSDFFN, showed good accuracy. However, sample acquisition and labeling are time-consuming and laborious. Therefore, some works began to introduce self-supervised strategies into change detection to alleviate the dependence on labeled data. Existing self-supervised hyperspectral change detection methods can be broadly divided into two categories: one generates high-confidence pseudo-labels as supervised information, with representative examples including BCG Net and S3Net; the other constructs supervised information between features based on contrastive learning, with representative examples including DSConvAEs, HyperNet, and UA-GSSL. However, the methods described above typically require preprocessing of paired images from the same geographic location during the training phase, and some methods also require fine-tuning with a small number of labels. Unlabeled single-temporal hyperspectral images are easier to obtain and less expensive than paired labeled multi-temporal hyperspectral images.

[0004] To overcome the challenges posed by limited labeled data and to expand the generalization ability of the model, this invention introduces self-supervised learning to fully utilize the information from unlabeled samples, and improves the accuracy of current change detection from the perspective of a mask based on single-phase imagery. Summary of the Invention

[0005] In view of this, the primary objective of this invention is to provide a self-supervised hyperspectral change detection method that uses multi-scale masks to simulate change regions, learns effective change features from single-temporal hyperspectral images, and obtains an efficient change detector. First, a multi-scale mask change simulation strategy based on the original spectrum is designed to simulate images with pseudo-second-temporal changes. Next, a global-local multi-scale feature extraction and aggregation network is designed to detect changes between the single-temporal hyperspectral image and the masked hyperspectral image. Then, the model is trained using mask-supervised loss and reconstruction loss. This method can efficiently train the model without the need for labeled samples or downstream fine-tuning.

[0006] To achieve the above objectives, the present invention provides the following technical solution:

[0007] A self-supervised hyperspectral change detection method includes the following steps: S1: hyperspectral image preprocessing; S2: multi-scale mask change simulation; S3: inputting single-phase samples and pseudo-second-phase samples obtained after mask-simulated changes into a global local feature aggregation encoder / decoder for training; S4: after training, performing change detection on the complete image to obtain the results.

[0008] Furthermore, in step S1, preprocessing is performed on the single-temporal hyperspectral image, mainly including geometric distortion correction, spectral image vignetting correction, radiometric correction, etc.; block extraction is performed on the preprocessed dataset. In this method, when selecting samples, considering the patches centered on the edge pixels of the image, the edges of the image are first expanded in a mapping manner, and 32×32 blocks are used as sliding windows to generate samples.

[0009] Furthermore, in step S2, to obtain a hyperspectral image close to the real second time phase, we use multi-scale masks to simulate the changing regions. To adapt to the changing features of ground features under different conditions, we set different mask sizes and different mask ratios, and randomly mix these independent multi-scale masks with equal probability to obtain the final multi-scale mask M. The spectrum of the original patch center pixel is horizontally flipped to obtain the changing spectrum, and the spectrum of the masked area is replaced with the changing spectrum. Random noise is added to the remaining areas to simulate the imaging differences in the real scene. The formula for the mask change simulation strategy is as follows:

[0010]

[0011] Where (1-M) represents the unmasked area, This represents the hyperspectral image after adding a mask. This represents the hyperspectral image after adding Gaussian noise. This represents the spectral value after inversion.

[0012] Furthermore, in step S3, the single-temporal hyperspectral image and the pseudo-second-temporal hyperspectral image obtained after mask simulation transformation are input into the global-local feature aggregation encoder-decoder. First, rich multi-scale features are obtained in the encoder stage. Then, the feature map of the same scale is subtracted from the dual-temporal feature map to obtain the differential feature map. Then, these multi-scale differential features are connected to the decoder through skip connections to realize cross-layer connection of features and reduce the loss of details.

[0013] Furthermore, in the global-local feature aggregation encoder-decoder, the encoding stage consists of three downsampling units. Each downsampling unit contains a global fusion module, a local fusion module, and a downsampling operation. The encoder stage can be summarized as follows:

[0014]

[0015] in, f represents the feature map generated by the i-th convolutional layer of the input patch. Conv 3×3 This indicates a convolution operation with a kernel size of 3×3. LMM(·) and GMM(·) represent the local fusion module and the global fusion module, respectively. All convolutional layers are followed by a batch normalization layer and a LeakyReLU activation function layer.

[0016] Furthermore, in the global-local feature aggregation encoder-decoder, the decoding stage includes a detection decoder and a reconstruction decoder; the feature maps of the two branches of the obtained encoder are differentially processed to obtain differential feature maps; the differential feature maps are then input into the multi-layer upsampling detection decoder to restore the feature maps to their original input size. The process of the detection decoder can be summarized as follows:

[0017]

[0018] in, f represents the output feature map after the i-th deconvolution operation. DConv 3×3 This represents a deconvolution operation with a kernel size of 3×3, f Conv 1×1 This represents a convolution operation with a kernel size of 1×1, and [;] represents a stacking operation along the channel dimension, which enables skip connections in the feature map;

[0019] Simultaneously, we input the output of the mask branch into the reconstruction decoder to obtain the reconstructed feature map of the original single-temporal hyperspectral image. The reconstruction decoder process can be summarized as follows:

[0020]

[0021] in, Let represent the reconstructed feature map after the i-th deconvolution operation; then, we use a multilayer perceptron without shared weights to project the obtained feature map onto another dimensional space, and the formula for this projection process is as follows:

[0022]

[0023] Finally, the reconstructed feature map X is obtained. rec and detection feature map X det .

[0024] Furthermore, in the encoding stage, to achieve effective aggregation of spatial spectral information and obtain discriminative differential features, we embedded a global fusion module and a local fusion module to realize remote information interaction and local information aggregation, respectively. In the global fusion module, firstly, given an input feature... The feature is divided into P blocks along the channel dimension and then connected along the columns to obtain... A 3×3 convolution is applied to this feature map, which is then reduced to its original dimension. Next, the reduced feature map is fused with the initial feature map using a fully connected layer and the GELU function. Then, the features are divided into P blocks along the channel dimension and connected along the rows to obtain... The same 3×3 convolution, fully connected layer, and GELU function are performed and fused with the initial feature map. The global fusion module slices and reassembles the feature maps in different row and column directions, making the spatial distance between pixels closer to each other and improving the interaction capability between remote information.

[0025] Furthermore, the output of the global fusion module is input into the local fusion module for further feature extraction. First, 3×7 and 7×3 convolution operations are applied to the input feature map to extract local features in different directions, as shown in the following formula:

[0026] x add =x+f Conv 3×7 (x)+f Conv 7×3 (x)

[0027] Next, global average pooling is used to compress the spatial dimension of the input feature map, followed by multilayer perceptron compression and feature extraction, and activation using the Swish function. The formula for this attention process is as follows:

[0028] Att(x) = Swish(GAP(MLP(x)) add )))

[0029] Where Swish(x) = xsigmoid(x), the final output of the local fusion module can be described as follows:

[0030] LMM(x)=Att(x)x add

[0031] By multiplying the feature map with the shared channel attention weights, the resulting feature map highlights useful regions and suppresses useless regions. Through the designed local fusion module, local features can be effectively aggregated to achieve the fusion of spatial spectral attention.

[0032] Furthermore, the output feature maps of the detection decoder and the corresponding projection head are selected for final detection. The final change detection results can be described as follows:

[0033]

[0034] Among them, y i To predict the probability results, fc(·) represents the fully connected layer for feature extraction and dimensionality reduction; multi-scale masks are used to supervise the change detection results, and the formula using the cross-entropy function as the loss function is as follows:

[0035]

[0036] Where N is the number of samples, m i The mask labels for a given sample are used; simultaneously, a reconstruction loss is designed for the output feature maps of the reconstruction decoder and the corresponding projector. The formula for the reconstruction loss function is as follows:

[0037]

[0038] The total loss function is composed of the above loss functions as follows, and is calculated as follows:

[0039] L total =L sup +L rec

[0040] Calculate the loss function, optimize the model parameters of the change detection framework based on the loss function and the backpropagation process, and obtain the trained change detection framework after training; use the trained change detection framework to discriminate input samples and output a change detection result map.

[0041] Furthermore, the present invention also provides a self-supervised hyperspectral change detection system, which is equipped with a control program for implementing the method described above.

[0042] The beneficial effects of this invention are as follows:

[0043] This invention proposes a self-supervised hyperspectral change detection method and system that can efficiently train models without the need for labeled samples or downstream fine-tuning, solving the problems of difficult pairing and labeling of multi-temporal samples. The proposed multi-scale mask simulation strategy utilizes original spectral information with multi-scale masks to achieve adaptive feature learning across different scales. The proposed global-local multi-scale feature extraction and aggregation network enables long-range feature interaction and aggregation of local spatial spectral features. Experimental results on publicly available hyperspectral image datasets demonstrate that the proposed STMNet outperforms state-of-the-art hyperspectral change detection methods.

[0044] Other advantages, objectives, and features of the invention will be set forth in part in the description which follows, and in part will be apparent to those skilled in the art from the following examination, or may be learned from practice of the invention. The objectives and other advantages of the invention can be realized and obtained through the following description. Attached Figure Description

[0045] To make the objectives, technical solutions, and advantages of the present invention clearer, the preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings, wherein:

[0046] Figure 1 This is a flowchart of the method of the present invention;

[0047] Figure 2 A diagram of a single-phase mask-based change detection network (STMNet) for self-supervised hyperspectral change detection;

[0048] Figure 3 This is a schematic diagram of the multi-scale variation simulation of the present invention;

[0049] Figure 4 This is a structural diagram of the global-local fusion module of the present invention;

[0050] Figure 5 The dataset is a land dataset, where (a) is the image before the change, (b) is the image after the change, and (c) is the change detection result of the method described in this invention;

[0051] Figure 6 Visualizations of different methods on the farmland dataset, including (a) CVA, (b) PCAKM, (c) MSCD, (d) HyperNet, (e) BCGNet, (f) UAGSSL, (g) STMNet, and (h) ground truth map. Detailed Implementation

[0052] The technical solution of the present invention will now be described in detail with reference to the accompanying drawings.

[0053] Figure 1 This is a flowchart of the method of the present invention. The present invention provides a self-supervised hyperspectral change detection method and system. As shown in the figure, in the image acquisition stage, geometric distortion correction, radiometric correction, and other methods are used to quickly achieve image registration. The deep learning network used for change detection is as follows: Figure 2 As shown, this invention uses multi-scale masks to simulate change regions, learns effective change features from single-temporal hyperspectral images, and obtains an efficient change detector. The network consists of three parts: multi-scale mask change simulation, global-local feature aggregation encoding / decoding, and self-supervised training. First, a multi-scale mask strategy based on the original spectrum is designed to simulate images with pseudo-second-temporal changes, which utilizes the original spectral information with multi-scale masks to achieve adaptive feature learning across different scales. Second, a global-local multi-scale feature extraction and aggregation network is designed to detect changes between single-temporal hyperspectral images and masked hyperspectral images, achieving long-range feature interaction and aggregation of local spatial spectral features. The end-to-end network proposed in this invention can efficiently train the model without the need for labeled samples and downstream fine-tuning.

[0054] Specifically, the technical solution of the present invention includes the following:

[0055] 1. Hyperspectral Image Preprocessing: Preprocessing is performed on single-temporal hyperspectral images, mainly including geometric distortion correction, spectral image vignetting correction, and radiometric correction. Block selection is then performed on the preprocessed dataset. In this method, when selecting samples, considering patches centered on the edge pixels of the image, the image edges are first expanded using a mapping method, and 32×32 blocks are used as sliding windows to generate samples.

[0056] 2. Multi-scale mask variation simulation: To obtain hyperspectral images that closely resemble the real second-phase images, we utilize multi-scale masks to simulate the variation regions, such as... Figure 3 As shown. To adapt to the changing features of ground features under different conditions, we set different mask sizes and different mask ratios. These independent multi-scale masks are randomly mixed with equal probability to obtain the final multi-scale mask M. The spectrum of the center pixel of the original patch is horizontally flipped to obtain the variation spectrum. The spectrum of the masked area is replaced with the variation spectrum, and random noise is added to the remaining areas to simulate the imaging differences in the real scene. The formula for the mask variation simulation strategy is as follows:

[0057]

[0058] Where (1-M) represents the unmasked area, This represents the hyperspectral image after adding a mask. This represents the hyperspectral image after adding Gaussian noise. This represents the spectral value after inversion.

[0059] 3. The single-temporal hyperspectral image and the pseudo-second-temporal hyperspectral image obtained after masked simulation transformation are input into the global-local feature aggregation encoder-decoder. First, rich multi-scale features are obtained in the encoder stage. Then, feature maps of the same scale are subtracted from the dual-temporal feature maps to obtain differential feature maps. These multi-scale differential features are then connected to the decoder via skip connections to achieve cross-layer feature connections and reduce the loss of detail.

[0060] 4. In the global-local feature aggregation encoder-decoder, the encoding stage consists of three downsampling units. Each downsampling unit contains a global fusion module, a local fusion module, and a downsampling operation. The encoder stage can be summarized as follows:

[0061]

[0062] in, f represents the feature map generated by the i-th convolutional layer of the input patch. Conv 3×3 This indicates a convolution operation with a kernel size of 3×3. LMM(·) and GMM(·) represent the local fusion module and the global fusion module, respectively. All convolutional layers are followed by a batch normalization layer and a LeakyReLU activation function layer.

[0063] 5. In a global-local feature aggregation encoder-decoder, the decoding stage includes a detection decoder and a reconstruction decoder. The feature maps of the two branches of the encoder are differentially analyzed to obtain a differential feature map. This differential feature map is then input into a multi-layer upsampled detection decoder to restore the feature map to its original input size. The detection decoder process can be summarized as follows:

[0064]

[0065] in, f represents the output feature map after the i-th deconvolution operation. DConv 3×3 This represents a deconvolution operation with a kernel size of 3×3, f Conv 1×1 This indicates a convolution operation with a kernel size of 1×1. [;] indicates a stacking operation along the channel dimension, which enables skip connections in the feature maps.

[0066] Simultaneously, we input the output of the mask branch into the reconstruction decoder to obtain the reconstructed feature map of the original single-temporal hyperspectral image. The reconstruction decoder process can be summarized as follows:

[0067]

[0068] in, Let represent the reconstructed feature map after the i-th deconvolution operation. Then, we use a multilayer perceptron without shared weights to project the obtained feature map onto another dimensional space. The formula for this projection process is as follows:

[0069]

[0070] Finally, the reconstructed feature map X is obtained. rec and detection feature map X det .

[0071] 6. In the encoding stage, to achieve effective aggregation of spatial spectral information and obtain discriminative differential features, we embedded global fusion modules and local fusion modules to realize remote information interaction and local information aggregation, such as... Figure 4 As shown. In the global fusion module, firstly, given an input feature... The feature is divided into P blocks along the channel dimension and then connected along the columns to obtain... A 3×3 convolution is applied to this feature map, which is then reduced to its original dimension. Next, the reduced feature map is fused with the initial feature map using a fully connected layer and the GELU function. Then, the features are divided into P blocks along the channel dimension and connected along the rows to obtain... The same 3×3 convolutions, fully connected layers, and GELU function are performed and fused with the initial feature map. The global fusion module slices and reassembles the feature maps in different row and column directions, making the spatial distances between pixels closer together and improving the interaction capability between long-range information.

[0072] 7. Next, the output of the global fusion module is input into the local fusion module for further feature extraction. First, 3×7 and 7×3 convolution operations are applied to the input feature map to extract local features in different directions, as shown in the following formula:

[0073] x add =x+f Conv 3×7 (x)+f Conv 7×3 (x)

[0074] Next, global average pooling is used to compress the spatial dimension of the input feature map, followed by multilayer perceptron compression and feature extraction, and activation using the Swish function. The formula for this attention process is as follows:

[0075] Att(x) = Swish(GAP(MLP(x)) add )))

[0076] Where Swish(x) = xsigmoid(x), the final output of the local fusion module can be described as follows:

[0077] LMM(x)=Att(x)x add

[0078] Multiplying the feature map by the shared channel attention weights results in a feature map that highlights useful regions and suppresses useless regions. Through the designed local fusion module, local features can be effectively aggregated to achieve spatial spectral attention fusion.

[0079] 8. Select the output feature maps of the detection decoder and the corresponding projection head for final detection. The final change detection results can be described as follows:

[0080]

[0081] Among them, y i For predicting probability results, fc(·) represents a fully connected layer for feature extraction and dimensionality reduction.

[0082] Multi-scale masks are used to supervise the change detection results, and the formula for using the cross-entropy function as the loss function is as follows:

[0083]

[0084] Where N is the number of samples, m i The mask label for a given sample.

[0085] Meanwhile, a reconstruction loss was designed for the output feature maps of the reconstruction decoder and the corresponding projection head. The formula for the reconstruction loss function is as follows:

[0086]

[0087] The total loss function is composed of the above loss functions as follows, and is calculated as follows:

[0088] L total =L sup +L rec

[0089] Calculate the loss function, optimize the model parameters of the change detection framework based on the loss function and the backpropagation process, and obtain the trained change detection framework after training; use the trained change detection framework to discriminate input samples and output a change detection result map.

[0090] like Figure 5The experimental results of the STMNet change detection network described in this invention on an open-source hyperspectral farmland dataset show that the changed regions are detected very well. The detection performance of this invention can be further illustrated through comparative experiments. On the farmland dataset, the method of this invention was compared with other existing methods such as CVA, PCAKM, MSCD, HyperNet, BCGNet, and UAGSSL. Overall Accuracy, Kappa coefficient, Precision, Recall, and F1 score were calculated respectively. A higher Precision indicates a higher proportion of correct predictions among all positive results; a higher Kappa coefficient indicates higher consistency of detection results; a higher Recall indicates a higher proportion of correctly predicted positive results; and a higher F1 score indicates a better overall evaluation of the results. Table 1 shows the values ​​of various indicators for the detection results of different methods.

[0091] Table 1 Comparison of STMNet and various methods on farmland datasets.

[0092]

[0093] It can be seen that the method of the present invention achieves the best accuracy on this dataset in the case of unlabeled samples. Figure 6 Visualized detection results of the above methods are presented. It can be seen that the method described in this invention outperforms other self-supervised hyperspectral image farmland change detection methods.

[0094] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications can be made to the technical solutions of the present invention without departing from the spirit and scope of the present invention, and all such modifications should be covered within the scope of the claims of the present invention.

Claims

1. A self-supervised hyperspectral change detection method, characterized in that: The method includes the following steps: S1: Hyperspectral image preprocessing; S2: Simulation of multi-scale mask variation; S3: Input the single-phase samples and the pseudo-second-phase samples obtained after masking simulation into the global-local feature aggregation encoder-decoder for training; S4: After training, perform change detection on the complete image to obtain the results; In step S2, multi-scale masks are used to simulate changing regions. Different mask sizes and mask ratios are set, and these independent multi-scale masks are randomly mixed with equal probability to obtain the final multi-scale mask M. The spectrum of the original patch center pixel is horizontally flipped to obtain the changing spectrum, and the spectrum of the masked region is replaced with the changing spectrum. Random noise is added to the remaining regions to simulate imaging differences in real scenes. The formula for the mask change simulation strategy is as follows: wherein, represents an unmasked region, represents a hyperspectral image after adding a mask, represents a hyperspectral image after adding Gaussian noise, represents a flipped spectral value; In step S3, the single-temporal hyperspectral image and the pseudo-second-temporal hyperspectral image obtained after mask simulation transformation are input into the global-local feature aggregation encoder-decoder. First, rich multi-scale features are obtained in the encoder stage. Then, the feature map of the same scale is subtracted from the dual-temporal feature map to obtain the differential feature map. Then, these multi-scale differential features are connected to the decoder through skip connections to realize cross-layer connection of features and reduce the loss of details.

2. The self-supervised hyperspectral change detection method according to claim 1, wherein: In step S1, preprocessing is performed on the single-temporal hyperspectral image, including geometric distortion correction, spectral image vignetting correction, and radiometric correction. Block extraction is performed on the preprocessed dataset. When selecting samples, considering the patches centered on the edge pixels of the image, the edges of the image are first expanded in a mapping manner, and samples are generated using 32×32 blocks as sliding windows.

3. The self-supervised hyperspectral change detection method of claim 1, wherein: In the global-local feature aggregation encoder-decoder, the encoding stage consists of three downsampling units. Each downsampling unit contains a global fusion module, a local fusion module, and a downsampling operation. The encoder stage can be summarized as follows: where, denotes the feature map generated by the i-th convolutional layer of the input patch, denotes a convolution operation with a kernel size of 3x3, and denotes the local fusion module and the global fusion module, and all the convolutional layers are followed by a batch normalization layer and a LeakyReLU activation function layer.

4. The self-supervised hyperspectral change detection method of claim 3, wherein: In a global-local feature aggregation encoder-decoder, the decoding stage includes a detection decoder and a reconstruction decoder. The feature maps from the two branches of the encoder are differentially analyzed to obtain a differential feature map. This differential feature map is then input into a multi-layer upsampled detection decoder to restore the feature map to its original input size. The detection decoder process can be summarized as follows: wherein, denotes the output feature map after the i-th deconvolution operation, denotes a deconvolution operation with kernel size 3x3, denotes a convolution operation with kernel size 1x1, and [;] denotes a stacking operation along the channel dimension, which implements the skip connection of the feature map; Simultaneously, the output of the mask branch is input into the reconstruction decoder to obtain the reconstructed feature map of the original single-temporal hyperspectral image. The reconstruction decoder process can be summarized as follows: wherein, represents the reconstructed feature map after the i-th deconvolution operation; then, the obtained feature map is projected to another dimensional space by using a multi-layer perception machine which does not share weights, and the formula of the projection process is as follows: Finally, the reconstructed feature map is obtained and the detection feature map .

5. The self-supervised hyperspectral change detection method according to claim 3, characterized in that: In the encoding stage, to achieve effective aggregation of spatial spectral information and obtain discriminative differential features, a global fusion module and a local fusion module are embedded to realize remote information interaction and local information aggregation, respectively. In the global fusion module, firstly, given an input feature map... The features are divided into P blocks along the channel dimension and connected along the columns to obtain... A 3×3 convolution is applied to this feature map, reducing it to its original dimension. Next, the reduced feature map is fused with the initial feature map using a fully connected layer and the GELU function. Then, the features are divided into P blocks along the channel dimension and connected along the rows to obtain... The same 3×3 convolutions, fully connected layers, and GELU function are performed and fused with the initial feature map. The global fusion module slices and reassembles feature maps in different directions of rows and columns, making the spatial distance between pixels closer to each other and improving the interaction capability between remote information.

6. The self-supervised hyperspectral change detection method according to claim 5, characterized in that: The output of the global fusion module is input into the local fusion module for further feature extraction. First, given an input feature map Apply 3x7 and 7x3 convolution operations to it to extract local features in different directions, as follows: Next, global average pooling is used to compress the spatial dimension of the input feature map, followed by multilayer perceptron compression and feature extraction, and activation using the Swish function. The formulas for the above process are as follows: wherein, The final output of the local fusion module can be described as follows: By multiplying the feature map with the shared channel attention weights, the resulting feature map highlights useful regions and suppresses useless regions. Through the designed local fusion module, local features are effectively aggregated to achieve the fusion of spatial spectral attention.

7. The self-supervised hyperspectral change detection method according to claim 1, characterized in that: The final change detection result is obtained by selecting the output feature maps of the detection decoder and the corresponding projection head. The final change detection result can be described as follows: in, To predict probability outcomes, For detecting feature maps The i-th sample; This represents a fully connected layer for feature extraction and dimensionality reduction; multi-scale masks are used to supervise the change detection results, and the cross-entropy function is used as the loss function in the following formula: in, For the sample size, The mask labels for a given sample are used; simultaneously, a reconstruction loss is designed for the output feature maps of the reconstruction decoder and the corresponding projector. The formula for the reconstruction loss function is as follows: in, For detecting feature maps The i-th sample; The total loss function is composed of the above loss functions as follows, and is calculated as follows: Calculate the loss function, optimize the model parameters of the change detection method based on the loss function and the backpropagation process, and obtain the trained change detection method after training; use the trained change detection framework to discriminate input samples and output a change detection result map.

8. A self-supervised hyperspectral change detection system, characterized in that... The method employs the method described in any one of claims 1 to 7.

Citation Information

Patent Citations

  • Hyperspectral remote sensing image ground object clustering method of self-supervised double-branch Transform structure

    CN118072059A

  • Method and system for delineating agricultural fields in satellite images

    WO2023043317A1