Self-supervised meta-transfer learning hyperspectral target detection method and system

By employing a self-supervised meta-transfer learning framework, combined with global-local spectral contrast learning and the maximum distance triplet loss function, the problems of insufficient training samples and poor adaptability to complex scenes in hyperspectral target detection are solved, achieving efficient target detection results.

CN120147618BActive Publication Date: 2026-06-16CHONGQING UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHONGQING UNIV
Filing Date
2025-03-07
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing hyperspectral target detection methods suffer from insufficient training samples and poor adaptability to complex scenes, especially in single-image training and testing, where they exhibit poor generalization and are difficult to effectively transfer to new datasets for target detection.

Method used

A self-supervised meta-transfer learning framework is adopted. The model is pre-trained using a Global-Local Spectral Contrast Learning (GLSL) module and a Maximum Distance Triplet (MDTriplet) loss function, and then fine-tuned using a Siamese network and contrast loss. An adaptive spatial spectral enhancement model is used to achieve joint learning of spatial and spectral information to generate hyperspectral target detection results.

🎯Benefits of technology

It improves the robustness and generalization ability of the model, significantly enhances the performance of hyperspectral target detection, and can effectively identify and separate targets in different scenarios, reducing the need for a large amount of labeled data.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120147618B_ABST
    Figure CN120147618B_ABST
Patent Text Reader

Abstract

The application relates to a self-supervised meta-transfer learning hyperspectral target detection method and system, and belongs to the technical field of deep learning. The method comprises the following steps: S1: preparing initial source domain and target domain data of a hyperspectral image; S2: performing self-supervised pre-training of a GLSL network by using source data; S3: migrating the GLSL to a target detection data set to perform spectral similarity detection; and S4: performing learning constraint by using a spatial constraint module to obtain a target detection result. The performance of the method is better than that of other hyperspectral image target detection methods, the method can be effectively migrated to different target detection tasks, and the method has an advantage over other methods in background suppression.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of deep learning technology and relates to a self-supervised meta-transfer learning method and system for hyperspectral target detection. Background Technology

[0002] In the continuous innovation of remote sensing technology, the spatial and spectral resolution of remote sensing images acquired by sensors have achieved a qualitative leap. Hyperspectral images are generated by capturing radiation data of a target scene over a large range of continuous wavelengths. These images typically consist of hundreds or even thousands of consecutive narrow bands, each corresponding to a specific region of the electromagnetic spectrum. Hyperspectral images have high spectral resolution, with each band usually having a narrow bandwidth. This high resolution allows for precise differentiation of object or scene features across different bands. Hyperspectral target detection focuses on locating and identifying specific target pixels in hyperspectral images, aiming to separate targets of interest from various backgrounds, typically requiring only a small amount of prior spectral information about the target. However, due to limited prior knowledge of the target and the existence of phenomena such as "different objects with the same spectrum" and "different spectra for the same object," hyperspectral target detection faces long-standing challenges.

[0003] In the early exploratory stages, numerous methods emerged in the field of hyperspectral target detection. Typical methods included constraint capability minimization based on finite implosion response filters and target detection by projecting pixel signals onto an orthogonal subspace with each background endmember. However, these methods struggle to utilize the nonlinear characteristics of the spectrum and are poorly adaptable to complex detection scenarios. Therefore, many researchers proposed kernel-based target detectors, such as kernel orthogonal subspace projection, kernel-based constraint energy minimization, and kernel-matched subspace detectors. However, traditional machine learning models typically extract shallow features, which is limited when the target and background are complex, nonlinearly differentiable, and fails to achieve ideal detection results. In recent years, deep learning, with its superior feature extraction capabilities and powerful parallel computing capabilities, has demonstrated broad application prospects and significant advantages in multiple areas of hyperspectral image processing. Its highly automated feature learning mechanism enables deep learning models to automatically extract hierarchical and discriminative feature representations from complex hyperspectral data. Notably, deep learning has shown superiority and efficiency in hyperspectral target detection. Researchers have proposed a series of frameworks for deep learning-based hyperspectral target detection algorithms. For example, to reduce the loss of spectral information, researchers have proposed a two-stream convolutional network framework. Furthermore, hyperspectral target detection algorithms based on interpretable representation networks, deep metric learning, and lightweight convolutional neural networks have also shown good performance. These algorithms are typically trained and tested on the same image scene and rely on a small amount of target prior to synthesize training samples, which requires significant computational time. Moreover, training and testing on single images leads to poor generalization of these algorithms, making it difficult to transfer learned features to new datasets for target detection, which greatly limits their application scenarios. To alleviate these problems, researchers have proposed some algorithms based on few-shot learning. For example, one researcher proposed a semi-supervised adaptive few-shot learning detector. However, these methods still require time to filter or generate samples to fine-tune specific hyperspectral target detection tasks. Summary of the Invention

[0004] In view of this, the purpose of this invention is to provide a self-supervised meta-transfer learning hyperspectral target detection method and system. Its core idea is to utilize a self-supervised meta-learning-transfer learning framework to achieve effective transfer from the source domain to the target domain. This invention consists of two modules: a self-supervised meta-transfer learning and pre-training framework, and an adaptive spatial spectral enhancement detector. First, in the meta-training stage, we use labeled hyperspectral image classification data to randomly construct multiple positive and negative sample pairs from different land covers, training the model to effectively distinguish spectral differences and improve its sensitivity to spectral variations in hyperspectral images. Next, we use a Global-Local Spectral Contrast Learning (GLSL) module and Maximum Distance Triplet (MDTriplet) loss to train the model to effectively distinguish spectral differences, and then transfer the pre-trained model to different target detection tasks, fine-tuning it using single target and background samples. Finally, we employ an adaptive spatial spectral enhancement model, jointly learning spatial information constraints and spectral information constraints to obtain the final detection result. To achieve the above objectives, this invention provides the following technical solution:

[0005] A self-supervised meta-transfer learning method and system for hyperspectral target detection includes the following steps: S1: preparing source domain data and target domain data of hyperspectral images; S2: performing self-supervised pre-training using hyperspectral classification source data to obtain a Global-Local Spectral Comparison Learning (GLSL) module that can effectively distinguish the similarities and differences between spectra; S3: transferring GLSL to the target detection dataset to perform spectral similarity detection and obtain a preliminary target detection result map; S4: further utilizing a Spatial Constraint Learning (SCLM) module to complete the joint learning constraints of spatial and spectral information to obtain the final target detection result.

[0006] Furthermore, in step S1, hyperspectral image source domain data and target domain data are prepared. Specifically, the source domain data uses a hyperspectral classification dataset containing rich ground cover information. To enhance the model's ability to perceive spectral changes, comparative learning of spectral levels is performed by constructing positive and negative sample pairs. Utilizing the idea of ​​meta-representation, two types of spectra are randomly selected in each task to form a set of positive and negative sample pairs, increasing task diversity. For the target domain dataset, four different hyperspectral target domain datasets are prepared for detection.

[0007] Furthermore, in step S2, during the pre-training phase, to alleviate the problem of insufficient training samples for hyperspectral target detection, the proposed GLSL network is first pre-trained using an open-source labeled hyperspectral image classification dataset. To extract spectral feature information more effectively, the proposed GLSL network employs a progressively deeper information extraction method from local to global perspectives. Specifically, the GLSL module consists of three branches: Local Vision (LV), Global Vision (GV), and Spatial Frequency Fusion (SFFM). In spectral analysis, subtle differences between adjacent bands often contain important information. Convolution can effectively extract features from these adjacent bands, helping the model understand the interactions between different bands. Specifically, we designed the LV module, which splits the input spectral data into two parts, F1 and F2, for separate processing. F1 uses 1×1 convolutions to adjust the feature dimension and introduces additional nonlinearity as a nonlinear transformation. F2 extracts multi-level local features from the spectral data by applying convolution kernels of different sizes layer by layer. Finally, the features obtained from parallel processing are fused, and a fully connected layer is used to map the extracted features to a lower-dimensional space, resulting in the feature map L. Convolutional operations can preserve local spectral details to some extent, but spectral bands typically number in the dozens or even hundreds, neglecting global information. Therefore, in the GV module, we use a Transformer structure to establish global relationships and long-range dependencies through a self-attention mechanism. Specifically, the input spectral data is sequentially passed through normalization, multi-head self-attention (MSA), an MLP layer, and a fully connected layer to extract spectral features from multiple adjacent bands. The residuals are then concatenated and fused to reduce information loss from shallow to deep layers. In summary, the GV module extracts and integrates global features using the MSA mechanism and MLP components. Through the combination of these components, the module can capture complex dependencies between different frequency bands when processing hyperspectral data, thereby achieving global feature learning. To further improve the quality of feature representation, the SFFM module was designed, with the main goal of integrating local spectral details and global contextual information. Therefore, in SFFM, the one-dimensional (1D) input is reconstructed into two-dimensional (2D) data, and feature extraction is performed through 2D convolution. This is beneficial for obtaining local correlations in spectral data and learning potential relationships between originally non-adjacent spectral bands. Specifically, the one-dimensional spectrum (1×1×d) is reshaped into a 1×1×b×b matrix, where b is the nearest square root. This is then fed into three 3×3 2D convolutional layers for spatial feature learning, with a ReLU activation function applied after each step. The 2D feature map is then flattened into a 1D vector, and a fully connected layer is used to convert the flattened feature vector back to 1D data. Finally, a 1D convolutional layer is used to extract the final features from the converted 1D data. Through this series of operations, SFFM effectively fuses the spatial and frequency domain features of the input spectrum, thereby improving its ability to capture high-dimensional features. This results in a strong feature representation capability and the acquisition of a feature map S.

[0008] Furthermore, in step S2, a maximum distance triplet (MDTriplet) loss method is proposed. During the pre-training phase, a Siamese network is used for contrastive learning between spectral beams, sequentially feeding in two similar spectra and one dissimilar spectrum. For this purpose, a maximum distance triplet (MDTriplet) loss function is designed. First, the general triplet loss is a loss function for learning contrastive feature representations, commonly used to train contrastive learning models. The goal of triplet loss is to minimize the distance between the anchor point and positive samples and maximize the distance between the anchor point and negative samples. The specific loss function of a general triplet loss can be defined as (taking Euclidean distance as an example):

[0009]

[0010] Among them, F(A) i ), F(P i ), F(G i ) indicates that it is applied to anchor sample A i Positive sample P i and negative sample G i The feature extraction function. α is the marginal parameter.

[0011] While the general triplet loss plays a role in classification tasks, the model may overoptimize samples where positive and negative samples are very close. Therefore, MDTriplet Loss was designed to consider the distance between positive and negative samples simultaneously. It aims to more comprehensively evaluate model performance by comparing the distance between the anchor point and negative samples, as well as the distance between positive and negative samples, thereby improving the model's robustness and generalization ability. The specific description of MDTriplet Loss is as follows:

[0012]

[0013] Where N is the number of samples in the batch.

[0014] Furthermore, in step S3, transfer learning is used to transfer the trained GLSL network to different hyperspectral target detection tasks. In the proposed method, considering the gap between the source and target domains, a fine-tuning method from transfer learning is chosen for the transfer. Also, considering the scarcity of target spectra, only one target sample and one background sample are used for model fine-tuning. Then, the target sample and the target prior are input into the fine-tuned GLSL network for spectral feature enhancement. The similarity between the enhanced spectrum and the prior knowledge is calculated to obtain the initial target detection result map.

[0015] Further, in step S3, a Siamese network and contrastive loss are used for fine-tuning. Siamese neural networks perform well with limited labeled data, making them particularly suitable for contrastive learning tasks. Therefore, in the fine-tuning stage, we employ a Siamese neural network structure, embedding the pre-trained GLSL module into the network structure, and fine-tuning the entire GLSL module using a small number of samples. Specifically, we randomly select a target and a background sample from the hyperspectral target detection dataset. We input the target or background sample and the target prior into the network for spectral feature enhancement, and calculate the similarity between the enhanced spectrum and the prior knowledge. Contrastive loss is used to measure the similarity between sample pairs and is commonly used to train deep learning models for similarity learning. It optimizes the model by minimizing the distance between similar sample pairs and maximizing the distance between dissimilar sample pairs. More importantly, the contrastive loss function can effectively learn useful features with limited samples, reducing the need for large amounts of labeled data. Therefore, in the fine-tuning stage, we use contrastive loss to minimize the distance between positive samples and the target prior, and maximize the distance between negative samples and the target prior. Specifically, let the embedding vectors of the sample pair (x1, x2) be z1 and z2, where z i It calculates the similarity between positive and negative samples and the target prior from the embedded representation output by the network:

[0016]

[0017] Next, the mathematical formula for the comparison loss function is as follows:

[0018]

[0019] Where y is the label of the sample pair; if the sample pair belongs to the same class, then y = 1; if the sample pair belongs to different classes, then y = 0. margin is a hyperparameter representing the minimum distance between dissimilar sample pairs. N is the number of sample pairs, here N = 1. If the input is positive samples, we want this distance to be as small as possible; otherwise, we want this distance to be greater than margin.

[0020] Furthermore, in step S3, an initial target detection result map is obtained. Through the combination of pre-training and fine-tuning, GLSL not only maintains its excellent spectral resolution but also exhibits greater flexibility and adaptability in specific tasks. Specifically, after inputting the target prior t and the target spectrum x into the GLSL module for feature enhancement, the extracted feature f t and It is used to calculate the cosine similarity between the spectrum to be measured and the target prior. Finally, these similarity values ​​are used to generate a similarity map S, thereby effectively capturing and identifying subtle spectral differences.

[0021]

[0022] Where S(i,j) represents the similarity value at position (i,j).

[0023] Furthermore, in step S4, the spatial features of the target are fused to further optimize the target detection result map, achieving spatial-spectral joint constraints. During the learning phase of spatial-spectral joint constraints, various techniques, including neighborhood operations and morphological operations, are employed to refine the target region and remove noise. In neighborhood operations, the confidence levels of each target pixel and its neighborhood are dynamically adjusted by analyzing each pixel, thereby better identifying and preserving the true target region. Specifically, based on the spectral similarity map S, pixels with a confidence level greater than r are designated as target pixels, labeled as 1, and r is set to 0.6. Neighborhood pixels are analyzed with the target pixel as the center point. The specific analysis operations include traversing all target pixels and calculating the sum of labels within both the small and large neighborhoods of the target pixel position (i,j). For example, to update the confidence level of each target pixel within a small 3×3 neighborhood, we first calculate the sum N3(i,j) of the target pixels within the 3×3 neighborhood of that point. If N3(i,j) is less than a set threshold k1, it indicates that the point is more likely to be an isolated noise point, and therefore the confidence level of that point needs to be reduced. Then, we use a decreasing exponential function q based on N3(i,j) to calculate the weight w3:

[0024]

[0025] For N3(i,j)<k1, apply weights to update the confidence graph S:

[0026] S'(i,j)=max(S(i,j)-w3,0)

[0027] Assuming the q function is a decreasing function, the calculated weight w3 is larger when the sum of labels in the neighborhood N3(i,j) is small. This indicates that the target pixel is more likely to be an isolated noise point and its confidence level needs to be reduced more significantly. Based on this, the confidence level of each target pixel is updated within a large 7×7 neighborhood. If the sum of target pixels in the neighborhood of point N7(i,j) is greater than a set threshold k2, it indicates that the target pixel neighborhood is likely a large false target, and therefore the confidence level of that point needs to be reduced. Then, the weights based on N7(i,j) are calculated using the incremental exponential function 1-q:

[0028]

[0029] For N7(i,j)>k2, apply the weights to update the confidence graph S':

[0030] S"(i,j)=max(S'(i,j)-w7,0)

[0031] The incremental function is used because the larger N7(i,j) is, the more likely it is to become a spurious target, and the more weights need to be reduced.

[0032] To optimize the boundaries of the target region, remove edge noise, and enhance target segmentation, morphological operations of erosion and dilation were employed after each neighborhood operation. Specifically, a 3×3 kernel was first used to erode the target region. This operation helps remove smaller noise points without affecting larger target structures. Next, a 7×7 kernel was used to dilate the eroded target region. The purpose of dilation is to connect adjacent target regions, making the segmented target more complete and coherent. Through these operations, we can not only reduce false detections but also ensure more accurate boundaries of the target region. Finally, the updated similarity map is used as the target detection map.

[0033] The beneficial effects of this invention are as follows:

[0034] The self-supervised meta-transfer learning hyperspectral target detection algorithm proposed in this invention consists of two stages: pre-training and joint spatial-spectral constraints. This achieves effective transfer of the network model, enhances its ability to capture different features in hyperspectral data, improves the network's robustness and generalization ability, and significantly improves target detection performance. Experimental results on four hyperspectral image datasets, compared with the performance of several state-of-the-art methods, demonstrate that the proposed method outperforms current state-of-the-art hyperspectral image target detection methods.

[0035] Other advantages, objectives, and features of the invention will be set forth in part in the description which follows, and in part will be apparent to those skilled in the art from the following examination, or may be learned from practice of the invention. The objectives and other advantages of the invention can be realized and obtained through the following description. Attached Figure Description

[0036] To make the objectives, technical solutions, and advantages of the present invention clearer, the preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings, wherein:

[0037] Figure 1 This is a flowchart of the method of the present invention;

[0038] Figure 2 This is a schematic diagram of the principle of self-supervised meta-transfer learning (SelfMTL) for hyperspectral target detection based on contrastive representation;

[0039] Figure 3 This is a structural diagram of the global-local spectral contrast learning network of the present invention;

[0040] Figure 4 Visualizations of different target detection methods on four hyperspectral image datasets, where (a) ground truth map, (b) CEM, (c) HCEM, (d) CSCR, (e) DSC, (f) MLSN, (g) LCNN-CD, (h) HTD-IRN, and (i) SelfMTL.

[0041] Figure 5 This is a comparison diagram of the target background separation box lines of the present invention. Detailed Implementation

[0042] The technical solution of the present invention will now be described in detail with reference to the accompanying drawings.

[0043] Figure 1 This invention provides a self-supervised meta-transfer learning hyperspectral target detection method and system, as shown in the flowchart. The source domain data uses a hyperspectral classification dataset containing rich ground cover information. To enhance the model's ability to perceive spectral changes, comparative learning of spectral levels is performed by constructing positive and negative sample pairs. Utilizing the idea of ​​meta-representation, two types of spectra are randomly selected in each task to form a set of positive and negative sample pairs, increasing task diversity. For the target domain dataset, four different hyperspectral target domain data are prepared for detection. The global-local spectral contrastive learning network is as follows: Figure 2 As shown, it can enhance the model's ability to capture different features in hyperspectral data. To alleviate the problem of insufficient training samples, we first pre-train the proposed GLSL network using an open-source labeled hyperspectral classification dataset. Subsequently, we adopt MDTriplet loss and input the constructed positive and negative sample pairs into the GLSL module for spectral-level contrastive learning. Furthermore, to effectively transfer the GLSL module, capable of distinguishing significant spectral differences, to various target detection tasks, we use two positive and negative sample spectra (one from the target and one from the background) for model parameter fine-tuning. In the testing phase, we input the prior target and the spectrum to be tested into the GLSL module to obtain feature-enhanced spectral beam representations. Then, we calculate the similarity between these spectral beam features and use it to form a spectral similarity map. Finally, to fully utilize the spatial information of the target, the GSLS algorithm employs an adaptive spectral spatial enhancement module to optimize the final target detection result map using spatial spectral constraints. The self-supervised meta-transfer learning framework designed in this invention effectively solves the problem of limited training samples by randomly constructing positive and negative sample pairs. The adaptive spectral spatial enhancement module designed in this invention fully integrates the spatial and spectral information in hyperspectral images. Specifically, the technical solution of the present invention includes the following:

[0044] 1. Source and Target Domain Data Preparation: The source domain data uses a hyperspectral classification dataset rich in ground cover information. To enhance the model's ability to perceive spectral changes, comparative learning of spectral levels is performed by constructing positive and negative sample pairs. Utilizing the concept of meta-representation, two types of spectra are randomly selected in each task to form a set of positive and negative sample pairs, increasing task diversity. For the target domain dataset, four different hyperspectral target domain datasets are prepared for detection.

[0045] 2. Self-supervised pre-training: such as Figure 3As shown, in the pre-training stage, to alleviate the problem of insufficient training samples for hyperspectral target detection, the proposed GLSL network is first pre-trained using an open-source labeled hyperspectral image classification dataset. To extract spectral feature information more effectively, the proposed GLSL network employs a progressively deeper information extraction method from local to global perspectives. Specifically, the GLSL module consists of three branches: Local Vision (LV), Global Vision (GV), and Spatial Frequency Fusion (SFFM). In spectral analysis, subtle differences between adjacent bands often contain important information. Convolution can effectively extract features from these adjacent bands, helping the model understand the interactions between different bands. Specifically, we designed the LV module, which splits the input spectral data into two parts, F1 and F2, for separate processing. F1 uses 1×1 convolutions to adjust the feature dimension and introduces additional nonlinearity as a nonlinear transformation. F2 extracts multi-level local features from the spectral data by applying convolution kernels of different sizes layer by layer. Finally, the features obtained from parallel processing are fused, and a fully connected layer is used to map the extracted features to a lower-dimensional space, resulting in the feature map L. Convolutional operations can preserve local spectral details to some extent, but spectral bands typically number in the dozens or even hundreds, neglecting global information. Therefore, in the GV module, we use a Transformer structure to establish global relationships and long-range dependencies through a self-attention mechanism. Specifically, the input spectral data is sequentially passed through normalization, multi-head self-attention (MSA), an MLP layer, and a fully connected layer to extract spectral features from multiple adjacent bands. The residuals are then concatenated and fused to reduce information loss from shallow to deep layers. In summary, the GV module extracts and integrates global features using the MSA mechanism and MLP components. Through the combination of these components, the module can capture complex dependencies between different frequency bands when processing hyperspectral data, thereby achieving global feature learning. To further improve the quality of feature representation, the SFFM module was designed, with the main goal of integrating local spectral details and global contextual information. Therefore, in SFFM, the one-dimensional (1D) input is reconstructed into two-dimensional (2D) data, and feature extraction is performed through 2D convolution. This is beneficial for obtaining local correlations in spectral data and learning potential relationships between originally non-adjacent spectral bands. Specifically, the one-dimensional spectrum (1×1×d) is reshaped into a 1×1×b×b matrix, where b is the nearest square root. This is then fed into three 3×3 2D convolutional layers for spatial feature learning, with a ReLU activation function applied after each step. The 2D feature map is then flattened into a 1D vector, and a fully connected layer is used to convert the flattened feature vector back to 1D data. Finally, a 1D convolutional layer is used to extract the final features from the converted 1D data. Through this series of operations, SFFM effectively fuses the spatial and frequency domain features of the input spectrum, thereby improving its ability to capture high-dimensional features. This results in a strong feature representation capability and the acquisition of a feature map S.

[0046] 3. Maximum distance triplet loss function: such as Figure 2 As shown, a method using the Maximum Distance Triplet (MDTriplet) loss is proposed. During the pre-training phase, a Siamese network is used for contrastive learning between spectral beams, sequentially feeding in two similar spectra and one dissimilar spectrum. A Maximum Distance Triplet (MDTriplet) loss function is designed for this purpose. First, the general Triplet loss is a loss function for learning contrastive feature representations, commonly used in training contrastive learning models. The goal of Triplet loss is to minimize the distance between the anchor point and positive samples and maximize the distance between the anchor point and negative samples. The specific loss function of a general Triplet loss can be defined as follows (using Euclidean distance as an example):

[0047]

[0048] Among them, F(A) i ), F(P i ), F(G i ) indicates that it is applied to anchor sample A i Positive sample P i and negative sample G i The feature extraction function. α is the marginal parameter.

[0049] While the general triplet loss plays a role in classification tasks, the model may overoptimize samples where positive and negative samples are very close. Therefore, MDTriplet Loss was designed to consider the distance between positive and negative samples simultaneously. It aims to more comprehensively evaluate model performance by comparing the distance between the anchor point and negative samples, as well as the distance between positive and negative samples, thereby improving the model's robustness and generalization ability. The specific description of MDTriplet Loss is as follows:

[0050]

[0051] Where N is the number of samples in the batch.

[0052] 4. Transfer learning is used to transfer the trained GLSL network to different hyperspectral target detection tasks. In the proposed method, considering the gap between the source and target domains, a fine-tuning method from transfer learning is chosen for transfer. Furthermore, considering the scarcity of target spectra, only one target sample and one background sample are used for model fine-tuning. Then, the target sample and the target prior are input into the fine-tuned GLSL network for spectral feature enhancement. The similarity between the enhanced spectrum and the prior knowledge is calculated to obtain the initial target detection result map.

[0053] 5. Fine-tuning using Siamese networks and contrastive loss. Siamese neural networks perform well with limited labeled data, making them particularly suitable for contrastive learning tasks. Therefore, in the fine-tuning stage, we employ a Siamese neural network structure, embedding a pre-trained GLSL module into the network structure and fine-tuning the entire GLSL module using small samples. Specifically, we randomly select a target and a background sample from the hyperspectral target detection dataset. We input the target or background sample and the target prior into the network for spectral feature enhancement and calculate the similarity between the enhanced spectrum and the prior knowledge. Contrastive loss is used to measure the similarity between sample pairs and is commonly used to train deep learning models for similarity learning. It optimizes the model by minimizing the distance between similar sample pairs and maximizing the distance between dissimilar sample pairs. More importantly, the contrastive loss function can effectively learn useful features with limited samples, reducing the need for large amounts of labeled data. Therefore, in the fine-tuning stage, we use contrastive loss to minimize the distance between positive samples and the target prior and maximize the distance between negative samples and the target prior. Specifically, let the embedding vectors of the sample pair (x1, x2) be z1 and z2, where z i It calculates the similarity between positive and negative samples and the target prior from the embedded representation output by the network:

[0054]

[0055] Next, the mathematical formula for the comparison loss function is as follows:

[0056]

[0057] Where y is the label of the sample pair; if the sample pair belongs to the same class, then y = 1; if the sample pair belongs to different classes, then y = 0. margin is a hyperparameter representing the minimum distance between dissimilar sample pairs. N is the number of sample pairs, here N = 1. If the input is positive samples, we want this distance to be as small as possible; otherwise, we want this distance to be greater than margin.

[0058] 6. Obtain the initial target detection result image. Through the combination of pre-training and fine-tuning, GLSL not only maintains its excellent spectral resolution but also exhibits greater flexibility and adaptability in specific tasks. Specifically, after inputting the target prior t and the target spectrum x into the GLSL module for feature enhancement, the extracted features f t and It is used to calculate the cosine similarity between the spectrum to be measured and the target prior. Finally, these similarity values ​​are used to generate a similarity map S, thereby effectively capturing and identifying subtle spectral differences.

[0059]

[0060] Where S(i,j) represents the similarity value at position (i,j).

[0061] 7. Integrating the spatial features of the target, the target detection result map is further optimized to achieve spatial-spectral joint constraints. In the learning phase of spatial-spectral joint constraints, various techniques, including neighborhood operations and morphological operations, are employed to refine the target region and remove noise. In neighborhood operations, the confidence of each target pixel and its neighborhood is dynamically adjusted by analyzing each pixel, thereby better identifying and preserving the true target region. Specifically, based on the spectral similarity map S, pixels with a confidence greater than r are set as target pixels, with a label of 1 and an r value of 0.6. Neighborhood pixels are analyzed with the target pixel as the center point. The specific operations include traversing all target pixels and calculating the sum of labels within both the small and large neighborhoods of the target pixel position (i,j). For example, to update the confidence of each target pixel within a small 3×3 neighborhood, we first calculate the sum N3(i,j) of the target pixels within the 3×3 neighborhood of that point. If N3(i,j) is less than a set threshold k1, it indicates that the point is more likely to be an isolated noise point, and therefore the confidence level of that point needs to be reduced. Then, we use a decreasing exponential function q based on N3(i,j) to calculate the weight w3:

[0062]

[0063] For N3(i,j)<k1, apply weights to update the confidence graph S:

[0064] S'(i,j)=max(S(i,j)-w3,0)

[0065] Assuming the q function is a decreasing function, the calculated weight w3 is larger when the sum of labels in the neighborhood N3(i,j) is small. This indicates that the target pixel is more likely to be an isolated noise point and its confidence level needs to be reduced more significantly. Based on this, the confidence level of each target pixel is updated within a large 7×7 neighborhood. If the sum of target pixels in the neighborhood of point N7(i,j) is greater than a set threshold k2, it indicates that the target pixel neighborhood is likely a large false target, and therefore the confidence level of that point needs to be reduced. Then, the weights based on N7(i,j) are calculated using the incremental exponential function 1-q:

[0066]

[0067] For N7(i,j)>k2, apply the weights to update the confidence graph S':

[0068] S"(i,j)=max(S'(i,j)-w7,0)

[0069] The incremental function is used because the larger N7(i,j) is, the more likely it is to become a spurious target, and the more weights need to be reduced.

[0070] To optimize the boundaries of the target region, remove edge noise, and enhance target segmentation, morphological operations of erosion and dilation were employed after each neighborhood operation. Specifically, a 3×3 kernel was first used to erode the target region. This operation helps remove smaller noise points without affecting larger target structures. Next, a 7×7 kernel was used to dilate the eroded target region. The purpose of dilation is to connect adjacent target regions, making the segmented target more complete and coherent. Through these operations, we can not only reduce false detections but also ensure more accurate boundaries of the target region. Finally, the updated similarity map is used as the target detection map.

[0071] 8. The input samples are judged using the trained target detection framework, and the target detection result image is output. For example... Figure 4 The image shows the comparison results of the SelfMTL hyperspectral target detection network described in this invention on a hyperspectral natural scene dataset, using the method of this invention and other existing methods such as CEM, HCEM, CSCR, DSC, MLSN, LCNN-CD, and HTD-IRN. It can be seen that the target region is detected very well. Simultaneously, the area under the curve (AUC) value is calculated; a larger AUC value indicates a better overall evaluation of the target detection results. Table 1 presents the index results of different methods on four different test sets:

[0072] Table 1 compares SelfMTL with various methods on four hyperspectral image datasets (mean values).

[0073]

[0074] The method of this invention achieves optimal accuracy on most datasets. Figure 5 Box plots of target-background separation using the SelfMTL method versus other comparative methods on four real hyperspectral image datasets are presented. It can be seen that the target samples generated by the method described in this invention achieve a high degree of separation between the target and the background. The method proposed in this invention can effectively highlight the target and suppress the background, and can efficiently transfer to different target detection methods.

[0075] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications can be made to the technical solutions of the present invention without departing from the spirit and scope of the present invention, and all such modifications should be covered within the scope of the claims of the present invention.

Claims

1. A self-supervised meta-transfer learning hyperspectral target detection method, characterized in that... Includes the following steps: S1: Preparation of initial source and target domain data for hyperspectral images; S2: Use source data to perform self-supervised pre-training of a GLSL network; S3: Transfer the GLSL network to the target detection dataset for spectral similarity detection; S4: Use the spatial constraint module to learn constraints and obtain target detection results; The GLSL network consists of three branches: local vision, global vision, and spatial frequency fusion. The local vision branch divides the input spectral data into two parts, F1 and F2, which are processed separately. F1 uses... Convolution adjusts the dimensionality of features and introduces additional nonlinearity as a nonlinear transformation. F2 extracts multi-level local features from spectral data by applying convolution kernels of different sizes layer by layer. Finally, it fuses the features obtained from parallel processing and uses a fully connected layer to map the extracted features to a lower-dimensional space, resulting in a feature map L. In global vision, a Transformer structure is used to establish global relationships and long-distance dependencies through a self-attention mechanism. Specifically, the input spectral data is passed sequentially through normalization, multi-head self-attention, MLP layers, and fully connected layers to extract spectral features from multiple adjacent bands. The residuals are then concatenated and fused to reduce information loss from shallow to deep layers. In spatial frequency fusion, the one-dimensional input is reconstructed into two dimensions and features are extracted through two-dimensional convolution. Specifically, the one-dimensional spectrum is reshaped into... , The nearest square root is used as the input feature map. Then, three 2D convolutional layers are passed in for spatial feature learning. After each step, the ReLU activation function is applied. The 2D feature map is then flattened into a 1D vector, and a fully connected layer is used to transform the flattened feature vector back into 1D data. Finally, a 1D convolutional layer is used to extract the final features from the transformed 1D data. Through this series of operations, spatial-frequency fusion effectively fuses the spatial and frequency domain features of the input spectrum, thereby improving the ability to capture high-dimensional features and resulting in a strong feature representation capability, yielding a feature map. .

2. The self-supervised meta-transfer learning hyperspectral target detection method according to claim 1, characterized in that: In step S1, hyperspectral image source domain data and target domain data are prepared. Specifically, the source domain data uses a hyperspectral classification dataset containing rich ground cover information. In order to enhance the model's ability to perceive spectral changes, positive and negative sample pairs are constructed to perform comparative learning of spectral levels. Using the idea of ​​meta-representation, two types of spectra are randomly selected in each task to form a set of positive and negative sample pairs, increasing the diversity of tasks. For the target domain dataset, four different hyperspectral target domain data are prepared for detection.

3. The self-supervised meta-transfer learning hyperspectral target detection method according to claim 1, characterized in that: In step S2, a maximum distance triplet (MDTriplet) loss method is proposed. During the pre-training phase, a Siamese network is used for comparative learning between spectral beams, sequentially feeding in two similar spectra and one dissimilar spectrum. A maximum distance triplet (MDTriplet) loss function is designed for this purpose, and its specific description is as follows: in, It is the number of samples in the batch. , , Indicates application to anchored samples Positive samples and negative samples Feature extraction function, It is a marginal parameter.

4. The self-supervised meta-transfer learning hyperspectral target detection method according to claim 1, characterized in that: First, we utilize transfer learning to transfer the trained GLSL network to different hyperspectral target detection tasks. In the proposed method, considering the certain gap between the source and target domains, we choose the fine-tuning method in transfer learning for transfer. Also, considering the scarcity of target spectra, we use only one target sample and one background sample for model fine-tuning. Then, we input the test sample and the target prior into the fine-tuned GLSL network to enhance spectral features, calculate the similarity between the enhanced spectrum and the prior knowledge, and obtain the initial target detection result map.

5. The self-supervised meta-transfer learning hyperspectral target detection method according to claim 4, characterized in that: We employ Siamese networks and contrastive loss for fine-tuning. In the fine-tuning phase, we use a Siamese neural network structure, embedding a pre-trained GLSL module into the network structure. We fine-tune the entire GLSL module using small samples. Specifically, we randomly select a target and a background sample from the hyperspectral target detection dataset. We input the target or background sample and the target prior into the network for spectral feature enhancement, and calculate the similarity between the enhanced spectrum and the prior knowledge. Contrastive loss is used to measure the similarity between sample pairs and is commonly used to train deep learning models for similarity learning. It optimizes the model by minimizing the distance between similar sample pairs and maximizing the distance between dissimilar sample pairs. More importantly, the contrastive loss function can effectively learn useful features with limited samples, reducing the need for large amounts of labeled data. Therefore, in the fine-tuning phase, we use contrastive loss to minimize the distance between positive samples and the target prior, and maximize the distance between negative samples and the target prior. Specifically, let sample pairs... The embedding vector is and ,in It calculates the similarity between positive and negative samples and the target prior from the embedded representation output by the network: Next, the mathematical formula for the comparison loss function is as follows: in These are the labels of the sample pairs. If the sample pairs belong to the same class, then... If the sample pairs belong to different classes, then Margin is a hyperparameter representing the minimum distance between pairs of out-of-class samples, where It is the number of sample pairs, here If the input is a positive sample, we want this distance to be as small as possible; otherwise, we want this distance to be greater than the margin.

6. The self-supervised meta-transfer learning hyperspectral target detection method according to claim 5, characterized in that: After obtaining the initial target detection result map, through a combination of pre-training and fine-tuning, GLSL not only maintains its excellent spectral resolution but also demonstrates greater flexibility and adaptability in specific tasks. Specifically, after inputting the target prior and the spectrum to be measured into the GLSL module for feature enhancement, the extracted features... and It is used to calculate the cosine similarity between the spectrum to be measured and the prior spectrum of the target. Finally, a similarity map is generated from these similarity values. This allows for the effective capture and identification of subtle spectral differences. in Indicates the location The similarity value at each location.

7. The self-supervised meta-transfer learning hyperspectral target detection method according to claim 1, characterized in that: By fusing the spatial features of the target, the target detection result map is further optimized to achieve joint spatial-spectral constraints. During the learning phase of these constraints, various techniques, including neighborhood operations and morphological operations, are employed to refine the target region and remove noise. In the neighborhood operations, the confidence levels of each target pixel and its neighborhood are dynamically adjusted through analysis, thereby better identifying and preserving the true target region. Specifically, this is achieved by combining spectral similarity maps... Pixels with a confidence level greater than r are designated as target pixels, labeled as 1, and r is set to 0.

6. Neighboring pixels are analyzed with the target pixel as the center point. The specific analysis operations include traversing all target pixels and calculating the target pixel's position. The sum of labels within a small and large neighborhood, for example, in order to... To update the confidence score of each target pixel within a small neighborhood, we first calculate the confidence score of that pixel. The sum of target pixels in the neighborhood ,if Less than a set threshold If the value is zero, it indicates that the point is more likely to be an isolated noise point, therefore the confidence level of that point needs to be reduced. Then, we use a confidence level based on... Decreasing exponential function To calculate weights : for Apply weights to update the confidence graph S: Assumption The function is a decreasing function, when the neighborhood When the sum of the labels in the matrix is ​​small, the calculated weight A larger value indicates that the target pixel is more likely to be an isolated noise point and requires a greater reduction in its confidence level. Based on this, the confidence level of each target pixel is set at a certain level. Update within the larger neighborhood, if point The sum of the target pixels in the neighborhood is greater than the set threshold. If the result is negative, it indicates that the neighborhood of the target pixel is likely a large false target. Therefore, the confidence level of that point needs to be reduced, and then the incremental exponential function should be used. Calculation based on Weights: for Apply weights to update the confidence graph : The increment function is used because... The larger it is, the more likely it is to become a spurious target, and the more weight needs to be reduced. To optimize the target region boundary, remove edge noise, and enhance target segmentation, morphological operations of erosion and dilation were employed after each neighborhood operation. Specifically, a... The target area is etched using a core-based etching operation. This operation helps remove smaller noise points without affecting larger target structures. Next, a... The similarity map is then expanded to connect adjacent target regions, making the segmented target more complete and coherent. Through this process, we can not only reduce false detections but also ensure that the boundaries of the target regions are more accurate. Finally, the updated similarity map is used as the target detection map.

8. A self-supervised meta-transfer learning hyperspectral target detection system, characterized in that: The system is equipped with a control program to implement the self-supervised meta-transfer learning hyperspectral target detection method as described in any one of claims 1-7.