Axial reorganization and group offset mamba method for hyperspectral image classification
By using the ARGO-Mamba network and employing local-global attention guidance, axial reorganization, and group offset strategies, the modeling challenges of spatial global correlation and long-range spectral dependence in hyperspectral image classification are solved, improving classification accuracy and robustness and achieving efficient feature discrimination.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HUZHOU UNIVERSITY
- Filing Date
- 2026-05-11
- Publication Date
- 2026-06-23
AI Technical Summary
Existing hyperspectral image classification methods struggle to effectively capture spatial global correlations and long-range spectral dependencies, resulting in limited classification accuracy in complex scenarios and difficulty in balancing feature discriminativeness and modeling efficiency.
The ARGO-Mamba network is constructed using the axial reconstruction and group-shifted Mamba methods. It is composed of a Local-Global Attention Guided Module (LGAG), an Axial Reconstruction Spatial Mamba Module (ARSM), and a Group-Shifted Spectral Mamba Module (GOSM). This collaboratively models spatial structure consistency and long-range spectral dependence, thereby enhancing feature discrimination capabilities.
While maintaining high computational efficiency, it significantly improves the accuracy and robustness of hyperspectral image classification, effectively solves the problems of high-dimensional spectral redundancy and complex spatial structure, and achieves fine classification of complex scenes.
Smart Images

Figure CN122265741A_ABST
Abstract
Description
Technical Field
[0001] This invention is applicable to the field of hyperspectral image classification, specifically relating to hyperspectral image classification methods. Background Technology
[0002] Remote sensing technology can efficiently acquire high spatiotemporal resolution Earth surface observation data. Among these technologies, hyperspectral imagery (HSI) generates a nearly continuous spectral curve for each pixel by imaging across hundreds of consecutive narrow bands. Compared to traditional RGB images, HSI possesses stronger ground feature discrimination capabilities in the spectral dimension and has been widely applied in various remote sensing scenarios such as urban planning, military reconnaissance, and precision agriculture. In these tasks, hyperspectral image classification (HSIC) aims to assign a corresponding category label to each pixel in the image.
[0003] Early HSIC methods primarily relied on manually designed spectral features, subspace projections, and statistical learning models. While these methods achieved some success in specific scenarios, their feature representation capabilities were limited, making them ill-suited for real-world remote sensing scenarios with complex terrain distributions and high intra-class variability. In recent years, deep learning has gradually become the mainstream paradigm in HSIC research. Early work focused mainly on unsupervised learning. Subsequently, convolutional neural networks (CNNs) were widely introduced into HSIC tasks. Through weight sharing and local connectivity, convolutional operations can efficiently capture spatial local features of images. However, CNNs are limited by their local receptive field, making it difficult to effectively capture global contextual relationships, thus limiting their performance when handling complex terrain shapes. Later, the Transformer revolutionized HSIC with its self-attention mechanism. The Transformer can effectively model global context and long-range dependencies, significantly overcoming the locality limitations of CNNs. However, the computational complexity of the self-attention mechanism is the square of the sequence length. This presents enormous computational challenges and memory consumption, severely limiting the practicality and scalability of Transformer when dealing with HSI.
[0004] Recently, state-space models (SSMs) have gradually become an efficient structure for building deep networks. Among them, structured state-space sequence structures (S4s) are widely used in continuous sequence analysis due to their excellent performance in long sequence modeling. Mamba further improves upon S4. For example, Mamba incorporates hardware awareness in its design, enabling it to achieve higher computational efficiency than Transformers while maintaining strong long-range dependency modeling capabilities. With this advantage, Mamba has been widely applied in various research fields, including natural language processing (NLP) and computer vision (CV). Its initial application in HSIC has also validated Mamba's potential. However, due to the unique spatial structure and spectral characteristics of hyperspectral data, effectively applying efficient sequence modeling paradigms to HSIC still faces several challenges that need to be addressed. Summary of the Invention
[0005] The purpose of this invention is to address the inherent characteristics of existing hyperspectral image classification, such as high-dimensional spectral redundancy, complex spatial structure, and strong heterogeneity of ground objects, which make it difficult to capture spatial global correlations, inefficient modeling of long-range spectral dependence, and difficulty in balancing feature discriminativeness and modeling efficiency, ultimately resulting in limited classification accuracy in complex scenes. Therefore, this invention proposes an axial reconstruction and group offset Mamba method for hyperspectral image classification.
[0006] The specific process of the Mamba method for axial reconstruction and group shifting for hyperspectral image classification is as follows:
[0007] Step 1: Obtain hyperspectral images with category labels as the training set;
[0008] Step 2: Construct the ARGO-Mamba model;
[0009] The ARGO-Mamba model includes a feature header block, feature extraction block 1, feature extraction block 2, LGAG block, Pool block 2, ARSM block, GOSM block, and classification block.
[0010] LGAG blocks are local-global attention-guided modules;
[0011] ARSM blocks are axially reconfigurable space Mamba modules;
[0012] The GOSM block is a group offset spectrum Mamba module;
[0013] Step 3: Train the ARGO-Mamba model based on the hyperspectral images from Step 1 to obtain the trained ARGO-Mamba model;
[0014] Step 4: Input the hyperspectral image to be tested into the trained ARGO-Mamba model, and the trained ARGO-Mamba model will output the classification result.
[0015] The beneficial effects of this invention are as follows:
[0016] Hyperspectral image classification (HSIC) is a core task in the field of remote sensing. Its inherent characteristics—high-dimensional spectral redundancy, complex spatial structure, and strong heterogeneity of ground features—lead to difficulties in capturing spatial global correlations and inefficient modeling of long-range spectral dependencies. Traditional methods struggle to balance feature discriminativeness and modeling efficiency, limiting the accuracy of fine-grained classification in complex scenes. To address this, this invention proposes an Axial Reorganization and Group Offset Mamba (ARGO-Mamba) network. This network employs a dual-branch strategy and consists of three core modules: local-global attention guidance (LGAG), axial reorganization spatial Mamba (ARSM), and group offset spectral Mamba (GOSM). In LGAG (including Spa-LGAG and Spe-LGAG), fine-grained information is first preserved through local feature aggregation, and then important representations are dynamically enhanced based on global features. This avoids the dilution of local discriminative details by global information during discriminative feature calibration. In ARSM, axial recombination, feature shifting, and bidirectional spatial Mamba are utilized to eliminate the reliance on sequence scanning strategies in existing methods, reduce semantic loss during spatial dimension transformation, accurately capture global spatial associations, and improve classification accuracy. In GOSM, diverse grouped views are generated through spectral grouping and group shifting, fully leveraging the long-range dependency modeling advantages of Mamba. This effectively distinguishes spectrally similar land features while reducing high-dimensional spectral redundancy, achieving efficient spectral modeling. These three modules form a complementary framework of spatial-spectral-feature enhancement.
[0017] Significant semantic correlations of ground features exist in hyperspectral images (see...) Figure 6Adjacent pixels typically exhibit strong consistency in spatial location and semantic category, making their spatial context information crucial for classification performance. However, existing methods often require reorganization or linearization of the two-dimensional structure when modeling spatial features, thereby disrupting the original spatial topological relationships and local semantic consistency, limiting the effective mining of fine-grained spatial discriminative information.
[0018] like Figure 6 In local spatial features, the central pixel and its neighboring pixels have significant semantic relationships (the red solid line). If the two-dimensional spatial features are directly flattened into a one-dimensional sequence, the original semantic relationships will inevitably be lost. To alleviate this problem, this invention proposes an axial recombination strategy. By performing axial structuring processing beforehand, the two-dimensional semantic relationships are preserved in the one-dimensional sequence (the red dashed line), thereby minimizing semantic loss.
[0019] Hyperspectral images contain a large number of continuous and highly correlated spectral bands. Directly modeling all bands would incur high computational costs, while common spectral grouping or compression strategies, if poorly designed, can easily disrupt the local continuity and correlation structure between key bands, thereby weakening the ability to represent fine-grained spectral discriminative information. Therefore, how to reduce computational complexity while maintaining spectral discriminability remains a significant challenge in hyperspectral image classification.
[0020] To address this, this invention proposes the ARGO-Mamba network, aiming to efficiently model long-range dependencies while maintaining spatial-spectral consistency. This framework collaboratively represents spatial-spectral features through a sequence modeling paradigm and utilizes local-global guidance and customized modeling strategies to solve spatial semantic loss and spectral redundancy problems, thereby improving classification performance and model robustness while maintaining high computational efficiency.
[0021] The main contributions of this invention can be summarized as follows:
[0022] We propose a unified modeling framework, ARGO-Mamba, for hyperspectral image classification. Under an efficient sequence modeling paradigm, it collaboratively models spatial structure consistency and long-range spectral dependence, significantly improving the discriminative power of hyperspectral features while ensuring computational efficiency.
[0023] A local-global attention guidance (LGAG) mechanism was designed, and corresponding guidance forms were implemented in the spatial and spectral modeling branches to adapt to the feature characteristics of hyperspectral data in different dimensions. This mechanism effectively alleviates the problem of excessive smoothing of local discriminative details by global statistics through dynamic interaction between local information and global context, thereby improving the discriminativity and stability of feature representation.
[0024] An axial reorganization spatial Mamba (ARSM) module is proposed, which enhances the modeling ability of directionality and spatial topological consistency through structured spatial reorganization, enabling the efficient sequence modeling paradigm to better adapt to complex spatial structure scenarios.
[0025] A group offset spectral Mamba (GOSM) module was constructed, which effectively alleviates the damage to key spectral continuity and related structures caused by fixed grouping strategies through multi-view spectral grouping representation, and achieves efficient modeling of long-range spectral dependence. Attached Figure Description
[0026] Figure 1 This is a diagram of the overall structure of ARGO-Mamba in this invention. The Head block is used to adjust the number of channels; Feature extraction block 1 and Feature extraction block 2 are used to extract local spatial and spectral features, respectively; the Head block is the feature header block, Feature extraction block 1 is feature extraction block 1, and Feature extraction block 2 is feature extraction block 2; Conv blocks (1-6) are used to further integrate information between channels; Pool block 1 and Pool block 2 are both used for dimensionality compression; Linear block 1 is a linear layer; Linear block 2 is a linear layer; Spa-LGAG is a spatial branch attention block; Spe-LGAG is a spectral branch attention block; Local feature aggregation is local feature aggregation; Global feature extraction is global feature extraction; Axial is the axial direction; Spatial Recombination is spatial recombination; Mamba is the standard Mamba framework based on SSM; Group Offset is group offset; Spectral is spectral; Birdirectional SpatialMamba is bidirectional spatial Mamba.
[0027] Figure 2 This is a structural diagram of the Spa-LGAG spatial branch attention block of the present invention;
[0028] Figure 3 This is a structural diagram of the spectral branching attention block Spe-LGAG of the present invention;
[0029] Figure 4 This is a detailed structural diagram of the ARSM of the present invention;
[0030] Figure 5 Detailed structural diagram of the invented GOSM;
[0031] Figure 6 This is a schematic diagram of the semantics of ground features in a hyperspectral image.
[0032] Figure 7 The images shown are classification maps obtained by different methods on the Indian Pines dataset. (a) is a pseudo-color map, (b) is a ground truth map, and (c)-(l) are HybridSN, PyResNet, HIT, SSTN, and SSFTT, respectively. , 3DSSMamba, MambaLG, MambaHSI and proposed;
[0033] Figure 8 The images shown are classification maps obtained by different methods on the Pavia University dataset. (a) is a pseudo-color image, (b) is a ground truth image, and (c)-(l) are HybridSN, PyResNet, HIT, SSTN, and SSFTT, respectively. , 3DSSMamba, MambaLG, MambaHSI and proposed;
[0034] Figure 9 The following is a visualization of t-SNE on the Indian Pines dataset. (a)-(d) are PyResNet, MambaHSI and proposed;
[0035] Figure 10 The following is a visualization of t-SNE on the Pavia University dataset. (a)-(d) are PyResNet examples, respectively. MambaHSI and proposed;
[0036] Figure 11 The training sample ratio represents the OA values of all methods in different training samples on the Indian Pines dataset.
[0037] Figure 12 This represents the OA values for all methods across different training samples in the Pavia University dataset. Detailed Implementation
[0038] Specific Implementation Method 1: The specific process of the axial reconstruction and group shift Mamba method for hyperspectral image classification in this implementation method is as follows:
[0039] Step 1: Obtain hyperspectral images with category labels as the training set;
[0040] Step 2: Construct the ARGO-Mamba model;
[0041] The ARGO-Mamba model includes a feature header block, feature extraction block 1, feature extraction block 2, LGAG block, Pool block 2, ARSM block, GOSM block, and classification block.
[0042] The LGAG block is a local–global attention guidance module;
[0043] The ARSM block is the axial recombination spatial Mamba module;
[0044] The GOSM block is the group offset spectral Mamba module;
[0045] The header block is used to adjust the number of channels;
[0046] Feature extraction block 1 and feature extraction block 2 are used to extract local spatial and spectral features, respectively.
[0047] Step 3: Train the ARGO-Mamba model based on the hyperspectral images from Step 1 to obtain the trained ARGO-Mamba model;
[0048] Step 4: Input the hyperspectral image to be tested into the trained ARGO-Mamba model, and the trained ARGO-Mamba model will output the classification result.
[0049] Specific Implementation Method Two: This implementation method differs from Specific Implementation Method One in that: in step one, hyperspectral images with category labels are acquired as the training set; the specific process is as follows:
[0050] Step 11: Obtain hyperspectral images with category labels. ; ;
[0051] in, Indicates the height of the image; Indicates the image width; Indicates the number of bands; Represents the set of real numbers;
[0052] Initial band number used The number of channels after processing is used ;
[0053] Steps 1 and 2: Considering the spatial correlation between adjacent pixels, the hyperspectral image is divided into a series of three-dimensional patch cube images. ; , ;
[0054] in, This indicates the spatial dimension of the patch cube; This indicates the height or width of each 3D patch cube (here, width and height are the same);
[0055] This represents the first 3D patch cube image; This represents the second 3D patch cube image; Indicates the first A three-dimensional patch cube image; Indicates the first A three-dimensional patch cube image;
[0056] Step 13: Use the center pixel category label of each 3D patch cube image as the category of the corresponding 3D patch cube image;
[0057] All 3D patch cube images were used as the training set;
[0058] Each 3D patch cube image is used as input to the model.
[0059] The other steps and parameters are the same as in Specific Implementation Method 1.
[0060] Specific Implementation Method Three: This implementation method differs from Specific Implementation Method One or Two in that: in step two, the ARGO-Mamba model is constructed;
[0061] The ARGO-Mamba model includes a feature header block, feature extraction block 1, feature extraction block 2, LGAG block, Pool block 2, ARSM block, GOSM block, and classification block.
[0062] The LGAG block is a local–global attention guidance module;
[0063] The ARSM block is the axial recombination spatial Mamba module;
[0064] The GOSM block is the group offset spectral Mamba module;
[0065] The specific process is as follows:
[0066] The feature header block consists of a two-dimensional convolutional layer (with a kernel size of [missing information]). ), BN layer, GELU layer;
[0067] BN layer is the batch normalization layer;
[0068] The GELU layer is a non-linear activation function layer;
[0069] Feature extraction block 1 consists of two-dimensional convolutional layers (with kernel size of...). ), BN layer, GELU layer;
[0070] Feature extraction block 2 consists of three-dimensional convolutional layers (with kernel size of [missing information]). LN layer, GELU layer;
[0071] The LN layer is a layer normalization layer;
[0072] LGAG blocks include the spatial branched attention block Spa-LGAG and the spectral branched attention block Spe-LGAG;
[0073] Existing attention frameworks typically rely on global pooling to extract global contextual information and then weight features accordingly (e.g., CBAM). However, this global statistical approach often neglects local feature information, potentially leading to insufficient representation of fine spatial and spectral features. To overcome these limitations, this invention proposes an LGAG module. Considering the differences between spatial and spectral features, a spatial branch Spa-LGAG and a spectral branch Spe-LGAG are designed, with structures as follows: Figure 2 and Figure 3 As shown;
[0074] Pool block 2 is a global average pooling layer;
[0075] ARSM blocks include convolutional kernels of size [size missing]. The depthwise convolutional layer has a kernel size of 1. The depthwise convolutional layer has a kernel size of 1. The depthwise convolutional layer has a kernel size of 1. The depthwise convolutional layer has a kernel size of 1. The depthwise convolutional layer has a kernel size of 1. The deep convolutional layer, convolutional block 1, convolutional block 2, convolutional block 3, and Mamba block;
[0076] Mamba blocks are the standard Mamba framework based on SSM;
[0077] Convolution block 1 consists of convolution kernels of size 1 and 2. 2D convolutional layers, BN layers, and GELU layers;
[0078] Convolution block 2 consists of convolution kernels of size 1. 2D convolutional layers, BN layers, and GELU layers;
[0079] Convolution block 3 consists of convolution kernels of size 3. 2D convolutional layers, BN layers, and GELU layers;
[0080] GOSM blocks include convolutional block 4, convolutional block 5, convolutional block 6, and Mamba block;
[0081] Convolution block 4 consists of convolution kernels of size 4. One-dimensional convolutional layer, GN layer; GN layer is group normalized;
[0082] Convolution block 5 sequentially includes convolution kernels of size 1. 1D convolutional layers, GN layers;
[0083] Convolution block 6 sequentially includes convolution kernels of size 1. 1D convolutional layers, GN layers;
[0084] The classification blocks include 1 pool block, 1 linear block, and 2 linear blocks;
[0085] Pool block 1 is a global average pooling layer;
[0086] Linear block 1 is a linear layer (Linear).
[0087] The two Linear blocks are linear layers.
[0088] Other steps and parameters are the same as in specific implementation method one or two.
[0089] Specific Implementation Method Four: This implementation method differs from Specific Implementation Methods One to Three in that: in step three, the ARGO-Mamba model is trained based on the hyperspectral image from step one to obtain a trained ARGO-Mamba model; the specific process is as follows:
[0090] Step 3: Input each 3D patch cube image from the training set into the feature head block, and the feature head block outputs a feature map. ;
[0091] Step 3.2: Transfer the feature map Input feature extraction block 1, output feature map from feature extraction block 1. ;
[0092] Step 33: Transfer the feature map Input feature extraction block 2, output feature map from feature extraction block 2. ;
[0093] Steps three and four: Transfer the feature map Input Spatial Branch Attention Block (Spa-LGAG), output feature map ;
[0094] Step 35: Transfer the feature map Input spectral branch attention block Spe-LGAG, output feature map of spectral branch attention block Spe-LGAG ;
[0095] The interaction between local and global features helps to improve the stability and robustness of attention weights, thereby more accurately highlighting key spatial-spectral features.
[0096] Step 36: Output feature map of Spatial Branch Attention Block (Spa-LGAG) Input ARSM block, output feature map ;
[0097] Step 37: Output feature map of the spectral branch attention block Spe-LGAG Input two pool blocks, output two pool blocks feature maps ;
[0098] Step 38: Output feature maps from two pool blocks. Input a GOSM block, output a feature map ;
[0099] Step 39: Output feature map of ARSM block Input the classification block (Pool block 1), and output the feature map from Pool block 1. ;
[0100] Output feature map of Pool block 1 The input is Linear block 1 of the classification block, and the output of Linear block 1 is the feature map. ;
[0101] Output feature map of GOSM block The input is two linear blocks (Linear Block 2), and the output is feature maps from these two linear blocks. ;
[0102] Step 30: Transfer the feature map and feature map The final prediction result is generated through dynamic fusion, and is represented as follows:
[0103]
[0104] in, This represents the final predicted probability distribution. Indicates parameters, Indicates parameters;
[0105] Step 31: Train the ARGO-Mamba model based on each 3D patch cube image in the training set until all 3D patch cube images are input into the ARGO-Mamba model to obtain the trained ARGO-Mamba model.
[0106] The other steps and parameters are the same as those in one of the specific implementation methods one to three.
[0107] Specific Implementation Method Five: This implementation method differs from Specific Implementation Methods One to Four in that: in steps three and four, the feature map... Input Spatial Branch Attention Block (Spa-LGAG), output feature map The specific process is as follows:
[0108] Steps 3 and 41
[0109] To enhance the representation of fine spatial features, Spa-LGAG first obtains more representative statistical information through local feature aggregation before global feature extraction;
[0110] Extract feature maps Local features; represented as:
[0111] (1)
[0112] in Represents the input feature map , ; This represents a boundary fill operation with a size of 1 in the spatial dimension; This indicates a sliding step size of 1. Pooling operations are used to extract local statistical information;
[0113] express Features resulting from the aggregation of local information;
[0114] Parameter sensitivity experiments show that 11×11 is the optimal input patch size. Within this receptive field, 3×3 local pooling is used to accurately capture substructural features; in contrast, larger windows tend to cause local statistical information to tend towards regional averaging, thus losing fine texture.
[0115] Step 342: Extract global features based on local features to achieve dynamic enhancement of key features; represented as:
[0116] (2)
[0117] in This represents a global average pooling operation along the channel dimension, used to extract global statistics. This represents the Sigmoid activation function used to generate the weights; This indicates element-wise multiplication; Spatial branch attention block Spa-LGAG output feature map .
[0118] The other steps and parameters are the same as in any of the specific implementation methods one to four.
[0119] Specific Implementation Method Six: This implementation method differs from Specific Implementation Methods One to Five in that: in steps three and five, the feature map... Input spectral branch attention block Spe-LGAG, output feature map of spectral branch attention block Spe-LGAG ;
[0120] The design of the spectral branch attention block Spe-LGAG is similar to that of the spatial branch attention block Spa-LGAG. Spe-LGAG was proposed to enhance the expression of fine spectral features; however, unlike the former, its local feature aggregation and feature weighting are both performed in the spectral dimension.
[0121] The entire process is described as follows:
[0122] (3)
[0123] in Represents the input feature map , , Indicates the number of channels; This represents a boundary padding operation with a size of 1 in the spectral dimension; This represents a moving average operation in the spectral dimension with a sliding step size of 1 and a window of 3 adjacent channels. This represents a global average pooling operation in the spectral space dimension; express Channel description operator; Represents a linear feature mapping layer; Spe-LGAG output feature map representing the spectral branch attention block. .
[0124] The other steps and parameters are the same as those in any of the specific implementation methods one to five.
[0125] Specific Implementation Method Seven: This implementation method differs from Specific Implementation Methods One through Six in that: in step three and six, the Spatial Branch Attention Block Spa-LGAG output feature map is... Input ARSM block, output feature map ;
[0126] While Mamba excels in handling long sequence tasks, it was originally designed for one-dimensional sequences. In HSIC, existing methods often use row-first, column-first, and cross-scanning strategies to transform two-dimensional spatial features into one-dimensional sequences to adapt to Mamba. However, such transformations easily disrupt the spatial topology, leading to the disordered splitting of strongly semantically related feature units (such as adjacent pixels of the same feature), making it difficult for Mamba to fully capture the global spatial context relationships between pixels. Therefore, this invention designs an ARSM module, the structure of which is as follows... Figure 4 As shown, this module mainly consists of three parts: axial recombination, feature shifting, and bidirectional spatial Mamba.
[0127] The specific process is as follows:
[0128] Step 361: To enhance the semantic associations in the generated sequences, output feature maps based on the Spa-LGAG attention block on the horizontal axis. After processing, a one-dimensional sequence is obtained. Axial sensing is used to transform low-level pixel features into mid-level structural features; the process is as follows:
[0129] (4)
[0130] in Feature map representing the output of Spa-LGAG attention block. , ; Indicates the kernel size as Convolution; Indicates the kernel size as Convolution; Indicates the kernel size as Convolution; This indicates that along the horizontal direction, the design aims to perceive the structured features of the underfield model at multiple scales; , , Represents the intermediate feature map; Represents the fused feature map;
[0131] Fuse feature maps along the horizontal axis Flattening the sequence yields a one-dimensional sequence. , , equal ; Indicates the sequence dimension;
[0132] Step 362: To enhance the semantic associations in the generated sequences, output feature maps based on the Spa-LGAG attention block along the vertical axis. After processing, a one-dimensional sequence is obtained. Axial sensing is used to transform low-level pixel features into mid-level structural features; the process is as follows:
[0133] (5)
[0134] in Feature map representing the output of Spa-LGAG attention block. , ; Indicates the kernel size as Convolution; Indicates the kernel size as Convolution; Indicates the kernel size as Convolution; This indicates that along the vertical direction, the design aims to perceive the structured features of the underfield model at multiple scales; , , Represents the intermediate feature map; Represents the fused feature map;
[0135] Fuse feature maps along the vertical axis Flattening the sequence yields a one-dimensional sequence. , , equal ;
[0136] Step 363: Output feature map of Spa-LGAG for spatial branch attention block based on horizontal feature shift. After processing, a one-dimensional sequence is obtained. ;
[0137] Step 364: Output feature map of Spatial Branch Attention Block Spa-LGAG based on vertical feature shift. After processing, a one-dimensional sequence is obtained. ;
[0138] Step 365: The one-dimensional sequence obtained in step 361... BSM processing is performed to obtain a one-dimensional sequence. ;
[0139] The specific process is as follows:
[0140] The one-dimensional sequence obtained in step 361 The sequence is reshaped and then input into convolutional block 1. Convolutional block 1 outputs a feature map. ;
[0141] feature map Input Mamba block, output feature map ;
[0142] For feature maps Perform feature reversal to obtain the feature map after feature reversal. ;
[0143] feature map Input Mamba block, output feature map ;
[0144] feature map Input convolution block 2, output feature map ;
[0145] feature map Input convolutional block 3, output feature map of convolutional block 3 ;
[0146] For feature maps Perform feature reversal to obtain the feature map after feature reversal. ;
[0147] feature map and feature map By adding elements one by one, a one-dimensional sequence is obtained. ;
[0148] The feature map Input Mamba block, output feature map The process is as follows:
[0149] This section aims to model global spatial context dependencies using Mamba. Since Mamba has typical causal modeling characteristics, its unidirectional sequence processing method may not fully capture the potential bidirectional dependencies in hyperspectral images. Therefore, this study reverses the input sequence to obtain an inverse sequence, and then inputs the forward and inverse sequences into parallel Mamba branches for feature modeling.
[0150]
[0151] in , , Both represent two-dimensional pointwise convolution operations;
[0152] This indicates a shape reshaping operation;
[0153] This indicates that the channel dimension and sequence dimension are interchanged, for example... ( It is the channel dimension. (It is the sequence dimension) becomes , ( It is the channel dimension. (It is the sequence dimension)
[0154] Used to generate the reversed input sequence;
[0155] This indicates a Mamba framework based on SSM, used to capture long-range spatial dependencies;
[0156] and This is an intermediate feature map;
[0157] Step 366: The one-dimensional sequence obtained in step 362... BSM processing is performed to obtain a one-dimensional sequence. The specific process is the same as step three sixty-five;
[0158] Step 367: The one-dimensional sequence obtained in step 363... BSM processing is performed to obtain a one-dimensional sequence. The specific process is the same as step three sixty-five;
[0159] Step 368: The one-dimensional sequence obtained in step 364... BSM processing is performed to obtain a one-dimensional sequence. The specific process is the same as step three sixty-five;
[0160] Step 369: Transform the one-dimensional sequence One-dimensional sequence One-dimensional sequence One-dimensional sequence By adding elements one by one, we obtain the sequence. ;
[0161] Step 360: For the sequence Perform reshape deformation to obtain the feature map. .
[0162] The other steps and parameters are the same as those in one of the specific implementation methods one to six.
[0163] Specific Implementation Method Eight: This implementation method differs from Specific Implementation Methods One through Seven in that: in step 363, the Spa-LGAG output feature map of the spatial branch attention block is based on horizontal feature shift. After processing, a one-dimensional sequence is obtained. The process is as follows:
[0164] Even with the introduction of axial reorganization strategies, it is still difficult to establish connections between non-local features that cross row / column boundaries; feature shifting mainly involves transferring boundary features from different rows / columns to the same row / column, enabling subsequent axial sensing to capture the boundary dependencies in the original feature map;
[0165] Step 3631: Taking horizontal feature displacement as an example (see...) Figure 4 (Left side), feature map Each channel in the process is feature-shifted using Formula 6 to form a feature map. ;
[0166] The feature representation of each row not only includes feature information that crosses row boundaries, but also incorporates contextual features from adjacent rows;
[0167] The specific process is as follows:
[0168] Feature map In a single channel The characteristics of the row are , ,in ;
[0169] Feature map The feature shifting process in a single channel is represented as follows:
[0170] (6)
[0171] in Used to the first Characteristics of lines Divided into and Two parts, , , Indicates the position of the division. , Indicates divisibility; Indicates the remaining positions. ;
[0172] Indicates the first Characteristics of lines The segmented head, Indicates the first Characteristics of lines The split tail section;
[0173] Indicates features Assign to new row characteristics From the position To the final characteristics;
[0174] Indicates features Assign to new row characteristics From position 0 to Features;
[0175] This means assigning the value 0 to the new row feature. From position 0 to Features;
[0176] This means assigning the value 0 to the new row feature. From the position To the final characteristics;
[0177] Feature map Each channel in the process undergoes feature shifting to form a feature map. , ;
[0178] Step 3632: Feature Map Based on Horizontal Axis The processing is carried out in the same way as step 361. Replace with A one-dimensional sequence is obtained. , ;
[0179] For the obtained one-dimensional sequence The former One and after The features are cropped to obtain the cropped one-dimensional sequence. ;
[0180] Make the clipped one-dimensional sequence Dimension matching ;
[0181] Through the Axial reorganization helps establish the relationship between cross-row boundary features and adjacent row features.
[0182] The other steps and parameters are the same as those in any of the specific implementation methods one to seven.
[0183] Specific Implementation Method Nine: This implementation method differs from Specific Implementation Methods One through Eight in that: in step 364, the Spa-LGAG output feature map of the spatial branch attention block is based on vertical feature shift. After processing, a one-dimensional sequence is obtained. The process is as follows:
[0184] Even with the introduction of axial reorganization strategies, it is still difficult to establish connections between non-local features that cross row / column boundaries; feature shifting mainly involves transferring boundary features from different rows / columns to the same row / column, enabling subsequent axial sensing to capture the boundary dependencies in the original feature map;
[0185] Step 3641: Analyze the feature map Transpose the horizontal and vertical dimensions of the spatial dimension;
[0186] Feature map representing the output of Spa-LGAG attention block. ;
[0187] Each channel in the transposed feature map is then shifted using Formula 6 to form the new feature map. , The specific process is the same as step 3631, feature map. Replace with the transposed feature map;
[0188] Step 3642: Analyze the feature map Perform dimensional reversal, based on the vertical axis of the dimensionally reversed feature map. The processing is the same as step 362. Replace with A one-dimensional sequence is obtained. , ;
[0189] For the obtained one-dimensional sequence The former One and after The features are cropped to obtain the cropped one-dimensional sequence. ;
[0190] Make the clipped one-dimensional sequence Dimension matching ;
[0191] Through the Axial reorganization helps establish the relationship between cross-row boundary features and adjacent row features.
[0192] The other steps and parameters are the same as those in one of the specific implementation methods one to eight.
[0193] Specific Implementation Method Ten: This implementation method differs from Specific Implementation Methods One through Nine in that: in step three-eight, the Pool block 2 outputs feature maps. Input a GOSM block, output a feature map The specific process is as follows:
[0194] Hyperspectral images typically contain hundreds of consecutive bands, and directly modeling the entire band sequence significantly increases the model's workload. While band grouping can alleviate computational pressure, this process disrupts the continuity of the original spectrum, leading to the neglect of key cross-group dependencies and limiting the model's discriminative ability and final classification performance. Therefore, this invention designs a GOSM module, the structure of which is as follows: Figure 5 As shown, this module mainly consists of two parts: group offset and bidirectional spectral Mamba.
[0195] Step 381: Process the input spectral sequence Perform group offset. ;
[0196] To prevent critical information from being fragmented during band grouping, this invention introduces a group offset mechanism. This mechanism generates offset grouped views based on the original grouping, allowing critical information crossing group boundaries to be fully captured in other offset views, thereby mitigating the problem of disrupted spectral continuity.
[0197] The process is as follows:
[0198] (7)
[0199] in This indicates that the spectral sequence is divided into multiple groups. Representing the length of each group, the input spectral sequence Feature maps output from Pool block 2 ;
[0200] This indicates the first spectral subgroup. This indicates the second spectral subgroup. Indicates the first Spectral subgroups, Indicates the first Spectral subgroups;
[0201] This represents the first spectral grouping sequence. ; This represents the second spectral grouping sequence. ; This represents the third spectral grouping sequence. ; This represents the 4th spectral grouping sequence. ;
[0202] This represents the first spectral subgroup in the second spectral grouping sequence. This represents the second spectral subgroup in the second spectral grouping sequence. Indicates the second spectral grouping sequence. Spectral subgroups;
[0203] This represents the first spectral subgroup in the third spectral grouping sequence. This represents the second spectral subgroup in the third spectral grouping sequence. Indicates the third spectral group sequence. Spectral subgroups;
[0204] This represents the first spectral subgroup in the third spectral grouping sequence. This represents the second spectral subgroup in the third spectral grouping sequence. Indicates the third spectral group sequence. Spectral subgroups;
[0205] equal ; Indicates divisibility; Used to offset grouped sequences;
[0206] Indicates the offset, take ; Indicates the offset, take ; Indicates the offset, take ;
[0207] offset Take respectively By constructing four sets of offset sequences, this design achieves the relocation of edge bands within each set, ensuring that cross-boundary continuity information is fully modeled in the multi-view representation. This scheme maintains a balance between enhancing model representation capabilities and controlling computational overhead. Furthermore, Group length The dynamic scaling design ensures that the solution can naturally adapt to various datasets with different spectral dimensions.
[0208] Step 382: Group the first spectral sequence separately. The second spectral grouping sequence The third spectral group sequence The fourth spectral group sequence Two-way Mamba spectroscopy was performed to obtain grouped sequences. and The specific process is as follows:
[0209] To fully leverage Mamba's global awareness capabilities in long sequence dependency modeling, this invention extracts deep features from different spectral views. Similar to bidirectional spatial Mamba, to capture potential bidirectional dependencies in spectral sequences, the input sequence is first inverted to generate a reverse sequence, and then the forward and reverse sequences are modeled separately.
[0210] Step 3821: Group the first spectral sequence Two-way Mamba spectroscopy was performed to obtain grouped sequences. The process can be formalized as follows:
[0211] (8)
[0212] in This represents convolution block 4; This represents convolution block 5; This represents convolution block 6; express The sequence obtained after convolution block 4; express pass The resulting sequence; This indicates that the grouped sequence is reversed; Indicates to Grouped sequences after two-dimensional Mamba spectroscopy;
[0213] Convolution block 4 consists of convolution kernels of size 4. One-dimensional convolutional layer, GN layer; GN layer is group normalized;
[0214] Convolution block 5 sequentially includes convolution kernels of size 1. 1D convolutional layers, GN layers;
[0215] Convolution block 6 sequentially includes convolution kernels of size 1. 1D convolutional layers, GN layers;
[0216] Step 3.822: Group the second spectral sequence Two-way Mamba spectroscopy was performed to obtain grouped sequences. The process can be formalized as follows:
[0217] (9)
[0218] in express The sequence obtained after convolution block 4; express pass The resulting sequence; Indicates to Grouped sequences after two-dimensional Mamba spectroscopy;
[0219] Step 3.823: Group the third spectral sequence. Two-way Mamba spectroscopy was performed to obtain grouped sequences. The process can be formalized as follows:
[0220] (10)
[0221] in express The sequence obtained after convolution block 4; express pass The resulting sequence; Indicates to Grouped sequences after two-dimensional Mamba spectroscopy;
[0222] Step 3824: Group the fourth spectral sequence Two-way Mamba spectroscopy was performed to obtain grouped sequences. The process can be formalized as follows:
[0223] (11)
[0224] in express The sequence obtained after convolution block 4; express pass The resulting sequence; Indicates to Grouped sequences after two-dimensional Mamba spectroscopy;
[0225] It is worth noting that no activation function was introduced into this convolutional layer. The linear convolution design avoids the nonlinear truncation effect and enhances the model's robustness to illumination fluctuations by preserving spectral continuity and homogeneous mapping characteristics. This approach fits the pixel linear mixture model, effectively preventing physical semantic distortion and ensuring the authenticity of proportional relationships while focusing on spectral morphological features.
[0226] After passing through bidirectional Mamba spectrometers, the four spectral sequences were processed to obtain the grouped sequences after sequence modeling. and ;
[0227] Step 3825: Grouping the sequence and Inverse group offsets are performed separately to ensure that each group sequence is precisely aligned in the spectral dimension during fusion, resulting in the desired sequence. and The specific process is as follows:
[0228] Step 38251: Grouping the sequence Perform reverse group offset to ensure that the grouped sequences are precisely aligned in the spectral dimension during fusion, resulting in a sequence. This process is represented as follows:
[0229] (12)
[0230] in, This indicates that the grouped sequence will be reversed (the offset will be restored).
[0231] This represents the offset, and is set to 0. Indicates the intermediate grouping sequence;
[0232] Indicates grouping sequences All spectral subgroups in the spectrum are spliced together;
[0233] Step 38252: Grouping the sequence Perform reverse group offset to ensure that the grouped sequences are precisely aligned in the spectral dimension during fusion, resulting in a sequence. This process is represented as follows:
[0234] (13)
[0235] in, This indicates that the grouped sequence will be reversed (the offset will be restored). Indicates the intermediate grouping sequence; Indicates grouping sequences All spectral subgroups in the spectrum are spliced together;
[0236] Step 38253: Grouping the sequence Perform reverse group offset to ensure that the grouped sequences are precisely aligned in the spectral dimension during fusion, resulting in a sequence. This process is represented as follows:
[0237] (14)
[0238] in, This indicates that the grouped sequence will be reversed (the offset will be restored). Indicates the intermediate grouping sequence; Indicates grouping sequences All spectral subgroups in the spectrum are spliced together;
[0239] Step 38254: Grouping the sequence Perform reverse group offset to ensure that the grouped sequences are precisely aligned in the spectral dimension during fusion, resulting in a sequence. This process is represented as follows:
[0240] (15)
[0241] in, This indicates that the grouped sequence will be reversed (the offset will be restored). Indicates the intermediate grouping sequence; Indicates grouping sequences All spectral subgroups in the spectrum are spliced together;
[0242] Step 3826, for the sequence and The fusion process is performed to obtain the fused sequence. ; ;
[0243] Fusion sequence As the output feature map of GOSM block .
[0244] Features after fusion It also integrates complementary information from different spectral perspectives, enabling the model to effectively aggregate the advantageous features of each spectral perspective, thereby forming a more comprehensive and robust spectral representation.
[0245] The other steps and parameters are the same as those in any of the specific implementation methods one to nine.
[0246] Mamba: SSM is based on the principle of linear time-invariant (LTI) systems, which converts continuously input signals... Mapped to continuous output signal Specifically, it is achieved through a potential state. This mapping is achieved. This process can be expressed using a linear ordinary differential equation (ODE) as follows:
[0247]
[0248] in It is a state matrix, which describes the changes in the internal state of the system. It is the input matrix. It is the output matrix. It is a direct mapping matrix. This represents the system at a specific point in time. The rate of change of state.
[0249] To extend the continuous-time system in the above equation to discrete sequence modeling tasks, S4 discretizes it. Specifically, S4 uses a time-scale parameter... The zero-order hold (ZOH) technique is used to hold parameters in a continuous system. and Mapped to parameters in a discrete system and This process can be represented as:
[0250]
[0251] in Represents matrix exponentiation. Let be the identity matrix. After completing the above discretization steps, the discretized SSM system can be further described as:
[0252]
[0253] To enable the model to dynamically adjust based on the input sequence, Mamba introduced a selectivity mechanism in its S6 architecture. The core idea of this mechanism is to allow the parameters to... It is no longer static, but depends on the input sequence. Dynamically generated. Specifically, for each input... Mamba predicts its corresponding [database] through a linear projection layer. .
[0254] Discretized variables can be obtained further. and At this point, the S6 architecture can be described as follows:
[0255]
[0256] Overall structure: Assuming the original hyperspectral image (Initial band count uses B, processed channel count uses C), where and Indicates the height and width of the image. This indicates the number of bands. Considering the spatial correlation between adjacent pixels, the image is divided into a series of three-dimensional patch cubes. .in, This represents the spatial dimension of the patch. Subsequently, using the center pixel label of each patch as a supervision signal, these cubes are used as input to the model.
[0257] The proposed overall structure of ARGO-Mamba is as follows: Figure 1 As shown; firstly, to adapt the model to different datasets, the number of channels in the input data is uniformly adjusted through the initial convolutional layer. Subsequently, the framework expands into a spatial-spectral dual-branch structure to synergistically leverage the complementary strengths of hyperspectral data. The spatial branch integrates the Spa-LGAG and ARSM modules to enhance key spatial features and model global context; the spectral branch combines the Spe-LGAG and GOSM modules to highlight important bands and capture long-range spectral dependencies.
[0258] In the final classification stage, the output of the spatial branch... With the output of the spectral branch The process of generating the final prediction result through dynamic fusion can be represented as:
[0259]
[0260] in, This represents the final predicted probability distribution. This is a linear mapping operation. Parameters and Instead of fixed scalars, the weights are input-driven adaptive weights. This allows the model to dynamically adjust the contribution ratio of each branch based on the spatial-spectral characteristics of the samples, thereby achieving optimal collaborative representation.
[0261] The beneficial effects of the present invention are verified using the following embodiments:
[0262] Example 1:
[0263] Datasets: To verify the classification performance of ARGO-Mamba, this invention selected four benchmark hyperspectral datasets covering different application scenarios: Indian Pines (IP) and WHU-Hi-Longkou (LK), which contain agricultural land features, and Pavia University (PU) and Houston 2013 (HST), which represent typical urban landscapes. Basic information about each dataset (including spatial resolution, spectral range, and number of categories) is shown in Table 1.
[0264] Implementation details: In the PyCharm development environment, the proposed method is implemented based on Python 3.10.14 and PyTorch 2.1.2. All experiments in this invention were performed under the PyTorch framework and run on a server equipped with a 2.1GHz Intel(R)Xeon(R)Silver 8352V CPU and a 24GB Nvidia GeForce RTX 4090 GPU.
[0265] To objectively evaluate model performance, overall classification accuracy (OA), average classification accuracy (AA), and kappa coefficient (K) were used as evaluation metrics. In the proposed method, the Adam optimizer was used, and the number of training epochs was set to 300. In GOSM, the group size was set to 14, and the offset for each iteration was set to 4. The hidden state dimension in the Mamba architecture was set to 16. The training set was randomly sampled according to the proportions in Table 1, and the remaining pixels were used for testing.
[0266]
[0267] Ablation Research
[0268] Table 2: Impact of different components on OA. The baseline includes only one convolutional layer and the Mamba architecture. The bolded portion represents the optimal result.
[0269]
[0270] The experimental results are shown in Table 2. As can be seen from the table, the model containing only the Baseline performed the worst across all datasets. When GOSM was introduced into the network, the model was able to group and model long-distance dependencies along the spectral dimension, thus capturing spectral correlations more fully, improving accuracy by 3.11%, 1.77%, 1.42%, and 0.5%, respectively. Furthermore, adding SPE-LGAG allowed the model to obtain a more stable global representation through local feature aggregation, focusing attention more on key regions and slightly improving performance.
[0271] By adding ARSM to the baseline model, the structured processing of spatial features enables it to capture global spatial context more efficiently in sequence modeling, improving accuracy by 4.65%, 3.41%, 2.32%, and 0.77%, respectively. Further introduction of SPA-LGAG also brings a slight performance improvement. Finally, when all modules are integrated, the model can collaboratively model in both spectral and spatial dimensions, achieving the best classification performance across all experimental settings through dynamic weighted fusion of information from both. Ablation experiments fully validate the effectiveness and complementarity of each module within the overall architecture.
[0272]
[0273]
[0274]
[0275]
[0276] To further verify the effectiveness of the proposed method, this invention selected three representative HSIC architectures for comprehensive comparison, including: CNN-based methods (HybridSN, PyResNet), Transformer-based methods (HIT, SSTN, SSFTT, A...). 2 S 2 KResNet and Mamba-based methods (3DSSMamba [TGRS 2025], MambaLG [TGRS 2025], MambaHSI [TGRS 2025]). To ensure the fairness of the results, all experiments were conducted under the same conditions, and the final result was the average of seven consecutive independent experiments.
[0277] 1) Quantitative Experimental Results and Analysis: Tables 3-6 show the classification results of each method on the four datasets: Indian Pines, Pavia University, Houston 2013, and WHU-Hi-Longkou. These include OA, AA, K, and the accuracy of each class. It can be seen that ARGO-Mamba significantly outperforms other methods in all three main metrics, exhibits the least fluctuation in results, and achieves optimal accuracy for more than half of the classes.
[0278] Overall, CNN-based methods, limited by their local receptive field, struggle to capture global contextual information, resulting in the worst performance. For example, in Indian Pines, the optimal CNN method achieved an OA value 18.26% lower than ARGO-Mamba. In contrast, Transformer and Mamba-based methods showed significant advantages, strongly demonstrating the criticality of global contextual modeling for HSIC. At Pavia University, ARGO-Mamba continued its previous success, achieving the best classification results across five categories. This indicates that for urban scenes with complex results, the proposed axial reassembly strategy can more effectively extract spatial information. At Houston 2013, compared to the best of the three methods, ARGO-Mamba improved OA values by 4.76%, 0.56%, and 1.55%, respectively. Error-free classification was achieved in categories 5 (land), 14 (tennis courts), and 15 (running tracks), again thanks to the axial reassembly and group offset strategies for handling spatial and spectral features. In WHU-Hi-Longkou, ARGO-Mamba improves the OA values by 2.69%, 0.75%, and 1.68% respectively compared to the best of the three methods, and achieves the best results in all five categories. These results fully validate the robustness and generalization ability of the proposed method in different scenarios.
[0279] 2) Visualization results and analysis: Figure 7 , Figure 8 The visualizations show the classification results of different methods on the Indian Pines and Pavia University datasets. It can be seen that ARGO-Mamba generates classification maps that are highly consistent with the original images on all datasets, with a clear overall structure and high fidelity in local details (such as the corn region in Indian Pines and the weed region in Pavia University).
[0280] In contrast, CNN-based methods performed significantly worse, with their classification maps exhibiting blurred edges and significant noise, reflecting limited local feature modeling capabilities and susceptibility to interference leading to misclassification. Magnified local results further demonstrate that ARGO-Mamba exhibits higher accuracy and consistency in depicting ground features, effectively distinguishing similar features while maintaining spatial consistency. This advantage stems from the model's early-stage focus on key features using an attention mechanism, thereby suppressing noise interference. Simultaneously, the axial reassembly strategy delves deeper into the spatial semantic relationships of ground features, enhancing the perception of complex relationships. The group offset strategy further strengthens local feature modeling capabilities, helping to distinguish spectrally similar vegetation and targets.
[0281] 3) T-SNE Results and Analysis: To intuitively analyze the feature representations learned by the model, this invention uses t-Distributed Stochastic Neighbor Embedding (t-SNE) to project high-dimensional features into a two-dimensional space. Figure 9 , Figure 10 The visualization results are shown on the Indian Pines and Pavia University datasets, and compared with PyResNet based on CNN and A... 2 S 2 A comparison is made between KResNet and Mamba-based MambaHSI. It is evident that ARGO-Mamba exhibits superior clustering performance across all datasets. Specifically, PyResNet shows the most severe feature distribution mixing. 2 S 2 While KResNet and MambaHSI can form certain clusters, some class overlap still exists. In contrast, ARGO-Mamba can tightly cluster similar samples and form clear boundaries between different classes, indicating that its learned features have higher intra-class consistency and inter-class separability. This result further validates the model's significant advantage in spatial-spectral joint feature discrimination ability.
[0282] Different sample ratios: The number of training samples has a significant impact on model performance. To evaluate the robustness of ARGO-Mamba, this invention conducted experiments on the Indian Pines and Pavia University datasets using different numbers of training samples. The experimental results are as follows: Figure 11 , Figure 12 As shown, the performance of each method generally increases with the increase of training samples. However, PyResNet exhibits significant fluctuations on the Pavia University dataset, indicating limitations in feature extraction and utilization. In contrast, ARGO-Mamba maintains stable and excellent performance under both sufficient and limited sample conditions, demonstrating stronger feature learning and generalization capabilities.
[0283] This invention may have other embodiments. Without departing from the spirit and essence of this invention, those skilled in the art can make various corresponding changes and modifications according to this invention, but these corresponding changes and modifications should all fall within the protection scope of the appended claims.
Claims
1. An axial reconstruction and group shift Mamba method for hyperspectral image classification, characterized in that: The specific process of the method is as follows: Step 1: Obtain hyperspectral images with category labels as the training set; Step 2: Construct the ARGO-Mamba model; The ARGO-Mamba model includes a feature header block, feature extraction block 1, feature extraction block 2, LGAG block, Pool block 2, ARSM block, GOSM block, and classification block. The LGAG block is a local-to-global attention-guided module; ARSM blocks are axially reconfigurable space Mamba modules; The GOSM block is a group offset spectrum Mamba module; Step 3: Train the ARGO-Mamba model based on the hyperspectral images from Step 1 to obtain the trained ARGO-Mamba model; Step 4: Input the hyperspectral image to be tested into the trained ARGO-Mamba model, and the trained ARGO-Mamba model will output the classification result.
2. The axial reconstruction and group shift Mamba method for hyperspectral image classification according to claim 1, characterized in that: In step one, hyperspectral images with category labels are obtained as the training set; the specific process is as follows: Step 11: Obtain hyperspectral images with category labels. ; ; in, Indicates the height of the image; Indicates the image width; Indicates the number of bands; Represents the set of real numbers; Steps 1 and 2: Divide the hyperspectral image into a series of three-dimensional patch cube images. ; , ; in, This indicates the spatial dimension of the patch cube; This indicates the height or width of each 3D patch cube; This represents the first 3D patch cube image; This represents the second 3D patch cube image; Indicates the first A three-dimensional patch cube image; Indicates the first A three-dimensional patch cube image; Step 13: Use the center pixel category label of each 3D patch cube image as the category of the corresponding 3D patch cube image; All 3D patch cube images were used as the training set; Each 3D patch cube image is used as input to the model.
3. The axial reconstruction and group shift Mamba method for hyperspectral image classification according to claim 2, characterized in that: In step two, the ARGO-Mamba model is constructed. The ARGO-Mamba model includes a feature header block, feature extraction block 1, feature extraction block 2, LGAG block, Pool block 2, ARSM block, GOSM block, and classification block. The LGAG block is a local-to-global attention-guided module; ARSM blocks are axially reconfigurable space Mamba modules; The GOSM block is a group offset spectrum Mamba module; The specific process is as follows: The feature header block consists of a convolutional layer, a batch normalization (BN) layer, and a GELU layer in sequence. BN layer is the batch normalization layer; The GELU layer is a non-linear activation function layer; Feature extraction block 1 consists of a convolutional layer, a BN layer, and a GELU layer in sequence; Feature extraction block 2 consists of a convolutional layer, an LN layer, and a GELU layer in sequence; The LN layer is a layer normalization layer; LGAG blocks include the spatial branched attention block Spa-LGAG and the spectral branched attention block Spe-LGAG; Pool block 2 is a global average pooling layer; ARSM blocks include convolutional kernels of size [size missing]. The depthwise convolutional layer has a kernel size of 1. The depthwise convolutional layer has a kernel size of 1. The depthwise convolutional layer has a kernel size of 1. The depthwise convolutional layer has a kernel size of 1. The depthwise convolutional layer has a kernel size of 1. The deep convolutional layer, convolutional block 1, convolutional block 2, convolutional block 3, and Mamba block; Mamba blocks are the standard Mamba framework based on SSM; Convolution block 1 consists of convolution kernels of size 1 and 2. 2D convolutional layers, BN layers, and GELU layers; Convolution block 2 consists of convolution kernels of size 1. 2D convolutional layers, BN layers, and GELU layers; Convolution block 3 consists of convolution kernels of size 3. 2D convolutional layers, BN layers, and GELU layers; GOSM blocks include convolutional block 4, convolutional block 5, convolutional block 6, and Mamba block; Convolution block 4 consists of convolution kernels of size 4. One-dimensional convolutional layer, GN layer; GN layer is group normalized; Convolution block 5 consists of convolution kernels of size 5. 1D convolutional layers, GN layers; Convolution block 6 sequentially includes convolution kernels of size 1. 1D convolutional layers, GN layers; The classification blocks include 1 pool block, 1 linear block, and 2 linear blocks; Pool block 1 is a global average pooling layer; Linear block 1 is a linear layer (Linear). The two Linear blocks are linear layers.
4. The axial reconstruction and group shift Mamba method for hyperspectral image classification according to claim 3, characterized in that: In step three, the ARGO-Mamba model is trained based on the hyperspectral image from step one to obtain a trained ARGO-Mamba model; the specific process is as follows: Step 3: Input each 3D patch cube image from the training set into the feature head block, and the feature head block outputs a feature map. ; Step 3.2: Transfer the feature map Input feature extraction block 1, output feature map from feature extraction block 1. ; Step 33: Transfer the feature map Input feature extraction block 2, output feature map from feature extraction block 2. ; Steps three and four: Transfer the feature map Input Spatial Branch Attention Block (Spa-LGAG), output feature map ; Step 35: Transfer the feature map Input spectral branch attention block Spe-LGAG, output feature map of spectral branch attention block Spe-LGAG ; Step 36: Output feature map of Spatial Branch Attention Block (Spa-LGAG) Input ARSM block, output feature map ; Step 37: Output feature map of the spectral branch attention block Spe-LGAG Input two pool blocks, output two pool blocks feature maps ; Step 38: Output feature maps from two pool blocks. Input a GOSM block, output a feature map ; Step 39: Output feature map of ARSM block Input the classification block (Pool block 1), and output the feature map from Pool block 1. ; Output feature map of Pool block 1 The input is Linear block 1 of the classification block, and the output is the feature map of Linear block 1. ; Output feature map of GOSM block The input is two linear blocks (Linear Block 2), and the output is feature maps from these two linear blocks. ; Step 30: Transfer the feature map and feature map The final prediction result is generated through dynamic fusion, and is represented as follows: in, This represents the final predicted probability distribution. Indicates parameters, Indicates parameters; Step 31: Train the ARGO-Mamba model based on each 3D patch cube image in the training set until all 3D patch cube images are input into the ARGO-Mamba model to obtain the trained ARGO-Mamba model.
5. The axial reconstruction and group shift Mamba method for hyperspectral image classification according to claim 4, characterized in that: In steps three and four, the feature map Input Spatial Branch Attention Block (Spa-LGAG), output feature map The specific process is as follows: Step 3-41: Extract Feature Map Local features; represented as: (1) in Represents the input feature map ; This represents a boundary fill operation with a size of 1 in the spatial dimension; This indicates a sliding step size of 1. Pooling operations; express Features resulting from the aggregation of local information; Step 342: Extract global features based on local features; represented as: (2) in This represents a global average pooling operation along the channel dimension; This represents the Sigmoid activation function used to generate the weights; This indicates element-wise multiplication; Spatial branch attention block Spa-LGAG output feature map .
6. The axial reconstruction and group shift Mamba method for hyperspectral image classification according to claim 5, characterized in that: In step three of the above, the feature map Input spectral branch attention block Spe-LGAG, output feature map of spectral branch attention block Spe-LGAG ; The entire process is described as follows: (3) in Represents the input feature map ; This represents a boundary padding operation with a size of 1 in the spectral dimension; This represents a moving average operation in the spectral dimension with a sliding step size of 1 and a window of 3 adjacent channels. This represents a global average pooling operation in the spectral space dimension; express Channel description operator; Represents a linear feature mapping layer; Spe-LGAG output feature map representing the spectral branch attention block. .
7. The axial reconstruction and group shift Mamba method for hyperspectral image classification according to claim 6, characterized in that: In step three-six, the Spatial Branch Attention Block (Spa-LGAG) output feature map is generated. Input ARSM block, output feature map The specific process is as follows: Step 361: Output feature map of Spa-LGAG based on the horizontal axis of spatial branch attention block. After processing, a one-dimensional sequence is obtained. The process is as follows: (4) in Feature map representing the output of Spa-LGAG attention block. ; Indicates the kernel size as Convolution; Indicates the kernel size as Convolution; Indicates the kernel size as Convolution; Indicates along the horizontal direction; , , Represents the intermediate feature map; Represents the fused feature map; Fuse feature maps along the horizontal axis Flattening the sequence yields a one-dimensional sequence. ; Step 362: Output feature map of Spa-LGAG based on vertical axis spatial branch attention block. After processing, a one-dimensional sequence is obtained. The process is as follows: (5) in Feature map representing the output of Spa-LGAG attention block. ; Indicates the kernel size as Convolution; Indicates the kernel size as Convolution; Indicates the kernel size as Convolution; Indicates along the vertical direction; , , Represents the intermediate feature map; Represents the fused feature map; Fuse feature maps along the vertical axis Flattening the sequence yields a one-dimensional sequence. ; Step 363: Output feature map of Spa-LGAG for spatial branch attention block based on horizontal feature shift. After processing, a one-dimensional sequence is obtained. ; Step 364: Output feature map of Spatial Branch Attention Block Spa-LGAG based on vertical feature shift. After processing, a one-dimensional sequence is obtained. ; Step 365: The one-dimensional sequence obtained in step 361... BSM processing is performed to obtain a one-dimensional sequence. ; The specific process is as follows: The one-dimensional sequence obtained in step 361 The sequence is reshaped and then input into convolutional block 1. Convolutional block 1 outputs a feature map. ; feature map Input Mamba block, output feature map ; For feature maps Perform feature reversal to obtain the feature map after feature reversal. ; feature map Input Mamba block, output feature map ; feature map Input convolutional block 2, output feature map ; feature map Input convolutional block 3, output feature map of convolutional block 3 ; For feature maps Perform feature reversal to obtain the feature map after feature reversal. ; feature map and feature map By adding elements one by one, a one-dimensional sequence is obtained. ; The feature map Input Mamba block, output feature map The process is as follows: in , , Both represent two-dimensional pointwise convolution operations; This indicates a shape reshaping operation; This indicates that the channel dimension and sequence dimension are interchanged; Used to generate the reversed input sequence; This refers to the Mamba framework based on SSM; and This is an intermediate feature map; Step 366: The one-dimensional sequence obtained in step 362... BSM processing is performed to obtain a one-dimensional sequence. ; Step 367: The one-dimensional sequence obtained in step 363... BSM processing is performed to obtain a one-dimensional sequence. ; Step 368: The one-dimensional sequence obtained in step 364... BSM processing is performed to obtain a one-dimensional sequence. ; Step 369: Transform the one-dimensional sequence One-dimensional sequence One-dimensional sequence One-dimensional sequence By adding elements one by one, we obtain the sequence. ; Step 360: For the sequence Perform reshape deformation to obtain the feature map. .
8. The axial reconstruction and group shift Mamba method for hyperspectral image classification according to claim 7, characterized in that: In step 363, the Spa-LGAG output feature map of the spatial branch attention block is based on horizontal feature shift. After processing, a one-dimensional sequence is obtained. The process is as follows: Step 3631: Transfer the feature map Each channel in the process undergoes feature shifting to form a feature map. The specific process is as follows: Feature map In a single channel The characteristics of the row are ,in ; Feature map The feature shifting process in a single channel is represented as follows: (6) in Used to the first Characteristics of lines Divided into and Two parts, , , Indicates the position of the division. , Indicates divisibility; Indicates the remaining positions. ; Indicates the first Characteristics of lines The segmented head, Indicates the first Characteristics of lines The split tail section; Indicates features Assign to new row characteristics From the position To the final characteristics; Indicates features Assign to new row characteristics From position 0 to Features; This means assigning the value 0 to the new row feature. From position 0 to Features; This means assigning the value 0 to the new row feature. From the position To the final characteristics; Feature map Each channel in the process undergoes feature shifting to form a feature map. ; Step 3632: Feature Map Based on Horizontal Axis After processing, a one-dimensional sequence is obtained. ; For the obtained one-dimensional sequence The former One and after The features are cropped to obtain the cropped one-dimensional sequence. .
9. The axial reconstruction and group shift Mamba method for hyperspectral image classification according to claim 8, characterized in that: In step 364, the Spa-LGAG output feature map of the spatial branch attention block is based on vertical feature shift. After processing, a one-dimensional sequence is obtained. The process is as follows: Step 3641: Analyze the feature map Transpose the horizontal and vertical dimensions of the spatial dimension; Feature map representing the output of Spa-LGAG attention block. ; Each channel in the transposed feature map is then feature-shifted to form the new feature map. ; Step 3642: Analyze the feature map Perform dimensional reversal, based on the vertical axis of the dimensionally reversed feature map. After processing, a one-dimensional sequence is obtained. ; For the obtained one-dimensional sequence The former One and after The features are cropped to obtain the cropped one-dimensional sequence. .
10. The axial reconstruction and group shift Mamba method for hyperspectral image classification according to claim 9, characterized in that: In step three, the Pool block 2 outputs feature maps. Input a GOSM block, output a feature map ; The specific process is as follows: Step 381: Process the input spectral sequence Perform group offset; the process is as follows: (7) in This indicates that the spectral sequence is divided into multiple groups. Representing the length of each group, the input spectral sequence Feature maps output from Pool block 2 ; This indicates the first spectral subgroup. This indicates the second spectral subgroup. Indicates the first Spectral subgroups, Indicates the first Spectral subgroups; This represents the first spectral grouping sequence; This indicates the second spectral grouping sequence; This indicates the third spectral grouping sequence; This indicates the 4th spectral grouping sequence; This represents the first spectral subgroup in the second spectral grouping sequence. This represents the second spectral subgroup in the second spectral grouping sequence. Indicates the second spectral grouping sequence. Spectral subgroups; This represents the first spectral subgroup in the third spectral grouping sequence. This represents the second spectral subgroup in the third spectral grouping sequence. Indicates the third spectral group sequence. Spectral subgroups; This represents the first spectral subgroup in the third spectral grouping sequence. This represents the second spectral subgroup in the third spectral grouping sequence. Indicates the third spectral group sequence. Spectral subgroups; equal ; Indicates divisibility; Used to offset grouped sequences; Indicates the offset, take ; Indicates the offset, take ; Indicates the offset, take ; Step 382: Group the first spectral sequence separately. The second spectral grouping sequence The third spectral grouping sequence The fourth spectral group sequence Two-way Mamba spectroscopy was performed to obtain grouped sequences. and The specific process is as follows: Step 3821: Group the first spectral sequence Two-way Mamba spectroscopy was performed to obtain grouped sequences. The process can be formalized as follows: (8) in This represents convolution block 4; This represents convolution block 5; This represents convolution block 6; express The sequence obtained after convolution block 4; express pass The resulting sequence; This indicates that the grouped sequence is reversed; Indicates to Grouped sequences after two-dimensional Mamba spectroscopy; Convolution block 4 consists of convolution kernels of size 4. One-dimensional convolutional layer, GN layer; GN layer is group normalized; Convolution block 5 consists of convolution kernels of size 5. 1D convolutional layers, GN layers; Convolution block 6 sequentially includes convolution kernels of size 1. 1D convolutional layers, GN layers; Step 3.822: Group the second spectral sequence Two-way Mamba spectroscopy was performed to obtain grouped sequences. The process can be formalized as follows: (9) in express The sequence obtained after convolution block 4; express pass The resulting sequence; Indicates to Grouped sequences after two-dimensional Mamba spectroscopy; Step 3.823: Group the third spectral sequence. Two-way Mamba spectroscopy was performed to obtain grouped sequences. ; The process can be formalized as follows: (10) in express The sequence obtained after convolution block 4; express pass The resulting sequence; Indicates to Grouped sequences after two-dimensional Mamba spectroscopy; Step 3824: Group the fourth spectral sequence Two-way Mamba spectroscopy was performed to obtain grouped sequences. The process can be formalized as follows: (11) in express The sequence obtained after convolution block 4; express pass The resulting sequence; Indicates to Grouped sequences after two-dimensional Mamba spectroscopy; Step 3825: Grouping the sequence and Perform reverse group offsets separately to obtain the sequence. and The specific process is as follows: Step 38251: Grouping the sequence Perform reverse group offset to obtain the sequence This process is represented as follows: (12) in, This indicates that the grouped sequence will be reversed by a group offset; This represents the offset, and is set to 0. Indicates the intermediate grouping sequence; Indicates grouping sequences All spectral subgroups in the spectrum are spliced together; Step 38252: Grouping the sequence Perform reverse group offset to obtain the sequence This process is represented as follows: (13) in, This indicates that the grouped sequence will be reversed by a group offset; Indicates the intermediate grouping sequence; Indicates grouping sequences All spectral subgroups in the spectrum are spliced together; Step 38253: Grouping the sequence Perform reverse group offset to obtain the sequence This process is represented as follows: (14) in, This indicates that the grouped sequence will be reversed by a group offset; Indicates the intermediate grouping sequence; Indicates grouping sequences All spectral subgroups in the spectrum are spliced together; Step 38254: Grouping the sequence Perform reverse group offset to obtain the sequence This process is represented as follows: (15) in, This indicates that the grouped sequence will be reversed by a group offset; Indicates the intermediate grouping sequence; Indicates grouping sequences All spectral subgroups in the spectrum are spliced together; Step 3826, for the sequence and The fusion process is performed to obtain the fused sequence. ; ; Fusion sequence As the output feature map of GOSM block .