Image classification method based on space-frequency joint adaptive direction mamba
The image classification method using the joint spatial-frequency adaptive directional Mamba addresses the problems of poor image feature extraction and low classification accuracy in existing technologies. By combining NSCT and deep learning into a three-branch network, it achieves adaptive fusion of low-frequency and high-frequency features and long-range dependency modeling, thereby improving the accuracy of image classification.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- XIAN UNIV OF TECH
- Filing Date
- 2026-03-30
- Publication Date
- 2026-06-19
AI Technical Summary
Existing image classification methods lack targeted processing when dealing with low-frequency and high-frequency information in images, making it difficult to fully utilize global structure and local detail information. They also lack sufficient modeling of multi-directional high-frequency feature differences, have coarse feature fusion methods, lack adaptive adjustment capabilities, and traditional convolutional networks have limited modeling of long-range dependencies, resulting in low classification accuracy.
An image classification method based on joint spatial-frequency adaptive directional Mamba is adopted. By combining NSCT feature extraction and deep learning, a three-branch network is constructed to extract low-frequency and multi-directional high-frequency features, which are then fused through an adaptive weighting mechanism. The Mamba module is used to enhance directional detail information, and a spatial model is constructed for long-distance dependency modeling. The model is optimized using the cross-entropy loss function.
It significantly improves the accuracy of image classification, enhances the ability to mine global structure and local detail features of images, reduces redundant information, and enhances the model's discrimination ability and classification accuracy.
Smart Images

Figure CN122244540A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of image processing and machine learning technology, and relates to an image classification method based on spatial-frequency joint adaptive orientation Mamba. Background Technology
[0002] Image classification is a fundamental research area in computer vision and pattern recognition, with wide applications in intelligent detection, scene understanding, target recognition, remote sensing interpretation, and medical analysis. With the increasing scale of image data and the growing complexity of application scenarios, extracting highly discriminative features from images to improve classification accuracy and model robustness has become a crucial research direction in image classification. Existing image classification methods have generally evolved from manually designed features to automatic feature extraction using deep learning. Early methods relied primarily on manually designed features such as texture, edges, shape, and color to describe image content; however, these methods have limited adaptability to complex scenes and struggle to fully represent the rich hierarchical information within images. In recent years, deep learning methods have made significant progress in image classification tasks due to their strong nonlinear modeling capabilities. Convolutional neural networks, in particular, can progressively learn local patterns and semantic features of images through multiple convolutional operations, demonstrating good performance in various image classification tasks. However, existing deep learning methods still have certain shortcomings in image classification. First, images typically contain both low-frequency and high-frequency information. Low-frequency information mainly reflects the overall outline, main structure, and stable region distribution of the target, while high-frequency information mainly reflects local features such as edges, textures, details, and directional changes. Most existing methods directly model the original image uniformly, lacking targeted processing for different frequency components, making it difficult to fully utilize the global structural information and local detail information in the image. Second, to enhance image feature representation capabilities, some studies employ multi-scale, multi-directional decomposition methods to preprocess images, decomposing them into low-frequency components and high-frequency components in multiple directions to characterize the main and detail information of the image respectively. While these methods can improve the model's ability to express multi-level image features to some extent, existing techniques typically use simple splicing, direct superposition, or uniform convolution extraction for different components, lacking a mechanism for differentiated modeling based on different frequency band feature attributes, thus failing to fully leverage the complementary advantages between various features. Furthermore, for high-frequency information in images, high-frequency components in different directions often reflect edge changes, texture directions, and local structural features in different directions, exhibiting significant directional sensitivity. Existing technologies, when processing high-frequency features in multiple directions, typically treat the components of each direction as ordinary channels and input them into the model uniformly. This fails to effectively characterize the differences between features in different directions and lacks adaptive filtering and enhancement of key directional information, which can easily lead to insufficient expression of directional features and thus affect the classification results.
[0003] Furthermore, the original, low-frequency, and high-frequency features in an image each have their own emphasis in representing content. Original image information retains relatively complete basic visual content, low-frequency information is beneficial for describing the overall layout and main structure, while high-frequency information is better at highlighting edge details and texture variations. Existing methods, when fusing these multiple feature types, often employ fixed weights or simple connections, lacking adaptive fusion mechanisms for different spatial locations and feature importance. This makes it difficult to dynamically adjust the contribution ratio of various features based on the image region content, easily introducing redundant information and reducing the model's classification accuracy and generalization ability. On the other hand, traditional convolutional networks mainly rely on local receptive fields for feature extraction, which has certain advantages in local neighborhood modeling, but its ability to model long-distance pixel relationships and cross-regional structural dependencies is relatively limited. When images contain complex spatial layouts, continuous textures, or obvious directional extension structures, relying solely on local convolutional operations often fails to balance the coordinated expression of global information and local details, thus limiting the classification model's ability to recognize complex image content.
[0004] In summary, existing methods suffer from the following shortcomings: 1) They do not fully utilize the combined information of the original image, low-frequency information, and high-frequency information, making it difficult to balance overall structure and local details; 2) They lack sufficient modeling of the differences in high-frequency features in multiple directions, and do not fully utilize directional structural information; 3) The multi-source feature fusion methods are relatively coarse, lacking adaptive adjustment capabilities based on spatial location and feature importance; 4) Traditional feature extraction methods have limited modeling capabilities for long-range dependencies and global structural information, making it difficult to meet the needs of high-precision image classification in complex scenes. Therefore, there is an urgent need to propose an image classification method that can synergistically utilize original image information, low-frequency structural information, and high-frequency detail information in multiple directions, while also considering local feature enhancement and global relationship modeling. Summary of the Invention
[0005] The purpose of this invention is to provide an image classification method based on space-frequency joint adaptive directional Mamba, which solves the problems of poor image feature extraction, insufficient feature fusion, low classification accuracy, and ineffective filtering of redundant information in multi-branch fusion methods in the prior art.
[0006] The technical solution adopted in this invention is implemented according to the following steps:
[0007] Step 1: Data preprocessing; Step 2: Construct a frequency domain feature extraction network and use the network to extract low-frequency and high-frequency features respectively; Step 3: Model the spatial domain of the polarimetric SAR source data and perform selective feature learning through the 2DMamba module. Steps 2 and 3 together constitute a three-branch network for joint spatial-frequency learning. Step 4: The spatial domain, low frequency, and multi-directional high frequency are fused. An adaptive weighting mechanism is adopted, and the original image, low frequency features, and high frequency features are fused through a multi-branch convolutional neural network, dynamically adjusting the contribution ratio of each branch. Step 5: Train the convolutional neural network model, use a classifier to predict the image category, and use the cross-entropy loss function to optimize the model and complete the final classification.
[0008] The beneficial effects of this invention are that by combining the advantages of NSCT feature extraction, the Mamba module, and deep learning, it can effectively mine the global structure and local detail features of an image; the Mamba module optimizes the directionality of high-frequency features, which fully enhances the detail information in different directions and improves the model's ability to distinguish details and edge information; through a three-branch network and an adaptive weighting mechanism, it can fully integrate the original image, low-frequency and high-frequency features, and effectively reduce redundant information, thereby improving classification accuracy; experimental results show that compared with traditional methods, this invention can significantly improve classification accuracy and has good application prospects. Attached Figure Description
[0009] Figure 1 This is a flowchart of the method of the present invention; Figure 2 This is a false color image of the fully polarimetric SAR of the Xian region in Embodiment 1 of the present invention; Figure 3 This is a result obtained using the existing CVCNN method in Example 1; Figure 4 This is a result obtained using the existing POLMPCNN method in Example 1; Figure 5 This is a diagram showing the effect obtained by using the method of the present invention in Example 1; Figure 6 This is a fully polarimetric SAR pseudocolor image of the Flevoland region from Embodiment 2 of the present invention; Figure 7 yes Figure 6 Corresponding class label reference diagram; Figure 8 This is a result obtained using the existing CVCNN method in Example 2; Figure 9 This is a result obtained using the existing POLMPCNN method in Example 2; Figure 10 This is a diagram showing the effect obtained by using the method of the present invention in Example 2; Figure 11 This is a full polarimetric SAR pseudocolor image of the San Francisco region from Embodiment 3 of the present invention; Figure 12 This is a result obtained using the existing CVCNN method in Example 3; Figure 13 This is a result image obtained using the existing POLMPCNN method in Example 3; Figure 14 This is a diagram showing the effect obtained by using the method of the present invention in Example 3. Detailed Implementation
[0010] The present invention will now be described in detail with reference to the accompanying drawings and specific embodiments.
[0011] Reference Figure 1 This invention is an image classification method based on joint spatial-frequency adaptive orientation Mamba, and it is also an image classification method based on the combination of non-subsampled contourlet transform (NSCT) feature extraction and deep learning. It is implemented according to the following steps: Step 1: Data preprocessing. NSCT is used to extract low-frequency and high-frequency features from the image, representing the global structural information and detail information of the image, respectively. The data is then normalized and converted into a shape supported by the deep learning model. The specific process is as follows: Constructing Multi-directional Features: Polarimetric SAR images are acquired to obtain an H×W×9 dimensional polarimetric SAR feature matrix, where H is the height and W is the width. This matrix is used as input for NSCT decomposition. In the contourlet transform, directional decomposition is based on a directional filter bank to divide the high-frequency components into multiple directions. Convolution operations are performed on the image using bandpass filters in different directions to obtain the detailed responses of the image in different directions. Since NSCT uses a non-downsampling structure, it does not change the spatial resolution of the image during decomposition. Therefore, the high-frequency and low-frequency components in each direction maintain the same spatial dimensions as the original image, resulting in a low-frequency feature shape of H×W×9, expressed as:
[0012] in, Indicates a low-pass filter. This indicates the result obtained after passing through a low-pass filter; To obtain multi-directional high-frequency features from the collected raw data, with a shape of H×W×9×k, the channel dimension and direction dimension are combined for ease of representation, as shown in the following expression:
[0013]
[0014] Among them, parameters K Represents the number of decomposition directions. Indicates the first Directional filters in each direction, Representing the k The high-frequency information of the direction is superimposed on the channel dimension to facilitate subsequent input. Then the overall characteristics X The construct expression is as follows:
[0015] The network input is formed using this method, specifically with a shape of H×W×90. Due to significant differences in numerical distribution between different channels and the presence of outliers, channel-by-channel normalization is performed to improve model training stability. For the [missing information - likely a specific channel or model name]... Channel characteristics First, calculate its quantiles, expressed as:
[0016] in, v min and These represent the image feature channels. The minimum and maximum values, Indicates the nth percentile; Then perform the truncation operation, the expression is:
[0017] in, This indicates the feature channel after truncation. ; Finally, normalization is performed, and the expression is:
[0018] in, This represents the feature channels after normalization. After the above processing, all features are normalized to the (0,1) interval, thereby effectively suppressing the impact of outliers on model training and improving feature robustness. To incorporate spatial context information, for each pixel Extract a local neighborhood window centered on the pixel, with a window size of [size missing]. The expression is:
[0019] in, x i,jThe features of a sample are represented by pixels in the image. Feature samples extracted from a local neighborhood window centered on the center. C The number of channels; the sample labels are determined by the label corresponding to the center pixel of the window, constructing a sample set. D The expression is:
[0020] in, N For the sample size, n Indicates the index of the sample, from 1 to , used to identify the sample number; Indicates the first Features of each sample; Indicates the first The label of a sample is the category label corresponding to the center pixel of that sample; During training, the samples are divided into batches of data to be input into the model, as expressed by:
[0021] in, This represents the feature matrix of a batch of samples. This represents a batch of sample labels, each of which is a label of length 1. The vector.
[0022] After step 1, sample data that can be directly input into the network model is obtained.
[0023] Step 2: Construct a frequency domain feature extraction network: The low-frequency feature branch is used to extract the overall structural information of the image. Through pyramid convolutional network feature learning, it captures the stable regions of the image; the high-frequency feature branch uses the Mamba module to scan and model, enhances high-frequency features in multiple directions, strengthens edge and detail information, and strengthens detail information in different directions through a directional enhancement mechanism, making the high-frequency features more discriminative.
[0024] Specifically, it includes: 2.1) Constructing a low-frequency feature branch: Inputting low-frequency features into the convolutional layer and downsampling them:
[0025]
[0026]
[0027] in, This represents a convolution operation with stride; This represents depthwise separable convolution; Indicates an upsampling operation; Indicates the low-frequency characteristics at different stages; This is the final low-frequency characteristic; 2.2) Constructing high-frequency feature branches: First, the high-frequency features are reconstructed into a multi-directional form, expressed as:
[0028] in, For learnable parameters, convolution mapping is performed on features in each direction, expressed as:
[0029] in, The activation function is used to introduce nonlinearity, enabling the neural network to learn more complex feature maps; in the experiment, it was used... relu The activation function truncates negative numbers to 0 and leaves positive numbers unchanged; its value range is [value range missing]. ; Then, modeling is performed using the orientation-aware Mamba module, with the expression:
[0030] Obtain multi-directional feature set ,in In accordance with the first k The direction converts high-frequency sequences into 1-dimensional sequences for selective scanning.
[0031] After step 2, high-frequency features in multiple directions and low-frequency features containing structural information are obtained.
[0032] Step 3: Perform spatial modeling on the polarimetric SAR source data to construct the original feature branches: The original features are input into the convolutional layer for initial mapping, as shown in the expression:
[0033] in, This represents the intermediate features after local spatial feature extraction of the original features, used to characterize the basic structural information of the image; further, a two-dimensional state space model (i.e., the 2DMamba module) is introduced for global modeling. By sequentially scanning the spatial features, long-distance dependency modeling is achieved, expressed as:
[0034] This represents the feature representation obtained after modeling with a two-dimensional state-space model. This feature not only contains local spatial information but also incorporates long-distance dependencies across regions. This branch is used to extract global structural information from the image.
[0035] Step 4, perform multi-directional, multi-branch feature fusion, the specific process is as follows: 4.1) Multi-directional fusion: To characterize the importance of different directions at various spatial locations, a directional response function is introduced. The response is obtained by inputting high-frequency features from multiple directions, and its expression is:
[0036] in, Represents a local feature mapping function. The weights are normalized using the following expression:
[0037] in, Indicates position First The weights for each direction are calculated, and only the most discriminative directional information is retained. This is then fused through a weighted summation, expressed as:
[0038] in, Indicates in At the location, the final high-frequency feature representation is obtained after multi-directional high-frequency feature weighted fusion; 4.2) Multi-branch fusion: Collaborative fusion of the original branch, low-frequency branch, and high-frequency branch. First, the three-branch features are concatenated, as shown in the expression:
[0039] This represents the result after concatenating the three-branch features. The weights of each branch are generated through a gating function, and the expression is as follows:
[0040] in, Indicates gating response, Further normalization is performed, and the expression is:
[0041] in, It is a small constant used to prevent division by zero; in this step, it is taken as... , obtained This indicates the branch contribution, and the final fusion result is:
[0042] in, , , Let represent the weighting coefficients of each branch, and satisfy the condition that their sum is 1; To further enhance feature representation capabilities, residual enhancement is introduced, expressed as follows:
[0043] in, Represents the model output, This is the result obtained after gating adaptive selection. It is a learnable parameter that is automatically updated during training and is initialized randomly.
[0044] Thus, we have obtained Finally, we obtain deep features that can be classified by the classifier.
[0045] Step 5: Train the convolutional neural network model, use a classifier to predict the image category, and optimize the model using the cross-entropy loss function to complete the final classification. The specific process is as follows: 5.1) For the deep features obtained in step 4, in order to reduce spatial redundancy and extract global semantic information, spatial compression is performed on the fused features using global average pooling, expressed as:
[0046] Where H represents the image height, Representing the image width, we obtain the global feature vector. z ; 5.2) The global response obtained in step 1 is input into the classification layer and mapped using the following expression:
[0047] in, This represents the predicted scores for each category output by the model, used to characterize the response strength of a sample belonging to each category; This is the classification weight matrix; For bias terms, C Given the number of categories, the category probabilities are obtained through a normalization function. The expression is:
[0048] 5.3) Optimization is performed using the cross-entropy loss function, expressed as:
[0049] in, This represents the cross-entropy loss function, used to measure the error between the model's predicted probabilities and the true labels. For the one-hot format of the real label, To predict probabilities.
[0050] This completes the overall process, significantly improving classification accuracy through multi-directional fusion via three-frequency joint modeling.
[0051] Example 1 I. Classification criteria are as follows: 1) In the classification experiment, 512×512×9-dimensional polarimetric SAR image data of a certain area of Xian were selected.
[0052] 2) In the classification experiment, the polarimetric SAR image data was sampled in layers, with 5% of the data selected for the training set and 1% selected for the test set; 3) In the classification experiment, the patch size was set to 64, the batch size to 32, the hidden state dimension in both the Mamba state space model and the two-dimensional state space model was set to 32, the learning rate was set to 0.0001, the model was trained for 300 rounds and the optimal model was saved for inference.
[0053] II. According to the aforementioned steps and procedures of the present invention, the following steps shall be followed in specific implementation: Step 1: Data preprocessing. Low-frequency and high-frequency data are obtained through NSCT decomposition and then normalized. Step 2, construct the space-frequency joint network model, specifically: Low-frequency pyramid convolution is used to extract the overall structure of an image. High-frequency Mamba modeling is performed using raw data augmentation in a multi-directional manner. Step 3: Perform a 2DMamba scan on the original image to extract global background information; Step 4: Perform feature fusion in multiple directions and branches.
[0054] Step 5: Input the data into the classifier for classification.
[0055] III. The classification content and results analysis are as follows: Table 1 compares the results of the method of the present invention with other methods in the prior art.
[0056] Figure 2 A pseudo-color map of the fully polarimetric SAR in the Xian region; Figure 3 The classification results of the existing CVCNN method on the fully polarimetric SAR image of the Xian area are shown, with a classification accuracy of 89.59%. Figure 4 The classification result obtained by the existing POLMPCNN method has a classification accuracy of 94.01%. Figure 5 This is the classification result of the fully polarimetric SAR image of the Xian area using the method of the present invention, with a classification accuracy of 96.52%.
[0057] It can be seen that Figure 4 It will produce a lot of noise and cannot effectively suppress noise, while according to Figure 5 The method of this invention, as shown, yields more consistent classification results and achieves better performance. In the classification results, different land cover categories such as ocean, forest, and buildings are accurately identified, with clear boundaries between categories. The classification accuracy reaches 96.52%; the average accuracy is 96.02%; and the Kappa coefficient is 94.27%.
[0058] Example 2 I. Classification Criteria 1) In the classification experiment, 1400×1200×9-dimensional polarimetric SAR image data of the Flevoland region were selected; 2) In the classification experiment, the polarimetric SAR image data was sampled in layers, with 5% of the data selected for the training set and 1% selected for the test set; 3) In the classification experiment, the patch size was set to 64, the batch size to 32, the hidden state dimension in both the Mamba state space model and the two-dimensional state space model was set to 32, the learning rate was set to 0.0001, the model was trained for 300 rounds and the optimal model was saved for inference.
[0059] II. According to the steps and procedures described above in this invention, the specific steps and procedures are described in Example 1.
[0060] III. The classification content and results are as follows: Table 2 compares the results of the method of the present invention with other methods in the prior art.
[0061] Figure 6 This is a pseudocolor SAR image of the Flevoland region with full polarization, represented using the Pauli base as the RGB three-channel color representation. Figure 7 for Figure 6 The corresponding class label reference diagram shows that the black area has no reference class label. Therefore, this invention does not consider the classification results of the black area. In the reference diagram, the Flevoland region is divided into 4 categories. Figure 8 The image shows the classification results obtained by the existing CVCNN method, with a classification accuracy of 96.57%. Figure 9 The image shows the classification results obtained by the existing POLMPCNN method, with a classification accuracy of 98.49%. Figure 10 This is a classification result image obtained by processing the fully polarimetric SAR image of the Flevoland region using the method of this invention. The classification accuracy is 99.97%. It can be seen that... Figure 8 This method will result in many noise points and cannot effectively suppress noise, while the method of this invention can obtain more consistent classification results.
[0062] Example 3 I. Classification Criteria 1) In the classification experiment, 900×1024×9-dimensional polarimetric SAR image data of the San Francisco area were selected.
[0063] 2) In the classification experiment, the polarimetric SAR image data was sampled in layers, with 5% of the data selected for the training set and 1% selected for the test set; 3) In the classification experiment, the patch size was set to 64, the batch size to 16, the hidden state dimension in both the Mamba state space model and the two-dimensional state space model was set to 32, the learning rate was set to 0.0001, the model was trained for 300 rounds and the optimal model was saved for inference.
[0064] II. The steps and processes described above in this invention are as follows, with specific steps and processes referred to in Example 1.
[0065] III. The classification content and results are as follows: Figure 12 The image shows the classification results obtained by the existing CVCNN method, with a classification accuracy of 93.81%. Figure 13 The image shows the classification results obtained by the existing POLMPCNN method, with a classification accuracy of 98.93%. Figure 14 This is a classification result image obtained after processing the Flevoland region's fully polarimetric SAR image using the method of this invention. The classification accuracy is 99.55%. It can be seen that... Figure 12 and Figure 13 This method will result in many noise points and cannot effectively suppress noise, while the method of this invention can obtain more consistent classification results.
[0066] Example 4 I. Classification Criteria 1) In the classification experiment, 512×512×9-dimensional polarimetric SAR image data of the Xian area were selected.
[0067] 2) In the classification experiment, the polarimetric SAR image data was sampled in layers, with 5% of the data selected for the training set and 1% selected for the test set; 3) In the classification experiment, the patch_size was set to 32, the batch_size was set to 32, the hidden state dimension in both the Mamba state space model and the two-dimensional state space model was set to 32, the learning rate was set to 0.0001, the model was trained for 300 rounds and the optimal model was saved for inference.
[0068] II. The steps and processes described above in this invention are as follows, with specific steps and processes referred to in Example 1.
[0069] III. The classification content and results are as follows: This method achieves accurate classification of different land features such as water bodies, grasslands, and buildings. The classification accuracy is 96.57%, which is superior to existing classification methods.
[0070] Example 5 I. Classification Criteria 1) In the classification experiment, 512×512×9-dimensional polarimetric SAR image data of the Xian area were selected.
[0071] 2) In the classification experiment, the polarimetric SAR image data was sampled in layers, with 5% of the data selected for the training set and 1% selected for the test set; 3) In the classification experiment, the patch size was set to 32, the batch size was set to 32, the hidden state dimension in both the Mamba state space model and the two-dimensional state space model was set to 16, the learning rate was set to 0.0001, the model was trained for 300 rounds and the optimal model was saved for inference.
[0072] II. The steps and processes described above in this invention are as follows, with specific steps and processes referred to in Example 1.
[0073] III. The classification content and results are as follows: It achieves accurate classification of different land features such as water bodies, grasslands, and buildings, and the classification results are superior to existing classification methods.
[0074] Example 6 I. Classification Criteria 1) In the classification experiment, 512×512×9-dimensional polarimetric SAR image data of the Xian area were selected.
[0075] 2) In the classification experiment, the polarimetric SAR image data was sampled in layers, with 5% of the data selected for the training set and 1% selected for the test set; 3) In the classification experiment, the patch size was set to 64, the batch size to 32, the hidden state dimension in both the Mamba state space model and the two-dimensional state space model was set to 16, the learning rate was set to 0.0001, the model was trained for 100 rounds and the optimal model was saved for inference.
[0076] II. The steps and processes described above in this invention are as follows, with specific steps and processes referred to in Example 1.
[0077] III. The classification content and results are as follows: It achieves accurate classification of different land features such as water bodies, grasslands, and buildings, and the classification results are superior to existing classification methods.
Claims
1. An image classification method based on space-frequency joint adaptive directional Mamba, characterized in that, Follow these steps: Step 1: Data preprocessing; Step 2: Construct a frequency domain feature extraction network to extract features from low-frequency and high-frequency features respectively; Step 3: Model the spatial domain of the polarimetric SAR source data, perform selective feature learning, and jointly construct a three-branch network for space-frequency joint learning; Step 4: Fusion of spatial domain, low frequency, and multi-directional high frequency features, fusing the original image, low frequency features, and high frequency features; Step 5: Train the convolutional neural network model to predict the image category, optimize the model, and complete the final classification.
2. The image classification method based on space-frequency joint adaptive directional Mamba according to claim 1, characterized in that, In step 1, Constructing multi-directional features: Acquire polarimetric SAR images to obtain an H×W×9-dimensional polarimetric SAR feature matrix, where H is the height and W is the width. Use this matrix as input for NSCT decomposition. In the contour wave transform, the directional decomposition is based on the directional filter bank to divide the high-frequency components into multiple directions. The image is then convolved with bandpass filters in different directions to obtain the detailed response of the image in different directions.
3. The image classification method based on joint space-frequency adaptive directional Mamba as described in claim 2, characterized in that, In step 1, the specific process is as follows: Both high-frequency and low-frequency components in each direction maintain the same spatial dimensions as the original image, resulting in a low-frequency feature shape of H×W×9, expressed as: in, Indicates a low-pass filter. This indicates the result obtained after passing through a low-pass filter; To obtain multi-directional high-frequency features from the collected raw data, with a shape of H×W×9×k, the channel dimension and the direction dimension are combined, as shown in the following expression: Among them, parameters K Represents the number of decomposition directions. Indicates the first Directional filters in each direction, Representing the k High-frequency information of direction is superimposed on the channel dimension to obtain Then the overall characteristics X The construct expression is as follows: The network input is formed using this method, with a specific shape of H×W×90. The features are then normalized channel-by-channel. For the first... Channel characteristics First, calculate its quantiles, expressed as: in, v min and These represent the image feature channels. The minimum and maximum values, Indicates the nth percentile; Then perform the truncation operation, the expression is: in, This indicates the feature channel after truncation. ; Finally, normalization is performed, and the expression is: in, This represents the feature channels after normalization. After the above processing, all features are normalized to the (0,1) interval; To incorporate spatial context information, for each pixel Extract a local neighborhood window centered on the pixel, with a window size of [size missing]. The expression is: in, x i,j The features of a sample are represented by pixels in the image. Feature samples extracted from a local neighborhood window centered on the center. C The number of channels; the sample labels are determined by the label corresponding to the center pixel of the window, constructing a sample set. D The expression is: in, N For the sample size, n Indicates the index of the sample, from 1 to , used to identify the sample number; Indicates the first Features of each sample; Indicates the first The label of a sample is the category label corresponding to the center pixel of that sample; During training, the samples are divided into batches of data to be input into the model, as expressed by: in, This represents the feature matrix of a batch of samples; This represents a batch of sample labels, each of which is a label of length 1. The vector.
4. The image classification method based on spatio-frequency joint adaptive directional Mamba according to claim 1, characterized in that, Step 2, the specific process is as follows: 2.1) Constructing a low-frequency feature branch: Inputting low-frequency features into the convolutional layer and downsampling them: in, This represents a convolution operation with stride; This represents depthwise separable convolution; Indicates an upsampling operation; Indicates the low-frequency characteristics at different stages; This is the final low-frequency characteristic; 2.2) Constructing high-frequency feature branches: First, the high-frequency features are reconstructed into a multi-directional form, expressed as: in, For learnable parameters, convolution mapping is performed on features in each direction, expressed as: in, For activation functions; using relu The activation function truncates negative numbers to 0 and leaves positive numbers unchanged; its value range is [value range missing]. ; Then, modeling is performed using the orientation-aware Mamba module, with the expression: Obtain multi-directional feature set ,in In accordance with the first k The direction converts high-frequency sequences into 1-dimensional sequences for selective scanning.
5. The image classification method based on spatio-frequency joint adaptive directional Mamba according to claim 1, characterized in that, Step 3, the specific process is as follows: Spatial modeling is performed on polarimetric SAR source data to construct the original feature branches: The original features are input into the convolutional layer for initial mapping, as shown in the expression: in, This represents the intermediate features after local spatial feature extraction from the original features; a two-dimensional state-space model is introduced for global modeling, and long-distance dependency modeling is achieved through sequential scanning of spatial features. The expression is: This represents the feature representation obtained after modeling using a two-dimensional state-space model.
6. The image classification method based on spatio-frequency joint adaptive directional Mamba according to claim 1, characterized in that, Step 4, the specific process is as follows: The specific process of implementing multi-directional and multi-branch feature fusion is as follows: 4.1) Multi-directional fusion: A directional response function is introduced, which takes high-frequency features from multiple directions as input to obtain the response. The expression is: in, Represents a local feature mapping function. The weights are normalized using the following expression: in, Indicates position First The weights for each direction are calculated, retaining only the most discriminative direction information. Fusion is achieved through weighted summation, expressed as: in, Indicates the location The final high-frequency feature representation is obtained after multi-directional high-frequency feature weighted fusion. 4.2) Multi-branch fusion: Collaborative fusion of the original branch, low-frequency branch, and high-frequency branch. First, the three-branch features are concatenated, as shown in the expression: in, This represents the result after concatenating the three-branch features. The weights of each branch are generated through a gating function, and the expression is as follows: Among them Indicates gating response, ; After further normalization, the expression is: in, It is a small constant, and the result is... If the branch contribution is represented, then the final fusion result is: in, , , Let represent the weighting coefficients of each branch, and let them sum to 1; Finally, residual enhancement is introduced, expressed as: in, Represents the model output, This is the result obtained after gating adaptive selection. It is a learnable parameter that is automatically updated during training and is initialized randomly.
7. The image classification method based on joint space-frequency adaptive directional Mamba as described in claim 6, characterized in that, Pick .
8. The image classification method based on spatio-frequency joint adaptive directional Mamba according to claim 1, characterized in that, Step 5, the specific process is as follows: 5.1) For the deep features obtained in step 4, spatial compression is performed on the fused features using global average pooling, expressed as follows: Where H represents the image height, Representing the image width, we obtain the global feature vector. ; 5.2) The global response obtained in step 1 is input into the classification layer and mapped using the following expression: in, This represents the predicted scores for each category output by the model; This is the classification weight matrix; For bias terms; C Given the number of categories, the category probabilities are obtained through a normalization function. The expression is: 5.3) Optimization is performed using the cross-entropy loss function, expressed as: in, Represents the cross-entropy loss function. For the one-hot format of the real label, To predict probabilities.