A vehicle target recognition method based on feature decomposition and cumulative learning
By using the FDRM ST-PA_RCNN algorithm, which employs a weight redistribution method based on feature decomposition and cumulative learning, combined with SAR image datasets and feature enhancement algorithms, the shortcomings of traditional vehicle detection and recognition methods in feature extraction and robustness are addressed, achieving efficient vehicle target recognition.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- FUDAN UNIVERSITY
- Filing Date
- 2025-04-11
- Publication Date
- 2026-06-26
AI Technical Summary
Traditional vehicle detection and recognition methods perform poorly in terms of feature extraction, recognition rate, and detection robustness, especially when the vehicle target has a small physical size, high mobility, and complex background, making it difficult to effectively identify.
The ST-PA_RCNN algorithm based on deep neural networks is adopted. By reducing feature redundancy through feature decomposition and cumulative learning weight redistribution, and combined with SAR image dataset and feature enhancement algorithm, a pixel-level and feature-level fusion model of ST-PA_RCNN is constructed for vehicle target recognition.
It significantly improves the overall performance of vehicle target detection, enhances the ability to detect and suppress false alarms, and demonstrates high effectiveness and robustness.
Smart Images

Figure CN120279436B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of vehicle target recognition, and more specifically, to a vehicle target recognition method based on feature decomposition and cumulative learning. Background Technology
[0002] Due to the characteristics of vehicle targets such as small physical size, high mobility, and complex background, traditional vehicle detection and recognition methods perform poorly in terms of feature extraction, recognition rate, and detection robustness.
[0003] Over the past decade, deep learning technology has become a significant milestone in the development of artificial intelligence, with its innovative breakthroughs significantly driving innovation in multiple technological fields. This technology has been successfully applied in various fields such as speech recognition, text understanding, visual information processing, dynamic image analysis, and multimedia content analysis, demonstrating remarkable application value. Unlike traditional pattern recognition methods that rely on manual feature engineering, this technology autonomously extracts feature information from massive amounts of data, achieving a fundamental transformation in knowledge representation. This data-driven learning mechanism endows models with stronger representational capabilities and higher computational efficiency, not only improving model performance but also providing innovative solutions for processing high-dimensional, unstructured data.
[0004] This paper proposes the FDRM ST-PA_RCNN algorithm for vehicle target recognition. It uses the ST-PA_RCNN framework based on deep neural networks ReDet and Oriented RCNN as the backbone network, and adds an FDRM module (Feature Decomposition and Feature Reassignment Model) transfer fusion block to reduce feature redundancy through weight redistribution, forming a feature fusion method based on feature decomposition and weight redistribution using a cumulative learning strategy. The algorithm is validated on typical target detection datasets, showing a significant performance improvement over the baseline model ST-PA_RCNN, demonstrating high effectiveness and robustness, and superior detection and false alarm suppression performance. Summary of the Invention
[0005] Therefore, this application provides a vehicle target recognition method based on feature decomposition and cumulative learning to solve the above problems.
[0006] This application provides a vehicle target recognition method based on feature decomposition and cumulative learning, including:
[0007] The dataset was constructed using a land SAR (synthetic aperture radar) imagery dataset of vehicle targets;
[0008] Vehicle target scattering feature extraction and enhancement;
[0009] Construct a pixel-level fusion model for ST-PA_RCNN (Region Convolutional Neural Network);
[0010] Training of a feature-level fusion model based on the FDRM (Feature Decomposition and Weight Reassignment Model) module;
[0011] Test the model's performance using test data and output the results.
[0012] Preferably, the construction of the dataset using the land SAR image vehicle target dataset includes:
[0013] Based on the characteristics of the satellite's left and right side views and its ascent and descent orbits, and combined with the target's deployment direction at the distribution base, we selected key scene areas for sample data collection.
[0014] After standardization, the target distribution is used to test whether the sample data meets the sample distribution index requirements.
[0015] If the conditions are met, the dataset construction is complete;
[0016] If it does not meet the requirements, the missing sample observation parameters are analyzed, and the distribution location coverage analysis of the missing sample observation parameters is performed using orbit simulation to form the optimal detection scheme that can be completed in a short time.
[0017] Rapid reconnaissance is conducted using satellite platforms with high timeliness, such as SAR, and preprocessing is performed to create training and test samples.
[0018] Preferably, the extraction and enhancement of vehicle target scattering features includes:
[0019] Texture features of SAR images are extracted using four algorithms: SAR-SIFT (Scale Invariant Feature Transform), SAR-HOG (Directed Gradient Histogram), NSLP (Non-Subsampled Laplacian Pyramid), and LC (Linear Color Saliency).
[0020] The extracted features are enhanced using a SAR-Harris detector (Synthetic Aperture Radar Hollis Corner Detector) and the OPTICS clustering method (density clustering algorithm based on reachability and core distance).
[0021] Preferably, the construction of the ST-PA_RCNN pixel-level fusion model includes:
[0022] An ST-PA_RCNN backbone network was constructed, which combined Swin Transformer (sliding window hierarchical attention network) and PA_FPN (path aggregation multi-level feature pyramid structure). The original image and the feature map enhanced by scattering features were fused through the channel fusion method, and the target detection model was trained.
[0023] Preferably, the training of the feature-level fusion model based on FDRM includes:
[0024] Based on ST-PA_RCNN, the FDRM module is introduced based on the feature fusion method Dual-Branch FPN. It reduces feature redundancy and enhances the diversity of the feature space through feature decomposition and weight redistribution.
[0025] Preferably, the step of testing the model's performance using test data and outputting results includes:
[0026] The images from the test dataset are input into the trained FDRM ST-PA_RCNN model (including pixel-level and feature-level fusion models) to perform object detection inference and calculate the model's performance metrics.
[0027] The model's performance is evaluated by calculating performance metrics such as precision, recall, and mean AP.
[0028] Output the detection results (bounding boxes and target categories) and visualize the detection results.
[0029] Preferred options also include:
[0030] The original SAR image is fused with the feature map enhanced by scattering features to form multi-channel input data.
[0031] Preferred options also include:
[0032] Feature redundancy is reduced and the diversity of the feature space is enhanced by feature decomposition and weight redistribution methods.
[0033] Preferably, this application also provides an electronic device, including a processor and a memory, wherein: the memory stores a computer program; when the computer program in the memory is executed by the processor, the electronic device is able to implement any of the described methods.
[0034] Preferably, embodiments of this application provide a computer-readable storage medium including a computer program that, when run on an electronic device, causes the electronic device to perform the method described in any implementation of the first aspect.
[0035] Preferably, embodiments of this application provide a computer program product, the computer program product including a computer program that, when executed by a processor, implements the method as described in any implementation of the first aspect.
[0036] The vehicle target recognition method, electronic device, computer-readable storage medium, and computer program product based on feature decomposition and cumulative learning provided in this application first construct a dataset using a land SAR image vehicle target dataset; then, vehicle target scattering features are extracted and enhanced; next, an ST-PA_RCNN pixel-level fusion model is constructed; furthermore, a feature-level fusion model based on FDRM is trained; finally, the model's performance is tested using test data and the results are output. This application can improve the vehicle target detection capability of SAR images. Attached Figure Description
[0037] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings:
[0038] Figure 1 This is a flowchart illustrating a vehicle target recognition method based on feature decomposition and cumulative learning, according to Embodiment 1.
[0039] Figure 2 This is a flowchart of another vehicle target recognition method based on feature decomposition and cumulative learning, as shown in Embodiment 1;
[0040] Figure 3 This is a general schematic diagram of this application;
[0041] Figure 4 This is a schematic diagram of the MIX_MSTAR simulation dataset;
[0042] Figure 5 This is the overall flowchart of the present invention, which, from left to right, consists of the data acquisition and dataset construction section, the model training section, and the model testing section.
[0043] Figure 6 This is a schematic diagram of the FDRM module;
[0044] Figure 7 This is a schematic diagram of the vehicle target detection results in this invention. Detailed Implementation
[0045] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present application, and not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative effort should fall within the scope of protection of the present application.
[0046] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0047] Example 1
[0048] This application provides a vehicle target recognition method based on feature decomposition and cumulative learning. See [link to relevant documentation]. Figure 1 The flowchart shown is for a vehicle target recognition method based on feature decomposition and cumulative learning. The specific implementation scheme is as follows:
[0049] Step S101: Construct a dataset using a land SAR image vehicle target dataset;
[0050] Step S102: Extract and enhance vehicle target scattering features;
[0051] Step S103: Construct the ST-PA_RCNN pixel-level fusion model;
[0052] Step S104: Training the feature-level fusion model based on the FDRM module;
[0053] Step S105: Test the model's performance using test data and output the results.
[0054] This application also provides another vehicle target recognition method based on feature decomposition and cumulative learning, see [link to relevant documentation]. Figure 2 The flowchart shown is for a vehicle target recognition method based on feature decomposition and cumulative learning. The specific implementation scheme is as follows:
[0055] Step S201: Based on the satellite's left and right side-view and ascent / descend orbit characteristics, and combined with the target's deployment direction at the distribution base, select key scene areas for sample data collection.
[0056] Step S202: After standardization, the target distribution is used to verify whether the sample data meets the sample distribution index requirements.
[0057] Step S203: If the conditions are met, the dataset construction is complete.
[0058] Step S204: If the condition is not met, analyze the missing sample observation parameters, use orbit simulation to perform distribution location coverage analysis on the missing sample observation parameters, and form an optimal detection scheme that can be completed in a short time.
[0059] Step S205: Use satellite platforms with high timeliness, such as SAR, to conduct rapid reconnaissance and preprocess the data to form training and test samples.
[0060] Step S206: Extract texture features from SAR images using four algorithms: SAR-SIFT, SAR-HOG, NSLP, and LC.
[0061] Step S207: Enhance the extracted features using the SAR-Harris detector and the OPTICS clustering method.
[0062] Step S208: Construct the ST-PA_RCNN backbone network, combine Swin Transformer and PA_FPN, fuse the original image with the feature map enhanced by scattering features through the channel fusion method, and train the target detection model.
[0063] Step S209: Based on ST-PA_RCNN, the FDRM module is introduced based on the feature fusion method Dual-Branch FPN. Feature decomposition and weight redistribution methods are used to reduce feature redundancy and enhance the diversity of the feature space.
[0064] Step S210: Input the images from the test dataset into the trained FDRM ST-PA_RCNN model (including pixel-level and feature-level fusion models) to perform object detection inference and calculate the model's performance metrics.
[0065] Step S211: Evaluate the model's performance by calculating performance metrics such as precision, recall, and mean precision (mAP).
[0066] Step S212: Output the detection results (bounding boxes and target categories) and visualize the detection results.
[0067] Step S213: Channel fusion is performed between the original SAR image and the feature map enhanced by scattering features to form multi-channel input data.
[0068] Step S214 reduces feature redundancy and enhances the diversity of the feature space by using feature decomposition and weight redistribution methods.
[0069] The vehicle target recognition method based on feature decomposition and cumulative learning provided in this application first constructs a dataset using a land SAR image vehicle target dataset; then, it extracts and enhances vehicle target scattering features; next, it constructs an ST-PA_RCNN pixel-level fusion model; furthermore, it trains a feature-level fusion model based on the FDRM module; finally, it tests the model's performance using test data and outputs the results. This application can improve the vehicle target detection capability of SAR images.
[0070] The dataset is constructed using the acquired high-resolution SAR images, and the specific implementation method is as follows:
[0071] Step 1: Based on the deployment direction of the target at the distribution base, select the key scene areas of interest and collect SAR image data of the vehicle target via satellite.
[0072] Step 2: Standardize the acquired SAR images.
[0073] Step 3: Use the target map distribution to check whether the sample data meets the sample distribution index requirements. If it does, the dataset construction is completed. If it does not, it is necessary to analyze the missing sample observation parameters. Use orbit simulation to perform distribution location coverage analysis on the observation parameters of the missing samples to form the optimal detection scheme that can be completed in a short time.
[0074] Step 4: Preprocess the image data to form training and test samples.
[0075] Step 5: Use the algorithm to perform batch processing to downsample these high-resolution images to a uniform size (512*512) as a dataset.
[0076] Step 6: Divide the training set and the test set into an 8:2 ratio.
[0077] The specific implementation method for extracting and enhancing vehicle target scattering features is as follows:
[0078] Step 1: Extract vehicle target features using SAR-SIFT, SAR-HOG, NSLP, and LC algorithms respectively. Specifically, the SAR-SIFT algorithm enhances robustness to multiplicative noise in SAR images through improved gradient calculation, and is used to extract keypoint features; the SAR-HOG algorithm extracts stable structural features through ratio gradient calculation, and is used to extract texture features; the NSLP algorithm preserves the spatial resolution of multi-scale texture features through a non-subsampled Laplacian pyramid, and is used to extract contour features; the LC algorithm extracts visually saliency features through a linear color model.
[0079] Step 2: Enhance the extracted features using the SAR-Harris detector and the OPTICS clustering method.
[0080] Step 3: Perform channel fusion between the original image and the feature map enhanced by scattering features to construct multi-channel input data.
[0081] Then, a pixel-level fusion model, ST-PA_RCNN, is constructed as a baseline. The specific implementation method is as follows:
[0082] Step 1: Construct the ST-PA_RCNN backbone network, combining the Swing Transformer and PA-FPN, and initialize the pre-training parameters;
[0083] Step 2: Input the multi-channel data after channel fusion into the network, and train the target detection model through the pixel-level fusion module;
[0084] Step 3: Optimize model parameters using cross-entropy loss and Smooth L1 loss.
[0085] Then, based on the baseline model, a feature-level fusion model FDRMST-PA_RCNN based on the FDRM module is constructed and trained using training data. The specific implementation method is as follows:
[0086] Step 1: Construct a Dual-Branch PA-FPN module, with the left and right sides displayed as PA-FPN respectively. Then, input the dual-branch features {P2, P3, P4, P5} obtained from PA-FPN into the feature fusion block to obtain a new feature map {M2, M3, M4, M5}.
[0087] Step 2: Construct an improved Feature Fusion Block (FDRM) module. The FDRM module can reduce more feature redundancy. Furthermore, the cumulative learning strategy allows for the reweighting of features from the two branches, enhancing the interaction between the two feature spaces and enabling the network to learn and utilize scattering properties more effectively.
[0088] Step 3: Input the features extracted from the original image and the scattering-enhanced feature image by DB-PA-FPN into the FDRM module for feature decomposition and reweighting. Construct orthogonal loss and compactness loss to reduce feature redundancy and enhance feature space diversity; use a dynamic weighting strategy to redistribute weights and fuse bi-branch features to form new features, and adjust the weighting coefficients μ (e.g., constant, linear, parabolic strategies, etc.) to generate hybrid features.
[0089] The feature decomposition module takes as input the feature map output by the feature extraction module and aims to decompose the input feature map. This module contains two parallel branches, which decompose the input features into foreground and background features based on orthogonal and reconstruction constraints between the two branches. The decomposed background features include complex background clutter, while the foreground features contain the target of interest. Both branches have the same network structure, containing three convolutional layers, each followed by a ReLU layer. All three convolutional layers have a 3×3 kernel size, a stride of 1, and 512 kernels. Since the parameters of the two branches are not shared, they are responsible for foreground and background feature extraction respectively. Finally, the obtained foreground and background features are summed and input into the decoder for image reconstruction. The decoder contains five deconvolutional layers, with the first four layers followed by ReLU layers. All five deconvolutional layers have a 3×3 kernel size, with strides of 1, 2, 2, 2, and 1, and the number of kernels are 512, 256, 128, 64, and 3, respectively. The outputs of the feature decomposition module, foreground and background features, are then input into a subsequent feature weight redistribution module for weighted allocation.
[0090] The orthogonal loss function is defined as follows:
[0091]
[0092] Where N represents the number of training samples in each batch. n1 f represents the column vector transformed from the difference features or background features obtained by decomposing the i-th sample. n2 This represents the column vector obtained by decomposing the interference features or foreground features of the i-th sample, and T represents the transpose of the vector.
[0093] The compact loss function is defined as follows:
[0094]
[0095] Where ||·||2 represents the L2 norm. Ii,j denotes the decomposition characteristics when j = 1, ..., M (M = 2 in our method). j This represents the center of the j-th decomposition vector, and it is updated in each batch. In this way, the variation of the same decomposition features is minimized.
[0096] The final loss function is:
[0097] L total =L cls +L loc +λ opl L opl +λ cmpt L cmpt
[0098] Where λ opl and λ cmpt These are two balancing parameters, set to 1 and 0.1 respectively.
[0099] L cls The classification loss measures the accuracy of the model in predicting vehicle target categories (or binary classification of targets and non-targets).
[0100] L loc The localization loss is used to measure the accuracy of the model when making regression predictions about vehicle bounding boxes (position, size, orientation, etc.).
[0101] The feature weight redistribution module based on cumulative learning fuses the foreground and background features obtained after feature decomposition before feeding them into the multi-scale detection module. This module redistributes the weights of the input foreground and background features, and under different cumulative learning strategies, the network training focus gradually shifts from background features to foreground features. The parameter μ is introduced as follows:
[0102] F r =μF fore +(1-μ)F back
[0103] Where F r This indicates a blending feature between foreground and background characteristics. F fore This represents foreground features, specifically the feature vectors or feature maps contained in the target of interest (vehicle) through the feature decomposition module. F back This represents background features, specifically image background or noise features extracted by the same decomposition module that do not belong to the vehicle's main body. Furthermore, we can employ various cumulative learning strategies to explore the optimal method for object detection performance. Given the current training epoch T and the total training epoch T... max The μ value differs depending on the strategy. For a constant strategy, μ is 0.5; for a linear strategy, μ is T / T. max Parabolic strategy takes (T / T) max ) 2 The output of the feature weight redistribution module is then used as the input to the multi-scale detection module to predict class labels and location information.
[0104] Step 4: Connect the channels of the obtained mixed features and finally downsample them for output.
[0105] Finally, the performance of the model is tested using test data. The specific implementation method is as follows:
[0106] Step 1: Select images from the test dataset, ensuring that these images were not used in the training process;
[0107] Step 2: Preprocess the test image, including downsampling to a uniform size (e.g., 512x512 pixels) and scattering feature extraction;
[0108] Step 3: Input the preprocessed image into the trained ST-PA_RCNN model (including pixel-level and feature-level fusion models) for object detection inference; the dataset input slice size is 512x512, and the stochastic gradient descent (SGD) optimizer is used with an initial learning rate of 5×10. -3 The momentum is 0.9, and the weight decay is 1.0 × 10⁻⁶. -4 After 56 training cycles, convergence was achieved.
[0109] Step 4: Calculate the model's performance metrics, including mean accuracy (mAP), precision, and recall. The calculation formulas are as follows: (where TP represents true positives, FP represents false positives, and FN represents false negatives).
[0110] Accuracy:
[0111] Recall rate:
[0112] Average Precision (AP):
[0113] Mean Precision (mAP):
[0114] Step 5: Analyze the test results, compare the performance differences of different fusion methods (pixel level, feature level) on different datasets, and verify the effectiveness and robustness of the model.
[0115] Step 6: Output the detection results, including the bounding box and target category, and visualize the detection results.
[0116] Step 7: The final pixel-level fusion method ST-PA_RCNN achieves a MAP of 95.72% in the vehicle target detection task, while the feature-level fusion method FDRM ST-PA_RCNN improves the MAP by 2.03% compared to ST-PA_RCNN. The comparative experimental results are shown in Table 1.
[0117]
[0118]
[0119] Table 1: Results of the comparative experiment
[0120] Example 2
[0121] In FDRM module mode, the adaptive feature gain allocation algorithm typically yields the "foreground feature" F. g With "background features" F b Two branches. The adaptive feature gain allocation algorithm defines an "adaptive gain function (AGF)" and introduces a parameter α to automatically amplify or reduce the foreground / background weights during training, thus achieving adaptability to different environments.
[0122] The execution flow of the adaptive feature gain allocation algorithm includes:
[0123] Accepts F output from the FDRM module g With F b And the initial α (usually set to 0.5 or other empirical values);
[0124] First, for F g and F b Each is subjected to σ(·) transformation to enhance stability and nonlinear representation;
[0125] W is obtained from the adaptive gain function. AGF If α is slightly greater than 0.5, the foreground features will be moderately enhanced; if α is significantly less than 0.5, the suppression of background features will be emphasized.
[0126] W AGF Consider it as a new feature map or channel, compared with the original F g Other network features can be spliced, superimposed, or subjected to subsequent convolution operations.
[0127] After each epoch or specific training step, α is fine-tuned using gradient or search strategies by observing metrics such as validation set mAP, total loss, or foreground recall to adapt it to the current scene (noise level, target distribution, etc.).
[0128] As the training progresses, α will tend to a steady state or change dynamically with external feedback, thereby ensuring that the "gain" or "background suppression" of the vehicle is always maintained at a reasonable level under different environments.
[0129] The following is the adaptive gain function (Formula 1):
[0130] W AGF (F g ,F b ;α)=α×σ(F g )-(1-α)σ(F b )
[0131] Among them, F g This indicates foreground features, typically derived from branches focused on vehicle targets after the FDRM module decomposition.
[0132] Fb It represents background features, mainly including ground clutter and noise;
[0133] σ(·) is the effective value obtained after activation function or normalization operation, such as ReLU, LeakyReLU or BatchNorm;
[0134] α∈(0,1) is an adjustable gain coefficient that is adaptively updated during the training phase based on the loss or validation set metrics.
[0135] W AGF This is the adaptive weighted result output by the adaptive feature gain allocation algorithm. It retains more foreground features and partially reduces background feature energy by using a weighted difference.
[0136] The technical effect is as follows:
[0137] When background noise increases, the sensitivity to F can be automatically reduced. b The weighting, and appropriately amplify F g ;
[0138] When the background is relatively clean but the number of targets is large, α can be adjusted to a balanced value to avoid excessive background weakening that could lead to missed detection of other potential information.
[0139] In extremely noisy scenarios, adaptive gain alone may not be sufficient to accurately locate small targets; when the difference between the foreground and background is already weak, simply "increasing the foreground and decreasing the background" cannot guarantee the preservation of local details. To address this, the cross-branch residual alignment algorithm proposes to construct a "residual alignment" operation between the foreground / background branches and introduces a parameter β to explicitly align the differences between them during network training, thus preventing vehicle features from being largely submerged by the background.
[0140] The execution flow of the cross-branch residual alignment algorithm includes:
[0141] The input is F g F b And α;
[0142] Initialize β, which can be set to a value in the range of 0.3 to 0.8. The specific value can be determined through grid search or experience.
[0143] First calculate αF b -F g If this value is positive, it means that the background features "outperform" the foreground in some sense; if it is negative, it means that the foreground is stronger.
[0144] Multiply the residual from the previous step by β, and then add it to F. g R is obtained from above CBRARThis process essentially involves injecting a certain amount of background correction into the foreground branch, making the features of small targets more robustly preserved under huge background noise.
[0145] R CBRAR As a new foreground representation or further fusion with background information, it can improve the accuracy of target detection.
[0146] Similar to α, β can also be fine-tuned based on the loss during network training to ensure that residual injection is not over-corrected, which would severely damage the foreground.
[0147] The formula for cross-branch residual alignment (Formula 2) is defined as follows:
[0148] R CBRA (F g ,F b ; α,β)=F g +β×(αF b -F g )
[0149] Where α is consistent with the adaptive feature gain allocation algorithm;
[0150] β∈(0,1) is the cross-branch residual coefficient, which determines the strength of residual alignment;
[0151] αF b -F g A cross-branch residual signal was constructed: if the background features are much larger than the foreground features, the term exhibits a negative compensation form; if the foreground features themselves dominate, the residual tends to be smaller.
[0152] Ultimate R CBRAR The aligned foreground feature map is equivalent to the one in F g A portion of the information from the background branch is injected, and its influence is modulated by β.
[0153] Technical effects:
[0154] In situations with high noise or partial occlusion, the network can provide additional difference compensation for small targets through the residual form of "background branch - foreground branch";
[0155] When the background noise changes drastically, the cross-branch alignment mechanism intelligently corrects the network's focus, reducing the interference of abrupt background changes on target recognition.
[0156] If the adaptive feature gain allocation algorithm or the cross-branch residual alignment algorithm is used alone, it may have one disadvantage and the other disadvantage, or each may have its own advantages and disadvantages at different stages. In order to organically combine the two, a cascaded linkage fusion algorithm is proposed, which uses the parameter γ adaptive feature gain allocation algorithm and the cross-branch residual alignment algorithm to achieve the dual advantages of gain amplification and residual alignment in extreme scenarios.
[0157] The execution flow of the cascaded linkage fusion algorithm includes:
[0158] Enter W AGF (F g ,F b ;α) and R CBRA (F g ,F b ;α,β);
[0159] γ can be selected based on prior experience (e.g., 0.5) or through automated search;
[0160] γ×W AGF (F g ,F b ;α) and (1-γ)×R CBRA (F g ,F b Add α and β together to generate F. Cascade ;
[0161] Output: F Cascade This can be considered as the final comprehensive feature map, which is then sent to the FPN or detection head for vehicle detection.
[0162] Based on the performance of the training or validation set, in addition to fine-tuning α and β, γ can also be globally tuned periodically to observe its impact on metrics such as mAP, Precision, and Recall, thereby finding the optimal configuration.
[0163] The cascaded linkage fusion formula (Formula 3) is as follows:
[0164] F Cascade =γ×W AGF (F g ,F b ;α)+(1-γ)×R CBRA (F g ,F b ;α,β)
[0165] Among them, W AGF (F g ,F b ;α) is the adaptive gain result generated by the adaptive feature gain allocation algorithm;
[0166] R CBRA (F g ,F b ; α,β) refers to the cross-branch residual results generated by the cross-branch residual alignment algorithm;
[0167] γ∈(0,1) is the cascade fusion factor, which determines whether the final output is more inclined towards the adaptive feature gain allocation algorithm or the cross-branch residual alignment algorithm;
[0168] F Cascade The comprehensive features produced by the cascaded linkage fusion algorithm include both the advantages of adaptive gain and the detailed compensation of residual alignment.
[0169] Technical effects:
[0170] In various complex scenarios, if the vehicle target is extremely small or severely occluded, the adaptive feature gain allocation algorithm can increase the target weight in advance; if the background noise fluctuates significantly, the cross-branch residual alignment algorithm can also correct and compensate in time during this process; the two are fused through γ-cascade, so that the network can maintain high sensitivity while taking into account detail alignment.
[0171] Compared to single foreground / background weighting or residual alignment, cascaded linkage fusion algorithms can be compatible with a variety of extreme cases, thus ensuring vehicle detection performance in more complex application scenarios.
[0172] To enable those skilled in the art to fully implement Embodiment 2, the following will provide a clear step-by-step description from the perspective of data preparation, network training, and how it is combined with Embodiment 1.
[0173] This embodiment 2 is consistent with embodiment 1 in terms of underlying data and basic feature processing, namely:
[0174] Use satellite or airborne SAR systems to acquire high-resolution images to ensure that training / test data covers different angles, resolutions, and terrain features.
[0175] If necessary, missing samples can be supplemented through target distribution testing, trajectory simulation, etc.
[0176] Multi-channel fusion is performed on the original image using SAR-SIFT, SAR-HOG, NSLP, LC, etc., to preserve multi-dimensional information such as texture, structure, and saliency;
[0177] During the feature-level modeling stage, the FDRM module is executed to obtain the foreground features F. g With background feature F b In Example 1, the FDRM module has detailed how to reduce redundancy and improve feature discriminative power based on orthogonal loss and compaction loss. At this point, the foreground / background features have been well separated.
[0178] To enable those skilled in the art to accurately implement and reproduce this embodiment 2, the following describes α, β,
[0179] The three newly added parameters for γ provide feasible heuristic tuning suggestions:
[0180] α is the adaptive gain coefficient, meaning that the larger α is, the more likely it is to enhance the foreground feature F. g The smaller the value, the more attention will be paid to background features;
[0181] The initial value can be set to 0.5; the changes in the loss curve, mAP, or precision can be observed after every few rounds of training. If the false negative rate increases in a high-noise environment, α needs to be increased.
[0182] If there is little background interference but the false detection rate is increasing, α can be appropriately reduced to allow the network to retain more background information and avoid the network becoming overly "sensitive".
[0183] β is the cross-branch residual alignment coefficient, typically between (0,1); meaning that when β is large, (αF) b -F g The difference is significantly amplified, which can easily strengthen compensation when small targets exist, but if it is too large, it may also cause the foreground characteristics to be impacted.
[0184] The initial value can be in the range of 0.3 to 0.8. If the detection finds that the vehicle is largely occluded or the background is extremely complex, β can be increased appropriately to enhance residual alignment. If too high a value of β causes excessive perturbation of the foreground features, it can be lowered.
[0185] γ is the cascade fusion factor, usually between (0,1); it means that when γ is large, the system relies more on the effect of adaptive gain; when γ is small, it relies more on cross-branch residual alignment.
[0186] The initial value can be set to 0.5, so that the outputs of the two algorithms are equal. In actual verification, if AGF is over-reliant, the deep mining of the residual signal may be ignored. Conversely, if CBRA is over-reliant, unnecessary residual operations may be added when the background is stable. A method similar to random search or binary search can be used to tune the parameters of γ offline and find the optimal value on the validation set. Alternatively, during online training, γ can be allowed to participate in backpropagation for automatic optimization through a certain learning rate.
[0187] The following are the precautions for this Example 2:
[0188] Add new scripts or sub-functions to the backend of the FDRM module to implement the Adaptive Feature Gain Allocation (AFGD) algorithm and the Cross-Branch Residual Alignment (CBRA) algorithm;
[0189] Ensure that the three parameters α, β, γ can be jointly managed by the network's optimizer (if the learnable mode is selected) or an external manager (if the search or custom update mode is selected).
[0190] When implementing the cascaded linkage fusion algorithm, it is necessary to add references to the outputs of the adaptive feature gain allocation algorithm and the cross-branch residual alignment algorithm in the code, and complete the secondary weighting and merging according to formula (3).
[0191] Formula details:
[0192] If ReLU is chosen as σ(·), it should be noted that if the foreground / background features are negative, they will be truncated. This problem can be mitigated by LeakyReLU or other activation functions.
[0193] For scenarios requiring high floating-point precision, if α, β, and γ participate in backpropagation, it is necessary to ensure that the training framework can correctly handle these parameters during gradient updates and avoid numerical overflow or gradient explosion.
[0194] Initially, α, β, and γ can be locked and only offline tuning is allowed. That is, the values are fixed first and then training is carried out. Fine-tuning is then performed after convergence is observed.
[0195] If automation is desired later, α, β, and γ can be declared as learnable variables in the network and written into the loss function for backpropagation updates, but the initial values should be carefully selected.
[0196] When instability or overfitting occurs, conventional measures such as lowering the learning rate, reducing the number of network layers, or reducing certain hyperparameters can be tried.
[0197] It is recommended to visualize key intermediate outputs during implementation to view the distribution of foreground / background fusion and residual signals, which will facilitate quick problem location.
[0198] If serious false positives or false negatives occur on certain test sets, the focus should be on investigating whether the range of values for β and γ leads to excessive amplification of residuals or insufficient fusion.
[0199] In summary, the technical effects of this embodiment 2 are as follows:
[0200] When faced with extreme conditions such as strong noise, non-stationary clutter, and local shadows / occlusions, this embodiment 2, through the combination of algorithms 1 to 3, significantly outperforms traditional solutions in terms of vehicle recognition accuracy and false negative rate.
[0201] Traditional FDRM modules focus on reducing feature redundancy but do not explicitly handle feature interaction between foreground and background during transient changes. This embodiment 2 uses a cross-branch residual alignment mechanism to ensure that foreground features are not significantly diluted and to apply "refined suppression" to background noise.
[0202] The three parameters α, β, and γ can be regarded as picks, and the network can automatically or semi-automatically adjust them under different training stages and different scenario requirements to meet diverse application needs.
[0203] Different fields or application scenarios (such as traffic flow monitoring, battlefield reconnaissance, natural disaster assessment, etc.) can customize more suitable numerical ranges according to the actual noise and target size distribution.
[0204] By explicitly describing the gradual process of foreground / background branch fusion using formulas (1) to (3), a foundation is provided for subsequent researchers or engineers to further expand and improve the cumulative learning strategy;
[0205] This work further enriches the three key integration points of "feature difference, residual injection, and adjustable gain" in the cutting-edge field of deep learning + SAR target detection, and is a good supplement to the existing backbone frameworks such as FPN and Transformer.
[0206] In Example 1, a cumulative learning strategy was used to linearly or simply weight and fuse foreground and background features, achieving good vehicle target detection results. However, as background noise and occlusion conditions become increasingly complex, fixed or single weighting methods can lead to the swallowing of small target features and insufficient suppression of background redundancy, thus affecting the accuracy and recall rate of vehicle detection. To further improve the performance of Example 1, Example 2 improves its "cumulative learning and feature weighting" stage by introducing three sets of distributed algorithms and supporting formulas. The addition of adjustable parameters addresses issues such as insufficient background noise suppression and missed detection of small targets, achieving more flexible and higher-precision vehicle target recognition.
[0207] Practice has shown that Example 1 can achieve high accuracy and high recall in most conventional or relatively moderate noise SAR environments. However, when faced with extreme scenarios such as extremely high noise, strong interference, complex occlusion, and drastic changes in background information, the following potential problems still exist:
[0208] Typically, the "foreground branch" and "background branch" are fused during training using only linear or simple functions. If the background energy is too strong or the small target features are weak, it may not be able to fully "amplify" the key vehicle features.
[0209] Although the FDRM module reduces feature redundancy through orthogonal loss and compact loss, it does not explicitly consider cross-branch complementarity in the specific fusion process.
[0210] When the vehicle target is partially obscured or the surrounding noise is extremely high, simple weighting or subtraction is often insufficient, and problems such as either insufficient suppression or excessive suppression may occur.
[0211] To improve robustness and accuracy in extreme scenarios (such as vehicles partially obscured by buildings, large-area rain, snow, noise, and non-uniform backgrounds), this embodiment 2 proposes three key improvements:
[0212] Introducing more adjustable parameters allows for more flexible adjustment of foreground / background blending during training based on validation set performance;
[0213] During the cumulative learning phase, a cross-branch residual approach is proposed, which effectively preserves small target information and reduces false background detections by residual alignment.
[0214] By using a cascaded and interconnected fusion method, multiple feature gain strategies are combined to enable vehicle targets to still be captured and highlighted even under strong noise.
[0215] The improvements in this embodiment 2 are made in response to the above-mentioned technical problems.
Claims
1. A vehicle target recognition method based on feature decomposition and cumulative learning, characterized in that, include: Dataset construction was performed using a land SAR imagery vehicle target dataset; Vehicle target scattering feature extraction and enhancement; Construct an ST-PA_RCNN pixel-level fusion model; Training of a feature-level fusion model based on the FDRM module; Test the model's performance using test data and output the results; The training of the feature-level fusion model based on the FDRM module includes: introducing the FDRM module based on the Dual-Branch FPN feature fusion method on the basis of ST-PA_RCNN, reducing feature redundancy and enhancing the difference in feature space through feature decomposition and weight redistribution methods; accepting the foreground feature Fg and background feature Fb output from the FDRM module through an adaptive feature gain allocation algorithm, transforming Fg and Fb with activation functions respectively, obtaining the weighted result according to the adaptive gain function, and fine-tuning the gain parameter α after each training step by observing the average precision, total loss or foreground recall index of the validation set, using gradient or search strategies to adapt it to the noise level and target distribution characteristics of the current scene; and using a cross-branch residual alignment algorithm, inputting the foreground feature Fg, background feature Fb and parameter α, initializing the residual coefficient β, calculating the residual value of α multiplied by Fb minus Fg, multiplying the residual value by β and adding it to Fg to obtain the aligned foreground feature representation, and using the aligned foreground feature as a new foreground representation or further fusion with background information; The outputs of the adaptive feature gain allocation algorithm and the cross-branch residual alignment algorithm are weighted and fused through a cascaded fusion factor γ to generate the final comprehensive feature.
2. The vehicle target recognition method based on feature decomposition and cumulative learning according to claim 1, characterized in that, The aforementioned data set construction using land SAR image vehicle target dataset includes: selecting key scene areas for sample data collection based on satellite left and right side-looking and ascent / descent characteristics, combined with the target's deployment direction at the distribution base; after standardization, verifying whether the sample data meets the sample distribution index requirements using target image distribution; if it meets the requirements, the data set construction is completed; if it does not meet the requirements, analyzing the missing sample observation parameters, using orbit simulation to perform distribution location coverage analysis on the missing sample observation parameters, forming an optimal reconnaissance scheme that can be completed in a short time; and using high-time-efficiency satellite platforms such as SAR for rapid reconnaissance, followed by preprocessing to form training and test samples.
3. The vehicle target recognition method based on feature decomposition and cumulative learning according to claim 1, characterized in that, The process of extracting and enhancing vehicle target scattering features includes: extracting texture features from SAR images using four algorithms: SAR-SIFT, SAR-HOG, NSLP, and LC; and enhancing the extracted features using a SAR-Harris detector and the OPTICS clustering method.
4. The vehicle target recognition method based on feature decomposition and cumulative learning according to claim 1, characterized in that, The construction of the ST-PA_RCNN pixel-level fusion model includes: constructing the ST-PA_RCNN backbone network, combining SwinTransformer and PA_FPN, fusing the original image with the feature map enhanced by scattering features through the channel fusion method, and training the target detection model.
5. The vehicle target recognition method based on feature decomposition and cumulative learning according to claim 1, characterized in that, The process of testing the model's performance using test data and outputting results includes: inputting images from the test dataset into a trained FDRM ST-PA_RCNN model, which includes pixel-level and feature-level fusion models, performing object detection inference, and calculating the model's performance metrics; evaluating the model's performance by calculating performance metrics such as precision, recall, and average precision; outputting detection results, which include bounding boxes and object categories, and visualizing the detection results.
6. The vehicle target recognition method based on feature decomposition and cumulative learning according to claim 1, characterized in that, Also includes: The original SAR image is fused with the feature map enhanced by scattering features to form multi-channel input data.
7. The vehicle target recognition method based on feature decomposition and cumulative learning according to claim 1, characterized in that, Also includes: Feature redundancy can be reduced through feature decomposition and weight redistribution. Enhance the diversity of the feature space.