End-to-end radar target refinement method based on echo discriminative learning

By constructing an end-to-end radar target recognition method based on echo discriminative learning, and combining HRRP and ISAR feature extraction networks, and utilizing Transformer encoders and adaptive gating fusion units, a refined radar target recognition method was achieved. This method solves the stability and accuracy problems of target recognition in complex application scenarios and improves recognition performance.

CN122194087APending Publication Date: 2026-06-12NANJING UNIV OF POSTS & TELECOMM

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NANJING UNIV OF POSTS & TELECOMM
Filing Date
2026-03-16
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing radar target recognition methods struggle to reliably achieve fine target identification in complex real-world application scenarios. Especially with long-range or weakly scattering targets, target echoes are easily overwhelmed by background interference. Existing clutter suppression and signal enhancement techniques offer limited improvement in characterizing the fine structural features of targets, and the loose fusion of HRRP and ISAR features fails to fully exploit the discriminative correlations between multi-domain features.

Method used

An end-to-end approach based on echo discriminative learning is adopted to construct HRRP feature extraction network and ISAR feature extraction network. Feature extraction and classification are performed through multi-domain feature fusion and classification modules. Feature interaction enhancement is achieved by using Transformer encoder, and weight allocation and weighted fusion are performed by adaptive gating fusion unit to form robust feature representation.

Benefits of technology

It improves the precision recognition performance of radar targets, reflects the physical differences of targets in different feature domains, enhances the ability to distinguish subtle structural differences between similar targets, improves the stability and overall accuracy of recognition, and adapts to changes in complex environments and imaging conditions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122194087A_ABST
    Figure CN122194087A_ABST
Patent Text Reader

Abstract

The application belongs to the technical field of radar signal processing and deep learning, and discloses an end-to-end radar target fine identification method based on echo distinguishability learning, which comprises the following steps: 1, obtaining radar echo data to be identified and dividing the radar echo data into a training set, a verification set and a test set; 2, synchronously inputting HRRP sequences in the training set and ISAR images into an identification network; 3, calculating a total loss value and iteratively updating parameters of the identification network until the total loss value converges or a preset training number of rounds is reached; 4, using the verification set to evaluate the identification accuracy of the current identification network; and 5, after the training is completed, using the test set to evaluate the performance of saved target parameters. The application realizes fine feature mining in the radar signal domain and the image domain, effectively mines complementary information of HRRP distance dimension structure and ISAR space geometry, and realizes radar target fine type identification.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of radar signal processing and deep learning technology, specifically relating to an end-to-end radar target fine recognition method based on echo discriminative learning. Background Technology

[0002] Radar Automatic Target Recognition (RATR) is a crucial component of radar sensing systems. In real-world radar detection scenarios, echo signals are typically affected by complex environmental factors, such as ground reflection, sea clutter, weather conditions, and system noise. Especially with long-range or weakly scattering targets, target echoes are easily overwhelmed by background interference, resulting in a low signal-to-noise ratio. Existing clutter suppression and signal enhancement techniques can improve echo quality to some extent, providing a more stable input for subsequent processing. However, their role is primarily focused on signal-level preprocessing, offering limited improvement in the stable representation of the target's fine structural features.

[0003] In recent years, deep learning methods have been widely applied to radar target recognition tasks. Existing methods often treat radar echo signals or imaging results directly as general one-dimensional signals or two-dimensional images, using general structures such as convolutional neural networks for feature extraction. These methods can achieve good recognition results under specific experimental conditions, but they often ignore the physical properties of radar echoes generated by electromagnetic scattering mechanisms. The learned features are quite sensitive to changes in target attitude, differences in observation geometry, and environmental changes, and the generalization ability and stability of the models are still limited.

[0004] From the perspective of radar target feature types, one-dimensional high-resolution range profiles (HRRP) can reflect the scattering structure distribution of a target in the range dimension, and have a strong characterization ability for target size and scattering point arrangement, but are more sensitive to attitude changes. Two-dimensional inverse synthetic aperture radar (ISAR) images can describe the spatial geometry and contour features of a target, but are prone to geometric deformations such as rotation and stretching when the target maneuvers or imaging conditions change. These two types of features characterize the physical properties of the same target from different dimensions and have a potential complementary relationship.

[0005] In existing radar target recognition methods, HRRP and ISAR features are typically modeled separately and then subjected to simple feature stitching or decision-level fusion. This loose fusion approach fails to establish deep correlations between different data domains at the feature level, making it difficult to fully exploit the discriminative and complementary features between range-dimensional structural information and spatial geometric information. When the quality of one feature degrades due to environmental or imaging conditions, the system struggles to effectively compensate using another feature, limiting further improvements in refined recognition performance.

[0006] Therefore, in complex and ever-changing real-world radar application scenarios, how to construct a feature representation method that can jointly characterize the differences between different feature domains while ensuring the effectiveness of echo signals, and fully explore the discriminative correlation between features in multiple domains, is a key issue that urgently needs to be addressed to improve the performance of radar target fine recognition. Summary of the Invention

[0007] To address the problem that existing radar target recognition methods struggle to achieve stable and refined target identification in complex real-world application scenarios, this application provides an end-to-end radar target refined identification method based on echo discriminative learning.

[0008] To achieve the above objectives, this application employs the following technical solution:

[0009] This application presents an end-to-end radar target refinement identification method based on echo discriminative learning, which specifically includes the following steps:

[0010] Step 1: Obtain the radar echo data to be identified, generate HRRP sequences and ISAR images respectively, pair a set of HRRP sequences with an ISAR image to form paired sample data, perform amplitude normalization on the generated HRRP sequences, map the data to the [0,1] interval, standardize the size of the generated ISAR images, divide the processed paired sample data into training set, validation set and test set, and load them in batches;

[0011] Step 2: Simultaneously input the HRRP sequences and ISAR images from the training set into a recognition network that includes a parallel HRRP feature extraction network, an ISAR feature extraction network, and a multi-domain feature fusion and classification module. The signal domain features and image domain features are obtained through the HRRP feature extraction network and the ISAR feature extraction network, respectively, and then the target category is obtained through the multi-domain feature fusion and classification module.

[0012] Step 3: Calculate the difference between the target category obtained in Step 2 and the true label using the cross-entropy loss function to obtain the total loss value. Calculate the gradient of the total loss value with respect to the parameters of each layer of the recognition network based on the backpropagation algorithm, and use the Adam optimizer to iteratively update the parameters of the recognition network until the total loss value converges or reaches the preset number of training rounds.

[0013] Step 4: During training, periodically use the validation set to evaluate the recognition accuracy of the current recognition network, and save the recognition network weight parameters corresponding to the training round with the highest recognition accuracy on the validation set as target parameters for subsequent inference. When the recognition accuracy of the validation set is the same for multiple training rounds, select the recognition network weight parameters corresponding to the training round that first reaches the highest recognition accuracy as the target parameters.

[0014] Step 5: After training, use the test set to evaluate the performance of the saved target parameters.

[0015] A further improvement of this application is that: in step 2, the input of the HRRP feature extraction network is the HRRP signal, and the HRRP feature extraction network includes a three-level one-dimensional convolutional feature extraction structure, a self-attention enhancement module, and a feature mapping layer. The three-level one-dimensional convolutional feature extraction structure is used to extract features from the input HRRP sequence to obtain convolutional features. The self-attention enhancement module is used to adaptively weight the convolutional features in the sequence dimension to highlight key responses and obtain attention-enhanced features. The feature mapping layer is used to map the attention-enhanced features into a one-dimensional HRRP feature vector output.

[0016] A further improvement of this application is that the three-level one-dimensional convolutional feature extraction structure is composed of three groups of convolutional processing units cascaded along the data flow direction. Each group of convolutional processing units includes a one-dimensional convolutional layer, a ReLU activation layer, and a one-dimensional max pooling layer. The one-dimensional convolutional layer of the first group of convolutional processing units has 1 input channel, 32 output channels, a kernel size of 3, a stride of 1, and padding of 1. The one-dimensional convolutional layer of the second group of convolutional processing units has 32 input channels, 32 output channels, a kernel size of 3, a stride of 1, and padding of 1. The one-dimensional convolutional layer of the third group of convolutional processing units has 32 input channels, 64 output channels, a kernel size of 3, a stride of 1, and padding of 1. The max pooling layer in all three groups of convolutional processing units uses max pooling with a kernel size of 2 and a stride of 2.

[0017] The self-attention enhancement module receives the convolutional features output by the three-level one-dimensional convolutional feature extraction structure and adaptively weights the convolutional features. The attention enhancement module includes a query mapping layer, a key mapping layer, and a value mapping layer. The query mapping layer and the key mapping layer both map the input convolutional features from 64 dimensions to 32 dimensions, while the value mapping layer keeps the input convolutional features 64 dimensions. Attention weights are calculated based on the query, key, and value, and attention-enhanced features are obtained by matching the attention weights. The attention weights are calculated using a scaled dot product and normalized using Softmax. Dropout with a dropout rate of 0.25 is applied to the attention weights.

[0018] The feature mapping layer flattens the attention-enhancing features and maps them to a 256-dimensional vector through multi-domain feature fusion and the fully connected classification layer of the classification module, which is then output as a one-dimensional HRRP feature vector.

[0019] A further improvement of this application is that: in step 2, the input to the ISAR feature extraction network is a two-dimensional inverse synthetic aperture radar (ISAR) image. The ISAR feature extraction network consists of four cascaded two-dimensional deformable convolutional feature extraction units. Each two-dimensional deformable convolutional feature extraction unit includes an offset generation convolutional layer and a two-dimensional deformable convolutional layer. The offset generation convolutional layer calculates the sampling offset of the deformable convolution based on the input two-dimensional inverse synthetic aperture radar (ISAR) image. The sampling offset is input as a control parameter to the two-dimensional deformable convolutional layer in the same unit. The two-dimensional deformable convolutional layer uses a 3×3 convolution kernel to perform convolution operations on the input two-dimensional inverse synthetic aperture radar (ISAR) image to extract spatial features. Each two-dimensional... The deformable convolutional feature extraction unit is connected to a ReLU activation layer and a max pooling layer in sequence after the two-dimensional deformable convolutional layer. The max pooling layer is used to downsample the feature map output by the two-dimensional deformable convolutional feature extraction unit. The number of output channels of the four two-dimensional deformable convolutional feature extraction units are set to 16, 32, 64 and 64 respectively, and the four two-dimensional deformable convolutional feature extraction units are connected in series in the order of data flow. The input of the next unit is the output of the max pooling layer of the previous unit. After the feature extraction of the fourth two-dimensional deformable convolutional feature extraction unit is completed, the two-dimensional feature map output by the max pooling layer is flattened and input into the fully connected classification layer of the multi-domain feature fusion and classification module. The flattened feature is mapped to a 256-dimensional vector and output as a one-dimensional ISAR feature vector.

[0020] A further improvement of this application is that, in step 2, the multi-domain feature fusion and classification module includes a feature alignment layer, a Transformer encoder, an adaptive gated fusion unit, and a fully connected classification layer. One-dimensional HRRP feature vectors and one-dimensional ISAR feature vectors are used as inputs to the multi-domain feature fusion and classification module. The feature alignment layer maps the one-dimensional HRRP feature vectors and one-dimensional ISAR feature vectors to the same feature dimension, forming a bi-branch feature representation with the same feature dimension. The Transformer encoder has one layer, four attention heads, and a feature dimension of 256, and achieves the characteristic relationship between the bi-branch feature representations with the same feature dimension through a multi-head self-attention mechanism. The system interacts with and outputs jointly modeled HRRP and ISAR sequence features. The adaptive gating fusion unit is used to assign weights and perform weighted fusion of the HRRP and ISAR sequence features. The adaptive gating fusion unit includes a linear mapping layer that acts on the HRRP and ISAR sequence features respectively, and a weight generation network. The weight generation network outputs two global importance scores, which are normalized by the Softmax function. The HRRP and ISAR sequence features are then weighted and summed to obtain the final fused feature vector. The fully connected classification layer performs classification mapping on the final fused feature vector and outputs the target category of the radar target through the Softmax function.

[0021] A further improvement of this application is that the processing of the multi-domain feature fusion and classification module specifically includes the following steps:

[0022] Step 2.1: Map the one-dimensional HRRP feature vector output by the HRRP feature extraction network to a preset dimension through a fully connected classification layer, and map the one-dimensional ISAR feature vector output by the ISAR feature extraction network to the same preset dimension through a feature alignment layer, thereby obtaining a dual-branch feature representation of the same feature dimension to achieve dimensional alignment between the one-dimensional HRRP feature vector and the one-dimensional ISAR feature vector.

[0023] Step 2.2: Construct a Transformer encoder. Input the feature sequence formed by the two-branch feature representations of the same feature dimension obtained in Step 2.1 into the Transformer encoder. Use the multi-head attention mechanism to jointly model the feature sequence to achieve interactive enhancement of the two-branch feature representations of the same feature dimension. Output the HRRP sequence features and ISAR sequence features after joint modeling.

[0024] Step 2.3: Construct an adaptive gating fusion unit to perform weight allocation and weighted fusion on the HRRP sequence features and ISAR sequence features output in Step 2.2. The linear mapping layer is used to project the HRRP sequence features and ISAR sequence features onto a unified gating space to obtain HRRP sequence projection features and ISAR sequence projection features. The weight generation network generates two global importance scores based on the HRRP sequence projection features and ISAR sequence projection features, and normalizes the two global importance scores into complementary fusion weights through the Softmax function. The HRRP sequence features and ISAR sequence features are weighted and summed according to the two fusion weights to obtain the final fused feature vector.

[0025] A further improvement of this application is that step 1 specifically includes the following steps:

[0026] Step 1.1: Obtain the radar echo data to be identified, and generate HRRP sequences and ISAR images respectively. Specifically, X HRRP range images are generated by continuously selecting X range image processing times within the same observation period for the same target, and these X range images form a set of HRRP sequences. At the same time, one ISAR image is generated using the echo data of the same observation period corresponding to the X HRRP range images, forming paired sample data of X HRRP sequences corresponding to one ISAR image.

[0027] Step 1.2: Assign a unique sample identifier ID to each paired sample data, and establish an index table corresponding to the sample identifier ID and the HRRP sequence and ISAR image to fix the pairing relationship;

[0028] Step 1.3: Perform amplitude normalization on the generated HRRP sequence, map the data to the [0,1] interval, and standardize the size of the generated ISAR image;

[0029] Step 1.4: Divide the processed paired data into training set, validation set and test set according to sample identifier ID, and load them in batches during training, validation and testing: select B samples according to sample identifier ID for each batch, and simultaneously read X HRRP sequences and 1 ISAR image corresponding to each sample to form batch input, so as to ensure the consistent correspondence of multimodal data within the batch.

[0030] The beneficial effects of this application are:

[0031] This application constructs a dual-branch, multi-domain discriminative feature extraction structure to specifically model the structural features of radar echoes in the range and spatial dimensions, enabling the extracted features to more fully reflect the physical differences of targets in different feature domains. This effectively improves the ability to distinguish subtle structural differences between similar targets and is suitable for the refined identification of radar targets.

[0032] This application introduces a joint modeling unit containing a Transformer encoder after the dual-branch feature alignment. It performs correlation modeling on the two features and realizes feature interaction enhancement. Furthermore, it uses an adaptive gating fusion unit to perform weight allocation and weighted fusion on the two features after joint modeling. This improves the robustness of the fusion representation while retaining the discriminative information of the two features. When the quality of one feature is degraded due to environmental or imaging conditions, the stability of the recognition performance can still be maintained by the other feature.

[0033] This application adopts an end-to-end recognition network to uniformly design the feature extraction, feature fusion and target discrimination processes, so that the network parameters are optimized as a whole around the needs of fine target differentiation, avoiding the information loss problem caused by the staged design in traditional methods, which is conducive to improving the overall accuracy and consistency of radar target fine recognition. Attached Figure Description

[0034] Figure 1 This is a flowchart illustrating the process of this application.

[0035] Figure 2 This is the network structure diagram of this application.

[0036] Figure 3 This is a schematic diagram of the HRRP sequence.

[0037] Figure 4 This is a schematic diagram of an ISAR image.

[0038] Figure 5 It is the confusion matrix of HRRP and ISAR in single-modal identification.

[0039] Figure 6 It is the confusion matrix obtained by multi-domain fusion.

[0040] Figure 7 It is the one-dimensional HRRP feature vector and the one-dimensional ISAR feature vector in step 2. Visualization chart.

[0041] Figure 8 It is a feature of fusion. Visualization chart. Detailed Implementation

[0042] The embodiments of the present invention will be disclosed below with reference to the drawings. For clarity, many practical details will be described in the following description. However, it should be understood that these practical details are not intended to limit the invention. That is, in some embodiments of the invention, these practical details are not essential.

[0043] like Figure 1As shown, this application presents an end-to-end radar target refinement identification method based on echo discriminative learning. It constructs an identification network comprising a parallel HRRP feature extraction network, an ISAR feature extraction network, and a multi-domain feature fusion and classification module. Specifically, the method includes the following steps:

[0044] Step 1: Acquire the radar echo data to be identified, generate HRRP sequences and ISAR images respectively, and pair a set of HRRP sequences with an ISAR image to form paired sample data. Perform amplitude normalization on the generated HRRP sequences, mapping the data to the [0,1] interval. Standardize the size of the generated ISAR images. Divide the processed paired sample data into training, validation, and test sets, and load them in batches. Specifically, the steps include:

[0045] Step 1.1: Obtain the radar echo data to be identified, and generate HRRP sequences and ISAR images respectively. Specifically, 128 HRRP range images are generated by continuously selecting 128 range image processing times within the same observation period for the same target, forming a set of HRRP sequences. At the same time, one ISAR image is generated using the echo data of the same observation period corresponding to the 128 HRRP range images, forming paired sample data of 128 HRRP sequences corresponding to one ISAR image.

[0046] Step 1.2: Assign a unique sample identifier ID to each paired sample data, and establish an index table corresponding to the sample identifier ID and the HRRP sequence and ISAR image to fix the pairing relationship;

[0047] Step 1.3: Perform amplitude normalization on the generated HRRP sequence, map the data to the [0,1] interval, and standardize the size of the generated ISAR image;

[0048] Step 1.4: Divide the processed paired data into training set, validation set and test set according to sample ID, and load them in batches during training, validation and testing: select B samples according to sample ID in each batch, and read the corresponding 128 HRRP sequences and 1 ISAR image for each sample to form batch input, so as to ensure the consistent correspondence of multimodal data within the batch.

[0049] Step 2: Simultaneously input the HRRP sequences and ISAR images from the training set into a parallel recognition network containing an HRRP feature extraction network, an ISAR feature extraction network, and a multi-domain feature fusion and classification module. The HRRP and ISAR feature extraction networks respectively extract signal domain features and image domain features, which are then processed by the multi-domain feature fusion and classification module to obtain the target category. The multi-domain feature fusion and classification module specifically includes the following steps:

[0050] Step 2.1: Map the one-dimensional HRRP feature vector output by the HRRP feature extraction network to a preset dimension through a fully connected classification layer, and map the one-dimensional ISAR feature vector output by the ISAR feature extraction network to the same preset dimension through a feature alignment layer, thereby obtaining a dual-branch feature representation of the same feature dimension to achieve dimensional alignment between the one-dimensional HRRP feature vector and the one-dimensional ISAR feature vector.

[0051] Step 2.2: Construct a Transformer encoder. Input the feature sequence into the Transformer encoder by combining the two-branch feature representations of the same feature dimension obtained in Step 2.1. Use the multi-head attention mechanism to jointly model the feature sequence to achieve interactive enhancement of the two-branch feature representations of the same feature dimension. Output the HRRP sequence features and ISAR sequence features after joint modeling.

[0052] Step 2.3: Construct an adaptive gating fusion unit to perform weight allocation and weighted fusion on the HRRP sequence features and ISAR sequence features output in Step 2.2. The linear mapping layer is used to project the HRRP sequence features and ISAR sequence features onto a unified gating space to obtain HRRP sequence projection features and ISAR sequence projection features. The weight generation network generates two global importance scores based on the HRRP sequence projection features and ISAR sequence projection features, and normalizes the two global importance scores into complementary fusion weights through the Softmax function. The HRRP sequence features and ISAR sequence features are weighted and summed according to the two fusion weights to obtain the final fusion feature vector.

[0053] Step 3: Calculate the difference between the target category obtained in Step 2 and the true label using the cross-entropy loss function to obtain the total loss value. Calculate the gradient of the total loss value with respect to the parameters of each layer of the recognition network based on the backpropagation algorithm, and use the Adam optimizer to iteratively update the parameters of the recognition network until the total loss value converges or reaches the preset number of training rounds.

[0054] Step 4: During training, periodically use the validation set to evaluate the recognition accuracy of the current recognition network, and save the recognition network weight parameters corresponding to the training round with the highest recognition accuracy on the validation set as target parameters for subsequent inference. When the recognition accuracy of the validation set is the same for multiple training rounds, select the recognition network weight parameters corresponding to the training round that first reaches the highest recognition accuracy as the target parameters.

[0055] Step 5: After training, use the test set to evaluate the performance of the saved target parameters.

[0056] like Figure 2As shown, the input to the HRRP feature extraction network is the HRRP signal. The HRRP feature extraction network includes a three-level one-dimensional convolutional feature extraction structure, a self-attention enhancement module, and a feature mapping layer. The three-level one-dimensional convolutional feature extraction structure is used to extract features from the input HRRP sequence to obtain convolutional features. The self-attention enhancement module is used to adaptively weight the convolutional features in the sequence dimension to highlight key responses and obtain attention-enhanced features. The feature mapping layer is used to map the attention-enhanced features into a one-dimensional HRRP feature vector output.

[0057] like Figure 2 As shown, the three-level one-dimensional convolutional feature extraction structure consists of three sets of convolutional processing units cascaded along the data flow direction. Each set of convolutional processing units includes a one-dimensional convolutional layer, a ReLU activation layer, and a one-dimensional max pooling layer. The first set of convolutional processing units has 1 input channel and 32 output channels for its one-dimensional convolutional layer, with a kernel size of 3, a stride of 1, and padding of 1. The second set of convolutional processing units has 32 input channels and 32 output channels for its one-dimensional convolutional layer, with a kernel size of 3, a stride of 1, and padding of 1. The third set of convolutional processing units has 32 input channels and 64 output channels for its one-dimensional convolutional layer, with a kernel size of 3, a stride of 1, and padding of 1. The max pooling layers in all three sets of convolutional processing units use max pooling with a kernel size of 2 and a stride of 2.

[0058] like Figure 2 As shown, the attention enhancement module receives the convolutional features output by the three-level one-dimensional convolutional feature extraction structure and adaptively weights the convolutional features. The attention enhancement module includes a query mapping layer, a key mapping layer, and a value mapping layer. The query mapping layer and the key mapping layer both map the input convolutional features from 64 dimensions to 32 dimensions, while the value mapping layer keeps the input convolutional features 64 dimensions. Attention weights are calculated based on the query, key, and value, and attention-enhanced features are obtained by matching the attention weights. The attention weights are calculated using a scaled dot product and normalized using Softmax. Dropout with a dropout rate of 0.25 is applied to the attention weights.

[0059] The feature mapping layer flattens the attention-enhancing features and maps them to a 256-dimensional vector through multi-domain feature fusion and the fully connected classification layer of the classification module, which is then output as a one-dimensional HRRP feature vector.

[0060] like Figure 2As shown, the input to the ISAR feature extraction network is an ISAR image, i.e., a two-dimensional inverse synthetic aperture radar image. The ISAR feature extraction network consists of four cascaded two-dimensional deformable convolutional feature extraction units. Each two-dimensional deformable convolutional feature extraction unit includes an offset generation convolutional layer and a two-dimensional deformable convolutional layer. The offset generation convolutional layer calculates the sampling offset of the deformable convolution based on the input two-dimensional inverse synthetic aperture radar (ISAR) image. This sampling offset is used as a control parameter input to the two-dimensional deformable convolutional layer within the same unit. The two-dimensional deformable convolutional layer uses a 3×3 convolutional kernel to perform convolution operations on the input two-dimensional inverse synthetic aperture radar (ISAR) image to extract spatial features. Each two-dimensional deformable convolutional feature extraction unit... The extraction unit is connected to a ReLU activation layer and a max pooling layer in sequence after the two-dimensional deformable convolutional layer. The max pooling layer is used to downsample the feature map output by the two-dimensional deformable convolutional feature extraction unit. The number of output channels of the four two-dimensional deformable convolutional feature extraction units are set to 16, 32, 64 and 64 respectively, and the four two-dimensional deformable convolutional feature extraction units are connected in series in the order of data flow. The input of the next unit is the output of the max pooling layer of the previous unit. After the feature extraction of the fourth two-dimensional deformable convolutional feature extraction unit is completed, the two-dimensional feature map output by the max pooling layer is flattened and input into the fully connected classification layer of the multi-domain feature fusion and classification module. The flattened feature is mapped to a 256-dimensional vector and output as a one-dimensional ISAR feature vector.

[0061] like Figure 2As shown, in step 2, the multi-domain feature fusion and classification module includes a feature alignment layer, a Transformer encoder, an adaptive gated fusion unit, and a fully connected classification layer. One-dimensional HRRP feature vectors and one-dimensional ISAR feature vectors are used as inputs to the multi-domain feature fusion and classification module. The feature alignment layer maps the one-dimensional HRRP feature vectors and one-dimensional ISAR feature vectors to the same feature dimension, forming a bi-branch feature representation with the same feature dimension. The Transformer encoder has one layer, four attention heads, and a feature dimension of 256. It achieves feature interaction between the bi-branch feature representations with the same feature dimension through a multi-head self-attention mechanism. The HRRP and ISAR sequence features, after joint modeling, are generated. The adaptive gating fusion unit is used to assign weights and perform weighted fusion on the HRRP and ISAR sequence features. The adaptive gating fusion unit includes a linear mapping layer that acts on the HRRP and ISAR sequence features respectively, and a weight generation network. The weight generation network outputs two global importance scores, which are normalized by the Softmax function. The HRRP and ISAR sequence features are then weighted and summed to obtain the final fused feature vector. The fully connected classification layer performs classification mapping on the final fused feature vector and outputs the target category of the radar target through the Softmax function.

[0062] To verify this application, the following experimental data are provided.

[0063] Dataset settings

[0064] In this embodiment, the experimental dataset covers radar echo data from five mainstream civil aircraft models, including corresponding HRRP sequences and ISAR image data. The composition and physical meaning of the two types of data are explained below.

[0065] HRRP data:

[0066] In this embodiment, the HRRP sequence is represented as a one-dimensional time series signal, with each sample being a vector of length 128, used to represent the echo intensity distribution at different distance cells. For example... Figure 3 As shown, from a physical perspective, each data point in the HRRP sequence corresponds to the electromagnetic scattering intensity of the target within a specific range cell. The magnitude of the amplitude reflects the relative strength of the scattering center at that range location, and the peak position usually corresponds to the target's main scattering structure.

[0067] ISAR data:

[0068] In this embodiment, the ISAR data used is a normalized grayscale image matrix with a size of 128×128 pixels, such as... Figure 4As shown, in ISAR images, the horizontal axis corresponds to the range dimension, reflecting the structural distribution of the target along the radar line of sight; the vertical axis corresponds to the Doppler dimension, reflecting the radial velocity differences of different parts of the target due to rotation or micro-motion. The spatial distribution of bright spots in the image intuitively characterizes the target's main scattering regions and their spatial geometric features.

[0069] To verify the effectiveness and necessity of this application compared to single-feature-domain recognition methods, an ablation comparison experiment was designed in this embodiment. The ablation experiment used a recognition network based solely on HRRP sequences, a recognition network based solely on ISAR images, and a recognition network using the multi-domain feature fusion architecture of this application. The target recognition performance was compared and analyzed under the same test set conditions, and the experimental results are shown in Table 1.

[0070] Table 1. Average recognition accuracy of unimodal and multimodal recognition

[0071]

[0072] Table 1 shows the average recognition accuracy of different recognition methods on the test set. The experimental results show that when using only HRRP sequence features for recognition, the average recognition accuracy is 80.02% due to the locality of distance dimension information; when using only ISAR image features for recognition, although the recognition performance is improved, the average recognition accuracy is still 82.06%.

[0073] In contrast, the multi-domain feature fusion-based recognition method proposed in this application can jointly model HRRP sequence features and ISAR image features at the feature level, achieving complementary utilization of information from different feature domains and improving the average recognition accuracy on the test set to 87.18%. This result demonstrates that deep fusion of multi-domain features can effectively alleviate the problem of insufficient information from a single feature domain, thereby improving the overall performance of radar target fine-grained recognition.

[0074] To further analyze the effect of multi-domain feature fusion on specific target categories, this embodiment compares and analyzes the confusion matrices under different recognition methods, and the results are shown in Figure 5. Figure 6 As shown.

[0075] like Figure 5 As shown, under single-modal recognition conditions, the recognition performance of different target models varies significantly in the HRRP or ISAR feature domains. For example, for the "BY787" target, the recognition accuracy is low when based solely on HRRP features, and it is easily confused with targets with similar structures; while when based solely on ISAR features, its recognition performance also fluctuates to some extent due to the influence of imaging quality and attitude changes.

[0076] like Figure 6As shown, the recognition results for different target models are significantly improved after adopting the multi-domain feature fusion method of this application. The features after multi-domain fusion can enhance the ability to express the subtle structural differences of the target based on the complementarity of HRRP and ISAR features, and effectively distinguish some targets that are easily confused under a single feature domain, thereby reducing the overall false positive rate and improving the recognition stability.

[0077] The experimental results above show that the multi-domain feature fusion method proposed in this application is not a simple feature stitching, but rather establishes the correlation between different feature domains at the feature level to achieve the coordinated utilization of multi-source information of radar targets, thereby maintaining good fine recognition performance under different target types and imaging conditions.

[0078] To further analyze the distribution characteristics of the feature space learned by the network under different feature modeling methods, this embodiment adopts... A visualization method is used to map the high-dimensional features output by the network to a two-dimensional space for display. By comparing the feature distribution under single feature domain and multi-domain feature fusion conditions, the structural improvement effect of the method of this invention on feature discriminability is analyzed.

[0079] like Figure 7 As shown in Figure (a), when modeling is based solely on HRRP features, there is a significant overlap in the feature distributions of different target categories. Specifically, the feature clusters of targets A320 and C919 exhibit substantial overlap in the two-dimensional projection space, resulting in unclear category boundaries. This phenomenon indirectly reflects that when targets share similarities in fuselage size and scattering structure, relying solely on distance-dimensional features is insufficient to form a stable distinguishing structure, thus limiting the performance of refined recognition.

[0080] like Figure 7 As shown in Figure (b), when modeling is based solely on ISAR images, the overall separation of features across categories is improved compared to HRRP, but the problem of discrete intra-class distribution still exists. Taking the A330 category as an example, its feature samples are split into multiple sub-clusters in the projection space, indicating that ISAR images are quite sensitive to target attitude and imaging conditions, and the feature representation of the same target varies significantly under different observation conditions. In addition, the feature distribution of categories such as BY787 also exhibits a certain degree of looseness, indicating that the stability of single two-dimensional spatial features in responding to attitude changes is still limited.

[0081] In comparison, such as Figure 8As shown, after adopting the multi-domain feature fusion method proposed in this application, the distribution of each target category in the feature space exhibits a clearer and more compact structure. Feature clusters that were originally split or overlapping in a single feature domain are effectively aggregated, and the boundaries between different categories are more clearly defined. For example, the scattered A330 sub-cluster in the ISAR image forms a relatively concentrated feature distribution after fusion; the A320 and C919 categories, which have severe aliasing in the HRRP features, are clearly distinguishable in the fused feature space.

[0082] The visualization results above demonstrate that the multi-domain feature fusion strategy proposed in this application can effectively integrate the complementary advantages of distance dimension structural information and spatial geometric information at the feature level, enabling the features learned by the network to maintain intra-class compactness while enhancing inter-class discriminability, thereby providing a more stable and reliable feature representation for the refined identification of radar targets.

[0083] 4. Experimental Conclusions

[0084] Based on the above ablation comparison experiments, confusion matrix analysis, and feature visualization results, we can conclude that the identification method proposed in this application can achieve stable and effective target differentiation in the constructed experimental scenario, verifying the feasibility and rationality of this application.

[0085] Experimental results show that by jointly modeling HRRP sequence features and ISAR image features, the multi-domain feature fusion method can make full use of the complementary information between different feature domains, enhance the expressive ability of target structural differences at the feature level, and thus outperform the recognition method that relies on only a single feature domain in terms of overall recognition performance.

[0086] Furthermore, when faced with target categories of similar size and structure, the multi-domain fusion model can reduce the tendency to misclassify under a single feature domain by comprehensively utilizing the discriminative information in different feature domains. Under certain imaging conditions that are limited or feature quality is reduced, the model can still rely on the information of another feature domain to maintain the stability of the recognition results, demonstrating good robustness and adaptability.

[0087] The visualization analysis of the feature manifold distribution further illustrates that the feature space learned in this application possesses a relatively clear class structure and good intra-class compactness, providing a stable feature representation foundation for the refined identification of radar targets. The experimental verification above demonstrates that the technical solution proposed in this application can provide a feasible and effective end-to-end implementation method for automatic radar target identification under complex application conditions.

[0088] The above description is merely an embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principle of the present invention should be included within the scope of the claims of the present invention.

Claims

1. An end-to-end radar target fine-grained identification method based on echo discriminative learning, characterized in that: The end-to-end radar target fine-grained identification method specifically includes the following steps: Step 1: Obtain the radar echo data to be identified, generate HRRP sequences and ISAR images respectively, pair a set of HRRP sequences with an ISAR image to form paired sample data, perform amplitude normalization on the generated HRRP sequences, map the data to the [0,1] interval, standardize the size of the generated ISAR images, divide the processed paired sample data into training set, validation set and test set, and load them in batches; Step 2: Simultaneously input the HRRP sequences and ISAR images from the training set into a recognition network that includes a parallel HRRP feature extraction network, an ISAR feature extraction network, and a multi-domain feature fusion and classification module. The signal domain features and image domain features are obtained through the HRRP feature extraction network and the ISAR feature extraction network, respectively, and then the target category is obtained through the multi-domain feature fusion and classification module. Step 3: Calculate the difference between the target category obtained in Step 2 and the true label using the cross-entropy loss function to obtain the total loss value. Calculate the gradient of the total loss value with respect to the parameters of each layer of the recognition network based on the backpropagation algorithm, and use the Adam optimizer to iteratively update the parameters of the recognition network until the total loss value converges or reaches the preset number of training rounds. Step 4: During training, periodically use the validation set to evaluate the recognition accuracy of the current recognition network, and save the recognition network weight parameters corresponding to the training round with the highest recognition accuracy on the validation set as target parameters for subsequent inference. When the recognition accuracy of the validation set is the same for multiple training rounds, select the recognition network weight parameters corresponding to the training round that first reaches the highest recognition accuracy as the target parameters. Step 5: After training, use the test set to evaluate the performance of the saved target parameters.

2. The end-to-end radar target fine-grained identification method based on echo discriminative learning according to claim 1, characterized in that: In step 2, the input to the HRRP feature extraction network is the HRRP signal. The HRRP feature extraction network includes a three-level one-dimensional convolutional feature extraction structure, a self-attention enhancement module, and a feature mapping layer. The three-level one-dimensional convolutional feature extraction structure is used to extract features from the input HRRP sequence to obtain convolutional features. The self-attention enhancement module is used to adaptively weight the convolutional features in the sequence dimension to obtain attention-enhanced features. The feature mapping layer is used to map the attention-enhanced features into a one-dimensional HRRP feature vector output.

3. The end-to-end radar target fine-grained identification method based on echo discriminative learning according to claim 2, characterized in that: The three-level one-dimensional convolutional feature extraction structure consists of three sets of convolutional processing units cascaded along the data flow direction. Each set of convolutional processing units includes a one-dimensional convolutional layer, a ReLU activation layer, and a one-dimensional max pooling layer. The first set of convolutional processing units has 1 input channel and 32 output channels for its one-dimensional convolutional layer, with a kernel size of 3, a stride of 1, and padding of 1. The second set of convolutional processing units has 32 input channels and 32 output channels for its one-dimensional convolutional layer, with a kernel size of 3, a stride of 1, and padding of 1. The third set of convolutional processing units has 32 input channels and 64 output channels for its one-dimensional convolutional layer, with a kernel size of 3, a stride of 1, and padding of 1. The max pooling layers in all three sets of convolutional processing units use max pooling with a kernel size of 2 and a stride of 2. The self-attention enhancement module receives the convolutional features output by the three-level one-dimensional convolutional feature extraction structure and adaptively weights the convolutional features. The attention enhancement module includes a query mapping layer, a key mapping layer, and a value mapping layer. The query mapping layer and the key mapping layer both map the input convolutional features from 64 dimensions to 32 dimensions, while the value mapping layer keeps the input convolutional features 64 dimensions. Attention weights are calculated based on the query, key, and value, and attention-enhanced features are obtained by matching the attention weights. The feature mapping layer flattens the attention-enhancing features and maps them to a 256-dimensional vector through multi-domain feature fusion and the fully connected classification layer of the classification module, which is then output as a one-dimensional HRRP feature vector.

4. The end-to-end radar target fine-grained identification method based on echo discriminative learning according to claim 3, characterized in that: In step 2, the input to the ISAR feature extraction network is an ISAR image. The ISAR feature extraction network consists of four cascaded two-dimensional deformable convolutional feature extraction units. Each two-dimensional deformable convolutional feature extraction unit includes an offset generation convolutional layer and a two-dimensional deformable convolutional layer. The offset generation convolutional layer calculates the sampling offset of the deformable convolution based on the input ISAR image. The sampling offset is used as a control parameter input to the two-dimensional deformable convolutional layer in the same unit. The two-dimensional deformable convolutional layer uses a 3×3 convolutional kernel to perform convolution operations on the input ISAR image to extract spatial features. Each 2D deformable convolutional feature extraction unit is followed by a ReLU activation layer and a max pooling layer in sequence after the 2D deformable convolutional layer. The max pooling layer is used to downsample the feature map output by the 2D deformable convolutional feature extraction unit. The number of output channels of the four 2D deformable convolutional feature extraction units are set to 16, 32, 64 and 64 respectively, and the four 2D deformable convolutional feature extraction units are connected in series in the order of data flow. The input of the next unit is the output of the max pooling layer of the previous unit. After the feature extraction of the fourth 2D deformable convolutional feature extraction unit is completed, the 2D feature map output by the max pooling layer is flattened and input into the fully connected classification layer of the multi-domain feature fusion and classification module. The flattened feature is mapped to a 256-dimensional vector and output as a one-dimensional ISAR feature vector.

5. The end-to-end radar target fine-grained identification method based on echo discriminative learning according to claim 4, characterized in that: In step 2, the multi-domain feature fusion and classification module includes a feature alignment layer, a Transformer encoder, an adaptive gated fusion unit, and a fully connected classification layer. One-dimensional HRRP feature vectors and one-dimensional ISAR feature vectors are used as inputs to the multi-domain feature fusion and classification module. The feature alignment layer maps the one-dimensional HRRP feature vectors and one-dimensional ISAR feature vectors to the same feature dimension, forming a bi-branch feature representation with the same feature dimension. The Transformer encoder has one layer, four attention heads, and a feature dimension of 256. It uses a multi-head self-attention mechanism to achieve feature interaction between the bi-branch feature representations with the same feature dimension, and outputs... After joint modeling, the HRRP sequence features and ISAR sequence features are used by an adaptive gating fusion unit to perform weight allocation and weighted fusion of the HRRP sequence features and ISAR sequence features. The adaptive gating fusion unit includes a linear mapping layer that acts on the HRRP sequence features and ISAR sequence features respectively, and a weight generation network. The weight generation network outputs two global importance scores, which are normalized by the Softmax function. The HRRP sequence features and ISAR sequence features are then weighted and summed to obtain the final fused feature vector. The fully connected classification layer performs classification mapping on the final fused feature vector and outputs the target category of the radar target through the Softmax function.

6. The end-to-end radar target fine-grained identification method based on echo discriminative learning according to claim 5, characterized in that: The multi-domain feature fusion and classification module specifically includes the following steps: Step 2.1: Map the one-dimensional HRRP feature vector output by the HRRP feature extraction network to a preset dimension through a fully connected classification layer, and map the one-dimensional ISAR feature vector output by the ISAR feature extraction network to the same preset dimension through a feature alignment layer, thereby obtaining a dual-branch feature representation of the same feature dimension to achieve dimensional alignment between the one-dimensional HRRP feature vector and the one-dimensional ISAR feature vector. Step 2.2: Construct a Transformer encoder. Input the feature sequence formed by the two-branch feature representations of the same feature dimension obtained in Step 2.1 into the Transformer encoder. Use the multi-head attention mechanism to jointly model the feature sequence to achieve interactive enhancement of the two-branch feature representations of the same feature dimension. Output the HRRP sequence features and ISAR sequence features after joint modeling. Step 2.3: Construct an adaptive gating fusion unit to perform weight allocation and weighted fusion on the HRRP sequence features and ISAR sequence features output in Step 2.

2. The linear mapping layer is used to project the HRRP sequence features and ISAR sequence features onto a unified gating space to obtain HRRP sequence projection features and ISAR sequence projection features. The weight generation network generates two global importance scores based on the HRRP sequence projection features and ISAR sequence projection features, and normalizes the two global importance scores into complementary fusion weights through the Softmax function. The HRRP sequence features and ISAR sequence features are weighted and summed according to the two fusion weights to obtain the final fused feature vector.

7. The end-to-end radar target fine-grained identification method based on echo discriminative learning according to claim 4, characterized in that: Step 1 specifically includes the following steps: Step 1.1: Obtain the radar echo data to be identified, and generate HRRP sequences and ISAR images respectively. Specifically, X HRRP range images are generated by continuously selecting X range image processing times within the same observation period for the same target, and these X range images form a set of HRRP sequences. At the same time, one ISAR image is generated using the echo data of the same observation period corresponding to the X HRRP range images, forming paired sample data of X HRRP sequences corresponding to one ISAR image. Step 1.2: Assign a unique sample identifier ID to each paired sample data, and establish an index table corresponding to the sample identifier ID and the HRRP sequence and ISAR image to fix the pairing relationship; Step 1.3: Perform amplitude normalization on the generated HRRP sequence, map the data to the [0,1] interval, and standardize the size of the generated ISAR image; Step 1.4: Divide the processed paired data into training set, validation set and test set according to sample identifier ID, and load them in batches during training, validation and testing: select B samples according to sample identifier ID for each batch, and simultaneously read X HRRP sequences and 1 ISAR image corresponding to each sample to form batch input, so as to ensure the consistent correspondence of multimodal data within the batch.