A high-precision satellite navigation signal interference identification method based on lightweight deep learning
By constructing a lightweight satellite navigation signal interference identification model and combining multi-scale time-frequency decoupled convolution and ESCA attention mechanism, the problems of high resource consumption and insufficient identification accuracy are solved, and high-precision interference identification is achieved in resource-constrained environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHONGQING UNIV OF POSTS & TELECOMM
- Filing Date
- 2026-03-05
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies for identifying interference in satellite navigation signals suffer from redundant model parameters, leading to excessive resource consumption and making it difficult to deploy on resource-constrained embedded terminals. Furthermore, lightweight models lack sufficient accuracy when processing complex interference signals, failing to meet the requirements of highly reliable positioning and timing systems.
A lightweight interference recognition model with inverted residual structure as the backbone is constructed. It combines multi-scale time-frequency decoupled convolution and symmetric efficient spatial channel attention mechanism. The feature extraction capability is enhanced by multi-scale time-frequency decoupled convolution, and the spatial and channel joint recalibration of time-frequency images is performed by ESCA attention mechanism.
While reducing the number of model parameters and computational overhead, it achieves high-precision identification of complex interference signals, making it suitable for resource-constrained embedded satellite navigation receivers and onboard processing platforms.
Smart Images

Figure CN122307598A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of satellite navigation interference identification technology, and specifically to a lightweight deep learning-based high-precision satellite navigation signal interference identification method. Background Technology
[0002] As a crucial component of national critical information infrastructure, Global Navigation Satellite Systems (GNSS) are widely used in transportation, national defense, and smart terminal navigation. Their positioning, navigation, and timing services directly impact the safe and reliable operation of critical industries. With the continuous expansion of GNSS application scenarios and the increasing complexity of the electromagnetic environment, electromagnetic interference has become a key bottleneck restricting the stable operation of the system. Various forms of intentional interference, including suppression interference, can lead to abnormal satellite navigation signal reception and degraded positioning accuracy, potentially causing navigation service interruptions and posing significant risks in safety-critical scenarios such as autonomous driving, aerospace, and emergency rescue. Therefore, researching high-precision and easily deployable satellite navigation signal interference identification technologies for end-user devices is of great significance for building a reliable interference monitoring and protection system.
[0003] Early research on satellite navigation signal interference identification often employed a "manual feature + traditional classifier" approach. This typically involved extracting statistical measures, energy distribution, and spectral characteristics from the time or frequency domains, and then using classifiers such as Support Vector Machines (SVM), Random Forests (RF), and decision trees to determine the interference type. While these methods are simple to implement and offer some reliability under specific data distributions or small sample sizes, their performance is highly dependent on feature design and prior assumptions. They struggle to cover the diversity and non-stationary characteristics of real-world interference and are prone to insufficient generalization across different scenarios. Especially under low interference-to-signal ratios or strong background noise, interference discrimination information often manifests as fine-grained time-frequency texture differences or transient features. In such cases, manual features are easily submerged by noise, leading to class boundary degradation and thus very limited interference identification accuracy.
[0004] In recent years, with the rapid development of deep learning technology, deep convolutional neural networks, represented by ResNet and GoogleNet, have achieved high recognition accuracy in satellite navigation signal interference identification tasks due to their powerful feature extraction capabilities. However, these models generally suffer from severe parameter redundancy, with tens of millions of parameters leading to huge storage resource overhead, making them difficult to deploy on embedded satellite navigation receiver terminals with extremely limited capacity and memory resources. To reduce resource consumption, the academic community has proposed lightweight networks, represented by MobileNet and ShuffleNet, which significantly compress the model size through strategies such as depthwise separable convolution. However, excessive compression can limit the model's representational ability, especially when dealing with satellite navigation interference signals with complex time-frequency morphology or weak power. Existing lightweight networks struggle to capture subtle discriminative interference features, resulting in a bottleneck in recognition accuracy and failing to meet the stringent requirements of high-reliability positioning and timing systems for interference perception performance. Summary of the Invention
[0005] In view of this, this invention proposes a lightweight deep learning-based high-precision satellite navigation signal interference identification method. This method constructs a lightweight interference identification model with an inverse residual structure as its backbone, and combines multi-scale time-frequency decoupled convolution with a symmetric efficient spatial and channel attention (ESCA) mechanism to achieve accurate identification of complex interference signals under low resource consumption. On the one hand, to balance model lightweighting with the ability to represent the time-frequency morphology of interference signals, this invention introduces a multi-scale time-frequency decoupled convolution design in the inverse residual feature extraction unit. This design decomposes the feature learning of two-dimensional convolution along the time and frequency axes, and simultaneously characterizes long-distance time-frequency dependence and local texture details through different receptive field branches. This enhances the ability to extract the time-frequency morphology of interference signals such as swept-frequency textures, narrowband line spectra, and pulse stripes without significantly increasing parameters and computational overhead. On the other hand, addressing the issues of interference discrimination information being easily submerged and similar interference signals being easily confused under low interference-to-signal ratio and strong noise background conditions, this invention integrates the ESCA attention mechanism into the backbone network. This mechanism performs spatial weighting on significant interference regions in the time-frequency image and channel weighting on feature channels related to different interference patterns, thereby achieving joint spatial and channel recalibration of the interference time-frequency image. This invention not only significantly reduces the number of model parameters, thus reducing storage resource consumption, but also achieves high-precision identification of satellite navigation signal interference in complex electromagnetic environments.
[0006] To achieve the above objectives, the technical solution adopted by this invention is: a lightweight deep learning-based high-precision satellite navigation signal interference identification method, comprising the following steps:
[0007] Step 1: Construct a satellite navigation signal dataset containing various types of interference, and divide the dataset into a training set and a test set according to a preset ratio;
[0008] Step 2: Perform DC removal, fixed window slicing, and power normalization preprocessing on the signal samples in the training and test sets in sequence to obtain standardized time-domain signal samples;
[0009] Step 3: Perform a short-time Fourier transform on the standardized time-domain signal samples to convert them to the time-frequency domain and generate a two-dimensional time-frequency feature image;
[0010] Step 4: Construct a lightweight satellite navigation signal interference identification model with an improved inverted residual structure as the backbone network and the introduction of a symmetrical and efficient spatial channel attention mechanism;
[0011] Step 5: Input the time-frequency images of the training set into the interference recognition model for training, and update the network parameters through backpropagation until the loss function converges;
[0012] Step 6: Input the time-frequency images of the test set or the dataset to be tested into the trained interference recognition model, and output the interference category recognition result.
[0013] In step 1 of the above technical solution, the satellite navigation signal data containing various interferences includes single-tone interference, multi-tone interference, linear frequency modulation interference, sinusoidal frequency modulation interference, narrowband noise frequency modulation interference, band-limited Gaussian noise interference, impulse interference, and interference-free satellite navigation signals.
[0014] Based on the above technical solution, the DC removal in step 2 is as follows: For each satellite navigation interference signal sample, the following method is used... DC component removal:
[0015]
[0016]
[0017] in, Represents the mean of the sample sequence. Indicates the length of the sample sequence. This represents the zero-mean sample after removing the DC component.
[0018] The fixed window slicing involves extracting signal segments of length N sampling points within a fixed time window, with the initial window position sliding sequentially at a fixed step size S.
[0019]
[0020] in, The slice number. The slice length, The sliding step size, For the index of sampling points within the window, Indicates the first The first slice The signal values at each sampling point. When the last segment has fewer than N points, zero-padding or truncation can be used to ensure that each segment has a consistent length.
[0021] The power normalization includes: the first The average power of the segmented signal is:
[0022]
[0023] Then according to the target power The signal is scaled and normalized to obtain the final standardized time-domain signal sample:
[0024]
[0025] Based on the above technical solution, the Short-Time Fourier Transform (STFT) method in step 3 decomposes the preprocessed signal into a series of short-time stationary frames by applying a sliding window of fixed length, and calculates its spectrum frame by frame, thereby constructing a time-frequency matrix that reflects the frequency evolution characteristics over time; the STFT is represented as:
[0026]
[0027] in, For time frame indexing, For frame shift, For window functions, For the index of sampling points within the window,
[0028] For frequency index, For FFT points, The length of the window;
[0029] To obtain a stable two-dimensional feature image representing the disturbance morphology, the amplitude squared of the STFT result is used to form a time-frequency energy spectrum.
[0030]
[0031] Simultaneously, the image is normalized and mapped to an image matrix with a mapping range of [0,1], resulting in a two-dimensional matrix that can be used to store the image as network input.
[0032]
[0033] in, and These represent the range of values for the time-frequency matrix of sample X;
[0034] Two-dimensional matrix The images are resampled or cropped according to a preset format and stored as time-frequency images in a preset channel format. Finally, the time-frequency image corresponding to each sample is saved in a preset format and a one-to-one correspondence is established with its interference category label to form the time-frequency image dataset required in steps 4-6.
[0035] Based on the above technical solution, the lightweight satellite navigation signal interference identification model described in step 4 uses the enhanced inverted residual block (eIRB) as the basic unit to form a backbone network, and embeds a multi-scale time-frequency decoupled deep convolution and symmetric efficient spatial channel attention (ESCA) mechanism in each eIRB. The interference recognition model consists of an input layer, an initial feature extraction (Stem) layer, a feature extraction backbone network (Backbone) formed by multi-level eIRB stacking, and a classification output (Task Head) layer. The Backbone consists of 8 stages (S1 to S8) connected in series. The output feature map sizes of each stage are as follows: S1 output 112×112×32; S2 output 112×112×16; S3 output 56×56×24; S4 output 28×28×32; S5 output 14×14×64; S6 output 14×14×96; S7 output 7×7×160; S8 output 7×7×320. The feature map S8 is mapped to 7×7×1280 by a 1×1 convolution and then input into the Task Head layer for classification.
[0036] Based on the above technical solution, the improved inverse residual module introduces a dual-branch multi-scale time-frequency decoupling convolution structure on the basis of the original single-core deep convolution structure. This allows for the extraction of subtle texture features in the temporal and frequency domains of the interfering time-frequency image under low computational complexity. In the spatial modeling stage of the inverse residual bottleneck block, the input time-frequency features are first expanded in dimensionality, and then time-frequency decoupling convolution processing is performed separately through parallel deep convolution branches. The outputs of each branch are fused element-wise, thereby achieving complementary enhancement of multi-scale time-frequency features without increasing the number of channels or additional parameters. The deep convolution branches include a 1×7 temporal asymmetric deep convolution branch, a 7×1 frequency asymmetric deep convolution branch, and a 3×3 local texture deep convolution branch. The 1×7 and 7×1 branches are used to model long-distance time-frequency dependence and strip-shaped energy distribution along the time axis and frequency axis, respectively, while the 3×3 branch is used to capture local details, point-like and block-like energy aggregation features. Compared to traditional single large convolution kernels, the multi-scale time-frequency decoupled convolution reduces the number of convolution kernel parameters and multiply-accumulate operations through directional decoupling and small kernel combination. At the same time, it enhances the ability to represent the morphological differences of different interferences such as narrowband, wideband, frequency sweep, and pulse in the time-frequency graph, thereby effectively improving the recognition efficiency and robustness of the model in edge deployment scenarios.
[0037] Based on the above technical solution, the ESCA mechanism is used to adaptively recalibrate the intermediate features of the backbone network to highlight the key channel responses and key time-frequency region responses related to interference category discrimination. ESCA includes pre- and post-channel attention branches and a time-frequency dual-axis spatial attention branch. The pre- and post-channel attention branches are located before and after the dual-axis spatial attention weighting, respectively, and are used to recalibrate the importance of feature channels twice. Each channel attention branch obtains channel statistical vectors through global average pooling and models the local correlation between channels using one-dimensional convolution. Channel weights are generated by Sigmoid activation to achieve channel-wise weighting. The time-frequency dual-axis spatial attention branch obtains one-dimensional sequence features by performing global average pooling along the time and frequency dimensions, and generates time axis weights and frequency axis weights by one-dimensional convolution. After Sigmoid activation, the feature maps are weighted time-wise and frequency-wise, respectively, thus forming a symmetrical significance enhancement process in the time and frequency dimensions. Since ESCA mainly consists of pooling operations and one-dimensional convolution, it avoids the introduction of large-scale fully connected layers, thus having low parameter and memory overhead. At the same time, its channel and dual-axis weighting can suppress redundant background responses and enhance the discrimination information such as interference energy concentration areas, edge directions and texture structures, thereby improving the model's ability to distinguish interference categories in low interference-to-signal ratio or strong background noise scenarios while maintaining lightweight design.
[0038] Based on the above technical solution, the inverse residual bottleneck structure is used to achieve efficient feature transformation under lightweight constraints. Its basic process is "channel dimensionality increase → time-frequency feature extraction → channel dimensionality reduction"; specifically, firstly, 1×1 pointwise convolution is used to transform the input feature channels from Expand to To enhance feature representation, multi-scale time-frequency decoupled deep convolution is then used to model the local time-frequency texture of the expanded features; finally, 1×1 pointwise convolution is used to compress the channels to... , to match the input of the next layer.
[0039] Based on the above technical solution, the overall architecture of the lightweight satellite navigation signal interference identification model is as follows:
[0040] Input layer: The time-frequency image generated in step 3 is used as the model input, and the input tensor is represented as: H represents the number of pixels in the frequency dimension, and W represents the number of pixels in the time dimension. Input the number of channels;
[0041] Initial feature extraction layer: Performs initial feature extraction and downsampling on the input, combined with batch normalization (BN) and activation functions. The basic time-frequency characteristics are obtained. :
[0042]
[0043] Feature extraction backbone network: Multiple stages are stacked with several eIRB modules to extract multi-scale time-frequency texture features layer by layer, obtaining the final feature output. ;
[0044] Output layer: After processing through multiple stages, the final output features are determined. go through convolution( The data is mapped to a high-dimensional semantic channel, then passed through a GAP layer and a fully connected layer to output the final probability of all interference.
[0045] P = Softmax(W·GAP( ( ))+b)
[0046] Where W and b are the weight matrix and bias vector of the fully connected layer, respectively, and P is the probability distribution vector of the interference class in the final output.
[0047] Based on the above technical solution, the architecture and processing flow of the improved inverted residual module (eIRB) are as follows:
[0048] Let the eIRB input be First, through Pointwise convolution (PWConv) expands the number of channels. The number of output channels is times that of the previous generation. After BN normalization and activation, high-dimensional features are output. :
[0049]
[0050] Secondly, Simultaneously, the input is fed into three parallel deep convolutional (DWConv) branches to extract features from different dimensions:
[0051]
[0052]
[0053]
[0054] in, , , These represent temporal features, frequency domain features, and local texture features, respectively. Finally, the outputs of the three branches are summed element-wise to obtain the fused intermediate feature map. :
[0055]
[0056] The ESCA mechanism employs a symmetrical serial structure of "channel-height-width-channel". Recalibrate. The ESCA mechanism consists of four phases:
[0057] (1) Pre-channel attention
[0058] First, feature maps are aggregated using Global Average Pooling (GAP). Spatial information is then used to generate channel weights via one-dimensional convolution:
[0059]
[0060] Encoding channel features using one-dimensional convolution:
[0061]
[0062]
[0063]
[0064] in, For channel statistics, Indicates the first The global average response of each channel; This represents a one-dimensional convolution operation with a kernel size of k, used to capture local cross-channel interactions. By number of channels Adaptive decision, in which This is the scaling factor. For bias terms, This means mapping the result to the nearest odd number; It is a sigmoid activation function; This represents the channel attention weights generated in the first stage. This represents broadcast multiplication along the channel dimension. The output feature map after attention recalibration in the pre-channel; These represent the spatial indexes of the feature map on the frequency axis and the time axis, respectively. Indicates the channel index.
[0065] (2) Spatial attention along the time axis
[0066] Secondly, regarding channel characteristics Average pooling is performed along the frequency axis (H-axis), while preserving the spatial position information along the time axis (W-axis):
[0067]
[0068] Encoding temporal features using one-dimensional convolution:
[0069]
[0070]
[0071] in, ; This is the time axis weight vector, used to enhance long-distance dependencies in the time dimension. This represents a one-dimensional convolution along the W axis, where This indicates the length of the convolution kernel for the one-dimensional convolution.
[0072] (3) Frequency axis spatial attention
[0073] Then, to Perform average pooling along the time axis to preserve the spatial position information of the frequency axis:
[0074]
[0075] Encoding frequency features using one-dimensional convolution:
[0076]
[0077]
[0078] in, ; This is the frequency axis weight vector, used to enhance the distribution characteristics of the frequency dimension. This indicates a one-dimensional convolution along the H-axis, where This indicates the length of the convolution kernel for the one-dimensional convolution.
[0079] (4) Post-channel attention
[0080] Finally, in order to restore and strengthen the dominance of channel features and form a symmetrical structure, channel weighting is performed again:
[0081]
[0082] Encoding channel features using one-dimensional convolution:
[0083]
[0084]
[0085] Final output This is a time-frequency feature map after multidimensional recalibration. This represents the channel attention weights generated in the second stage; the ESCA output is then processed... Dimensionality is reduced to 1 after pointwise convolution, BN normalization, and linear activation. :
[0086]
[0087] This indicates the final output feature map of the ESCA module.
[0088] If and only if s=1 and If necessary, use residual connection; otherwise, output directly.
[0089]
[0090] Based on the above technical solution, the specific configuration for the training process of the satellite navigation signal interference identification model in step 5 is as follows:
[0091] Training batches: Training uses a mini-batch iterative approach;
[0092] Training cycle: Let the number of training rounds be E;
[0093] Loss function: The multi-class cross-entropy loss function is used to supervise and constrain the output of 8 classes. It is used to measure the difference between the predicted probability distribution of the interference class in the model output and the true label one-hot vector, and to drive parameter updates by minimizing the loss function.
[0094] Optimization algorithm: The Adam adaptive optimizer is used to update the parameters of the interference identification model.
[0095] Based on the above technical solution, step 6 further evaluates the accuracy of the satellite navigation signal interference model identification, selecting accuracy, precision, recall, and F1 score as evaluation metrics. Accuracy measures the overall correctness of the model's classification of all samples; precision and recall assess the model's ability to identify specific interference categories precisely and completely; and the F1 score is the harmonic mean of precision and recall, used to comprehensively evaluate model performance.
[0096] Compared with the prior art, the present invention has the following beneficial effects:
[0097] (1) This invention integrates a multi-scale time-frequency decoupled convolution structure in the inverse residual unit, decomposes the two-dimensional large kernel convolution into parallel large kernel branches along the time axis and frequency axis for fusion, and obtains a larger effective receptive field while significantly reducing the number of convolution parameters. This design not only reduces the computational overhead, but also more accurately matches the anisotropic distribution law of satellite navigation interference signals in the time-frequency domain, improves the discriminability and separability of time-frequency features, thereby reducing the probability of misjudgment between similar interference types.
[0098] (2) Compared with the traditional channel attention mechanism, this invention integrates a symmetrical and efficient spatial channel attention mechanism, adopts a symmetrical serial architecture of "channel → dual-axis space → channel", and uses global pooling and one-dimensional convolution to achieve dependency modeling, avoiding the information bottleneck and additional parameter overhead caused by fully connected dimensionality reduction. This mechanism achieves multi-dimensional recalibration of spatial position and channel weight in time-frequency images with low additional parameter cost, and can adaptively suppress background noise and significantly enhance the feature response of weak interference signals, thereby improving the recognition accuracy of the model in complex electromagnetic environment and low interference signal ratio scenarios.
[0099] (3) The satellite navigation signal interference identification model constructed in this invention achieves interference identification accuracy comparable to that of deep networks while maintaining a low number of parameters and storage footprint. It effectively overcomes the shortcomings of traditional high-precision deep network models that are difficult to store in hardware with limited capacity and the insufficient accuracy of existing lightweight models. It is particularly suitable for resource-constrained embedded satellite navigation signal receivers, handheld terminals and on-board processing platforms, and has broad application prospects. Attached Figure Description
[0100] Figure 1 A flowchart illustrating a lightweight deep learning-based high-precision satellite navigation signal interference identification method provided in this embodiment of the invention.
[0101] Figure 2 A schematic diagram of the overall network architecture of the lightweight satellite navigation signal interference identification model constructed for an embodiment of the present invention;
[0102] Figure 3 This is a schematic diagram of the structure of the multi-scale time-frequency decoupling convolution unit in an embodiment of the present invention;
[0103] Figure 4 This is a schematic diagram of the internal processing flow of the ESCA mechanism in an embodiment of the present invention;
[0104] Figure 5 This is a schematic diagram comparing the structure of the eIRB unit under different step size settings in an embodiment of the present invention. Detailed Implementation
[0105] The specific embodiments of the present invention will be described in further detail below with reference to the accompanying drawings and examples. The following examples are for illustrative purposes only and are not intended to limit the scope of the invention.
[0106] In this embodiment, the specific implementation steps of a lightweight deep learning-based high-precision satellite navigation signal interference identification method are as follows: Figure 1 As shown, it includes the following steps:
[0107] Step 1: Construct a satellite navigation signal dataset containing various types of interference, and divide the dataset into a training set and a test set according to a preset ratio.
[0108] The interference signals are categorized into single-tone interference, multi-tone interference, linear frequency modulation (FM) interference, sinusoidal FM interference, narrowband noise FM interference, band-limited Gaussian noise interference, impulse interference, and interference-free satellite navigation signals. The model for the interference signals is as follows:
[0109] (1) Monotone interference
[0110]
[0111] in, For amplitude, The center frequency of the interference signal. This represents the initial phase of the interference signal. t represents time. Represents the imaginary unit. This indicates an interference signal.
[0112] (2) Multi-tone interference
[0113]
[0114] in, The number of single-tone signals. Let m be the frequency of the m-th tone component. Let m be the amplitude of the m-th monotone component. The initial phase of the m-th single-tone component.
[0115] (3) Linear frequency modulation interference
[0116]
[0117] Instantaneous frequency , The starting frequency and the frequency modulation slope , Sweep bandwidth, Frequency sweep cycle.
[0118] (4) Sinusoidal frequency modulation interference
[0119]
[0120] in, The modulation index, The modulation frequency is the instantaneous frequency that changes periodically with the sine function.
[0121] (5) Narrowband noise FM interference
[0122]
[0123] in, It is a noise-modulated signal. Represents the integral variable. The frequency modulation sensitivity coefficient is the instantaneous frequency of the interference modulated by random noise. The amplitude is constant, but the spectrum is highly random in a narrow band.
[0124] (6) Band-limited Gaussian noise interference
[0125]
[0126] in, The signal is a band-limited Gaussian white noise with zero mean and a power spectral density of... . This is the impulse response of a bandpass filter, used to limit the bandwidth of noise. This represents the convolution operation.
[0127] (7) Pulse interference
[0128]
[0129]
[0130] in, For rectangle functions, The pulse repetition interval, Where n is the pulse width and n is the integer index of the pulse sequence.
[0131] (8) No interference
[0132]
[0133] At this time, the interference power is 0, that is... This indicates interference-free satellite navigation signals; among which, For the amplitude of the navigation signal, For navigation message data, It is a pseudo-random spreading code. For carrier frequency, For Doppler frequency shift, This is the initial phase.
[0134] To verify the effectiveness of the method proposed in this invention, this embodiment constructs a satellite navigation interference signal dataset. Considering the difficulty in obtaining interference samples with accurate labels and comprehensive type coverage in real-world scenarios, this embodiment uses software simulation to generate training and testing data. Preferably, the satellite navigation signal and various interference signals are generated based on the MATLAB simulation platform. The GPS constellation L1 band signal with a center frequency of 1575.42MHz is selected as the satellite navigation signal. To simulate real signal characteristics, an 8th-order Butterworth front-end filter model with a bandwidth of 20.46MHz is introduced, and the complex signal sampling frequency is set to 40MHz. Simultaneously, the CNR of the satellite navigation signal is set to vary within the range of 25–50dBHz, which covers various typical scenarios from weak signal reception to strong signal reception.
[0135] In this embodiment, a fixed satellite navigation signal power of −160 dBW is used as a reference, and the interference power is set within the range of −142 dBW to −107 dBW, thereby forming various JSR samples covering 18 dB to 53 dB. Simultaneously, the interference center frequency is randomly offset within ±9 MHz relative to the receiver's equivalent center frequency, and the initial interference phase is randomly generated to improve sample diversity and cover different frequency offsets and phase conditions. Finally, to approximate the quantization effect of real hardware, the simulated time-domain sampling data is uniformly quantized from double-precision floating-point form and reduced to 8-bit fixed-point representation to simulate the amplitude truncation and quantization noise caused by the limited quantization precision of the receiver front-end, thereby improving the consistency between the constructed dataset and the actual receiver output data. Specific interference signal parameters are shown in Table 1 below:
[0136] Table 1 Interference signal parameters
[0137]
[0138] The dataset contains eight categories, one of which consists of pure, interference-free satellite navigation signals, while the other seven categories contain various types of interference signals. To ensure dataset balance, the number of samples in each category remains strictly consistent at 32,760, for a total of 262,080 samples.
[0139] Step 2: Perform DC removal, fixed window slicing, and power normalization preprocessing on the signal samples in the training and test sets in sequence to obtain standardized time-domain signal samples;
[0140] In this embodiment, the dataset is preprocessed, specifically including the following steps:
[0141] First, for each satellite navigation interference signal sample DC component removal is performed by calculating the mean of the sample and subtracting the mean from each sampling point to achieve zero mean, thereby eliminating the numerical bias introduced during simulation and quantization.
[0142]
[0143]
[0144] in, Represents the mean of the sample sequence. Indicates the length of the sample sequence. This represents the zero-mean sample after removing the DC component.
[0145] Secondly, a sample slicing operation is performed, extracting signal segments according to a fixed time window. The starting position of the window slides sequentially with a fixed step size to ensure that all samples meet the consistency of the model's input dimension. A slice can be represented as
[0146]
[0147] in, The slice number. The slice length, The sliding step size, For the index of sampling points within the window, Indicates the first The first slice The signal values at each sampling point. When the last segment has fewer than N points, zero-padding or truncation can be used to ensure that each segment has a consistent length.
[0148] Finally, to eliminate the impact of differences in average power between different samples on subsequent time-frequency features and model training, each slice of signal was processed. Perform power normalization, the first The average power of the segmented signal is:
[0149]
[0150] Then according to the target power The signal is scaled and normalized to obtain the final standardized time-domain signal sample:
[0151]
[0152] Step 3: Perform a short-time Fourier transform on the standardized time-domain signal samples to convert them to the time-frequency domain and generate a two-dimensional time-frequency feature image;
[0153] In this embodiment, the Short-Time Fourier Transform (STFT) method decomposes the preprocessed signal into a series of short-time stationary frames by applying a sliding window of fixed length, and calculates the spectrum frame by frame to construct a time-frequency matrix reflecting the frequency evolution characteristics over time. The STFT is represented as:
[0154]
[0155] in, For time frame indexing, For frame shift, For window functions, For frequency index, For FFT points, This is the window length.
[0156] In this embodiment, a duration of 1 is selected. The signal is used as the input observation window, and a time-frequency diagram is generated through STFT transformation. This observation duration is sufficient to cover the image characteristics of the interference signal. The specific STFT key parameter settings are shown in Table 2.
[0157] Table 2 STFT Parameters
[0158]
[0159] To obtain a stable two-dimensional time-frequency characteristic image representing the interference pattern, the amplitude squared of the STFT result is used to form the time-frequency energy spectrum.
[0160]
[0161] Simultaneously, it is normalized and mapped to an image matrix with a mapping range of [0,1], resulting in a two-dimensional matrix that can be used to store images as network input.
[0162]
[0163] in and These represent the minimum and maximum values of the time-frequency matrix of sample 𝑘, respectively.
[0164] Two-dimensional matrix The images are resampled or cropped according to a preset format and stored as time-frequency images in a preset channel format. In this embodiment, a pseudo-color three-channel format is used, with an output size of 224×224×3. Finally, the time-frequency image corresponding to each sample is saved in a preset format and a one-to-one correspondence is established with its interference category label to form the time-frequency image dataset of steps 4-6.
[0165] Step 4: Construct a lightweight satellite navigation signal interference identification model with an improved inverted residual structure as the backbone network and the introduction of a symmetrical and efficient spatial channel attention mechanism;
[0166] The satellite navigation signal interference identification model uses an enhanced inverted residual block (eIRB) as the basic unit to form the backbone network. Within each eIRB, multi-scale time-frequency decoupled deep convolution and a symmetric efficient spatial channel attention mechanism (ESCA) are embedded to adaptively adjust the spatial location of significant time-frequency regions and the response weights of key interference signal characteristic channels during recalibration. This improves the ability to distinguish interference categories while maintaining low parameter and computational complexity. Figure 2 As shown, the interference recognition model consists of an input layer, an initial feature extraction (Stem) layer, a feature extraction backbone network (Backbone) formed by multi-level eIRB stacking, and a classification output (Task Head) layer. The Backbone comprises eight stages (S1 to S8) connected in series, with the output feature map sizes of each stage as follows: S1 output 112×112×32; S2 output 112×112×16; S3 output 56×56×24; S4 output 28×28×32; S5 output 14×14×64; S6 output 14×14×96; S7 output 7×7×160; and S8 output 7×7×320. Feature map S8 is mapped to 7×7×1280 via a 1×1 convolution and then input into the Task Head layer for classification. Each stage contains one or more eIRB modules; where eIRB, t, c, and s represent the number of stacked eIRB modules in the previous layer, the channel expansion factor, the number of output channels, and the stride, respectively. The eIRB module mainly consists of multi-scale time-frequency decoupled convolution, ESCA attention mechanism, and inverse residual bottleneck.
[0167] The multi-scale time-frequency decoupled convolution is used to simultaneously extract multi-directional, multi-scale texture features from interfering time-frequency images under low computational complexity, such as... Figure 3 As shown, in the spatial modeling stage of the inverse residual bottleneck block, parallel deep convolution branches are used to decouple the features after dimensionality increase through convolution processing. These branches include a 1×7 temporal asymmetric deep convolution branch, a 7×1 frequency asymmetric deep convolution branch, and a 3×3 local texture deep convolution branch. The 1×7 and 7×1 branches are used to model long-distance time-frequency dependence and strip-shaped energy distribution along the time and frequency axes, respectively, while the 3×3 branch is used to capture local details, point-like, and block-like energy aggregation features. The outputs of each branch are fused element-wise, achieving multi-scale time-frequency feature complementarity without increasing the number of channels or additional parameters. Compared to traditional single large convolution kernels, this multi-scale time-frequency decoupled convolution reduces the number of convolution kernel parameters and multiply-accumulate operations through directional decoupling and a small kernel combination structure. It also enhances the ability to represent the morphological differences of various interferences such as narrowband, wideband, sweep frequency, and pulse in the time-frequency map, thereby effectively improving the recognition efficiency and robustness of the model in edge deployment scenarios.
[0168] The ESCA mechanism is used to adaptively recalibrate intermediate features of the backbone network to highlight key channel responses and key time-frequency region responses that are relevant to interference category discrimination. The specific structure is as follows: Figure 4 As shown, ESCA includes pre-channel and post-channel attention branches and time-frequency dual-axis spatial attention branches. The pre-channel and post-channel attention branches are located before and after the dual-axis spatial attention weighting, respectively, and are used to recalibrate the importance of feature channels twice. The channel attention branches all obtain channel statistical vectors through global average pooling and model the local correlation between channels using one-dimensional convolution. Channel weights are generated by Sigmoid activation to achieve channel-wise weighting. The time-frequency dual-axis spatial attention branch obtains one-dimensional sequence features by performing global average pooling along the time and frequency dimensions, and generates time-axis and frequency-axis weights using one-dimensional convolution. After Sigmoid activation, the feature maps are weighted time-wise and frequency-wise, thus forming a symmetrical significance enhancement process in the time and frequency dimensions. Since ESCA mainly consists of pooling operations and one-dimensional convolution, it avoids the introduction of large-scale fully connected layers, resulting in low parameter and memory overhead. At the same time, its channel and dual-axis weighting can suppress redundant background responses and enhance the discrimination information such as interference energy concentration areas, edge directions, and texture structures, thereby improving the model's ability to distinguish interference categories in low interference-to-signal ratio or strong background noise scenarios while maintaining lightweight design.
[0169] The inverse residual bottleneck structure is used to achieve efficient feature transformation under lightweight constraints. Its basic process is "channel dimensionality increase → time-frequency feature extraction → channel dimensionality reduction"; specifically, firstly, a 1×1 pointwise convolution is used to transform the input feature channels from... Expand to To enhance feature representation, multi-scale time-frequency decoupled deep convolution is then used to model the local time-frequency texture of the expanded features; finally, 1×1 pointwise convolution is used to compress the channels to... , to match the input of the next layer.
[0170] In this embodiment, the specific construction process of the satellite navigation signal interference identification model is as follows:
[0171] Input layer: The time-frequency image generated in step 3 above is used as the model input, and the input tensor is represented as: H represents the number of pixels in the frequency dimension, and W represents the number of pixels in the time dimension. In this embodiment, the input tensor size is 224×224×3.
[0172] Stem layer: Performs initial feature extraction and downsampling on the input, preferably using a single 3×3 ordinary convolution with a stride of 2, combined with Batch Normalization (BN) and activation functions to obtain basic time-frequency features:
[0173]
[0174] This represents a 3×3 ordinary convolution. This represents the activation function. This indicates normalization.
[0175] Backbone network layers: Each stage stacks several eIRB modules to extract multi-scale time-frequency texture features layer by layer. The meanings of eIRB and s in the figure are as follows: when eIRB=1, it means that the stage contains only 1 eIRB module; when eIRB=n (n≥2) and s=2, it means that the stage contains a total of n eIRB modules, where the stride of the first eIRB is 2, and the stride of the remaining n-1 eIRBs is 1.
[0176] The specific structure of the eIRB module is as follows: Figure 5 As shown, the build process is as follows:
[0177] Let the eIRB input be First, through Pointwise convolution (PWConv) expands the number of channels. The number of output channels is times that of the previous generation. After BN normalization and activation, high-dimensional features are output. :
[0178]
[0179] in, This represents the activation function ReLU6. Next, [the following is a list of steps / functions]. Simultaneously, the input is fed into three parallel deep convolutional (DWConv) branches to extract features from different dimensions:
[0180]
[0181]
[0182]
[0183] in, , , These represent temporal features, frequency domain features, and local texture features, respectively. Finally, the outputs of the three branches are summed element-wise to obtain the fused intermediate feature map. :
[0184]
[0185] The aforementioned multi-scale time-frequency decoupled convolution method introduces no additional parameters and minimizes computational and storage overhead. The fused feature maps... Record as input .
[0186] The ESCA mechanism employs a symmetrical serial structure of "channel-height-width-channel". Recalibrate. This process consists of four stages:
[0187] (1) Pre-channel attention
[0188] Let the input be First, spatial information is aggregated using global average pooling (GAP), and then channel weights are generated through one-dimensional convolution:
[0189]
[0190] Encoding channel features using one-dimensional convolution:
[0191]
[0192]
[0193]
[0194] in, For channel statistics, Indicates the first The global average response of each channel; This represents a one-dimensional convolution operation with a kernel size of k, used to capture local cross-channel interactions. By number of channels Adaptive decision, in which This is the scaling factor. For bias terms, This means mapping the result to the nearest odd number; It is a sigmoid activation function; This represents the channel attention weights generated in the first stage. This represents broadcast multiplication along the channel dimension. The output feature map after attention recalibration in the pre-channel; These represent the spatial indexes of the feature map on the frequency axis and the time axis, respectively. Indicates the channel index.
[0195] (2) Spatial attention along the time axis
[0196] Secondly, regarding channel characteristics Average pooling is performed along the frequency axis (H-axis), while preserving the spatial position information along the time axis (W-axis):
[0197]
[0198] Encoding temporal features using one-dimensional convolution:
[0199]
[0200]
[0201] in, ; This is the time axis weight vector, used to enhance long-distance dependencies in the time dimension. This represents a one-dimensional convolution along the W axis, where This indicates the length of the convolution kernel for the one-dimensional convolution.
[0202] (3) Frequency axis spatial attention
[0203] Then, to Perform average pooling along the time axis to preserve the spatial position information of the frequency axis:
[0204]
[0205] Encoding frequency features using one-dimensional convolution:
[0206]
[0207]
[0208] in, ; This is the frequency axis weight vector, used to enhance the distribution characteristics of the frequency dimension. This indicates a one-dimensional convolution along the H-axis, where This indicates the length of the convolution kernel for the one-dimensional convolution.
[0209] (4) Post-channel attention
[0210] Finally, in order to restore and strengthen the dominance of channel features and form a symmetrical structure, channel weighting is performed again:
[0211]
[0212] Encoding channel features using one-dimensional convolution:
[0213]
[0214]
[0215] Final output To obtain the intermediate output time-frequency characteristic map after multidimensional recalibration, the ESCA output is processed... Dimensionality is reduced to 1 after pointwise convolution, BN normalization, and linear activation. :
[0216]
[0217] This indicates the final output feature map of the ESCA module.
[0218] If and only if s=1 and If necessary, use residual connection; otherwise, output directly.
[0219]
[0220] Task Head Layer: After processing by multiple eIRB units, the final output features are then processed... Convolutional mappings are applied to high-dimensional semantic channels, followed by global average pooling (GAP) and a fully connected layer to output the final probability of all perturbations: -
[0221] P = Softmax(W·GAP( ( ))+b)
[0222] Where P is the probability distribution vector of the interference category in the final output, and W and b are the weight matrix and bias vector of the fully connected layer, respectively.
[0223] Step 5: Input the time-frequency images of the training set into the interference recognition model for training, and update the network parameters through backpropagation until the loss function converges.
[0224] In this embodiment, the training process for the satellite navigation signal interference identification model is specifically configured as follows:
[0225] Training batches: Training uses a mini-batch iterative approach, with the training batch size set to 0. Then the number of iterations in each round is ,in This represents the number of training samples. In this embodiment, it is taken as... The training samples are randomly shuffled to improve generalization ability.
[0226] Training cycle: Let the training epoch size be E. In this embodiment, E is set to 50, meaning that all samples in the training dataset will be processed by the model a maximum of 50 times. Furthermore, an early stopping technique is employed, with a threshold of 3 epochs. If the validation set performance does not improve further within 3 consecutive epochs, the training process is terminated early.
[0227] Loss function: In this embodiment, a multi-class cross-entropy loss function is used to supervise and constrain the outputs of the eight classes.
[0228]
[0229] in, The one-hot vector of the real label. This represents the output probability of the interference recognition model.
[0230] Learning rate: In this embodiment, an initial learning rate is used. This is combined with a learning rate decay strategy to accelerate convergence and improve training stability.
[0231]
[0232] in, This represents the current number of training steps. Total training steps .
[0233] Optimization algorithm: In this embodiment, the Adam adaptive learning rate optimizer is used to update the parameters of the interference recognition model;
[0234]
[0235] in, These are the parameters of the current model. The learning rate of the current model. and These are first-order and second-order momentum estimates, respectively. Minimum value To prevent the denominator from being 0.
[0236]
[0237]
[0238]
[0239] in, and Representing the first The gradient first moment and gradient second moment are estimated in the next iteration; in this embodiment, the exponential decay rate of the first moment is... =0.9, the exponential decay rate of the second moment =0.999, For the first The gradient of the next iteration.
[0240] Step 6: Input the time-frequency images of the test set into the trained interference recognition model, output the interference category recognition result, and evaluate the model's recognition accuracy.
[0241] In order to analyze the performance of the satellite navigation signal interference identification model based on reparameterization and multidimensional attention fusion, this embodiment uses a confusion matrix as an evaluation index to intuitively display the model's prediction distribution and inter-category confusion.
[0242] Confusion Matrix: This matrix, based on the correspondence between true and predicted labels, categorizes prediction results into four types: True Positive (TP) indicating correct identification of the positive category; False Negative (FN) indicating incorrect prediction of a positive example as a negative example; False Positive (FP) indicating incorrect prediction of a negative example as a positive example; and True Negative (TN) indicating correct identification of the negative category. By statistically analyzing the distribution of these four categories, the core metrics of the classification model can be intuitively reflected, as follows:
[0243] (1) Overall accuracy, which is the sum of the number of correctly predicted samples across all 8 classes divided by the total number of test samples.
[0244]
[0245] in, Indicates the overall accuracy rate. This represents the total number of test samples. This indicates a category index.
[0246] (2) Precision rate, which represents the probability that the prediction is reliable on average each time the model makes a prediction when judging these 8 signals.
[0247]
[0248] in, Indicates the first The accuracy of the class Indicates the first The sample of class 1 was correctly identified as the first class. The number of classes This indicates that it does not actually belong to the first Class, but incorrectly identified as the first The number of samples in a class.
[0249] (3) Recall rate, which represents the average percentage of interference types that the model can successfully identify for 8 real signals.
[0250]
[0251] in, Indicates the first Recall rate of the class Indicates that it actually belongs to the first The number of samples that belong to one class but are incorrectly identified as other classes.
[0252] (4) F1-Score, which comprehensively reflects the robustness of the model when dealing with 8 different feature signals, and avoids the problem of the identification defects of other categories being masked by too many samples of a single category.
[0253]
[0254] in, Indicates the first The F1 value of the class.
[0255] The above embodiments should be understood as illustrative only and not as limiting the scope of protection of the present invention. After reading the description of the present invention, those skilled in the art can make various alterations or modifications to the present invention, and these equivalent changes and modifications also fall within the scope defined by the claims of the present invention.
Claims
1. A high-precision satellite navigation signal interference identification method of lightweight deep learning, characterized in that, Includes the following steps: Step 1: Construct a satellite navigation signal dataset containing various types of interference, and divide the dataset into a training set and a test set according to a preset ratio; Step 2: Perform DC removal, fixed window slicing, and power normalization preprocessing on the signal samples in the training and test sets in sequence to obtain standardized time-domain signal samples; Step 3: Perform a short-time Fourier transform on the standardized time-domain signal samples to convert them to the time-frequency domain and generate a two-dimensional time-frequency feature image; Step 4: Construct a lightweight satellite navigation signal interference identification model with an improved inverted residual structure as the backbone network and the introduction of a symmetrical and efficient spatial channel attention mechanism; Step 5: Input the time-frequency images of the training set into the interference recognition model for training, and update the network parameters through backpropagation until the loss function converges; Step 6: Input the time-frequency images of the test set or the dataset to be tested into the trained interference recognition model, and output the interference category recognition result.
2. The high-precision satellite navigation signal interference identification method of light-weight deep learning according to claim 1, characterized in that: Step 1 includes a dataset of satellite navigation signals with various types of interference, including single-tone interference, multi-tone interference, linear frequency modulation interference, sinusoidal frequency modulation interference, narrowband noise frequency modulation interference, band-limited Gaussian noise interference, impulse interference, and interference-free satellite navigation signals.
3. The lightweight deep learning-based high-precision satellite navigation signal interference identification method according to claim 1, characterized in that: The DC removal in step 2 is done by applying to each satellite navigation jammer signal sample DC component removal: wherein, denotes the mean of the sample sequence, denotes the length of the sample sequence, denotes the zero-meaned sample after removing the direct current component; The fixed window slicing involves extracting signal segments of length N sampling points within a fixed time window, with the initial window position sliding sequentially at a fixed step size S. wherein, is a slice number, is a slice length, is a sliding step, is a sample point index within the window, denotes the signal value of the th sample point in the th slice; when the last segment is not enough N points, zero padding or truncation can be used to ensure the length of each segment is consistent; The power normalization includes: the first The average power of the segmented signal is: Then according to the target power The signal is scaled and normalized to obtain the final standardized time-domain signal sample: 。 4. The lightweight deep learning-based high-precision satellite navigation signal interference identification method according to claim 1, characterized in that: The lightweight satellite navigation signal interference identification model described in step 4 uses an improved inverse residual module (eIRB) as the basic unit to form a backbone network, and embeds a multi-scale time-frequency decoupled deep convolution and a symmetric efficient spatial channel attention mechanism (ESCA) within each eIRB. The lightweight satellite navigation signal interference identification model consists of an input layer, an initial feature extraction layer, a feature extraction backbone network formed by stacking multiple eIRBs, and a classification output layer.
5. The lightweight deep learning-based high-precision satellite navigation signal interference identification method according to claim 4, characterized in that: The improved inverse residual module introduces a dual-branch, multi-scale time-frequency decoupled convolution structure on the basis of the original single-core deep convolution structure. In the spatial modeling stage of the inverse residual bottleneck block, the input time-frequency features are first expanded in dimensionality, and then time-frequency decoupled convolution processing is performed separately through parallel deep convolution branches. The outputs of each branch are fused by adding elements one by one.
6. The lightweight deep learning-based high-precision satellite navigation signal interference identification method according to claim 5, characterized in that: The deep convolution branches include a 1×7 temporal asymmetric deep convolution branch, a 7×1 frequency asymmetric deep convolution branch, and a 3×3 local texture deep convolution branch; the 1×7 and 7×1 branches are used to model long-distance time-frequency dependence and strip-shaped energy distribution along the time axis and frequency axis, respectively, while the 3×3 branch is used to capture local details, point-like and block-like energy aggregation features.
7. The lightweight deep learning-based high-precision satellite navigation signal interference identification method according to claim 4, characterized in that: The ESCA includes pre- and post-channel attention branches and a time-frequency dual-axis spatial attention branch. The pre- and post-channel attention branches are located before and after the dual-axis spatial attention weighting, respectively, and are used to recalibrate the importance of feature channels twice. Each channel attention branch obtains channel statistical vectors through global average pooling and models the local correlation between channels using one-dimensional convolution. Channel weights are generated by Sigmoid activation to achieve channel-wise weighting. The time-frequency dual-axis spatial attention branch obtains one-dimensional sequence features by performing global average pooling along the time and frequency dimensions, respectively. Time-axis weights and frequency-axis weights are generated by one-dimensional convolution. After Sigmoid activation, the feature maps are weighted time-wise and frequency-wise, respectively, thus forming a symmetrical significance enhancement process in the time and frequency dimensions.
8. A lightweight deep learning-based high-precision satellite navigation signal interference identification method according to claim 4, 5, 6, or 7, characterized in that: The overall architecture of the lightweight satellite navigation signal interference identification model is as follows: Input layer: The time-frequency image generated in step 3 is used as the model input, and the input tensor is represented as: H represents the number of pixels in the frequency dimension, and W represents the number of pixels in the time dimension. Input the number of channels; Initial feature extraction layer: Performs initial feature extraction and downsampling on the input, combined with batch normalization (BN) and activation functions. The basic time-frequency characteristics are obtained. : Feature extraction backbone network: Multiple stages are stacked with several eIRB modules to extract multi-scale time-frequency texture features layer by layer, obtaining the final feature output. ; Output layer: After processing through multiple stages, the final output features are determined. go through convolution( The data is mapped to a high-dimensional semantic channel, then passed through a GAP layer and a fully connected layer to output the final probability of all interference. P=Softmax(W·GAP( ( ))+b) Where W and b are the weight matrix and bias vector of the fully connected layer, respectively, and P is the probability distribution vector of the interference class in the final output.
9. The lightweight deep learning-based high-precision satellite navigation signal interference identification method according to claim 8, characterized in that: The feature extraction backbone network consists of 8 stages (S1 to S8) connected in series. The output feature map sizes of each stage are as follows: S1 output 112×112×32; S2 output 112×112×16; S3 output 56×56×24; S4 output 28×28×32; S5 output 14×14×64; S6 output 14×14×96; S7 output 7×7×160; S8 output 7×7×320.
10. The lightweight deep learning-based high-precision satellite navigation signal interference identification method according to claim 8, characterized in that: The architecture and processing flow of the improved inverse residual module (eIRB) are as follows: Let the eIRB input be First, through Pointwise convolution ( Expand the number of channels The number of output channels is times that of the previous generation. After BN normalization and activation, high-dimensional features are output. : Secondly, Simultaneously input into three parallel depthwise convolutions ( Branches are used to extract features from different dimensions: in, , , These represent temporal features, frequency domain features, and local texture features, respectively. Finally, the outputs of the three branches are summed element-wise to obtain the fused intermediate feature map. : The ESCA mechanism consists of four phases: (1) Pre-channel attention First, feature maps are aggregated using Global Average Pooling (GAP). Spatial information is then used to generate channel weights via one-dimensional convolution: Encoding channel features using one-dimensional convolution: in, For channel statistics, Indicates the first The global average response of each channel; This represents a one-dimensional convolution operation with a kernel size of k, used to capture local cross-channel interactions. By number of channels Adaptive decision, in which This is the scaling factor. For bias terms, This means mapping the result to the nearest odd number; Use the sigmoid activation function; This represents the channel attention weights generated in the first stage. This represents broadcast multiplication along the channel dimension. The output feature map after attention recalibration in the pre-channel; These represent the spatial indexes of the feature map on the frequency axis and the time axis, respectively. Indicates the channel index; (2) Spatial attention along the time axis Secondly, regarding channel characteristics Average pooling is performed along the frequency axis (H-axis), while preserving the spatial position information along the time axis (W-axis): Encoding temporal features using one-dimensional convolution: in, ; This is the time axis weight vector, used to enhance long-distance dependencies in the time dimension; This represents a one-dimensional convolution along the W axis, where This indicates the length of the convolution kernel for the one-dimensional convolution; (3) Frequency axis spatial attention Then, to Perform average pooling along the time axis to preserve the spatial position information of the frequency axis: Encoding frequency features using one-dimensional convolution: in, ; This is the frequency axis weight vector, used to enhance the distribution characteristics of the frequency dimension; This represents a one-dimensional convolution along the H-axis, where This indicates the length of the convolution kernel for the one-dimensional convolution; (4) Post-channel attention Finally, in order to restore and strengthen the dominance of channel features and form a symmetrical structure, channel weighting is performed again: Encoding channel features using one-dimensional convolution: Final output This is a time-frequency feature map after multidimensional recalibration. This represents the channel attention weights generated in the second stage; the ESCA output is then processed... Dimensionality is reduced to 1 after pointwise convolution, BN normalization, and linear activation. : This represents the final output feature map of the ESCA module; If and only if s=1 and If necessary, use residual connection; otherwise, output directly. 。