Discrimination enhancement fusion modulation recognition method for obfuscated modulation categories

By preprocessing multi-channel signal features and co-modeling multiple features, the ability to distinguish confused modulation categories is explicitly improved. This solves the problem of high misclassification rate of confused modulation categories in existing deep learning methods under complex channels and low signal-to-noise ratio, and achieves higher modulation recognition accuracy.

CN122247811APending Publication Date: 2026-06-19CHONGQING UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHONGQING UNIV
Filing Date
2026-03-27
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing deep learning methods struggle to effectively reduce the misclassification rate between confused modulation categories under complex channel conditions or low signal-to-noise ratio scenarios, and they lack the ability to specifically improve the discrimination capability of confused modulation categories.

Method used

By employing multi-channel signal feature preprocessing, a discriminative enhancement residual convolution feature extraction module, a confusion-aware temporal discriminative coding module, and a discriminative enhancement classification module, the ability to distinguish confused modulation categories is explicitly improved through multi-feature collaborative modeling and discriminative enhancement mechanisms.

Benefits of technology

It significantly improves the overall performance of modulation recognition under complex channel and low signal-to-noise ratio conditions, reduces the misclassification rate of confusion, and improves the accuracy of distinguishing confused modulation categories.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122247811A_ABST
    Figure CN122247811A_ABST
Patent Text Reader

Abstract

This invention relates to the field of wireless communication system technology, specifically to a discriminative enhancement fusion modulation recognition method for confused modulation categories, comprising the following steps: preprocessing the signal to construct multi-channel signal features for the input model; constructing a received signal model including a discriminative enhancement residual convolutional feature extraction module, a confused sensing temporal discriminative coding module, and a discriminative enhancement classification module, with the multi-channel signal features serving as input for the final classification. On the one hand, the signal is preprocessed to construct multi-channel signal features as the input basis for subsequent models; on the other hand, through multi-feature collaborative modeling, a discriminative enhancement mechanism is introduced to allocate the modeling capability of the model to confused modulation categories. By making targeted modifications to key functional modules and performing collaborative fusion at the overall network level, the model can progressively enhance its discriminative capability for confused modulation categories at multiple stages, including feature extraction, temporal modeling, and classification decision.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of wireless communication system technology, and in particular to a discriminative enhancement fusion modulation identification method for confusing modulation categories. Background Technology

[0002] As wireless communication systems continue to evolve towards complex electromagnetic environments and diverse service scenarios, the rapid and accurate identification of signal modulation modes at the receiver has become a crucial link in cognitive radio, spectrum sensing, and intelligent communication systems. Automatic Modulation Recognition (AMR) determines the modulation category of the received signal, providing important information for subsequent demodulation, interference identification, and resource management, and has broad engineering application value.

[0003] In recent years, deep learning methods have demonstrated significant advantages in modulation recognition tasks due to their end-to-end feature modeling capabilities. Models based on convolutional neural networks, recurrent neural networks, and attention mechanisms can automatically learn discriminative features from the original IQ sequence or its derived representations, achieving superior overall recognition performance compared to traditional methods based on manual features on various public datasets. However, most existing deep learning modulation recognition methods typically treat the AMR problem as a unified multi-classification task, employing a single feature learning and classification mechanism to model all modulation categories identically.

[0004] In complex channel conditions or low signal-to-noise ratio scenarios, this unified modeling approach often reveals significant shortcomings. Numerous studies and experimental results show that classification errors in modulation recognition are not uniformly distributed across all categories, but rather highly concentrated in a few modulation categories with confusing relationships. These modulation schemes exhibit strong similarities in temporal structure, spectral distribution, or statistical characteristics, and their discrimination boundaries are highly susceptible to overlap under the influence of noise, frequency offset, or multipath interference, thus becoming a major bottleneck limiting overall recognition performance improvement. Simply relying on deepening the network or expanding the parameter scale often only improves the average recognition accuracy, but cannot fundamentally reduce the misclassification rate between confused modulation categories.

[0005] To address the aforementioned issues, existing research has attempted to enhance model expressive power through strategies such as multi-channel input, multi-feature fusion, or multi-task learning. For example, this involves combining time-domain and frequency-domain features and introducing auxiliary tasks to provide additional discriminative information. While these methods have improved model robustness to some extent, their core focus remains on expanding or supplementing features, lacking specific modeling of the inherent difficulty in discriminating between different modulation categories. In other words, most existing methods prioritize learning more features, while paying less attention to which categories most need to be distinguished.

[0006] Based on the characteristics of modulation recognition tasks, different modulation categories exhibit significant differences in discrimination difficulty and differentiation requirements: some modulation categories show obvious differences and are easy to distinguish; while modulation categories with confusing relationships only show slight differences in local temporal structure or fine-grained features, which are the main sources of misjudgment. Summary of the Invention

[0007] The purpose of this invention is to provide a discriminative enhancement fusion modulation recognition method for confused modulation categories. The method explicitly introduces a discriminative enhancement mechanism in the model design, focuses the modeling on improving the ability to distinguish confused modulation categories, and coordinates multiple discriminative information through a reasonable feature fusion method to more effectively alleviate the confusion problem and improve the overall recognition performance.

[0008] To achieve the above objectives, the present invention provides a discriminative enhancement fusion modulation identification method for confusing modulation categories, comprising the following steps: The signal is preprocessed to construct multi-channel signal features for the input model, which includes a discriminant-enhanced residual convolutional feature extraction module, a confusion-aware temporal discriminant coding module, and a discriminant-enhanced classification module. The multi-channel signal features are input into the discriminative enhanced residual convolution feature extraction module for local discriminative feature extraction; A confusion-aware temporal discriminant coding module is constructed. Local discriminant features are used as input to the confusion-aware temporal discriminant coding module. Relying on the multi-head self-attention mechanism and one-dimensional convolution operation, the output is a high-dimensional temporal discriminant feature that simultaneously contains local temporal information and global temporal correlation features. A discriminative enhancement classification module is constructed. The high-dimensional temporal discriminative features are used as input to the discriminative enhancement classification module. After passing through two fully connected layers, the features are mapped to the modulation category space. The predicted probability of each modulation category is calculated by the Softmax function to complete the final classification decision.

[0009] In wireless communication systems, the signal received by the receiver is typically affected by a combination of factors, including channel fading, additive noise, and carrier frequency offset. In terms of signal modeling, the received signal in satellite communication under baseband conditions can be represented as: (7) Indicates the instantaneous amplitude of the signal. Indicates instantaneous phase; different modulation methods are achieved through... right Different mapping rules enable information carrying.

[0010] Considering the flat fading channel, carrier frequency offset, and additive white Gaussian noise (AWGN), the complex baseband signal at the receiver can be expressed as: (8) Among them, For complex channel gain, Indicates the normalized carrier frequency offset. For the initial phase shift, The noise is complex Gaussian white noise, with its real and imaginary parts having zero mean and variance respectively. The Gaussian distribution.

[0011] The received signal is decomposed into in-phase and quadrature components. (9) in (10) IQ signals can completely preserve the amplitude and phase information of the modulation scheme.

[0012] To further analyze the discrimination characteristics of different modulation schemes, this invention maps the IQ signal to the amplitude and phase domains. The instantaneous amplitude and phase of the received signal are defined as follows: (11) (12) In its amplitude A(t) It mainly reflects the energy distribution characteristics of the modulated signal, while the phase... Φ(t) This describes the phase change pattern between symbols. For amplitude-modulated signals, the discrimination information is mainly reflected in the amplitude change; while for phase-modulated or quadrature-amplitude-modulated signals, the phase evolution structure plays a key role in modulation discrimination.

[0013] It should be noted that the initial phase shift in the received signal This will lead to Φ(t) This results in an overall translation, but the translation does not change the inherent relative phase structure of the modulation scheme. In actual modulation recognition, if left unaddressed, this phase uncertainty may interfere with the deep learning model's learning of modulation features, especially under low signal-to-noise ratio conditions.

[0014] Building upon the amplitude-phase representation, to characterize the fine-grained variation characteristics of the modulated signal in the time dimension, this invention further introduces instantaneous phase difference as an analytical quantity, defined as the phase change between adjacent sampling points. (13) Instantaneous phase difference directly reflects the phase evolution of the modulated signal between symbols. For phase-modulated signals, ΔΦ(t) typically exhibits a discrete and structured change pattern; while for amplitude-modulated or frequency-modulated signals, the phase change is often more continuous and smooth. Therefore, instantaneous phase difference has high discriminative value in distinguishing different modulation categories, especially those with ambiguity. However, under conditions of low signal-to-noise ratio or frequency offset, noise terms can significantly perturb the phase estimation, causing overlap in the ΔΦ(t) distributions of different modulation methods. This overlap weakens the separability of the modulated signal in the phase dimension, and is one of the important reasons why ambiguity in existing technologies easily leads to misjudgment in complex channel environments.

[0015] Furthermore, the preprocessing of the signal to construct the multi-channel signal features of the input model includes the following steps: The signal is normalized by energy to eliminate the influence of channel gain variation; The instantaneous phase is relativized to preserve the relative structure of the phase change; Construct multi-channel signal features for the input model.

[0016] The normalized signal is represented as: (1) The corresponding IQ components are: (2) in, This represents the original complex baseband signal received at the t-th sampling time. N This represents the total number of signal sampling points contained within an observation window or a single signal sample. This represents the complex baseband signal after energy normalization, a process used to eliminate the influence of channel gain variations on signal amplitude. and These represent the in-phase and quadrature components of the normalized signal, respectively. and These represent the operations of taking the real part and imaginary part of a complex number, respectively.

[0017] The relativization of the instantaneous phase includes the following steps: Calculate the behavioral mean: (3) Obtain the phase sequence after phase alignment: (4) in, Indicates a viewing window N The average value of the instantaneous phase of the signal within each sampling point. This represents the instantaneous phase of the signal at the t-th sampling time. This indicates that the relativized phase sequence after subtracting the mean eliminates the uncertainty interference of the initial phase shift, i.e., the overall phase translation.

[0018] The multi-channel signal characteristics are represented as follows: (5) and These represent the in-phase and quadrature components of the normalized signal, respectively. Indicates the instantaneous amplitude of the signal. This represents the relativized phase sequence after subtracting the mean. This operation eliminates the uncertainty interference of the initial phase shift, i.e., the overall phase translation. It represents the instantaneous phase difference change between adjacent sampling points.

[0019] Furthermore, the process of inputting the multi-channel signal features into the discriminative enhancement residual convolutional feature extraction module to output local discriminative features is as follows: Initial mapping of multi-channel input features is performed using one-dimensional convolution; Then it enters the residual backbone, which consists of at least three residual blocks stacked together; A phase difference-guided discrimination gating is introduced, and the residual mapping output is weighted to output local discrimination features.

[0020] The confusion-aware temporal discrimination coding module includes two layers of long short-term memory network and one layer of improved Transformer-Encoder.

[0021] The first layer of the Long Short-Term Memory (LSTM) network is used to extract the temporal features of the modulated signal in the initial stage. The second layer of the LSM network is used to compress and filter the temporal features to reduce redundant information and improve feature discriminability.

[0022] In this embodiment, the Discriminative Enhanced Residual Convolutional Feature Extraction Module (DE-ResCNN) is an improved version of the Residual Convolutional Neural Network (ResCNN), primarily used to extract local temporal features with stable discriminative capabilities from multi-channel signals. Through structural modifications tailored to the discriminative characteristics of modulated signals, this module effectively enhances the ability to perceive fine-grained differences in confused modulation categories while maintaining the local modeling advantages of convolutional networks, providing high-quality basic feature representations for subsequent temporal modeling.

[0023] The confusion-aware temporal discriminative coding module (CA-TDEncoder) is designed based on a collaborative modeling framework of Long Short-Term Memory (LSTM) networks and Transformer Encoders to characterize the evolution of modulated signals over time. The LSTM structure focuses on capturing short-term temporal dependencies and phase continuity features, while the Transformer encoder models global temporal correlations and long-term dependencies. To fully utilize temporal features from different discriminative perspectives, this module introduces a discriminant-oriented feature fusion mechanism internally, collaboratively integrating discriminative information from different modeling branches to form a unified and more discriminative temporal feature representation.

[0024] Furthermore, the process of inputting the multi-channel signal features into the discriminative enhancement residual convolutional feature extraction module for local discriminative feature extraction specifically includes the following steps: Build the input layer; Preliminary feature mapping: The multi-channel input sequence first undergoes a one-dimensional convolutional layer for preliminary mapping to obtain low-level local representations. Construct the backbone for residual feature extraction; A gated bypass is introduced alongside the residual backbone; The gating weight sequence is multiplied element-wise with the feature map output by the residual backbone, and the final output is a local discriminative feature that contains fine-grained structure and has high responsiveness to confusing features.

[0025] Furthermore, the specific steps taken by the discrimination enhancement classification module to complete the final classification decision are as follows: A single fully connected layer is used to compress the feature dimension into a low-dimensional embedding space; A second fully connected mapping layer is introduced to map the features to the modulation category space, and its output dimension is equal to the total number of modulation categories; The Softmax function is used to calculate the predicted probability of each modulation category to complete the final classification decision.

[0026] The discriminative enhancement classification module uses cross-entropy loss as the basic classification loss, and the overall loss function is defined as follows: (6) in, L CE Represents the standard cross-entropy loss. L DE This indicates the discriminant enhancement term. These are the weighting coefficients.

[0027] In this implementation, the discriminative enhancement classification module (DE-Classifier) ​​makes the modulation category decision. Based on an improved design using the traditional Softmax classifier and cross-entropy loss function, it introduces discriminative enhancement constraints for confused modulation categories while maintaining stable overall classification performance. By explicitly increasing the distribution spacing of confused modulation categories in the feature space during the classification stage, this module effectively guides the model to form clearer decision boundaries, thereby reducing the probability of misclassification.

[0028] This invention provides a discriminative enhancement fusion modulation recognition method for scrambled modulation categories. On the one hand, it preprocesses the signal before it enters the model to construct multi-channel signal features as the input basis for the subsequent model, which is unified and can meet the requirements of information complementarity. On the other hand, it introduces a discriminative enhancement mechanism through multi-feature collaborative modeling to focus the modeling capability on scrambled modulation categories. By making targeted modifications to key functional modules and carrying out collaborative fusion at the overall network level, the model can progressively enhance its discriminative ability for scrambled modulation categories at multiple stages, including feature extraction, temporal modeling, and classification decision. Attached Figure Description

[0029] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0030] Figure 1 This is a schematic diagram of the overall network structure of a discrimination enhancement fusion modulation recognition method for confusing modulation categories according to the present invention.

[0031] Figure 2 This is a schematic diagram of the discriminative enhancement residual convolution feature extraction module of the discriminative enhancement fusion modulation recognition method for confusing modulation categories according to the present invention.

[0032] Figure 3 This is a schematic diagram of the structure of the confusion-aware temporal discrimination coding module of the discrimination enhancement fusion modulation recognition method for confusion modulation categories according to the present invention.

[0033] Figure 4 This is a schematic diagram of the discriminant enhancement classification module of a discriminant enhancement fusion modulation recognition method for confusing modulation categories according to the present invention.

[0034] Figure 5 This is a line graph comparing the recognition accuracy of each model under different signal-to-noise ratios (SNR) according to an embodiment of the present invention.

[0035] Figure 6 This is a horizontal comparison diagram of the confusion matrix of different models when the signal-to-noise ratio is 8dB, provided as an embodiment of the present invention. Detailed Implementation

[0036] Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain the present invention, and should not be construed as limiting the present invention.

[0037] Please see Figure 1 This invention provides a discriminative enhancement fusion modulation identification method for confusing modulation categories, comprising the following steps: The signal is preprocessed to construct multi-channel signal features for the input model, which includes a discriminant-enhanced residual convolutional feature extraction module, a confusion-aware temporal discriminant coding module, and a discriminant-enhanced classification module. The multi-channel signal features are input into the discriminative enhanced residual convolution feature extraction module for local discriminative feature extraction; A confusion-aware temporal discriminant coding module is constructed. Local discriminant features are used as input to the confusion-aware temporal discriminant coding module. Relying on the multi-head self-attention mechanism and one-dimensional convolution operation, the output is a high-dimensional temporal discriminant feature that simultaneously contains local temporal information and global temporal correlation features. A discriminative enhancement classification module is constructed. The high-dimensional temporal discriminative features are used as input to the discriminative enhancement classification module. After passing through two fully connected layers, the features are mapped to the modulation category space. The predicted probability of each modulation category is calculated by the Softmax function to complete the final classification decision.

[0038] The specific implementation process is as follows: Construct an input layer to process the preprocessed five-channel one-dimensional signal sequence. It serves as the input to the Discriminative Enhanced Residual Convolutional Feature Extraction Module (DE-ResCNN).

[0039] Preliminary Feature Mapping: The multi-channel input sequence first undergoes a one-dimensional convolutional layer (Conv1D) for preliminary mapping to obtain low-level local representations. To preserve temporal resolution, this layer does not use max pooling; the specific parameters are set as follows: kernel length... Number of output channels It then undergoes batch normalization (BatchNorm) and the ReLU nonlinear activation function.

[0040] Residual Feature Extraction Backbone Construction: The initially mapped features are fed into a residual backbone network consisting of three stacked residual stages. The stride of each stage's convolution is set to 1 to avoid premature loss of fine-grained phase change information. The first stage contains two standard residual blocks, with a total of 64 output channels; The second and third stages each contain two residual blocks, with the number of output channels increasing to 128 and 256, respectively. To expand the receptive field without introducing pooling operations, an expansion rate is introduced in these two stages. Dilated Convolution.

[0041] Phase difference guided discrimination gating (PDG) mechanism: A gated bypass is introduced beside the residual backbone. Instantaneous phase difference channels are extracted from the input sequence. By using one-dimensional convolution and the Sigmoid function mapping, a dynamic gated weight sequence is generated. .

[0042] Feature weighting and output: Gating weight sequence The feature map output from the residual backbone is then weighted using element-wise multiplication. The final output is a local discriminative feature sequence containing fine-grained structure and exhibiting high responsiveness to confusing features. This serves as the input for the next stage of timing coding.

[0043] Input Feature Reception: Receive the local discriminative feature sequence output from step one. This feature sequence retains fine-grained phase difference information, but lacks correlation in the global time dimension.

[0044] Short-term temporal dependence and phase continuity extraction (two-layer LSTM modeling): The local discriminative features are first input into the first layer of the Long Short-Term Memory (LSTM) network, with the hidden layer dimension set to 96. The main purpose of this layer is to perform an extended mapping of the sequence in the initial stage, fully extracting the short-term temporal evolution and phase continuity patterns of adjacent time segments of the signal.

[0045] The features are then fed into the second LSTM layer, where the hidden layer dimension is compressed to 48. The purpose of this layer is to compress and reduce the dimensionality of the high-dimensional temporal features from the previous layer, removing redundant noise information while refining the discriminative power of features between confusing categories.

[0046] Local enhancement and global correlation modeling (improved Transformer-Encoder): The 48-dimensional time series sequence, after dimensionality reduction by the second LSTM layer, is input into the improved Transformer-Encoder module with a single-layer structure. The single-layer design is used to strictly control the model complexity with limited samples and avoid the overfitting risk of deep self-attention networks.

[0047] One-dimensional convolutional local enhancement: Before entering the self-attention layer, the sequence first undergoes a one-dimensional convolution (Conv1D) operation. The role of this convolutional layer is to compensate for the traditional Transformer's insufficient ability to perceive the local structure of the sequence (such as instantaneous phase jumps within a short window), and further stabilize the local temporal pattern.

[0048] Multi-channel feature interaction (multi-head self-attention mechanism): The features then enter the multi-head self-attention layer, with 4 parallel attention heads. The feature mapping dimension is consistent with the previous LSTM (i.e., 48 dimensions) to avoid additional mapping overhead. Through these 4 different attention heads, the network calculates the correlation weights between features at different time steps and in different channels within multiple subspaces, realizing cross-channel feature interaction and capturing long-sequence global dependencies.

[0049] Temporal encoding output: After residual connection and layer normalization, the output is a high-dimensional temporal discriminative feature sequence that contains both local discriminative details and global temporal correlations. It is then fed into a global pooling / flatten layer for flattening, in preparation for the final classification decision.

[0050] Global Pooling and Feature Flattening: Receive the high-dimensional temporal discriminative feature sequence output from step two. First, compress the sequence features into a one-dimensional global feature vector through a global pooling layer (GlobalPooling / Flatten), which serves as the initial input to the classifier.

[0051] Discriminative Feature Dimensionality Reduction Mapping (Low-Dimensional Embedding Layer Construction): The global feature vector is input to the first fully connected layer (MLP / FullyConnectedLayer). To preserve discriminative information while eliminating redundant noise, the output dimension of this fully connected layer is strictly set to 64 dimensions. This operation constructs a compact low-dimensional embedding space, effectively avoiding the problem of blurred decision boundaries caused by an overly sparse feature space, which is crucial for distinguishing between confused modulation categories.

[0052] Modulation category space mapping and probability output (classifier head construction): The 64-dimensional features after dimensionality reduction then enter the second fully connected layer for classification mapping. The output dimension of this layer is precisely set to , where represents the total number of modulation categories preset by the system. Finally, the output state of each category neuron is calculated using the Softmax activation function to obtain the predicted probability matrix of the signal to be identified belonging to each modulation mode, thus completing the basic classification decision.

[0053] Discriminative Enhancement Joint Loss Function Calculation (Core Optimization Mechanism): During the model backpropagation and parameter optimization stages, a single classification loss is abandoned, and a joint loss function composed of the basic classification loss is constructed (…). ) and discriminative enhancement loss ( The joint loss function is composed of .

[0054] On the one hand, in modulation recognition tasks, please refer to Figure 2 Residual Convolutional Neural Networks (ResCNNs) are widely used to learn discriminative representations from IQ sequences or their derived features due to their strong local feature extraction capabilities and stable deep training characteristics. This invention proposes a Discriminative Enhancement ResCNN (DE-ResCNN) module based on ResCNN. The design goal of this module is to enhance the network's ability to discriminate cues related to confused modulation categories (especially...) while preserving the fine-grained structure of the modulation signal as much as possible. The sensitivity to the changing patterns of the data provides cleaner and more separable local discriminative features for the subsequent confusion-aware temporal discriminative coding module.

[0055] The input to DE-ResCNN is a multi-channel feature sequence constructed through information preprocessing: (14) in For phase-aligned sequences, This is a sequence of instantaneous phase differences. Unlike traditional ResCNNs that only use (I,Q) as input, multi-channel input can characterize modulation discriminative information from three complementary perspectives: amplitude, phase, and phase change, which is especially crucial for distinguishing confused modulation categories. However, multi-channel input also brings a problem: the discriminative contributions of different channels are not balanced. If they are simply concatenated and then directly convolved, the network may not be able to spontaneously and stably focus on the "more discriminative" channels and time segments. Therefore, this invention introduces a phase difference-guided discriminative gating mechanism outside the ResCNN backbone, enabling... It can explicitly participate in the feature selection process, thereby enhancing the expression of confusion-related discriminative information.

[0056] DE-ResCNN consists of a lightweight convolutional front-end and multiple layers of residual blocks, while satisfying the following two structural principles: To avoid premature compression of fine-grained phase change information at the front end, this invention does not use max pooling, but instead uses one-dimensional convolution with a stride of 1 as the main feature extraction method, and expands the receptive field by dilated convolution when necessary, thereby enhancing the local structure modeling capability without sacrificing temporal resolution. use The intensity of changes guides the network to weight key time segments and key channels, reducing the interference of noise and irrelevant changes on feature learning.

[0057] In terms of structure, DE-ResCNN first uses a one-dimensional convolutional layer to perform an initial mapping of the multi-channel input: (15) in The kernel length is 1. Number of output channels It is a nonlinear activation function. Considering that the local structure of the modulated signal is usually distributed within a short window, in order to balance local detail capture and computational complexity, this invention sets the kernel length of this layer to be... The number of output channels is set to In order to obtain relatively sufficient low-level local representation capabilities.

[0058] This then leads to the residual backbone, which is formed by stacking multiple residual blocks. A standard residual block can be represented as: (16) in It consists of two layers of one-dimensional convolution, normalization, and activation functions. Unlike existing ResCNN technologies, this invention introduces phase-diff guided gating (PDG) into the residual backbone: let the gating weight sequence be... (15) And weight the residual mapping output. (16) in, This represents element-wise multiplication. The intuitive meaning of this mechanism is that when there are more significant structural changes in the instantaneous phase difference, the gating weights tend to be larger, thereby improving the network's response to these segments; conversely, when the phase difference changes are mainly caused by noise or lack structure, the gating weights suppress the residual output, which helps stabilize feature learning and reduce the risk of confusion and misjudgment.

[0059] In one embodiment of the present invention, to achieve a balance between discriminative capability and computational cost, three residual stages are employed, with the number of channels set to 64, 128, and 256 respectively, and each stage containing two residual blocks. Simultaneously, to maintain temporal resolution and preserve fine-grained structural information, the stride of the convolution in each stage is set to 1; if it is necessary to expand the receptive field, an expansion rate is introduced in the latter two stages. Dilated convolutions are used to enhance the ability to model longer local patterns without introducing pooling operations.

[0060] After passing through DE-ResCNN, the output local discriminative features are represented as follows: (17) in It retains fine-grained structural information in the time dimension and exhibits higher responsiveness to confusion-related discriminative segments under phase difference-guided gating. This feature will serve as input to the confusion-aware temporal discriminative coding module (CA-TD Encoder) in the next section, providing a more stable and separable basic representation for its short-term dependency and long-term global association modeling.

[0061] Secondly, in modulation recognition tasks, confused modulation categories are often highly similar at the local feature level, and their discriminative differences are more reflected in the temporal evolution of the signal. Therefore, after completing the extraction of local discriminative features, this invention further models the temporal characteristics of the modulation signal and uses a confusion-aware temporal discriminative coding module to enhance the separability of confused modulation categories in the time dimension.

[0062] The input to this module is the local discriminative feature sequence output by the Discriminative Enhanced Residual Convolutional Feature Extraction Module (DE-ResCNN), and its overall structure is shown in Figure 3.

[0063] Considering the significant time-series characteristics of modulated signals, this invention first employs a Long Short-Term Memory (LSTM) network to perform temporal modeling of local discriminative features. LSTM, through its gating mechanism, can effectively capture short-term temporal dependencies and phase continuity features in modulated signals, making it suitable for modeling the evolution patterns between adjacent time segments. However, a single LSTM structure has limitations in modeling long-range dependencies and global temporal correlations, and it is difficult to explicitly distinguish key time segments that significantly contribute to modulation discrimination.

[0064] To address this, this invention introduces an improved Transformer-Encoder structure based on LSTM to further enhance the modeling of temporal features. The Transformer-Encoder, relying on a multi-head self-attention mechanism, can model the dependencies between different time positions globally, thereby highlighting temporal features crucial for modulation discrimination. Simultaneously, to overcome the limitations of traditional Transformers in perceiving local temporal structures, this invention introduces a one-dimensional convolution operation into its encoding structure to enhance the modeling ability of local temporal patterns.

[0065] The CA-TD Encoder consists of two LSTM layers and one improved Transformer-Encoder layer. The first LSTM layer has a hidden layer dimension of 96 to fully extract the temporal features of the modulated signal in the initial stage. The second LSTM layer has a hidden layer dimension of 48 to compress and filter the temporal features, reducing redundant information and improving feature discriminativeness. This hierarchical design of "expanding first and then compressing" can effectively control model complexity while ensuring modeling capability.

[0066] In one embodiment of the present invention, an improved Transformer-Encoder is introduced to globally model temporal features based on the LSTM output. Considering the balance between sample size and model complexity in modulation recognition tasks, this embodiment adopts a single-layer encoder structure to avoid the risk of overfitting caused by excessively deep networks. The number of heads in the multi-head self-attention mechanism is set to 4 to model the correlation between modulation signals in the temporal dimension from different subspaces. The feature dimension of the encoder is consistent with the output dimension of the second-layer LSTM, thereby reducing additional feature mapping overhead and ensuring the continuity of temporal features before and after encoding.

[0067] Through collaborative modeling using LSTM and an improved Transformer-Encoder, the temporal discriminative features output by the CA-TD Encoder simultaneously contain stable local temporal information and global temporal correlation features, further enhancing the separability of scrambled modulation categories in the feature space. This module provides a more compact and discriminative feature representation for subsequent discriminative classification modules.

[0068] Thirdly, after completing local discriminative feature extraction and confusion-aware temporal discriminative encoding, the model needs to map the fused high-dimensional discriminative features to specific modulation category labels. In existing technologies, in the scenario of confusion modulation category recognition, simply relying on cross-entropy loss often fails to explicitly constrain the distribution relationship of different modulation categories in the feature space, which can easily lead to unclear discriminative boundaries between confusion categories.

[0069] Based on the traditional Softmax classification structure, this invention introduces a Discriminative Enhancement Classifier (DE-Classifier) ​​module, the overall structure of which is shown in Figure 4.

[0070] The input to the DE-Classifier is the high-dimensional temporal discriminative features output by the CA-TD Encoder. To further enhance the discriminative power of the features and reduce redundant information, this invention first employs a fully connected layer to compress the feature dimension to a lower-dimensional embedding space. The output dimension of this fully connected layer is set to 64, which is used to reduce the feature dimension while maintaining discriminative information, thereby improving classification stability and reducing the number of model parameters. Compared to directly using higher-dimensional embeddings, this setting can effectively avoid the feature space being too sparse, which is more beneficial for discriminating modulation categories. After the discriminative embedding layer, a second fully connected layer is introduced to map the features to the modulation category space, and its output dimension is equal to the total number of modulation categories. M The Softmax function is then used to calculate the predicted probability of each modulation category, thus completing the final classification decision.

[0071] Regarding the loss function, DE-Classifier uses cross-entropy loss as the basic classification loss, while introducing a discriminative enhancement term for confusing modulation categories to impose additional constraints on the feature distribution. The overall loss function is defined as: (6) In the formula, the basic classification loss Standard cross-entropy loss is used to ensure the overall macro-level classification accuracy of the model. Discriminative augmentation loss is also employed. For highly confusing modulation categories (such as higher-order QAM or homogeneous modulation), this constraint term is introduced into the aforementioned 64-dimensional feature embedding space. Its physical meaning lies in explicitly calculating and minimizing the distance between feature centers of similar samples (preserving feature aggregation), while maximizing the boundary margin between feature centers of different confusing categories (increasing the distribution spacing). Dynamic weight coefficients To avoid introducing training instability, the weight coefficients... The values ​​are dynamically adjusted within a range of 0.1 to 0.3 from the beginning to the end of training. This joint loss mechanism forces the model to learn a clearer and more separable decision boundary from the mathematical foundation, thus significantly reducing the probability of confusion and misjudgment.

[0072] In one embodiment, Setting it in the range of 0.1–0.3, this value can effectively enhance the ability to distinguish between confused modulation categories without introducing training instability.

[0073] Verification experiment: To verify the effectiveness of the proposed modulation recognition method based on discriminant enhancement (DE-Classifier), comparative experiments were conducted on the standard RML2018.01a dataset (containing 24 modulation categories). Classic deep learning benchmark models such as CNN1, ResNet, CLDNN, and MCLDNN were selected as comparison targets.

[0074] Overall recognition accuracy comparison and analysis. like Figure 5 As shown, the overall recognition accuracy of each model is displayed within the signal-to-noise ratio (SNR) range of -20dB to 20dB. In the low SNR region, the proposed method still maintains good noise resistance. As shown in the magnified box (Inset) in the figure, when the SNR is greater than 0dB, the traditional benchmark model's accuracy tends to saturate around 88%~94% due to its limited ability to extract features from high-order modulation signals (such as high-order QAM family). However, this invention, by introducing a discriminative enhancement module, effectively widens the inter-class distance and narrows the intra-class difference, successfully breaking through this bottleneck and maintaining optimal performance across the entire SNR range, with the highest recognition accuracy approaching 99%.

[0075] Analysis of the ability to distinguish between confused categories To visually demonstrate the advantages of this invention in resolving "easily confused categories," Figure 6 The confusion matrices of CNN1, ResNet, MCLDNN, CLDNN, and the method of this invention are compared when SNR=8dB. Figure 6 In the confusion matrices of (a) CNN1, (b) ResNet, (c) MCLDNN, and (d) CLDNN, there are many obvious dark distributions outside the main diagonal, especially between higher-order modulation signals such as 16QAM, 32QAM, and 64QAM, where serious mutual misjudgments occur. Figure 6 (c) The confusion matrix of the present invention has a very deep blue color block on its main diagonal and extremely clean off-diagonal regions (nearly zero false positives). This fully demonstrates that the present method successfully resolves the feature overlap of easily confused categories in the latent vector space through the joint loss constraint of the feature space, significantly improving the accuracy of fine-grained classification.

[0076] In the task of identifying easily confused modulation categories, the method proposed in this invention effectively strengthens the discrimination boundary between similar modulation categories through multi-feature collaborative modeling and a discrimination enhancement mechanism. Confusion matrix analysis results show that, under medium-to-low SNR conditions, the model significantly alleviates misclassification phenomena between high-order QAM and family-specific modulation schemes.

[0077] The experimental design and analysis results show that the proposed method has good stability and potential advantages in complex channel environments, especially in the task of identifying easily confused modulation categories, demonstrating strong discriminative ability.

[0078] The above-disclosed embodiments are merely some preferred embodiments of the present invention and should not be construed as limiting the scope of the invention. Those skilled in the art will understand that implementing all or part of the above-described embodiments and making equivalent changes in accordance with the claims of the present invention are still within the scope of the invention.

Claims

1. A discriminative enhancement fusion modulation recognition method for confusing modulation categories, characterized in that, Includes the following steps: The signal is preprocessed to construct multi-channel signal features for the input model, which includes a discriminant-enhanced residual convolutional feature extraction module, a confusion-aware temporal discriminant coding module, and a discriminant-enhanced classification module. The multi-channel signal features are input into the discriminative enhanced residual convolution feature extraction module for local discriminative feature extraction; A confusion-aware temporal discriminant coding module is constructed. Local discriminant features are used as input to the confusion-aware temporal discriminant coding module. Relying on the multi-head self-attention mechanism and one-dimensional convolution operation, the output is a high-dimensional temporal discriminant feature that simultaneously contains local temporal information and global temporal correlation features. A discriminative enhancement classification module is constructed. The high-dimensional temporal discriminative features are used as input to the discriminative enhancement classification module. After passing through two fully connected layers, the features are mapped to the modulation category space. The predicted probability of each modulation category is calculated by the Softmax function to complete the final classification decision.

2. The discriminative enhancement fusion modulation identification method for confusing modulation categories as described in claim 1, characterized in that, The preprocessing of the signal to construct the multi-channel signal features of the input model includes the following steps: The signal is normalized by energy to eliminate the influence of channel gain variation; The instantaneous phase is relativized to preserve the relative structure of the phase change; Construct multi-channel signal features for the input model.

3. The discriminative enhancement fusion modulation identification method for confusing modulation categories as described in claim 2, characterized in that, The normalized signal is represented as: (1) The corresponding IQ components are: (2) in, This represents the original complex baseband signal received at the t-th sampling time. N This represents the total number of signal sampling points contained within an observation window or a single signal sample. This represents the complex baseband signal after energy normalization, a process used to eliminate the influence of channel gain variations on signal amplitude. and These represent the in-phase and quadrature components of the normalized signal, respectively. and These represent the operations of taking the real part and imaginary part of a complex number, respectively.

4. The discriminative enhancement fusion modulation identification method for confused modulation categories as described in claim 2, characterized in that, The relativization of the instantaneous phase includes the following steps: Calculate the behavioral mean: (3) Obtain the phase sequence after phase alignment: (4) in, This represents the average instantaneous phase of the signal within N sampling points over an observation window. This represents the instantaneous phase of the signal at the t-th sampling time. This indicates that the relativized phase sequence after subtracting the mean eliminates the uncertainty interference of the initial phase shift, i.e., the overall phase translation.

5. The discriminative enhancement fusion modulation identification method for confusing modulation categories as described in claim 2, characterized in that, The multi-channel signal characteristics are represented as follows: (5) and These represent the in-phase and quadrature components of the normalized signal, respectively. Indicates the instantaneous amplitude of the signal. This represents the relativized phase sequence after subtracting the mean. This operation eliminates the uncertainty interference of the initial phase shift, i.e., the overall phase translation. It represents the instantaneous phase difference change between adjacent sampling points.

6. The discriminative enhancement fusion modulation identification method for confused modulation categories as described in claim 1, characterized in that, The process of inputting the multi-channel signal features into the discriminative enhancement residual convolutional feature extraction module for local discriminative feature extraction is as follows: Initial mapping of multi-channel input features is performed using one-dimensional convolution; Then it enters the residual backbone, which consists of at least three residual blocks stacked together; A phase difference-guided discrimination gating is introduced, and the residual mapping output is weighted to output local discrimination features.

7. The discriminative enhancement fusion modulation identification method for confusing modulation categories as described in claim 1, characterized in that, The confusion-aware temporal discrimination coding module includes two layers of long short-term memory network and one layer of improved Transformer-Encoder.

8. The discriminative enhancement fusion modulation identification method for confusing modulation categories as described in claim 1, characterized in that, The input of the multi-channel signal features into the discriminative enhanced residual convolutional feature extraction module for local discriminative feature extraction specifically includes the following steps: Build the input layer; Preliminary feature mapping: The multi-channel input sequence first undergoes a one-dimensional convolutional layer for preliminary mapping to obtain low-level local representations. Construct the backbone for residual feature extraction; A gated bypass is introduced alongside the residual backbone; The gating weight sequence is multiplied element-wise with the feature map output by the residual backbone, and the final output is a local discriminative feature that contains fine-grained structure and has high responsiveness to confusing features.

9. The discriminative enhancement fusion modulation identification method for confusing modulation categories as described in claim 1, characterized in that, The specific steps taken by the discriminative enhancement classification module to complete the final classification decision are as follows: A single fully connected layer is used to compress the feature dimension into a low-dimensional embedding space; A second fully connected mapping layer is introduced to map the features to the modulation category space, and its output dimension is equal to the total number of modulation categories; The Softmax function is used to calculate the predicted probability of each modulation category to complete the final classification decision.

10. The discriminative enhancement fusion modulation identification method for confusing modulation categories as described in claim 9, characterized in that, The discriminative enhancement classification module uses cross-entropy loss as the basic classification loss, and the overall loss function is defined as follows: (6) in, L CE Represents the standard cross-entropy loss. L DE This indicates the discriminant enhancement term. These are the weighting coefficients.