Wideband signal detection and recognition method based on time-frequency representation optimization and network reconstruction
By using time-frequency characterization optimization and network reconstruction, a complex-valued time-frequency matrix is generated and divided into an energy-phase joint domain and an energy domain. Combined with a specific network model for training and updating, the problem of phase information loss in existing methods is solved, and more efficient broadband signal detection and recognition are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- AIR FORCE UNIV PLA
- Filing Date
- 2026-05-22
- Publication Date
- 2026-06-19
AI Technical Summary
Existing broadband signal detection and recognition methods discard phase information when processing complex time-frequency spectra, leading to a decline in the performance of modulation recognition models. Furthermore, existing research has not explored in depth the impact of different signal representation methods on task performance.
A method based on time-frequency representation optimization and network reconstruction is adopted. A complex-valued time-frequency matrix is generated through short-time Fourier transform and norm normalization. The matrix is divided into an energy-phase joint domain and an energy domain. The matrix is input into ResNet50 and CV-ResNet50 backbone networks. The model is combined with Sparse R-CNN and DINO models for target detection. A common basis loss is defined for model update.
It achieves accurate identification of signal detection, overcomes the information loss caused by traditional power conversion, and improves the robustness and recognition accuracy of the model, especially showing better performance under amplitude spectrum representation.
Smart Images

Figure CN122247540A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of signal detection and recognition technology, and in particular to a broadband signal detection and recognition method based on time-frequency characterization optimization and network reconstruction. Background Technology
[0002] With the continuous expansion of fifth-generation communication user terminals and application scenarios, data transmission rates and connection densities in wireless communication have steadily improved. However, the contradiction between limited spectrum resources and ever-increasing communication demands is becoming increasingly prominent. Wideband Signal Detection and Recognition (WSDR), as an intelligent spectrum sensing technology, provides crucial support for on-demand air interface resource allocation and proactive interference avoidance, and has become an important means to improve spectrum utilization. Compared to performing signal detection and modulation recognition separately, WSDR's end-to-end joint processing approach better meets the needs of actual dynamic spectrum management and has significant application potential in complex electromagnetic environments.
[0003] Currently, the mainstream approach to WSDR (Wind-Side Detection and Recognition) involves constructing time-frequency feature representations for sampled signals through time-frequency transformation and feature mapping, and then utilizing advanced target detection models to achieve multi-signal time-frequency localization and modulation recognition. Existing research mainly focuses on two directions: first, performance optimization for complex electromagnetic scenarios, aiming to improve signal localization and recognition accuracy under conditions such as low signal-to-noise ratio, small sample learning, and multiple signal overlap; second, network architecture innovation based on signal structure priors, constructing a more efficient anchor-free detection mechanism by introducing time-frequency center or key point constraints.
[0004] While existing methods have made progress on specific metrics, they generally share a common limitation: in the feature mapping stage, the time-frequency spectrum is usually mapped to a real-valued grayscale or color (Red Green Blue, RGB) image through logarithmic power transformation and image quantization.
[0005] This processing method facilitates compatibility with visual inspection frameworks, but it discards phase information and introduces nonlinear amplitude compression when processing complex-valued time-frequency spectra, and its rationality and effectiveness remain unverified. In the field of automatic modulation recognition, existing research has shown that different signal representations, such as baseband in-phase / quadrature (I / Q) sequences, amplitude / phase sequences, and complex-valued sequences, significantly affect the performance of modulation recognition models. However, in the field of WSDR, the impact of different time-frequency feature representations on task performance has not been thoroughly explored.
[0006] Therefore, it is necessary to provide a new technical solution to improve one or more of the problems existing in the above solutions.
[0007] It should be noted that the information disclosed in the background section above is only used to enhance the understanding of the background of this application, and therefore may include information that does not constitute prior art known to those skilled in the art. Summary of the Invention
[0008] The purpose of this application is to provide a broadband signal detection and identification method based on time-frequency characterization optimization and network reconstruction, thereby overcoming one or more problems caused by the limitations and defects of related technologies to a certain extent.
[0009] A broadband signal detection and identification method based on time-frequency characterization optimization and network reconstruction, according to an embodiment of this application, includes: The received broadband signal is frequency-converted using the local oscillator signal and then low-pass filtered to obtain the baseband complex signal. The baseband complex signal is subjected to short-time Fourier transform and norm normalization to obtain a complex-valued time-frequency matrix; wherein, the complex-valued time-frequency matrix includes ten different time-frequency feature representations, which include: complex value, amplitude / phase, in-phase / orthogonal, in-phase magnitude / orthogonal magnitude, amplitude, power, logarithmic power, color, grayscale, and multi-channel enhancement; The ten different time-frequency feature representations are divided into an energy-phase joint domain and an energy domain; wherein, the energy-phase joint domain includes: complex value, amplitude / phase, in-phase / quadrature, and the energy domain includes: in-phase magnitude / quadrature magnitude, amplitude, power, logarithmic power, color, grayscale, and multi-channel enhancement; The energy-phase joint domain and the energy domain are used as input time-frequency features and input into the backbone network to obtain the corresponding feature map; wherein, the backbone network includes ResNet50 and CV-ResNet50; The feature map is fed into the target detection model to obtain several sets of prediction results; wherein, the target detection model includes the Sparse R-CNN model and the DINO model, and each set of prediction results includes a time-frequency boundary bounding box and a modulation class probability; Define a common base loss for adapting the Sparse R-CNN model and the DINO model, and obtain the total loss of the Sparse R-CNN model and the total loss of the DINO model based on the common base loss. The Sparse R-CNN model is updated based on the total loss of the Sparse R-CNN model to obtain the updated Sparse R-CNN model. The DINO model is updated based on the total loss of the DINO model to obtain the updated DINO model. The feature maps are input into the updated Sparse R-CNN model and the updated DINO model respectively to obtain several sets of final prediction results; wherein each set of final prediction results includes the final time-frequency boundary bounding box and the final modulation class probability.
[0010] In the embodiments of this application, the step of using the local oscillator signal to convert the received broadband signal to a frequency and then performing low-pass filtering to obtain a baseband complex signal includes: The expression for the broadband signal is as follows: (1) In the formula, Indicates a broadband signal. Indicates the first The baseband modulation signal of the road signal, , Indicates the time-varying amplitude of a broadband signal. Indicates the modulation phase of a broadband signal. Represents the time-domain channel gain. This represents the real-valued additive white Gaussian noise introduced at the receiver front end, where N represents the total number of signals. This represents a complex exponential carrier signal, where j represents the imaginary number, and π takes the value 3.14. Indicates the first Road signal frequency, Indicates the first The baseband modulation signal of the signal is modulated to the first... The real part of the signal frequency is then taken and transmitted. This represents the time frame index of the time-frequency matrix.
[0011] In the embodiments of this application, the step of using the local oscillator signal to convert the received broadband signal to a frequency and then performing low-pass filtering to obtain a baseband complex signal includes: The expression for the baseband complex signal is as follows: (2) In the formula, Indicates a baseband complex signal. , Indicates the first The frequency offset between the path signal frequency and the local oscillator signal frequency. Indicates the local oscillator signal frequency. This represents the carrier phase drift term caused by the frequency offset. This represents complex baseband noise. , This represents the local oscillator signal.
[0012] In the embodiments of this application, the step of performing short-time Fourier transform and norm normalization on the baseband complex signal to obtain a complex-valued time-frequency matrix includes: Perform a short-time Fourier transform on the baseband complex signal to obtain a time-frequency matrix; wherein the expression of the time-frequency matrix is as follows: (3) In the formula, Represents the time-frequency matrix. This represents the time frame index of the time-frequency matrix. Represents the frequency index of the time-frequency matrix. Represents a complex signal sequence. Represents the window function. This indicates the step size that controls the window offset between adjacent frames. M This indicates the length of the short-time Fourier transform. n Represents a local index within the short-time Fourier transform window, 0 ≤ n ≤M-1, Indicates phase.
[0013] In the embodiments of this application, the step of performing short-time Fourier transform and norm normalization on the baseband complex signal to obtain a complex-valued time-frequency matrix includes: The time-frequency matrix is normalized using a norm to obtain a complex-valued time-frequency matrix; wherein the calculation formula for the complex-valued time-frequency matrix is as follows: (4) In the formula, Represents a complex-valued time-frequency matrix. This represents the maximum value of the modulus of the complex-valued time-frequency matrix.
[0014] In the embodiments of this application, the step of using the energy-phase joint domain and the energy domain as input time-frequency features and inputting them into the backbone network to obtain the corresponding feature map includes: The input time-frequency features all include real-valued tensors and complex-valued tensors. The real-valued tensors are input into ResNet50, and the complex-valued tensors are input into CV-ResNet50. The backbone network's first The complex convolution operation of a layer is defined as follows: (5) In the formula, The first part represents the backbone network. Complex convolution operation of layers, Indicates the first The real part of the complex convolution kernel. Indicates the first The imaginary part of a complex convolution kernel. express The real part, express The imaginary part, Represents a complex-valued tensor. , This indicates that the complex-valued tensor belongs to a complex space of dimensions H, W, and 1. Indicates the height of the time-frequency matrix. This represents the width of the time-frequency matrix. This represents the convolution operation; The expression for the feature map is as follows: (6) In the formula, Representing feature maps, Represents the feature extraction function. This represents the learnable parameters of the network. This represents the input time-frequency characteristics.
[0015] In the embodiments of this application, the expression for the common basis loss is as follows: (7) In the formula, Indicates common underlying loss, The balance coefficient representing the classification loss. , This represents the balance coefficient of the bounding box regression loss. , The balance coefficient represents the positioning regularization loss. , Indicates focal loss. , Indicates the probability of modulation category. , , This represents the mean absolute error loss used in coordinate regression. , The mean absolute error norm is represented by the standard deviation of the mean absolute error. This indicates the center coordinates and width and height of the prediction box. Represents the center coordinates and width and height of the actual bounding box; This indicates the bounding box positioning regularization term. R represents and The minimum circumscribed rectangle. Indicates the area of the box. Indicates intersection, It represents the union of sets.
[0016] In the embodiments of this application, the expression for the total loss of the Sparse R-CNN model is as follows: (8) In the formula, This represents the total loss of the Sparse R-CNN model, where K represents the number of iterations, and k ranges from 1 to K. This represents the weight in the k-th iteration stage. , This represents the common fundamental loss in the k-th iteration stage.
[0017] In the embodiments of this application, the expression for the total loss of the DINO model is as follows: (9) In the formula, This represents the total loss of the DINO model. This represents the auxiliary loss applied to noisy queries. Represents the balance coefficient. .
[0018] The technical solutions provided by the embodiments of this application may include the following beneficial effects: In one embodiment of this application, the above method categorizes ten different time-frequency feature representations into two types: energy-phase joint domain and energy domain. This achieves a complete representation of key information such as amplitude, phase, and energy of broadband signals from different dimensions of time-frequency features. The energy-phase joint domain and energy domain are used as input time-frequency features and input into a backbone network including ResNet50 and CV-ResNet50 to obtain corresponding feature maps. These feature maps are then input into a target detection model including a Sparse R-CNN model and a DINO model, enabling the reconstruction of the target detection models and allowing them to effectively adapt to different styles of time-frequency feature representations. The Sparse R-CNN model is trained and updated using its total loss to obtain an updated Sparse R-CNN model. Similarly, the DINO model is trained and updated using its total loss to obtain an updated DINO model. The feature maps are then input into the updated Sparse R-CNN model and the updated DINO model, respectively, to obtain the final prediction result, achieving accurate signal detection and recognition. This application reconstructs the target detection model to enable it to handle multi-channel real-valued and complex-valued matrix inputs. Traditional power transformation and logarithmic processing cause information loss and lead to a significant performance degradation. In contrast, amplitude spectrum exhibits better robustness in WSDR tasks, thereby systematically evaluating the impact of different time-frequency feature representations on WSDR performance.
[0019] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and do not limit this application. Attached Figure Description
[0020] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application. It is obvious that the drawings described below are merely some embodiments of this application, and those skilled in the art can obtain other drawings based on these drawings without any inventive effort.
[0021] Figure 1 The flowchart illustrating the steps of a broadband signal detection and recognition method based on time-frequency representation optimization and network reconstruction in this application is shown in the schematic diagram. Figure 2 This schematic diagram illustrates the WSDR processing flow based on time-frequency feature representation in this application. Figure 3 This illustration shows the WSDR processing flow and model architecture diagram under different time-frequency feature representations in this application; Figure 4 This application is shown schematically. Figure 3 Enlarged view of point V in the image; Figure 5 This application is shown schematically. Figure 3 Enlarged view of point Z in the image; Figure 6 This diagram illustrates the mAR performance variation of the Sparse R-CNN model under different signal-to-noise ratio conditions in this application. Figure 7 This diagram illustrates the mAP performance variation of the Sparse R-CNN model under different signal-to-noise ratio conditions in this application. Figure 8 This schematic diagram illustrates the modulation classification confusion matrix represented by complex values under the DINO model in this application. Figure 9 This schematic diagram illustrates the amplitude / phase representation of the modulation classification confusion matrix under the DINO model in this application; Figure 10 This schematic diagram illustrates the amplitude representation of the modulation classification confusion matrix under the DINO model in this application; Figure 11 This illustration schematically shows the modulation classification confusion matrix in the Sparse R-CNN model, where the complex-valued representation is used. Figure 12 This diagram illustrates the amplitude / phase representation of the modulation classification confusion matrix in the Sparse R-CNN model of this application. Figure 13 This illustration schematically shows the amplitude representation of the modulation classification confusion matrix in the Sparse R-CNN model of this application. Detailed Implementation
[0022] Exemplary embodiments will now be described more fully with reference to the accompanying drawings. However, these exemplary embodiments can be implemented in many forms and should not be construed as limited to the examples set forth herein; rather, they are provided to make this application more comprehensive and complete, and to fully convey the concept of the exemplary embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
[0023] Furthermore, the accompanying drawings are merely illustrative of this application and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and therefore, repeated descriptions of them will be omitted.
[0024] This example implementation provides a broadband signal detection and identification method based on time-frequency characterization optimization and network reconstruction. (Reference) Figure 1 As shown, the method may include steps S101 to S108.
[0025] In step S101, the received broadband signal is frequency-converted using the local oscillator signal and then low-pass filtered to obtain a baseband complex signal.
[0026] Step S102: Perform short-time Fourier transform and norm normalization on the baseband complex signal to obtain a complex-valued time-frequency matrix; wherein, the complex-valued time-frequency matrix includes ten different time-frequency feature representations, including: complex value, amplitude / phase, in-phase / orthogonal, in-phase magnitude / orthogonal magnitude, amplitude, power, logarithmic power, color, grayscale, and multi-channel enhancement.
[0027] Step S103: Divide the ten different time-frequency feature representations into the energy-phase joint domain and the energy domain; wherein, the energy-phase joint domain includes: complex value, amplitude / phase, in-phase / quadrature, and the energy domain includes: in-phase magnitude / quadrature magnitude, amplitude, power, logarithmic power, color, grayscale, and multi-channel enhancement.
[0028] Step S104: Input the energy-phase joint domain and the energy domain as input time-frequency features and input them into the backbone network to obtain the corresponding feature map; wherein, the backbone network includes ResNet50 and CV-ResNet50.
[0029] Step S105: Feed the feature map into the object detection model to obtain several sets of prediction results; among them, the object detection model includes the Sparse R-CNN model and the DINO model, and each set of prediction results includes the time-frequency boundary bounding box and the modulation class probability.
[0030] Step S106: Define the common basis loss for the Sparse R-CNN model and the DINO model, and obtain the total loss of the Sparse R-CNN model and the total loss of the DINO model based on the common basis loss.
[0031] Step S107: Update the Sparse R-CNN model based on the total loss of the Sparse R-CNN model to obtain the updated Sparse R-CNN model. Update the DINO model based on the total loss of the DINO model to obtain the updated DINO model.
[0032] Step S108: Input the feature maps into the updated Sparse R-CNN model and the updated DINO model respectively to obtain several sets of final prediction results; where each set of final prediction results includes the final time-frequency boundary bounding box and the final modulation class probability.
[0033] In one embodiment of this application, the above method categorizes ten different time-frequency feature representations into two types: energy-phase joint domain and energy domain. This achieves a complete representation of key information such as amplitude, phase, and energy of broadband signals from different dimensions of time-frequency features. The energy-phase joint domain and energy domain are used as input time-frequency features and input into a backbone network including ResNet50 and CV-ResNet50 to obtain corresponding feature maps. These feature maps are then input into a target detection model including a Sparse R-CNN model and a DINO model, enabling the reconstruction of the target detection models and allowing them to effectively adapt to different styles of time-frequency feature representations. The Sparse R-CNN model is trained and updated using its total loss to obtain an updated Sparse R-CNN model. Similarly, the DINO model is trained and updated using its total loss to obtain an updated DINO model. The feature maps are then input into the updated Sparse R-CNN model and the updated DINO model, respectively, to obtain the final prediction result, achieving accurate signal detection and recognition. This application reconstructs the target detection model to enable it to handle multi-channel real-valued and complex-valued matrix inputs. Traditional power transformation and logarithmic processing cause information loss and lead to a significant performance degradation. In contrast, amplitude spectrum exhibits better robustness in WSDR tasks, thereby systematically evaluating the impact of different time-frequency feature representations on WSDR performance.
[0034] Below, we will refer to Figures 2 to 5 The steps of the method described above in this example embodiment will be explained in more detail.
[0035] Before describing this application, the WSDR processing flow will be explained first. For example... Figure 2 As shown, the WSDR processing flow first involves the receiver capturing a broadband signal and converting it into a baseband waveform. Subsequently, these signals undergo time-frequency transformation and optional preprocessing steps (such as normalization and logarithmic transformation) before being fed into a detector containing a target detection model, thereby achieving signal detection and modulation type identification functions.
[0036] In step S101, the broadband signal is received by the receiver. The received broadband signal is modeled as a superposition of N simultaneously transmitted modulated signals. Each modulated signal is distorted by the time-varying channel response and is also affected by additive noise.
[0037] In one embodiment, step S101 includes the following: The expression for a broadband signal is as follows: (1) In the formula, Indicates a broadband signal. Indicates the first The baseband modulation signal of the road signal, , Indicates the time-varying amplitude of a broadband signal. Indicates the modulation phase of a broadband signal. Represents the time-domain channel gain. This represents the real-valued additive white Gaussian noise introduced at the receiver front end, where N represents the total number of signals. This represents a complex exponential carrier signal, where j represents the imaginary number, and π takes the value 3.14. Indicates the first Road signal frequency, Indicates the first The baseband modulation signal of the signal is modulated to the first... The real part of the signal frequency is then taken and transmitted. This represents the time frame index of the time-frequency matrix.
[0038] In one embodiment, the received broadband signal is frequency-converted using a local oscillator signal and then low-pass filtered to obtain a baseband complex signal. Further, the expression for the baseband complex signal is as follows: (2) In the formula, Indicates a baseband complex signal. , Indicates the first The frequency offset between the path signal frequency and the local oscillator signal frequency. Indicates the local oscillator signal frequency. This represents the carrier phase drift term caused by the frequency offset. This represents complex baseband noise. , This represents the local oscillator signal.
[0039] For the first in the baseband complex signal Ideal signal components of the path, the first The complex form of the ideal signal component can be expressed as: (10) In the formula, Indicates the first The complex form of the ideal signal component. , This represents the amplitude of an ideal signal component. Indicates the first The total phase term of the ideal signal components. According to the definition of complex signal phase, the first... The phase of the ideal signal component can be written as: (11) In the formula, Indicates the first Phase of the ideal signal component in the path, The modulation phase of a broadband signal changes at a rate determined by the symbol rate. This is the carrier phase drift term caused by the frequency offset, and the rate of change of this carrier phase drift term is much higher than the rate of change of the modulation phase. This rapidly changing phase component can completely mask... The inherent characteristics of this lead to severe phase aliasing. Furthermore, complex-valued baseband noise... It also introduces additional random jitter into the phase, further exacerbating phase distortion.
[0040] In step S102, the step of performing short-time Fourier transform and norm normalization on the baseband complex signal to obtain the complex-valued time-frequency matrix includes the following: Performing a short-time Fourier transform on the baseband complex signal yields the time-frequency matrix; the expression for the time-frequency matrix is as follows: (3) In the formula, Represents the time-frequency matrix. This represents the time frame index of the time-frequency matrix. Represents the frequency index of the time-frequency matrix. Represents a complex signal sequence. Represents the window function. This represents the step size controlling the window offset between adjacent frames, M represents the length of the short-time Fourier transform, and n represents the local index within the short-time Fourier transform window, where 0 ≤ n ≤ M-1. Indicates phase.
[0041] The time-frequency matrix is normalized using a norm to obtain a complex-valued time-frequency matrix; the formula for calculating the complex-valued time-frequency matrix is as follows: (4) In the formula, Represents a complex-valued time-frequency matrix. This represents the maximum value of the modulus of the complex-valued time-frequency matrix.
[0042] It is understandable that complex signal sequences Represented in polar coordinates: , This represents the magnitude of a complex signal sequence. The Short-Time Fourier Transform (STFT) analyzes local segments of a baseband complex signal through a sliding window, thereby obtaining the time-frequency representation of the discrete-time signal, i.e., the time-frequency matrix. It's important to understand that for high symbol rate signals (satisfying...),... ,in For symbol period, (For symbol rate), a single STFT frame covers multiple symbol periods. Window duration This determines the time resolution of time-frequency analysis. Let be the sampling frequency of the signal. As can be seen from the phase term expansion of formula (3), since the signal phase contains modulation components that vary with the symbol, the superposition of multiple symbol phases during the STFT summation process will eventually cause ambiguity in the phase characteristics of a single symbol. In addition, this application follows the configuration scheme of the open-source signal processing machine learning toolkit (PyTorch Signal Processing Machine Learning Toolkit, Torchsig) dataset, performs infinite norm normalization on the time-frequency matrix to obtain a complex-valued time-frequency matrix, so as to eliminate the amplitude deviation caused by the difference in transmit power and channel fading, thereby effectively highlighting the relative time-frequency structure characteristics of the signal, rather than its absolute energy level.
[0043] In one embodiment, the complex-valued time-frequency matrix includes ten different time-frequency feature representations, which are: complex-valued, amplitude / phase, in-phase / quadrature, in-phase magnitude / quadrature magnitude, amplitude, power, logarithmic power, color, grayscale, and multi-channel enhancement. The in-phase magnitude is represented by |in-phase|, and the quadrature magnitude is represented by |quadrature|.
[0044] To investigate the impact of ten different time-frequency feature representations on WSDR performance, this application divides the ten different time-frequency feature representations into an energy-phase joint domain and an energy domain. Table 1 summarizes the ten different time-frequency feature representations, the definition and shape of the input time-frequency features, as shown in Table 1.
[0045] Table 1 Summary of Ten Time-Frequency Features In Table 1, Indicates a complex value. Indicates the real part, Indicates the imaginary part. Indicates the height of the time-frequency matrix. This represents the width of the time-frequency matrix. This represents the complex-valued time-frequency matrix obtained through short-time Fourier transform and infinite norm. , This represents the phase of the complex-valued time-frequency matrix. This represents the magnitude (i.e., amplitude) of the complex-valued time-frequency matrix. Indicates amplitude / phase. Indicates in-phase / orthogonal. This represents the real part (i.e., in-phase) of the complex-valued time-frequency matrix. This represents the imaginary part (i.e., orthogonal) of the complex-valued time-frequency matrix. This indicates |in-phase| / |orthogonal|. This represents the magnitude of the real part of the complex-valued time-frequency matrix. This represents the magnitude of the imaginary part of the complex-valued time-frequency matrix. Indicates amplitude, Indicates power, Represents logarithmic power. This represents min-max normalization. Indicates color, Indicates multi-channel enhancement. This represents element-wise multiplication. Represents an exponential function. express The 75th percentile of the distribution.
[0046] In step S103, the ten different time-frequency feature representations are divided into the energy-phase joint domain and the energy domain. The energy-phase joint domain includes three feature representations: complex-valued, amplitude / phase, and in-phase / quadrature, aiming to preserve the inherent coupling relationship between signal amplitude and phase. These feature representations either directly adopt a complex-valued structure or decouple them into multiple real-valued components, hoping to utilize phase information to improve the distinguishability of signal modulation methods. However, these feature representations usually require dedicated complex-valued neural networks or multi-channel real-valued neural networks for effective processing.
[0047] In contrast to the energy-phase joint domain, the energy domain completely discards the phase component and constructs seven feature representations based solely on the signal energy distribution characteristics: |in-phase| / |orthogonal|, amplitude, power, logarithmic power, color, grayscale, and multi-channel enhancement. These feature representations cover various derivative forms, ranging from the original amplitude spectrum and power spectrum to logarithmic compression variants. Furthermore, to better adapt to the visual inspection framework, multiple feature encoding strategies are employed, including color, grayscale, and multi-channel enhancement.
[0048] In step S104, as Figure 3 , Figure 4 and Figure 5 As shown, the energy-phase joint domain and the energy domain are used as input time-frequency features and input into the backbone network to obtain the corresponding feature maps; the backbone network includes ResNet50 and CV-ResNet50. ResNet50 stands for Residual Network 50, meaning a 50-layer deep residual network. CV-ResNet50 stands for Complex-Valued Residual Network 50, meaning a complex-valued residual neural network.
[0049] In one embodiment, step S104 includes the following: The input time-frequency features include both real-valued tensors and complex-valued tensors. The real-valued tensors are input into ResNet50, and the complex-valued tensors are input into CV-ResNet50. The backbone network's first The complex convolution operation of a layer is defined as follows: (5) In the formula, The first part represents the backbone network. Complex convolution operation of layers, Indicates the first The real part of the complex convolution kernel. Indicates the first The imaginary part of a complex convolution kernel. express The real part, express The imaginary part, Represents a complex-valued tensor. , This indicates that the complex-valued tensor belongs to a complex space of dimensions H, W, and 1. Indicates the height of the time-frequency matrix. This represents the width of the time-frequency matrix. This represents the convolution operation; The expression for the feature map is as follows: (6) In the formula, Representing feature maps, Represents the feature extraction function. This represents the learnable parameters of the network. This represents the input time-frequency characteristics.
[0050] Understandably, a backbone network can be used to extract feature maps from input time-frequency features. Backbone networks include ResNet50 and CV-ResNet50, and the input time-frequency features come in various forms, including both real-valued and complex-valued tensors. For real-valued tensors... It can be directly fed into a standard ResNet50, where, , Indicates the number of channels. This indicates that the real-valued tensor belongs to a real space of dimensions H, W, and C. For complex-valued tensors... Input CV-ResNet50, where, , This represents the real part of a complex-valued tensor. Represents the imaginary part of a complex-valued tensor. , This indicates that the complex-valued tensor belongs to a complex space of dimensions H, W, and 1.
[0051] This application constructs a backbone network comprising ResNet50 and CV-ResNet50. Traditional real-valued networks typically treat the real and imaginary parts as independent channels, relying on learned weights to mine their correlation; while CV-ResNet50 can directly model the interaction between amplitude and phase in the complex domain. The complex convolution operation of the layer is defined as shown in Equation (5). The backbone network extracts feature maps from the input time-frequency features, and the expression of the feature maps is shown in Equation (6).
[0052] In step S105, the feature map is fed into the object detection model to obtain several sets of prediction results. The object detection model includes a Sparse R-CNN model and a DINO model. Each set of prediction results includes a time-frequency bounding box and a modulation class probability. Sparse R-CNN stands for Sparse Region-Convolutional Neural Network, and DINO stands for Distillation with No Labels.
[0053] It is understandable that the object detection models include the Sparse R-CNN model and the DINO model, both of which are end-to-end object detection models. By inputting the time-frequency features into the backbone network, feature maps are extracted. These feature maps are then fed into the Sparse R-CNN model and the DINO model, respectively. The two object detection models, Sparse R-CNN and DINO, predict the feature maps, resulting in L sets of prediction results. Each set of prediction results includes a time-frequency bounding box and a modulation class probability.
[0054] In step S106, in order to jointly optimize the localization and recognition tasks, this application defines a common basis loss that adapts to the Sparse R-CNN model and the DINO model, and obtains the total loss of the Sparse R-CNN model and the total loss of the DINO model based on the common basis loss.
[0055] Furthermore, the expression for the common basis loss is as follows: (7) In the formula, Indicates common underlying loss, The balance coefficient representing the classification loss. , This represents the balance coefficient of the bounding box regression loss. , The balance coefficient represents the positioning regularization loss. , Indicates focal loss. , Indicates the probability of modulation category. , , This represents the mean absolute error loss used in coordinate regression. , The mean absolute error norm is represented by the standard deviation of the mean absolute error. This indicates the center coordinates and width and height of the prediction box. Represents the center coordinates and width and height of the actual bounding box; This indicates the bounding box positioning regularization term. R represents and The minimum circumscribed rectangle. Indicates the area of the box. Indicates intersection, It represents the union of sets.
[0056] Understandably, the focus loss is used to mitigate the class imbalance problem. The Sparse R-CNN model progressively refines the candidate boxes through all iterations. The common basis loss is a weighted sum of the detection losses from all iterations, forcing the network to learn effective supervision information at each layer.
[0057] Furthermore, the expression for the total loss of the Sparse R-CNN model is as follows: (8) In the formula, This represents the total loss of the Sparse R-CNN model, where K represents the number of iterations, and k ranges from 0 to K. This represents the weight in the k-th iteration stage. , This represents the common fundamental loss in the k-th iteration stage.
[0058] Furthermore, the expression for the total loss of the DINO model is as follows: (9) In the formula, This represents the total loss of the DINO model. This represents the auxiliary loss applied to noisy queries. Represents the balance coefficient. .
[0059] Understandably, the DINO model introduces a denoising strategy by adding noise to the real labels to construct auxiliary queries in order to accelerate network convergence and stabilize the Hungarian matching process. The total loss of the DINO model is shown in Equation (9).
[0060] In step S107, the Sparse R-CNN model is updated based on the total loss of the Sparse R-CNN model to obtain the updated Sparse R-CNN model, and the DINO model is updated based on the total loss of the DINO model to obtain the updated DINO model.
[0061] The Sparse R-CNN and DINO models update their network parameters through backpropagation. During training, the Sparse R-CNN model was trained for 72 epochs with a learning rate of 0.000025, while the DINO model was trained for 42 epochs with a learning rate of 0.0001. Training stopped when the set number of epochs was reached, and the weights of the optimal mean average precision (mAP) were saved. Here, epoch represents the step size, which can be set according to actual conditions; this application will not elaborate on this.
[0062] The two models were trained with different numbers of epochs and learning rates, mainly because of the difference in their convergence speeds. Therefore, adjustments were made to ensure that both models could converge on the current task.
[0063] In step S108, the feature maps are input into the updated Sparse R-CNN model and the updated DINO model respectively to obtain several sets of final prediction results; each set of final prediction results includes the final time-frequency boundary bounding box and the final modulation class probability.
[0064] Specifically, after the Sparse R-CNN model and the DINO model are updated, the feature maps are input into the updated Sparse R-CNN model and the updated DINO model respectively to obtain the final prediction results, thus achieving accurate recognition of signal detection.
[0065] This application selects ten typical time-frequency feature representations from the energy-phase joint domain and the energy domain as research objects, and uses the Sparse R-CNN model and the DINO model as representative models of convolutional neural network and Transformer architecture, respectively. By reconstructing the two object detection models, including the Sparse R-CNN model and the DINO model, to enable them to handle multi-channel real-valued and complex-valued matrix inputs, the impact of different time-frequency feature representations on WSDR performance is systematically evaluated. This application shows that traditional power transformation and logarithmic processing cause information loss, leading to a significant performance degradation, while amplitude spectrum exhibits better robustness in WSDR tasks. Significant signal phase ambiguity is observed in time-frequency feature representations, and existing real-valued and complex-valued neural networks have limitations in phase information parsing. Furthermore, simply stacking features from different domains directly at the input layer does not bring the expected performance improvement, indicating the need for a more refined feature fusion mechanism.
[0066] The following experiments will further illustrate this application.
[0067] 1. Dataset and Experiment Setup Table 2 Dataset Parameters This application constructs a broadband signal dataset based on Torchsig, version 0.5.3.1. The parameters of this dataset are shown in Table 2: each broadband signal sample contains multiple targets with different frequencies, signal-to-noise ratio levels, and modulation types, and signal overlap scenarios are simulated by configuring overlap probabilities. All signal samples are normalized using STFT and the infinite norm, transforming them into complex time-frequency matrices of size [512, 512, 1].
[0068] In Table 2, ASK stands for Amplitude Shift Keying, FSK stands for Frequency Shift Keying, OFDM stands for Orthogonal Frequency Division Multiplexing, PAM stands for Pulse Amplitude Modulation, PSK stands for Phase Shift Keying, and QAM stands for Quadrature Amplitude Modulation.
[0069] This application conducted independent experiments using the Sparse R-CNN model and the DINO model for the ten time-frequency feature representations listed in Table 1. The performance of the Sparse R-CNN model and the DINO model was comprehensively evaluated using three core metrics: mAP, mean average recall (mAR), and confusion matrix. Table 3 presents a comparative analysis of the mAP for the ten time-frequency feature representations.
[0070] Table 3 Comparative Analysis of mAP for Ten Time-Frequency Feature Representations 2. Results Comparison and Analysis Table 3 summarizes the mAP performance of the Sparse R-CNN model and the DINO model under ten time-frequency feature representations. The two models exhibit highly consistent performance trends, validating the robustness of the conclusions in this application. The amplitude spectrum achieves the best performance among all feature representations. Conversely, the power spectrum, due to excessive enhancement of strong signals, severely suppresses weak signals and noise, resulting in an extremely unbalanced energy distribution. While the logarithmic power spectrum compresses the signal's dynamic range, it also reduces the intensity contrast between the target signal and the background. Therefore, the performance of these two time-frequency feature representations is lower than that of the amplitude spectrum. The quantization and nonlinear mapping operations used in grayscale and color images further degrade performance. Compared to the logarithmic power spectrum and amplitude spectrum, the performance of multi-channel enhancement (MCE) representation and amplitude / phase representation decreases, indicating that simple channel stacking not only fails to effectively fuse information but also introduces redundant noise, hindering the extraction of key features. The overall performance of in-phase / orthogonal representation is poor. In contrast, the |in-phase| / |orthogonal| representation, which retains only amplitude information, significantly improves mAP. The results show that real-valued networks have limited effectiveness in directly processing in-phase / orthogonal representations, while the |in-phase| / |orthogonal| scheme is more in line with the processing characteristics of real-valued networks. Furthermore, although complex-valued representations theoretically possess stronger expressive power, their practical performance is still lower than that of energy-based feature representations.
[0071] Figure 6 The mAR performance of the Sparse R-CNN model under different signal-to-noise ratio conditions is shown. Figure 7 This paper demonstrates the mAP performance variation of the Sparse R-CNN model under different signal-to-noise ratio (SNR) conditions. In the low SNR range, the mAP of the amplitude spectrum and the mAR of the complex-valued spectrum show the most outstanding performance. As the SNR increases, the mAP of the in-phase / orthogonal representation and the amplitude / phase representation gradually increase from a low level. Conversely, the mAP of the RGB representation shows the opposite trend, and its mAR remains at a low level under all SNR conditions.
[0072] Figures 8 to 13 The confusion matrix further reveals the classification bottleneck. The energy-phase joint domain representation has limited ability to distinguish phase-modulated signals such as PSK and QAM, with a misclassification rate significantly higher than that of the amplitude spectrum. Analysis shows that this limitation stems from the fact that phase information is highly susceptible to interference in broadband sampling scenarios: carrier frequency offset, noise interference, and the phase blurring effect caused by the superposition of multiple symbol phases within a single STFT frame severely damage the integrity of the phase features, making them unreliable for signal discrimination. In contrast, energy features are more robust to the aforementioned distortions.
[0073] It should be noted that although the steps of the method in this application are described in a specific order in the accompanying drawings, this does not require or imply that these steps must be performed in that specific order, or that all the steps shown must be performed to achieve the desired result. Additional or alternative steps may be omitted, multiple steps may be combined into one step, and / or a step may be broken down into multiple steps. Furthermore, it is readily understood that these steps may be executed synchronously or asynchronously, for example, in multiple modules / processes / threads.
[0074] Other embodiments of this application will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary techniques in the art not disclosed herein.
Claims
1. A broadband signal detection and recognition method based on time-frequency representation optimization and network reconstruction, characterized in that, The method includes: The received broadband signal is frequency-converted using the local oscillator signal and then low-pass filtered to obtain the baseband complex signal. The baseband complex signal is subjected to short-time Fourier transform and norm normalization to obtain a complex-valued time-frequency matrix; wherein, the complex-valued time-frequency matrix includes ten different time-frequency feature representations, which include: complex value, amplitude / phase, in-phase / orthogonal, in-phase magnitude / orthogonal magnitude, amplitude, power, logarithmic power, color, grayscale, and multi-channel enhancement; The ten different time-frequency feature representations are divided into an energy-phase joint domain and an energy domain; wherein, the energy-phase joint domain includes: complex value, amplitude / phase, in-phase / quadrature, and the energy domain includes: in-phase magnitude / quadrature magnitude, amplitude, power, logarithmic power, color, grayscale, and multi-channel enhancement; The energy-phase joint domain and the energy domain are used as input time-frequency features and input into the backbone network to obtain the corresponding feature map; wherein, the backbone network includes ResNet50 and CV-ResNet50; The feature map is fed into the target detection model to obtain several sets of prediction results; wherein, the target detection model includes the Sparse R-CNN model and the DINO model, and each set of prediction results includes a time-frequency boundary bounding box and a modulation class probability; Define a common base loss for adapting the Sparse R-CNN model and the DINO model, and obtain the total loss of the Sparse R-CNN model and the total loss of the DINO model based on the common base loss. The Sparse R-CNN model is updated based on the total loss of the Sparse R-CNN model to obtain the updated Sparse R-CNN model. The DINO model is updated based on the total loss of the DINO model to obtain the updated DINO model. The feature maps are input into the updated Sparse R-CNN model and the updated DINO model respectively to obtain several sets of final prediction results; wherein each set of final prediction results includes the final time-frequency boundary bounding box and the final modulation class probability.
2. The broadband signal detection and recognition method based on time-frequency representation optimization and network reconstruction according to claim 1, characterized in that, The step of using the local oscillator signal to convert the received broadband signal to a frequency and then performing low-pass filtering to obtain a baseband complex signal includes: The expression for the broadband signal is as follows: (1) In the formula, Indicates a broadband signal. Indicates the first The baseband modulation signal of the road signal, , Indicates the time-varying amplitude of a broadband signal. Indicates the modulation phase of a broadband signal. Represents the time-domain channel gain. This represents the real-valued additive white Gaussian noise introduced at the receiver front end, where N represents the total number of signals. This represents a complex exponential carrier signal, where j represents the imaginary number, and π takes the value 3.
14. Indicates the first Road signal frequency, Indicates the first The baseband modulation signal of the signal is modulated to the first... The real part of the signal frequency is then taken and transmitted. This represents the time frame index of the time-frequency matrix.
3. The broadband signal detection and recognition method based on time-frequency representation optimization and network reconstruction according to claim 2, characterized in that, The step of using the local oscillator signal to convert the received broadband signal to a frequency and then performing low-pass filtering to obtain a baseband complex signal includes: The expression for the baseband complex signal is as follows: (2) In the formula, Indicates a baseband complex signal. , Indicates the first The frequency offset between the path signal frequency and the local oscillator signal frequency. Indicates the local oscillator signal frequency. This represents the carrier phase drift term caused by the frequency offset. This represents complex baseband noise. , This represents the local oscillator signal.
4. The broadband signal detection and recognition method based on time-frequency representation optimization and network reconstruction according to claim 3, characterized in that, The step of performing short-time Fourier transform and norm normalization on the baseband complex signal to obtain a complex-valued time-frequency matrix includes: Perform a short-time Fourier transform on the baseband complex signal to obtain a time-frequency matrix; wherein the expression of the time-frequency matrix is as follows: (3) In the formula, Represents the time-frequency matrix. This represents the time frame index of the time-frequency matrix. Represents the frequency index of the time-frequency matrix. Represents a complex signal sequence. Represents the window function. This represents the step size controlling the window offset between adjacent frames, M represents the length of the short-time Fourier transform, and n represents the local index within the short-time Fourier transform window, where 0 ≤ n ≤ M-1. Indicates phase.
5. The broadband signal detection and recognition method based on time-frequency representation optimization and network reconstruction according to claim 4, characterized in that, The step of performing short-time Fourier transform and norm normalization on the baseband complex signal to obtain a complex-valued time-frequency matrix includes: The time-frequency matrix is normalized using a norm to obtain a complex-valued time-frequency matrix; wherein the calculation formula for the complex-valued time-frequency matrix is as follows: (4) In the formula, Represents a complex-valued time-frequency matrix. This represents the maximum value of the modulus of the complex-valued time-frequency matrix.
6. The broadband signal detection and recognition method based on time-frequency representation optimization and network reconstruction according to claim 5, characterized in that, The step of using the energy-phase joint domain and the energy domain as input time-frequency features and inputting them into the backbone network to obtain the corresponding feature map includes: The input time-frequency features all include real-valued tensors and complex-valued tensors. The real-valued tensors are input into ResNet50, and the complex-valued tensors are input into CV-ResNet50. The backbone network's first The complex convolution operation of a layer is defined as follows: (5) In the formula, The first part represents the backbone network. Complex convolution operation of layers, Indicates the first The real part of the complex convolution kernel. Indicates the first The imaginary part of a complex convolution kernel. express The real part, express The imaginary part, Represents a complex-valued tensor. , This indicates that the complex-valued tensor belongs to a complex space of dimensions H, W, and 1. Indicates the height of the time-frequency matrix. This represents the width of the time-frequency matrix. This represents the convolution operation; The expression for the feature map is as follows: (6) In the formula, Representing feature maps, Represents the feature extraction function. This represents the learnable parameters of the network. This represents the input time-frequency characteristics.
7. The broadband signal detection and recognition method based on time-frequency representation optimization and network reconstruction according to claim 6, characterized in that, The expression for the common basis loss is as follows: (7) In the formula, Indicates common underlying loss, The balance coefficient representing the classification loss. , This represents the balance coefficient of the bounding box regression loss. , The balance coefficient represents the positioning regularization loss. , Indicates focal loss. , Indicates the probability of modulation category. , , This represents the mean absolute error loss used in coordinate regression. , The mean absolute error norm is represented by the standard deviation of the mean absolute error. This indicates the center coordinates and width and height of the prediction box. Represents the center coordinates and width and height of the actual bounding box; This indicates the bounding box positioning regularization term. R represents and The minimum circumscribed rectangle. Indicates the area of the box. Indicates intersection, It represents the union of sets.
8. The broadband signal detection and recognition method based on time-frequency representation optimization and network reconstruction according to claim 7, characterized in that, The expression for the total loss of the Sparse R-CNN model is as follows: (8) In the formula, This represents the total loss of the Sparse R-CNN model, where K represents the number of iterations, and k ranges from 1 to K. This represents the weight in the k-th iteration stage. , This represents the common fundamental loss in the k-th iteration stage.
9. The broadband signal detection and recognition method based on time-frequency representation optimization and network reconstruction according to claim 8, characterized in that, The expression for the total loss of the DINO model is as follows: (9) In the formula, This represents the total loss of the DINO model. This represents the auxiliary loss applied to noisy queries. Indicates common underlying loss, Represents the balance coefficient. .