Spectral background noise prediction method based on multi-scale attention mechanism and domain adaptation
By employing a multi-scale attention mechanism and a domain-adaptive spectral background noise prediction method, the problems of parameter sensitivity and multi-site universality in background noise extraction in radio spectrum monitoring are solved, achieving high-precision noise prediction and stable signal processing in complex environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHEJIANG YUANCHU DATA TECH CO LTD
- Filing Date
- 2026-02-03
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies for background noise extraction in radio spectrum monitoring suffer from problems such as high parameter sensitivity, poor versatility across multiple sites, and limited feature extraction capabilities, leading to noise baseline distortion and affecting signal processing performance.
A spectral background noise prediction method based on multi-scale attention mechanism and domain adaptation is adopted. By improving the convolutional neural network and combining multi-scale feature extraction, channel attention mechanism, domain adaptation technology and composite loss function, data preprocessing, model training and postprocessing are performed to achieve accurate prediction of background noise.
It achieves high-precision and robust prediction of background noise in complex spectrum environments, adapts to different devices and electromagnetic environments, reduces deployment and maintenance costs, and generates a noise baseline that conforms to physical laws, avoiding the step effect of traditional algorithms.
Smart Images

Figure SMS_1
Abstract
Description
Technical Field
[0001] This invention relates to the field of signal processing and artificial intelligence interdisciplinary technologies, and in particular to a spectral back noise prediction method based on multi-scale attention mechanism and domain adaptation. Background Technology
[0002] In the field of radio spectrum monitoring, background noise level (i.e., noise floor) is a core parameter reflecting the background state of the spectrum environment. Its extraction accuracy directly determines the accuracy of signal detection, the reliability of signal-to-noise ratio calculation, and the comprehensiveness of spectrum situational awareness. For example, in signal monitoring, accurate background noise data can help quickly identify weak useful signals; in electromagnetic environment assessment, a stable noise floor is an important basis for determining whether there is abnormal interference.
[0003] Currently, the mainstream background noise extraction methods in the industry mainly rely on traditional algorithms such as sliding window statistical methods, minimum value filtering methods, and envelope tracking. The core logic of these methods is to traverse the spectral data through a sliding window, filter out noise components using manually set thresholds, and then fit a noise baseline. However, in practical applications, these traditional algorithms reveal significant limitations: I. High parameter sensitivity. The performance of traditional algorithms is highly dependent on parameters such as window size and screening threshold that are manually adjusted. However, the radio spectrum environment is dynamic and fluctuating—the intensity of spectrum signals and the type of interference vary greatly in different time periods and regions. Fixed parameters are difficult to adapt to all scenarios, often resulting in problems of over- or under-extraction of noise.
[0004] Second, poor compatibility across multiple sites. Significant differences exist in equipment configuration (e.g., antenna gain, receiver model, sampling accuracy) and installation environment (e.g., densely populated urban areas, suburbs, remote fields) among different monitoring sites, leading to noticeable deviations in amplitude range and noise distribution characteristics in the spectrum data collected from each site. A single algorithm struggles to accommodate these differences, requiring individual parameter optimization or even algorithm redesign for each site, significantly increasing deployment and maintenance costs.
[0005] Third, the feature extraction capability is limited. Traditional algorithms can only judge noise based on the statistical characteristics of local data, and cannot effectively capture the multi-scale features of spectral signals, making it difficult to distinguish between broadband interference and real background noise. In complex electromagnetic environments, overfitting (misjudging interference as noise) or underfitting (missing some noise components) is prone to occur, resulting in distortion of the noise baseline and affecting the subsequent signal processing effect. Summary of the Invention
[0006] The present invention aims to overcome the above-mentioned shortcomings in the prior art and provides a spectrum background noise prediction method based on multi-scale attention mechanism and domain adaptation that can achieve accurate prediction of background noise in complex spectrum environments.
[0007] To achieve the above objectives, the present invention adopts the following technical solution: A spectrum background noise prediction method based on multi-scale attention mechanism and domain adaptation is proposed. This method utilizes an improved convolutional neural network, integrating multi-scale attention mechanism and domain adaptation techniques. Through four core steps—data preprocessing, model building, model training, prediction, and post-processing—it achieves accurate prediction of background noise in complex spectrum environments. The specific technical solution is as follows: (1) Data preprocessing: Collect spectrum data files from multiple monitoring stations. Each file should contain at least three columns: frequency axis (FREQUENCY), level sequence (AVGEP), and corresponding background noise label (NOISE). Organize the data into training set and validation / test set according to the monitoring station dimension. (2) Model construction: The ImprovedCNNNoisePredictor model is constructed. Through the synergistic effect of multi-scale feature extraction layer, improved residual attention block, domain adaptive layer and calibration output, the accurate capture of complex spectral noise is achieved. (3) Model training: A training strategy combining composite loss function, robust normalization and residual learning framework is adopted to ensure the stability of model training and the reliability of prediction. (4) Prediction and post-processing: The preprocessed spectrum signal is input into the trained ImprovedCNNNoisePredictor model to obtain the initial background noise prediction value; the Savitzky-Golay filter is integrated for secondary smoothing, and the final output is a smooth and natural background noise prediction result that conforms to physical laws.
[0008] To address the aforementioned technical pain points in the background technology, this invention proposes a deep learning method that integrates multi-scale feature extraction, channel attention mechanism, and domain adaptation technology. Through model structure innovation and training strategy optimization, it achieves high-precision and robust prediction of background noise, overcoming the application limitations of traditional algorithms.
[0009] Preferably, in step (1), data preprocessing specifically includes three modules: adaptive data standardization, adaptive interpolation, and multidimensional data augmentation, as follows: (11) Adaptive data standardization: A robust standardization method based on the median and interquartile range is adopted, and its mathematical model is as follows: X_norm = clip((X - μ_med) / (IQR + ε), -5, 5) Where X is the original input N-dimensional spectral signal vector; μ_med is the median of X; IQR is the difference between the 75th percentile and the 25th percentile; ε is the minimum value of 1e-8; clip(·, -5, 5) is the truncation function; and X_norm is the output spectral signal vector after adaptive robust normalization, whose numerical range is constrained to the interval [-5, 5]. (12) Adaptive interpolation: Construct a mapping function f: [0, 1] -> R, and achieve data dimension unification through linear interpolation sampling. The mathematical model is as follows: X_model[i] = f(i / (L_model - 1)) Where L_model is the preset input dimension of the model, the value of i ranges from 0 to L_model - 1, f is a linear interpolation function constructed based on the original spectrum data, and its domain is the normalized interval [0, 1]; X_model[i] is the spectrum signal value at the i-th position in the output sequence after adaptive linear interpolation processing. The dimension of this sequence is unified as L_model, which can be directly input into the subsequent model for processing; (13) Multidimensional data augmentation: Four signal transformation methods are introduced, namely random Gaussian noise superposition, Gaussian blur smoothing, random amplitude scaling, and random cyclic shift, and the diversity of training data is expanded by random combination of these methods.
[0010] Preferably, in step (13), the specific steps are as follows: (131) Random Gaussian noise superposition: Random noise with a mean of 0 and a dynamically changing variance is superimposed on the original spectrum to simulate the thermal noise fluctuations during receiver operation; the mathematical model is X_aug = X + N(0, σ_noise), where σ_noise = noise_level·strength; X_aug is the enhanced spectrum signal vector after superimposing random Gaussian noise, which is the output of this data enhancement operation; N(0, σ_noise) represents the Gaussian noise distribution with a mean of 0 and a variance of σ_noise, used to simulate the thermal noise fluctuations of the receiver; σ_noise is the variance of the Gaussian noise, which determines the noise intensity, and is calculated by noise_level and strength; noise_level is the preset base noise intensity coefficient, used to control the reference amplitude of the noise; strength is the dynamic adjustment coefficient, which is randomly selected in the range of [0.05, 0.2] to realize the dynamic change of noise intensity; (132) Gaussian blur smoothing: A one-dimensional Gaussian kernel is used to convolve and smooth the spectral data to simulate the effect of the signal after passing through filters with different bandwidths; the standard deviation of the Gaussian kernel σ is randomly and uniformly sampled in the range of [0.5, 1.5]. (133) Random amplitude scaling: Multiplying the amplitude of the full-band signal by a random scaling factor s to simulate a small drift in the receiver gain; (134) Random cyclic shift: The spectrum data is shifted slightly along the frequency axis. The maximum shift is set to 2% of the spectrum length, and the random shift amount shift∈[-max_shift, max_shift]; (135) During model training, a probabilistic implementation strategy is adopted: random Gaussian noise superposition is applied with a 40% probability, Gaussian blur smoothing is applied with a 30% probability, random amplitude scaling is applied with a 20% probability, and random cyclic shift is applied with a 10% probability, to ensure that each enhancement method can be fully trained.
[0011] Preferably, in step (2), specifically for the multi-scale feature extraction layer, the following is specified: To simultaneously capture the microscopic details, mesoscale structure, and macroscopic trends of the spectral signal, this layer employs three parallel convolutional branches, each configured with different kernel sizes and dilatation rates. The output features of each branch are processed by batch normalization (BN) and GELU activation functions, then concatenated along the channel dimension, and finally fused and dimensionality reduced using a 1×1 convolutional layer. The mathematical model is as follows: Ok = GELU(BN(Conv1d_k_d(F_in))) O_fused = Conv1d_1x1(Concat(O1, O2, O3)) Wherein, F_in is the input feature map; Ok is the output feature of the k-th convolutional branch; Conv1d_k_d represents a one-dimensional convolutional operation with kernel size k and dilation rate d; Concat(O1, O2, O3) is a channel-dimensional concatenation operation that merges the output features O1, O2, and O3 of the three convolutional branches along the channel dimension to obtain the concatenated result of multi-scale features; O_fused is the output feature map after 1×1 convolution fusion, which integrates spectral signal features of different scales and can simultaneously reflect microscopic details, mesoscale structures, and macroscopic trends.
[0012] Preferably, in step (2), for the improved residual attention block, this module integrates the Squeeze-and-Excitation channel attention mechanism with the residual connection, specifically as follows: (a) Channel Attention Mechanism: For the input feature map U (C×L dimension, where C is the number of channels and L is the feature length), the channel descriptors are first extracted through compression—the global average pooling z_avg and global max pooling z_max are calculated, where z_avg_c = (1 / L)·sum(U_c,i) and z_max_c = max(U_c,i); then, the shared multilayer perceptron (MLP) is used for activation to learn the importance weights of each channel W = σ(MLP(z_avg)) + σ(MLP(z_max)), where σ is the sigmoid activation function; finally, through recalibration, the weights are multiplied by the original feature map to obtain U_tilde_c = W_c·U_c; where U_c is the feature vector of the c-th channel of the input feature map U, with dimension L (feature length); i is the index of the feature vector U_c, with a value range of [0, ..., ... L-1] is used to traverse all feature points of the channel; sum(U_c,i) is the summation of the values at all positions (index i) of the feature vector U_c of the c-th channel, which is the basic calculation of global average pooling; max(U_c,i) is the maximum value of the values at all positions (index i) of the feature vector U_c of the c-th channel, which is the calculation method of global max pooling; z_max_c is the global average pooling value of the c-th channel, which is obtained by summing the values at all positions of U_c and taking the average, reflecting the overall activation intensity of the channel; z_avg is the output vector of global average pooling, with dimension C (number of channels), which is formed by concatenating z_avg_c of all channels in channel order; z_max is the output vector of global max pooling, with dimension C (number of channels), which is formed by concatenating z_max_c of all channels in channel order; W_c is the weight coefficient of the c-th channel, which is calculated by the shared multilayer perceptron (MLP) and sigmoid activation function, reflecting the importance of the channel; U_tilde_c is the weight coefficient of the c-th channel. The output feature vector of each channel after weight recalibration is obtained by multiplying the original channel feature U_c with the weight W_c point by point, thereby enhancing the key channel features. (b) Residual connection: Introducing a Shortcut path to directly superimpose the input features onto the features processed by the attention mechanism.
[0013] Preferably, in step (2), for the domain adaptation layer, this layer is designed with a dual-branch structure of general feature transformation and site-specific transformation, specifically as follows: (i) General feature transformation branch: A multilayer perceptron with shared weights is used to extract the common features of the spectrum signals of all stations; (ii) Site-specific transformation branch: Configure a dedicated multilayer sensor for each monitoring site to extract the site's personalized features; (iii) Weighted fusion: The final output is obtained by weighted summation of general features and site-specific features. The mathematical model is h_out = h_common + λ·h_site, where h_common is the general feature, h_site is the site-specific feature, λ is the fusion weight coefficient, and h_out is the final output feature after weighted fusion.
[0014] Preferably, in step (2), for the calibration output, this module includes two steps: global linear calibration and site-specific calibration, specifically: (A) Global linear calibration: The initial predicted value y_hat_raw of the model backbone output is linearly adjusted by a learnable global scaling factor α and a global bias term β to obtain y_hat_global = α·y_hat_raw + β, which adapts to the overall distribution of the full data; y_hat_global is the output predicted value after global linear calibration. It adapts the initial predicted value to the overall distribution of the full spectrum noise data through scaling and bias adjustment. (B) Site-Specific Calibration: A two-dimensional learnable embedding vector E_s is assigned to each site. A site-specific scaling factor is calculated using scale_s = 0.4·σ(E_s,0) + 0.8, and a site-specific bias correction is calculated using bias_s = 3.0·tanh(E_s,1). The final calibrated prediction value y_hat_final = y_hat_global·scale_s + bias_s is obtained, which accurately compensates for the system bias of a single site. Here, scale_s is the site-specific scaling factor, calculated from the 0th dimension of the site embedding vector E_s, used to scale the globally calibrated prediction value at the site level to adapt to the amplitude characteristics of a single site; σ() is the Sigmoid activation function, which maps the input to the (0, 1) interval to constrain the range of the site-specific scaling factor; bias_s is the site-specific bias correction, calculated from the site embedding vector E_s. The first dimension is calculated and used to perform site-level bias compensation on the predicted value after global calibration, correcting the system bias of a single site; tanh is the hyperbolic tangent activation function, which maps the input to the (-1, 1) interval and is used to constrain the fluctuation range of the site-specific bias correction; y_hat_final is the final output predicted value after global linear calibration and site-specific calibration.
[0015] Preferably, in step (3), specifically for the composite loss function, it is as follows: The total loss is composed of multiple weighted components, taking into account prediction accuracy, curve smoothness, and frequency domain consistency. The mathematical model is as follows: L_total = w1·L_MSE + w2·L_L1 + w3·L_Huber + w4·L_TV + w5·L_Cons+ w6·L_Centroid; Where: L_MSE is the mean squared error loss, which calculates the squared mean of the differences between the predicted and actual values, used to measure the overall accuracy of the prediction results, and penalizes larger errors more severely; L_L1 is the absolute error loss, which calculates the absolute mean of the differences between the predicted and actual values, and is more robust to outliers than MSE, with a more uniform error penalty; L_Huber is the Huber robust loss, which combines the advantages of MSE and L1 loss, exhibiting MSE when the error is small and switching to L1 when the error is large, thus combining accuracy and robustness; L_TV is the total variation regularization loss, which constrains the smoothness of the prediction curve by calculating the difference between predicted values at adjacent frequency points, reducing the sharp fluctuations of the curve. The mathematical model is L_TV = (1 / N)·sum(|y_(i+1) - y_i|), N is the total number of frequency points; L_Cons is the consistency loss, used to constrain the consistency of prediction results in the frequency domain or time domain, ensuring that the noise prediction results of different frequency bands or time points conform to physical laws; L_Centroid is the frequency domain spectral center loss, which constrains the consistency of the energy center of the predicted spectrum and the true spectrum in the frequency domain by calculating the difference of the spectral center, thereby improving the accuracy of frequency domain prediction. The difference of the spectral center between the predicted curve and the true label is calculated by FFT transformation, L_Centroid = (C_pred -C_true)^2, where C is the spectral center; w1-w6 are the weight coefficients of each loss component; where sum() is the summation function, used in the total variation regularization loss to sum the absolute values of the differences between adjacent predicted values of all frequency points, reflecting the overall fluctuation of the prediction curve; y_(i+1) is the noise prediction value of the (i+1)th frequency point; y_i is the noise prediction value of the ith frequency point; C_pred is the spectral center of the predicted spectrum curve, calculated by performing FFT on the predicted value. The transformed value reflects the energy center location of the predicted spectrum. C_true is the spectral center of the true spectral label, calculated by performing an FFT transform on the true label, and serves as a benchmark reference for frequency domain consistency.
[0016] Preferably, in step (3), for robust normalization, the specific steps are as follows: Before calculating the loss, batch-level dynamic normalization is performed on the predicted value Y_hat and the true value Y. The median m = median(Y) and scaling factor s = max(IQR(Y), std(Y)) of the current batch target value are calculated, and the normalization process is: Y_hat_norm = (Y_hat - m) / s, Y_norm = (Y - m) / s. The loss function is calculated based on the normalized Y_hat_norm and Y_norm. Among them, median(Y) is the median of the true value Y in the current batch, which serves as the benchmark center for robust normalization. It has a stronger ability to resist the interference of outliers than the mean, ensuring the stability of normalization. IQR(Y) is the interquartile range (the difference between the 75th percentile and the 25th percentile) of the true value Y in the current batch, which reflects the dispersion of the data and is a robust dispersion index. std(Y) is the true value Y in the current batch. The standard deviation of Y reflects the overall volatility of the data and is a traditional dispersion indicator. `max(IQR(Y), std(Y))` uses the larger of the interquartile range and the standard deviation as the scaling factor `s`, combining robustness with adaptability to overall volatility and avoiding scaling distortion caused by outliers. `Y_hat_norm` is the output of the predicted value `Y_hat` after robust normalization, eliminating scale differences within the batch and improving the stability of loss calculation. `Y_norm` is the output of the true value `Y` after robust normalization, and is on the same scale as `Y_hat_norm`, ensuring consistency and comparability in the calculation of the loss function.
[0017] As a preferred option, in step (3), specifically for the residual learning framework, the following steps are taken: the traditional envelope tracking algorithm is used to calculate the heuristic noise baseline, which is then convolved and fused additively with multi-scale features to inject physical priors; the "baseline + residual" output method is adopted to improve the fitting ability of broadband lifting and gradual regions.
[0018] The beneficial effects of this invention are: (1) Strong adaptability across the entire frequency band: The multi-scale feature extraction layer can capture the microscopic details, mesoscale structure and macroscopic trend of the spectrum simultaneously through the combination of different convolution kernels and hole rates. It can achieve accurate fitting whether it is a flat and simple spectrum or a complex spectrum with drastic fluctuations. (2) Good versatility across multiple sites: The domain adaptive layer uses a weighted fusion of general features and site-specific features, combined with site-specific calibration of the calibration output head, so that the model does not need to be trained separately for each site. One set of weights can be adapted to monitoring sites with different equipment characteristics and different electromagnetic environments, which greatly reduces deployment and maintenance costs. (3) Outstanding robustness: The adaptive data standardization method effectively suppresses the influence of outliers and differences in station amplitude. Multidimensional data augmentation improves the model's anti-interference ability against interference and equipment drift. The robust composite loss function makes the model insensitive to sudden strong signal interference and can accurately lock the bottom noise. (4) The prediction curve is smooth and natural: the channel attention mechanism strengthens the extraction of key features, the total variation regularization constrains the smoothness of the curve, the multi-filter post-processing removes the spike interference, and the generated noise baseline conforms to the physical propagation law of electromagnetic signals, avoiding the step effect commonly found in traditional algorithms. Detailed Implementation
[0019] The present invention will be further described below with reference to specific embodiments.
[0020] This method for predicting spectral background noise based on multi-scale attention mechanisms and domain adaptation is based on an improved convolutional neural network (CNN). It integrates multi-scale attention mechanisms and domain adaptation techniques, and achieves accurate prediction of background noise in complex spectral environments through four core steps: data preprocessing, model building, model training, prediction, and post-processing. The specific technical solution is as follows: (1) Data preprocessing: Collect spectrum data files from multiple monitoring stations. Each file contains at least three columns: frequency axis (FREQUENCY), level sequence (AVGEP), and corresponding background noise label (NOISE). Organize them into training set and validation / test set according to the monitoring station dimension.
[0021] Data preprocessing is fundamental to ensuring effective model training. Its core objectives are to adapt to signal differences across different sites, expand the diversity of training data, and eliminate abnormal interference. Specifically, it includes three main modules: adaptive data standardization, adaptive interpolation, and multidimensional data augmentation. (11) Adaptive Data Standardization: Addressing the issues of large amplitude differences and numerous outlier interferences in the spectral signals from different monitoring stations, this invention employs a robust standardization method based on the median and interquartile range (IQR), avoiding the shortcomings of traditional standardization methods that are affected by extreme values. Its mathematical model is as follows: X_norm = clip((X - μ_med) / (IQR + ε), -5, 5) Where X is the original input N-dimensional spectral signal vector; μ_med is the median of X, which reflects the central tendency of the data better than the mean; IQR is the difference between the 75th percentile (Q75) and the 25th percentile (Q25), used to measure the dispersion of the data and has stronger anti-interference ability; ε is the minimum value of 1e-8, used to avoid the case where the denominator is zero; clip(·, -5, 5) is the truncation function, and X_norm is the output spectral signal vector after adaptive robust normalization, whose numerical range is constrained to the interval [-5, 5], effectively removing the interference of extreme outliers on model training.
[0022] (12) Adaptive Interpolation: Due to differences in sampling parameters between different monitoring devices, the length of the collected spectrum data (L_in) may be inconsistent with the preset input dimension (L_model) of the model. To solve this problem, this invention constructs a mapping function f: [0, 1] ->R, and achieves data dimension unification through linear interpolation sampling. The mathematical model is as follows: X_model[i] = f(i / (L_model - 1)) Where i ranges from 0 to L_model - 1, f is a linear interpolation function constructed based on the original spectrum data, and its domain is the normalized interval [0, 1], ensuring that the interpolated signal can completely retain the trend characteristics of the original spectrum and avoid information loss; X_model[i] is the spectrum signal value at the i-th position in the output sequence after adaptive linear interpolation processing. The dimension of this sequence is unified as L_model, which can be directly input into the subsequent model for processing.
[0023] (13) Multi-Dimensional Data Augmentation: To simulate real and complex electromagnetic environments and improve the model's generalization and anti-interference capabilities, this invention introduces four signal transformation techniques: random Gaussian noise superposition, Gaussian blur smoothing, random amplitude scaling, and random cyclic shifting. These techniques are applied in random combinations to expand the diversity of training data. Specifically: (131) Random Gaussian noise superposition: Random noise with a mean of 0 and a dynamically changing variance is superimposed on the original spectrum to simulate the thermal noise fluctuations during receiver operation. The mathematical model is X_aug = X + N(0, σ_noise), where σ_noise = noise_level·strength. X_aug is the enhanced spectrum signal vector after superimposing random Gaussian noise, which is the output of this data enhancement operation; N(0, σ_noise) represents the Gaussian noise distribution with a mean of 0 and a variance of σ_noise, used to simulate the thermal noise fluctuations of the receiver; σ_noise is the variance of the Gaussian noise, which determines the noise intensity, and is calculated by noise_level and strength; noise_level is the preset base noise intensity coefficient, used to control the reference amplitude of the noise; strength is the dynamic adjustment coefficient, which is randomly selected in the range of [0.05, 0.2] to realize the dynamic change of noise intensity, enhance the diversity of data, and enable the model to adapt to the spectrum environment with different signal-to-noise ratios.
[0024] (132) Gaussian blur smoothing: A one-dimensional Gaussian kernel is used to convolve and smooth the spectral data to simulate the effect of the signal after passing through filters with different bandwidths. The standard deviation of the Gaussian kernel σ is randomly and uniformly sampled in the range of [0.5, 1.5] to help the model ignore the small jitter of the spectrum and avoid overfitting high-frequency noise.
[0025] (133) Random amplitude scaling: Multiply the full-band signal amplitude by a random scaling factor s (s∈[0.95,1.05]) to simulate the small drift of receiver gain, ensuring that the model is adaptable to changes in the absolute level of the signal and does not need to be retrained due to device gain fine-tuning.
[0026] (134) Random cyclic shift: The spectrum data is shifted slightly along the frequency axis. The maximum shift amount max_shift is set to 2% of the spectrum length (e.g., the spectrum of 1024 frequency points is shifted by a maximum of 5 frequency points). The random shift amount shift∈[-max_shift, max_shift] effectively eliminates the influence of frequency point position deviation on the prediction results and enhances the model's tolerance to frequency shift.
[0027] (135) During the model training process, a probabilistic implementation strategy is adopted: random Gaussian noise superposition is applied with a 40% probability, Gaussian blur smoothing is applied with a 30% probability, random amplitude scaling is applied with a 20% probability, and random cyclic shift is applied with a 10% probability, to ensure that each enhancement method can be fully trained, while avoiding signal distortion caused by the superposition of multiple transformations.
[0028] (2) Model Construction: This invention constructs an ImprovedCNNNoisePredictor model, which achieves accurate capture of complex spectral noise through the synergistic effect of multi-scale feature extraction layers, improved residual attention blocks, domain adaptive layers, and calibration output. The specific architecture is as follows: (21) Multi-scale feature extraction layer To simultaneously capture the microscopic details (such as local jitter), mesoscale structure (such as local fluctuations), and macroscopic trends (such as the overall frequency band variation patterns) of the spectral signal, this layer employs three parallel convolutional branches, each configured with different kernel sizes (K∈{3, 7, 15}) and dilation rates (D∈{1, 2, 4}): small kernels with small dilation rates are used to extract microscopic details, while large kernels with large dilation rates are used to capture macroscopic trends. The output features of each branch are processed by batch normalization (BN) and the GELU activation function, then concatenated along the channel dimension, and finally fused and dimensionality reduced using a 1×1 convolutional layer. The mathematical model is as follows: Ok = GELU(BN(Conv1d_k_d(F_in))) O_fused = Conv1d_1x1(Concat(O1, O2, O3)) Wherein, F_in is the input feature map; Ok is the output feature of the k-th convolutional branch; Conv1d_k_d represents a one-dimensional convolutional operation with kernel size k and dilation rate d; Concat(O1, O2, O3) is a channel-dimensional concatenation operation, merging the output features O1, O2, and O3 of the three convolutional branches along the channel dimension to obtain the concatenated result of multi-scale features; O_fused is the output feature map after 1×1 convolution fusion, integrating spectral signal features of different scales, which can simultaneously reflect microscopic details, mesoscale structures, and macroscopic trends; the GELU activation function, compared to ReLU, can better simulate the probability distribution of data and improve the nonlinear fitting ability of the model; the 1×1 convolutional layer effectively reduces the complexity of model parameters and improves training efficiency by integrating multi-branch features.
[0029] (22) Improved residual attention block To enhance key channel features, suppress redundant information, and alleviate the gradient vanishing problem in deep networks, this module integrates a Squeeze-and-Excitation channel attention mechanism with residual connections, specifically: (a) Channel attention mechanism: For the input feature map U (C×L dimension, where C is the number of channels and L is the feature length), the channel descriptors are first extracted through a "compression" operation—the global average pooling z_avg and the global max pooling z_max are calculated, where z_avg_c = (1 / L)·sum(U_c,i) and z_max_c = max(U_c,i). The combination of the two pooling methods can more comprehensively reflect the channel features. Then, the importance weights of each channel are learned through a shared multilayer perceptron (MLP) operation, which is σ(MLP(z_avg)) + σ(MLP(z_max)) (σ is the sigmoid activation function). Finally, the weights are multiplied by the original feature map through a "recalibration" operation to obtain U_tilde_c = W_c·U_c. Wherein, U_c is the feature vector of the c-th channel of the input feature map U, with dimension L (feature length); i is the index of the feature vector U_c, with a value range of [0, L-1], used to traverse all feature points of this channel; sum(U_c,i) sums the values of all positions (index i) of the feature vector U_c of the c-th channel, which is the basic calculation of global average pooling; max(U_c,i) takes the maximum value of all positions (index i) of the feature vector U_c of the c-th channel, which is the calculation method of global max pooling; z_max_c is the global average pooling value of the c-th channel, which is obtained by summing the values of all positions of U_c and taking the average, reflecting the overall activation intensity of this channel; z_avg is the output vector of global average pooling, with dimension C (number of channels), which is formed by concatenating the z_avg_c of all channels in channel order; z_max is the output vector of global max pooling, with dimension C (number of channels), which is formed by concatenating the z_max_c of all channels in channel order. The channels are concatenated in order; W_c is the weight coefficient of the c-th channel, which is calculated by a shared multilayer perceptron (MLP) and a sigmoid activation function, reflecting the importance of the channel; U_tilde_c is the output feature vector of the c-th channel after weight recalibration, which is obtained by multiplying the original channel features U_c with the weights W_c point by point, thereby enhancing the features of key channels.
[0030] (b) Residual connection: The Shortcut path is introduced to directly superimpose the input features onto the features processed by the attention mechanism, which effectively alleviates the gradient vanishing problem during deep network training and ensures that the model can deeply mine spectral features.
[0031] (23) Domain Adaptive Layer To address the challenges of model transfer caused by differences in equipment characteristics and electromagnetic environments at different monitoring sites, this layer employs a dual-branch structure consisting of "general feature transformation" and "site-specific transformation." Specifically: (i) General feature transformation branch: A multilayer perceptron with shared weights is used to extract common features of the spectrum signals of all stations to ensure the basic predictive ability of the model; (ii) Site-specific transformation branch: Configure a dedicated multilayer sensor for each monitoring site to extract the site's personalized characteristics (such as inherent equipment noise and local electromagnetic environment characteristics). (iii) Weighted Fusion: The final output is obtained by weighted summation of general features and site-specific features. The mathematical model is h_out = h_common + λ·h_site (λ=0.3 in this embodiment), where h_common is the general feature, h_site is the site-specific feature, λ is the fusion weight coefficient used to balance the influence of common and individual features, and h_out is the final output feature after weighted fusion. This design allows the model to adapt to multiple sites with only one set of core weights, eliminating the need for separate training for each site and significantly reducing deployment costs.
[0032] (24) Calibration output To further improve prediction accuracy and adapt to the differences in noise distribution at different sites, this module includes two stages: global linear calibration and site-specific calibration. Specifically: (A) Global linear calibration: The initial predicted value y_hat_raw of the model backbone output is linearly adjusted by a learnable global scaling factor α and a global bias term β to obtain y_hat_global = α·y_hat_raw + β, which adapts to the overall distribution of the full data. y_hat_global is the output predicted value after global linear calibration. Through scaling and bias adjustment, it makes the initial predicted value adapt to the overall distribution of the full spectrum noise data, thereby improving the global consistency and accuracy of the prediction results.
[0033] (B) Site-specific calibration: A two-dimensional learnable embedding vector E_s is assigned to each site. The site-specific scaling factor (range [0.8, 1.2]) is calculated by scale_s = 0.4·σ(E_s,0) + 0.8, and the site-specific bias correction (range [-3, 3]) is calculated by bias_s = 3.0·tanh(E_s,1). Finally, the calibrated prediction value y_hat_final = y_hat_global·scale_s + bias_s is obtained, which accurately compensates for the system bias of a single site. Here, `scale_s` is a site-specific scaling factor, calculated from the 0th dimension of the site embedding vector `E_s`, used to scale the predicted value at the site level after global calibration to adapt to the amplitude characteristics of a single site; `σ()` is the Sigmoid activation function, which maps the input to the (0, 1) interval, used to constrain the range of the site-specific scaling factor and ensure the stability of the scaling adjustment; `bias_s` is the site-specific bias correction, calculated from the 1st dimension of the site embedding vector `E_s`, used to compensate for the bias at the site level after global calibration, correcting the system bias of a single site; `tanh` is the hyperbolic tangent activation function, which maps the input to the (-1, 1) interval, used to constrain the fluctuation range of the site-specific bias correction and avoid overcompensation; `y_hat_final` is the final output predicted value after global linear calibration and site-specific calibration. Based on adapting to the overall distribution of the full dataset, it further compensates for the system bias of a single site, enabling accurate prediction of spectral noise for different sites.
[0034] (3) Model training: To ensure the stability of model training and the reliability of prediction, this invention adopts a training strategy that combines composite loss function, robust normalization and residual learning framework.
[0035] (31) Composite loss function The total loss is composed of multiple weighted components, taking into account prediction accuracy, curve smoothness, and frequency domain consistency. The mathematical model is as follows: L_total = w1·L_MSE + w2·L_L1 + w3·L_Huber + w4·L_TV + w5·L_Cons+ w6·L_Centroid; in: L_MSE is the mean squared error loss, which calculates the squared mean of the differences between the predicted and actual values. It is used to measure the overall accuracy of the prediction results and penalizes larger errors more severely. It optimizes the overall error between the predicted and actual values and improves the prediction accuracy.
[0036] L_L1 is the absolute error loss, which calculates the mean absolute value of the difference between the predicted value and the true value. It is more robust to outliers than MSE, and the error penalty is more uniform. It reduces the impact of outliers on training and enhances the robustness of the model.
[0037] L_Huber is the Huber robust loss, which combines the advantages of MSE and L1 loss. When the error is small, it performs as MSE (optimizing accuracy), and when the error is large, it performs as L1 (suppressing outliers). It balances the advantages of both and has both accuracy and robustness.
[0038] L_TV is the total variational regularization loss. It constrains the smoothness of the prediction curve by calculating the difference between the predicted values of adjacent frequency points, thereby reducing the sharp fluctuations of the curve. The mathematical model is L_TV = (1 / N)·sum(|y_(i+1) - y_i|), where N is the total number of frequency points, to prevent the output curve from shaking violently.
[0039] L_Cons is the consistency loss, used to constrain the consistency of prediction results in the frequency domain or time domain, ensuring that the noise prediction results at different frequency bands or time points conform to physical laws; ensuring the prediction consistency of data from the same site at different time periods, and improving model stability.
[0040] L_Centroid is the frequency domain spectral center loss. By calculating the difference in spectral center between the predicted spectrum and the true spectrum, it constrains the consistency of the energy center of the two in the frequency domain, thereby improving the accuracy of frequency domain prediction. The difference in spectral center between the predicted curve and the true label is calculated by FFT transformation. L_Centroid = (C_pred - C_true)^2, where C is the spectral center, which reflects the centroid of the frequency distribution of signal energy, ensuring that the prediction result is consistent with the real noise in the frequency domain characteristics.
[0041] w1-w6 are the weighting coefficients of each loss component. In this embodiment, the values are w1=0.4, w2=0.3, w3=0.1, w4=0.05, w5=0.1, and w6=0.05, which can be fine-tuned according to the actual scenario.
[0042] Here, `sum()` is the summation function used in the total variational regularization loss to sum the absolute values of the differences between adjacent predicted values at all frequency points, reflecting the overall fluctuation of the prediction curve; `y_(i+1)` is the noise prediction value at the (i+1)th frequency point; `y_i` is the noise prediction value at the ith frequency point; `C_pred` is the spectral center of the predicted spectrum curve, calculated by performing an FFT transformation on the predicted values, reflecting the energy center position of the predicted spectrum; and `C_true` is the spectral center of the true spectral label, calculated by performing an FFT transformation on the true label, serving as a benchmark reference for frequency domain consistency.
[0043] (32) Robust normalization Before calculating the loss, the predicted value Y_hat and the true value Y are dynamically normalized at the batch level to eliminate the influence of absolute level fluctuations in different batches on the gradient. Specifically: Calculate the median m = median(Y) of the current batch target value and the scaling factor s = max(IQR(Y), std(Y)) (take the larger value between the interquartile range and the standard deviation to enhance robustness against interference); Normalization: Y_hat_norm = (Y_hat - m) / s, Y_norm = (Y - m) / s; The loss function is calculated based on the normalized Y_hat_norm and Y_norm, ensuring that model training focuses on relative error and improves generalization ability.
[0044] Wherein, median(Y) is the median of the true values Y in the current batch, serving as the benchmark center for robust normalization. It is more resistant to outliers than the mean, ensuring the stability of normalization. IQR(Y) is the interquartile range (the difference between the 75th and 25th percentiles) of the true values Y in the current batch, reflecting the dispersion of the data and serving as a robust dispersion metric. std(Y) is the standard deviation of the true values Y in the current batch, reflecting the overall volatility of the data and serving as a traditional dispersion metric. max(IQR(Y), std(Y)) uses the larger of the interquartile range and the standard deviation as the scaling factor s, combining robustness with adaptability to overall volatility, avoiding scaling distortion caused by outliers. Y_hat_norm is the output of the predicted value Y_hat after robust normalization, eliminating scale differences within the batch and improving the stability of loss calculation. Y_norm is the true value Y. The output after robust normalization is on the same scale as Y_hat_norm, ensuring that the calculation of the loss function is consistent and comparable.
[0045] (33) Residual Learning Framework During the training and inference phases, the traditional envelope tracking algorithm is used to calculate a heuristic noise baseline, which is then convolved and additively fused with multi-scale features to inject physical priors. The "baseline + residual" output method is adopted to improve the fitting ability of broadband uplift and gradual regions.
[0046] (4) Prediction and Post-processing: The preprocessed spectral signal is input into the trained ImprovedCNNNoisePredictor model to obtain preliminary background noise prediction values. To remove occasional spikes (such as transient strong interference) and ensure the continuity and physical rationality of the background noise curve, this invention integrates a Savitzky-Golay filter for secondary smoothing: Savitzky-Golay filter: Based on local polynomial fitting, it preserves signal trends while smoothing curves; The final output is a smooth, natural background noise prediction result that conforms to physical laws.
[0047] The technical solution of the present invention will be described in detail below with reference to specific experimental data and operating procedures: 1. Data Acquisition and Preprocessing Data source: Spectrum data were collected from 5 monitoring stations in different areas, including a station in a densely populated urban area (test station 1), a suburban station (test station 2), a remote field station (test station 3), an industrial area station (test station 4), and a transportation hub station (test station 5). 1,000 spectrum data points were collected from each station, with a frequency range of 87Hz-108MHz. Each data point contains 1,680 frequency points and is in the format of a floating-point array.
[0048] Adaptive standardization: For each spectral data point, the median μ_med and interquartile range (IQR) are calculated, and the data is standardized using the formula X_norm = clip((X - μ_med) / (IQR + 1e-8), -5, 5). The amplitude of the processed data is concentrated in the range [-5, 5].
[0049] Adaptive interpolation: The model input dimension is set to 1024. The original 1680-dimensional spectrum data is compressed to 1024 dimensions through linear interpolation. The interpolated data retains more than 98% of the trend characteristics of the original spectrum, and the frequency domain error is less than 0.5%.
[0050] Multidimensional data augmentation: An augmentation operation is applied randomly with the following probabilities: 40% probability of adding Gaussian noise, 30% probability of Gaussian smoothing, 20% probability of amplitude scaling, and 10% probability of cyclic shifting. The Gaussian noise amplitude is estimated by the standard deviation of the site data and multiplied by an augmentation strength coefficient, which can be 0.1.
[0051] 2. Model Building and Training Model Implementation: The ImprovedCNNNoisePredictor model was built with an Intel Core i9-13900K CPU, NVIDIA RTX 4090 GPU, and 64GB of RAM.
[0052] Model structure: The multi-scale convolutional branch outputs 32 channels × 3 paths and merges them into 64 channels; then, four improved residual attention blocks are stacked to expand the channels to 256; a single-channel background noise curve is generated through the convolutional output head; a consistency enhancement layer performs local smoothing constraints on the output; and a calibration head performs global and site-specific calibrations on the output.
[0053] Training parameter settings: Optimizer: learning rate 0.001, weight decay 1e-5; Training rounds: 300 rounds, Batch size: 64; Early stopping strategy: Stop training if the validation set loss does not decrease for 50 consecutive rounds; 3. Model Testing and Result Analysis Test data: 200 spectrum data points from each site that were not used in training, covering different time periods (morning peak, off-peak, night) and different weather conditions (sunny, rainy, hazy) to ensure the comprehensiveness of the test scenarios.
[0054] Using the envelope tracking algorithm as a control, the output of the model in this invention, smoothed by Savitzky-Golay (w=7, p=2), is taken as the final output. Evaluation metrics include MSE, MAE, RMSE, correlation coefficient, and smoothness score, where the smoothness score is defined as: Smoothness_Score = 1 / (1 + Var(Δy)) Where Δy is the first-order difference of the prediction curve.
[0055] Table 1: Overall Test Comparison Results
[0056] As shown in Table 1, compared with the heuristic baseline algorithm, the method of this invention reduces MSE by about 56.3%, MAE by about 42.2%, RMSE by about 33.9%, and smoothness index by about 9.3%; the correlation coefficient remains above 0.98.
[0057] Thus, the spectral back noise prediction method based on multi-scale attention mechanism and domain adaptation is completed.
[0058] Through experimental calculations, this invention provides a spectral background noise prediction method based on a multi-scale attention mechanism and domain adaptation. By integrating deep learning technology with signal processing theory, it achieves high-precision and robust prediction of background noise in complex spectral environments. Furthermore, this invention is adaptable to heterogeneous equipment at multiple sites and dynamic electromagnetic environments, possessing strong engineering practicality and promotional value. It can be widely applied in radio spectrum monitoring scenarios, significantly improving the accuracy of radio spectrum signal background noise (base noise).
[0059] Compared with the prior art, the present invention has the following significant technical advantages: 1. Strong adaptability across the entire frequency band: The multi-scale feature extraction layer can capture the microscopic details, mesoscale structure and macroscopic trends of the spectrum at the same time through the combination of different convolution kernels and dilatation rates. It can achieve accurate fitting whether it is a flat and simple spectrum or a complex spectrum with dramatic fluctuations.
[0060] 2. Good versatility across multiple sites: The domain adaptive layer uses a weighted fusion of general features and site-specific features, combined with site-specific calibration of the calibration output head, so that the model does not need to be trained separately for each site. A single set of weights can be adapted to monitoring sites with different equipment characteristics and different electromagnetic environments, which greatly reduces deployment and maintenance costs.
[0061] 3. Outstanding robustness: The adaptive data standardization method effectively suppresses the impact of outliers and differences in site amplitude. Multidimensional data augmentation enhances the model's ability to resist interference and equipment drift. The robust composite loss function makes the model insensitive to sudden strong signal interference and can accurately lock the bottom noise.
[0062] 4. Smooth and natural prediction curve: The channel attention mechanism enhances the extraction of key features, the total variation regularization constrains the smoothness of the curve, and the multi-filter post-processing removes peak interference. The generated noise baseline conforms to the physical propagation law of electromagnetic signals, avoiding the step effect commonly found in traditional algorithms.
[0063] This method is applicable to scenarios such as radio spectrum monitoring, signal detection, signal-to-noise ratio calculation, and spectrum situational awareness, and can effectively adapt to complex and ever-changing spectrum environments and multi-site monitoring needs.
Claims
1. A spectral backnoise prediction method based on multi-scale attention mechanism and domain adaptation, characterized by: Based on an improved convolutional neural network, integrating multi-scale attention mechanisms and domain adaptation techniques, this method achieves accurate prediction of background noise in complex spectral environments through four core steps: data preprocessing, model building, model training, prediction, and post-processing. The specific technical solution is as follows: (1) Data preprocessing: Collect spectrum data files from multiple monitoring stations. Each file should contain at least three columns: frequency axis (FREQUENCY), level sequence (AVGEP), and corresponding background noise label (NOISE). Organize the data into training set and validation / test set according to the monitoring station dimension. (2) Model construction: The ImprovedCNNNoisePredictor model is constructed. Through the synergistic effect of multi-scale feature extraction layer, improved residual attention block, domain adaptive layer and calibration output, the accurate capture of complex spectral noise is achieved. (3) Model training: A training strategy combining composite loss function, robust normalization and residual learning framework is adopted to ensure the stability of model training and the reliability of prediction. (4) Prediction and post-processing: The preprocessed spectrum signal is input into the trained ImprovedCNNNoisePredictor model to obtain the initial background noise prediction value; the Savitzky-Golay filter is integrated for secondary smoothing, and the final output is a smooth and natural background noise prediction result that conforms to physical laws.
2. The spectral backnoise prediction method based on multi-scale attention mechanism and domain adaptation as described in claim 1, characterized in that, In step (1), data preprocessing specifically includes three major modules: adaptive data standardization, adaptive interpolation, and multidimensional data augmentation, as follows: (11) Adaptive data standardization: A robust standardization method based on the median and interquartile range is adopted, and its mathematical model is as follows: X_norm = clip((X - μ_med) / (IQR + ε), -5, 5) Where X is the original input N-dimensional spectral signal vector; μ_med is the median of X; IQR is the difference between the 75th percentile and the 25th percentile; ε is the minimum value of 1e-8; clip(·, -5, 5) is the truncation function; and X_norm is the output spectral signal vector after adaptive robust normalization, whose numerical range is constrained to the interval [-5, 5]. (12) Adaptive interpolation: Construct a mapping function f: [0, 1] -> R, and achieve data dimension unification through linear interpolation sampling. The mathematical model is as follows: X_model[i] = f(i / (L_model - 1)) Where L_model is the preset input dimension of the model, the value of i ranges from 0 to L_model - 1, f is a linear interpolation function constructed based on the original spectrum data, and its domain is the normalized interval [0, 1]; X_model[i] is the spectrum signal value at the i-th position in the output sequence after adaptive linear interpolation processing, and the dimension of the sequence is unified as L_model; (13) Multidimensional data augmentation: Four signal transformation methods are introduced, namely random Gaussian noise superposition, Gaussian blur smoothing, random amplitude scaling, and random cyclic shift, and the diversity of training data is expanded by random combination of these methods.
3. The spectral backnoise prediction method based on multi-scale attention mechanism and domain adaptation according to claim 2, characterized in that, In step (13), specifically: (131) Random Gaussian noise superposition: Random noise with a mean of 0 and a dynamically changing variance is superimposed on the original spectrum to simulate the thermal noise fluctuations during receiver operation; the mathematical model is X_aug = X + N(0, σ_noise), where σ_noise = noise_level·strength; X_aug is the enhanced spectrum signal vector after superimposing random Gaussian noise, which is the output of this data enhancement operation; N(0, σ_noise) represents the Gaussian noise distribution with a mean of 0 and a variance of σ_noise, used to simulate the thermal noise fluctuations of the receiver; σ_noise is the variance of Gaussian noise, which determines the noise intensity, and is calculated by noise_level and strength; noise_level is the preset base noise intensity coefficient, used to control the reference amplitude of the noise; strength is the dynamic adjustment coefficient, which is randomly selected in the range of [0.05, 0.2] to realize the dynamic change of noise intensity; (132) Gaussian blur smoothing: A one-dimensional Gaussian kernel is used to convolve and smooth the spectral data to simulate the effect of the signal after passing through filters with different bandwidths; the standard deviation of the Gaussian kernel σ is randomly and uniformly sampled in the range of [0.5, 1.5]. (133) Random amplitude scaling: Multiplying the amplitude of the full-band signal by a random scaling factor s to simulate a small drift in the receiver gain; (134) Random cyclic shift: The spectrum data is shifted slightly along the frequency axis. The maximum shift amount max_shift is set to 2% of the spectrum length, and the random shift amount shift∈[-max_shift, max_shift]; (135) During model training, a probabilistic implementation strategy is adopted: random Gaussian noise superposition is applied with a 40% probability, Gaussian blur smoothing is applied with a 30% probability, random amplitude scaling is applied with a 20% probability, and random cyclic shift is applied with a 10% probability, to ensure that each enhancement method can be fully trained.
4. The spectral backnoise prediction method based on multi-scale attention mechanism and domain adaptation according to claim 1, characterized in that, In step (2), specifically for the multi-scale feature extraction layer, the process is as follows: To simultaneously capture the microscopic details, mesoscale structure, and macroscopic trends of the spectral signal, this layer employs three parallel convolutional branches, each configured with different kernel sizes and dilatation rates. The output features of each branch are processed by batch normalization (BN) and GELU activation functions, then concatenated along the channel dimension, and finally fused and dimensionality reduced using a 1×1 convolutional layer. The mathematical model is as follows: Ok = GELU(BN(Conv1d_k_d(F_in))) O_fused = Conv1d_1x1(Concat(O1, O2, O3)) Wherein, F_in is the input feature map; Ok is the output feature of the k-th convolutional branch; Conv1d_k_d represents a one-dimensional convolutional operation with kernel size k and dilation rate d; Concat(O1, O2, O3) is a channel-dimensional concatenation operation that merges the output features O1, O2, and O3 of the three convolutional branches along the channel dimension to obtain the concatenated result of multi-scale features; O_fused is the output feature map after 1×1 convolution fusion, which integrates spectral signal features of different scales and can simultaneously reflect microscopic details, mesoscale structures, and macroscopic trends.
5. The spectral backnoise prediction method based on multi-scale attention mechanism and domain adaptation according to claim 1, characterized in that, In step (2), for the improved residual attention block, this module integrates the Squeeze-and-Excitation channel attention mechanism with the residual connection, specifically as follows: (a) Channel Attention Mechanism: For the input feature map U (C×L dimension, where C is the number of channels and L is the feature length), the channel descriptors are first extracted through compression—the global average pooling z_avg and global max pooling z_max are calculated, where z_avg_c = (1 / L)·sum(U_c,i) and z_max_c = max(U_c,i); then, the shared multilayer perceptron (MLP) is used for activation to learn the importance weights of each channel W = σ(MLP(z_avg)) + σ(MLP(z_max)), where σ is the sigmoid activation function; finally, through recalibration, the weights are multiplied by the original feature map to obtain U_tilde_c = W_c·U_c; where U_c is the feature vector of the c-th channel of the input feature map U, with dimension L (feature length); i is the index of the feature vector U_c, with a value range of [0, ..., ... L-1] is used to traverse all feature points of the channel; sum(U_c,i) is the summation of the values at all positions (index i) of the feature vector U_c of the c-th channel, which is the basic calculation of global average pooling; max(U_c,i) is the maximum value of the values at all positions (index i) of the feature vector U_c of the c-th channel, which is the calculation method of global max pooling; z_max_c is the global average pooling value of the c-th channel, which is obtained by summing the values at all positions of U_c and taking the average, reflecting the overall activation intensity of the channel; z_avg is the output vector of global average pooling, with dimension C (number of channels), which is formed by concatenating z_avg_c of all channels in channel order; z_max is the output vector of global max pooling, with dimension C (number of channels), which is formed by concatenating z_max_c of all channels in channel order; W_c is the weight coefficient of the c-th channel, which is calculated by the shared multilayer perceptron (MLP) and sigmoid activation function, reflecting the importance of the channel; U_tilde_c is the weight coefficient of the c-th channel. The output feature vector of each channel after weight recalibration is obtained by multiplying the original channel feature U_c with the weight W_c point by point, thereby enhancing the key channel features. (b) Residual connection: Introducing a Shortcut path to directly superimpose the input features onto the features processed by the attention mechanism.
6. The spectral backnoise prediction method based on multi-scale attention mechanism and domain adaptation according to claim 1, characterized in that, In step (2), for the domain adaptation layer, this layer is designed with a dual-branch structure of general feature transformation and site-specific transformation, specifically as follows: (i) General feature transformation branch: A multilayer perceptron with shared weights is used to extract the common features of the spectrum signals of all stations; (ii) Site-specific transformation branch: Configure a dedicated multilayer sensor for each monitoring site to extract the site's personalized features; (iii) Weighted fusion: The final output is obtained by weighted summation of general features and site-specific features. The mathematical model is h_out = h_common + λ·h_site, where h_common is the general feature, h_site is the site-specific feature, λ is the fusion weight coefficient, and h_out is the final output feature after weighted fusion.
7. The spectral backnoise prediction method based on multi-scale attention mechanism and domain adaptation according to claim 1, characterized in that, In step (2), for the calibration output, this module includes two parts: global linear calibration and site-specific calibration, specifically: (A) Global linear calibration: The initial predicted value y_hat_raw of the model backbone output is linearly adjusted by a learnable global scaling factor α and a global bias term β to obtain y_hat_global = α·y_hat_raw + β, which adapts to the overall distribution of the full data; y_hat_global is the output predicted value after global linear calibration. It adapts the initial predicted value to the overall distribution of the full spectrum noise data through scaling and bias adjustment. (B) Site-Specific Calibration: A two-dimensional learnable embedding vector E_s is assigned to each site. A site-specific scaling factor is calculated using scale_s = 0.4·σ(E_s,0) + 0.8, and a site-specific bias correction is calculated using bias_s = 3.0·tanh(E_s,1). The final calibrated prediction value y_hat_final = y_hat_global·scale_s + bias_s is obtained, which accurately compensates for the system bias of a single site. Here, scale_s is the site-specific scaling factor, which is calculated from the 0th dimension of the site embedding vector E_s and is used to scale the globally calibrated prediction value at the site level to adapt to the amplitude characteristics of a single site. σ() is the Sigmoid activation function, which maps the input to the (0, 1) interval and is used to constrain the range of the site-specific scaling factor. bias_s is the site-specific bias correction, which is calculated from the 1st dimension of the site embedding vector E_s. The dimension is calculated and used to perform site-level bias compensation on the globally calibrated predicted values to correct the system bias of a single site; tanh is the hyperbolic tangent activation function, which maps the input to the (-1, 1) interval and is used to constrain the fluctuation range of the site-specific bias correction; y_hat_final is the final output predicted value after global linear calibration and site-specific calibration.
8. The spectral backnoise prediction method based on multi-scale attention mechanism and domain adaptation according to claim 1, characterized in that, In step (3), specifically for the composite loss function, it is as follows: The total loss is composed of multiple weighted components, taking into account prediction accuracy, curve smoothness, and frequency domain consistency. The mathematical model is as follows: L_total = w1·L_MSE + w2·L_L1 + w3·L_Huber + w4·L_TV + w5·L_Cons +w6·L_Centroid; Where: L_MSE is the mean squared error loss, which calculates the squared mean of the differences between the predicted and actual values, used to measure the overall accuracy of the prediction results, and penalizes larger errors more severely; L_L1 is the absolute error loss, which calculates the absolute mean of the differences between the predicted and actual values, and is more robust to outliers than MSE, with a more uniform error penalty; L_Huber is the Huber robust loss, which combines the advantages of MSE and L1 loss, exhibiting MSE when the error is small and switching to L1 when the error is large, thus combining accuracy and robustness; L_TV is the total variation regularization loss, which constrains the smoothness of the prediction curve by calculating the difference between predicted values at adjacent frequency points, reducing the sharp fluctuations of the curve. The mathematical model is L_TV = (1 / N)·sum(|y_(i+1) - y_i|), N is the total number of frequency points; L_Cons is the consistency loss, used to constrain the consistency of prediction results in the frequency domain or time domain, ensuring that the noise prediction results of different frequency bands or time points conform to physical laws; L_Centroid is the frequency domain spectral center loss, which constrains the consistency of the energy center of the predicted spectrum and the true spectrum in the frequency domain by calculating the difference of the spectral center, thereby improving the accuracy of frequency domain prediction. The difference of the spectral center between the predicted curve and the true label is calculated by FFT transformation, L_Centroid = (C_pred - C_true)^2, where C is the spectral center; w1-w6 are the weight coefficients of each loss component; where sum() is the summation function, used in the total variation regularization loss to sum the absolute values of the differences between adjacent predicted values of all frequency points, reflecting the overall fluctuation of the prediction curve; y_(i+1) is the noise prediction value of the (i+1)th frequency point; y_i is the noise prediction value of the ith frequency point; C_pred is the spectral center of the predicted spectrum curve, calculated by performing FFT on the predicted value. The transformed value is calculated to reflect the energy center location of the predicted spectrum; C_true is the spectral center of the real spectrum label, which is calculated by performing an FFT transformation on the real label and serves as a benchmark reference for frequency domain consistency.
9. The spectral backnoise prediction method based on multi-scale attention mechanism and domain adaptation according to claim 1, characterized in that, In step (3), specifically for robust normalization, it is as follows: Before calculating the loss, batch-level dynamic normalization is performed on the predicted value Y_hat and the true value Y. The median m = median(Y) and scaling factor s = max(IQR(Y), std(Y)) of the current batch target value are calculated, and the normalization process is: Y_hat_norm = (Y_hat - m) / s, Y_norm = (Y - m) / s. The loss function is calculated based on the normalized Y_hat_norm and Y_norm. Among them, median(Y) is the median of the true value Y in the current batch, which serves as the benchmark center for robust normalization. It has a stronger ability to resist the interference of outliers than the mean, ensuring the stability of normalization. IQR(Y) is the interquartile range (the difference between the 75th percentile and the 25th percentile) of the true value Y in the current batch, which reflects the dispersion of the data and is a robust dispersion index. std(Y) is the true value Y in the current batch. The standard deviation of Y reflects the overall volatility of the data and is a traditional dispersion indicator. `max(IQR(Y), std(Y))` uses the larger of the interquartile range and the standard deviation as the scaling factor `s`, combining robustness with adaptability to overall volatility and avoiding scaling distortion caused by outliers. `Y_hat_norm` is the output of the predicted value `Y_hat` after robust normalization, eliminating scale differences within the batch and improving the stability of loss calculation. `Y_norm` is the output of the true value `Y` after robust normalization, and is on the same scale as `Y_hat_norm`, ensuring consistency and comparability in the calculation of the loss function.
10. The spectral backnoise prediction method based on multi-scale attention mechanism and domain adaptation according to claim 1, characterized in that, In step (3), specifically for the residual learning framework, the following steps are taken: the traditional envelope tracking algorithm is used to calculate the heuristic noise baseline, which is then convolved and fused additively with multi-scale features to inject physical priors; the "baseline + residual" output method is adopted to improve the fitting ability of broadband lifting and gradual regions.