Radio frequency interference multi-scale target detection method and system based on semi-supervised deep learning
By constructing the PA-AllSpark network model and utilizing prior feature extraction and multi-scale semantic feature fusion, the stability problem of boundary-sensitive regions and complex structural regions in radio frequency interference detection under low-labeling conditions is solved, and efficient radio frequency interference target detection is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- KUNMING UNIV OF SCI & TECH
- Filing Date
- 2026-03-22
- Publication Date
- 2026-06-19
AI Technical Summary
Existing semi-supervised methods, under low-labeling conditions, are insufficient in characterizing boundary-sensitive regions and complex structural regions in radio frequency interference identification and detection, and false tag noise is prone to accumulate, affecting the accuracy and stability of detection results.
A PA-AllSpark network model is constructed, including a prior feature extraction module, a two-stage prior guided fusion structure, an encoder, and a MSEM-BGFM hierarchical decoder. Combined with a semi-supervised training strategy, the stability of radio frequency interference target detection is improved through prior feature extraction, multi-scale semantic feature fusion, and boundary region structure refinement.
Under low-annotation conditions, the accuracy of radio frequency interference target detection and boundary characterization ability are improved, the dependence on large-scale pixel-level labeled data is reduced, and the detection stability of the model in boundary-sensitive areas and complex structural areas is enhanced.
Smart Images

Figure CN122244420A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to a method for multi-scale target detection of radio frequency interference based on semi-supervised deep learning, belonging to the field of image processing based on computer vision. Background Technology
[0002] Radio astronomy acquires astronomical information by observing and analyzing radio emission signals from celestial bodies. However, in actual radio observations, observation equipment is inevitably affected by radio frequency interference (RFI) from communication systems, radar equipment, satellite links, and other man-made wireless sources. This RFI contaminates the valid astronomical signals, leading to phenomena such as increased noise, signal distortion, abnormal fringes, or localized anomalous structures in astronomical time-frequency images. This reduces the quality of observational data and affects the accuracy of subsequent data processing, analysis, and scientific results. Therefore, effectively identifying and detecting RFI target regions in astronomical time-frequency images has become one of the key tasks in radio astronomy data processing.
[0003] Existing methods for radio frequency interference (RF) identification and detection mainly include threshold detection, subspace decomposition, wavelet transform, time-frequency analysis, machine learning, and deep learning. With the development of deep learning, RF interference identification has gradually been modeled as the problem of detecting interference target regions in astronomical time-frequency images, preferably achieved through pixel-level segmentation. To reduce the cost of manual pixel-level annotation, semi-supervised learning methods have been introduced into this task, using a combination of a small number of labeled samples and a large number of unlabeled samples for training to reduce dependence on large-scale labeled data. However, existing semi-supervised methods still have room for improvement in representing boundary-sensitive regions and complex structural regions under low annotation ratios. Furthermore, pseudo-label noise is prone to accumulation and propagation in complex structural regions, affecting the accuracy and stability of the detection results. Currently, there is a lack of RF interference target detection methods that can simultaneously improve boundary stability and robustness to complex structural regions under low annotation conditions.
[0004] In view of this, the present invention is hereby proposed. Summary of the Invention
[0005] This invention provides a method and system for multi-scale target detection of radio frequency interference based on semi-supervised deep learning. It constructs a PA-AllSpark network model consisting of a prior feature extraction module, a two-stage prior guided fusion structure, an encoder, and a MSEM-BGFM hierarchical decoder. The model is trained using a semi-supervised training strategy to achieve the detection of radio frequency interference target regions in astronomical time-frequency images under low-labeling conditions. The output is a radio frequency interference mask image with the same resolution as the input astronomical time-frequency image, and the detection stability of boundary-sensitive regions and complex structure regions is improved.
[0006] The technical solution of this invention is:
[0007] According to a first aspect of the present invention, a method for multi-scale target detection of radio frequency interference based on semi-supervised deep learning is provided, comprising: acquiring astronomical time-frequency images; constructing a PA-AllSpark network model, wherein the PA-AllSpark network model includes a prior feature extraction module, a two-stage prior guided fusion structure, an encoder, and an MSEM-BGFM hierarchical decoder; training the PA-AllSpark network model using a semi-supervised training strategy to obtain a trained PA-AllSpark network model; inputting the astronomical time-frequency images into the trained PA-AllSpark network model, wherein the trained PA-AllSpark network model extracts statistical features and shapes characterizing radio frequency interference through the prior feature extraction module. The prior feature tensor of the state regularity is enhanced and refined early using the two-stage prior-guided fusion structure to obtain early prior-guided output features. These early prior-guided output features are used as input to the encoder, which extracts multi-scale semantic features. The multi-scale semantic features are then fused, spatial resolution restored, and boundary region structure refined using the MSEM-BGFM hierarchical decoder to obtain the output features of the MSEM-BGFM hierarchical decoder. The output features of the MSEM-BGFM hierarchical decoder are then introduced, and the prior feature tensor is further enhanced and refined using the two-stage prior-guided fusion structure to obtain late-stage prior-guided output features. Based on these late-stage prior-guided output features, a radio frequency interference mask image is obtained.
[0008] Preferably, the prior feature extraction module is used to extract prior feature tensors from the input astronomical time-frequency image that can characterize the statistical features and morphological patterns of radio frequency interference.
[0009] Preferably, the extraction of the prior feature tensor characterizing the statistical features and morphological patterns of radio frequency interference specifically involves:
[0010] Prior values for structural strength anomalies;
[0011] The Sobel operator is used to calculate the gradient response in the time direction and the gradient response in the frequency direction. Based on the gradient response in the time direction and the gradient response in the frequency direction, the gradient magnitude response is calculated. The absolute values of the gradient magnitude response, the gradient response in the time direction, and the gradient response in the frequency direction are normalized to obtain the prior values of the gradient magnitude, the gradient in the time direction, and the gradient in the frequency direction.
[0012] Construct local contrast prior values;
[0013] Based on the prior values of intensity anomalies, gradient magnitude, temporal gradient, frequency gradient, and local contrast calculated at all pixel locations in the astronomical time-frequency image, corresponding prior feature maps of intensity anomalies are constructed. Gradient magnitude prior feature map Temporal gradient prior feature map Frequency-direction gradient prior feature map and local contrast prior feature map ; The intensity anomaly prior feature map Gradient magnitude prior feature map Temporal gradient prior feature map Frequency-direction gradient prior feature map and local contrast prior feature map By concatenating the components, we obtain the prior feature tensor. .
[0014] Preferably, the two-stage prior guidance fusion structure includes an early prior guidance module and a late prior guidance module.
[0015] Preferably, the early prior guidance module specifically comprises:
[0016] Prior feature tensor Input the first mapping function To obtain early guiding weights :
[0017] ;
[0018] in, For the Sigmoid function;
[0019] Utilizing the aforementioned early guidance weights The original input astronomical time-frequency image Element-wise modulation is performed to obtain early prior guided output features. :
[0020] ;
[0021] in, This indicates element-wise multiplication.
[0022] Preferably, the later prior guidance module specifically comprises:
[0023] Prior feature tensor Input the second mapping function To obtain the later guidance weight :
[0024] ;
[0025] Introducing preset weighting coefficients to guide weighting in later stages Output features of the MSEM-BGFM hierarchical decoder after local weighted modulation Element-wise modulation is performed to obtain the later prior guidance features. :
[0026] ;
[0027] in, These are preset weighting coefficients.
[0028] Preferably, the multi-scale semantic features output by the encoder are... , , Obtained through a linear projection layer , , ,right After passing through the AllSpark module, linear projection layer, and ECA channel attention module in sequence, high-level enhanced features are obtained; the MSEM-BGFM hierarchical decoder is used to... , , The high-level enhancement features are used as input for stepwise recovery and structural refinement. The decoding process of the MSEM-BGFM layered decoder is as follows: spatial resolution is gradually recovered starting from the high-level enhancement features; the high-level enhancement features are then compared with... As input to the first multi-scale enhancement module MSEM_1, feature fusion and multi-scale enhancement are performed by the first multi-scale enhancement module MSEM_1; then the output of the first multi-scale enhancement module MSEM_1 is combined with... As input to the second multi-scale enhancement module MSEM_2, the second multi-scale enhancement module MSEM_2 further performs feature fusion and multi-scale enhancement; the output of the second multi-scale enhancement module MSEM_2 and The input is an upsampling fusion module to obtain high-resolution fusion features; then the high-resolution fusion features and shallow features c1 are processed by the boundary feature extractor, and the output B is obtained. Map The input boundary guided fusion module BGFM obtains the decoded end features refined by the boundary region structure as the output features of the MSEM-BGFM layered decoder.
[0029] Preferably, the first multi-scale enhancement module MSEM_1 and the second multi-scale enhancement module MSEM_2 have the same structure. Taking the first multi-scale enhancement module MSEM_1 as an example, the specific steps are as follows: The high-level enhancement features are upsampled using bilinear interpolation to obtain the first enhancement features; simultaneously, the high-level enhancement features are... The second enhanced feature is obtained through the ECA channel attention module, global average pooling (GAP), and one-dimensional convolution. Then, the first and second enhanced features are concatenated along the channel dimension, and a preliminary fusion feature is obtained through 3×3 convolution, normalization layer, and ReLU activation layer. Channel weights are generated through context gating mechanism and multiplied element-wise with the preliminary fusion feature. Finally, multi-scale features are extracted through three 3×3 convolution branches with different dilation rates, and the output is obtained by concatenating along the channel dimension and then using 1×1 convolution.
[0030] Preferably, the boundary-guided fusion module BGFM uses the high-resolution fusion feature x output by the above-sampled fusion module and the boundary response map B output by the Boundary Extractor. Map As input; Boundary response map B Map Boundary attention branches are mapped to boundary-guided weights; subsequently, the high-resolution fused feature x is modulated using element-wise multiplication; then, a refinement module is used to refine the local structure, obtaining the refined features. Finally, the residual connections are combined to obtain the decoded end features refined by the boundary region structure. .
[0031] According to a second aspect of the present invention, a multi-scale target detection system for radio frequency interference based on semi-supervised deep learning is provided, comprising modules of any of the methods described above.
[0032] The beneficial effects of this invention are:
[0033] Compared with existing technologies, the PA-AllSpark network model of this invention is composed of a prior feature extraction module, a two-stage prior guided fusion structure, an encoder, and an MSEM-BGFM hierarchical decoder consisting of a multi-scale enhancement module and a boundary guided fusion module. The above architecture enhances the model's ability to characterize radio frequency interference-related regions under low-labeling conditions, improves the detection stability of boundary-sensitive regions and complex structural regions, and alleviates the problem of blurred boundaries of interference targets. At the same time, this invention adopts a semi-supervised training method, which reduces the dependence on large-scale pixel-level labeled data, thereby improving the accuracy of radio frequency interference target detection and boundary characterization ability under low-labeling conditions, and has good engineering application value. Attached Figure Description
[0034] Figure 1 This is an overall flowchart of the multi-scale target detection method for radio frequency interference based on semi-supervised deep learning according to the present invention.
[0035] Figure 2 This is a diagram showing the overall structure of the PA-AllSpark network model based on semi-supervised deep learning proposed in this invention.
[0036] Figure 3 This is a visualization of the responses of different prior feature maps on the same astronomical time-frequency image sample in this invention.
[0037] Figure 4 This is a structural diagram of the multi-scale enhancement module MSEM of the present invention.
[0038] Figure 5 This is a structural diagram of the Boundary Guided Fusion Module (BGFM) of this invention.
[0039] Figure 6 This is a sample image of a real astronomical time-frequency image used in this invention.
[0040] Figure 7 Based on Figure 6 Realistic mask label images created.
[0041] Figure 8 AllSpark is a comparative example of this invention. Figure 6 The recognition results are shown in the image.
[0042] Figure 9 This invention provides a comparative example of AllSpark-UNet. Figure 6 The recognition result image.
[0043] Figure 10 This invention relates to PA-AllSpark as an embodiment of the PA-AllSpark protocol. Figure 6 The recognition result image. Detailed Implementation
[0044] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention. It should be noted that, unless otherwise specified, the embodiments and features in the embodiments of this application can be arbitrarily combined with each other.
[0045] Example 1: As Figures 1-10 As shown, according to a first aspect of the present invention, a method for multi-scale target detection of radio frequency interference based on semi-supervised deep learning is provided, comprising:
[0046] Acquire astronomical time-frequency images;
[0047] A PA-AllSpark network model is constructed, comprising a prior feature extraction module, a two-stage prior-guided fusion structure, an encoder, and an MSEM-BGFM hierarchical decoder. The PA-AllSpark network model is trained using a semi-supervised training strategy to obtain a trained PA-AllSpark network model. Astronomical time-frequency images are input into the trained PA-AllSpark network model. The trained PA-AllSpark network model extracts prior feature tensors that characterize the statistical features and morphological patterns of radio frequency interference through the prior feature extraction module, and utilizes the two-stage prior-guided fusion structure... Early enhancement and refinement of the prior feature tensor are performed to obtain early prior guided output features. These early prior guided output features are used as input to the encoder, which extracts multi-scale semantic features. The MSEM-BGFM hierarchical decoder then performs feature fusion, spatial resolution restoration, and boundary region structure refinement on these multi-scale semantic features to obtain the output features of the MSEM-BGFM hierarchical decoder. The output features of the MSEM-BGFM hierarchical decoder are then introduced, and the two-stage prior guided fusion structure is used to perform late enhancement and refinement of the prior feature tensor to obtain late prior guided output features. Based on these late prior guided output features, a radio frequency interference mask image is obtained. Specifically, obtaining the radio frequency interference mask image based on the late prior guided output features involves: late prior guided features... First, a Dropout operation is performed to reduce the model's over-reliance on local feature responses. Then, a segmentation prediction head consisting of 1×1 convolutions is fed in to map high-dimensional features into pixel-level classification score maps. After that, bilinear interpolation upsampling is used to restore the pixel-level score maps to the original input image resolution, and pixel-wise argmax is used for class selection, finally obtaining the radio frequency interference mask image. The categories include interference and non-interference.
[0048] Furthermore, the semi-supervised training strategy is as follows: Construct an astronomical time-frequency image dataset and divide the dataset into a training set, a validation set, and a test set. The training set includes labeled samples and unlabeled samples. The labeled samples provide radio frequency interference mask labels corresponding to the astronomical time-frequency images, and the unlabeled samples only contain the original astronomical time-frequency images. The labeled samples and unlabeled samples in the training set are divided into ratios of 1:4, 1:8, and 1:16, respectively. In the semi-supervised training process, the PA-AllSpark network model is first used to perform forward prediction on unlabeled samples to obtain the corresponding pixel-level classification score map, and pseudo-labels are generated by pixel-by-pixel argmax. Then, labeled and unlabeled samples are merged into a unified batch and input into the PA-AllSpark network model to obtain the corresponding segmentation prediction results. For labeled samples, supervised loss is calculated based on the real mask labels; for unlabeled samples, unsupervised loss is calculated using the pseudo-labels; and boundary auxiliary loss is calculated using the boundary prediction results output by the model (the boundary prediction results are single-channel boundary outputs obtained by passing the decoder's end features through a 1×1 convolutional boundary prediction head, mainly used to calculate boundary auxiliary loss during training). These losses collectively participate in model parameter updates. After iterative training, a trained PA-AllSpark network model is obtained. The validation set is used during training to periodically evaluate model performance and select the optimal model. The test set / astronomical time-frequency images to be predicted are input into the trained PA-AllSpark network model to obtain the corresponding radio frequency interference mask images.
[0049] Furthermore, the prior feature extraction module is used to extract prior feature tensors from the input two-dimensional static astronomical time-frequency image that can characterize the statistical features and morphological patterns of radio frequency interference.
[0050] Furthermore, the extraction of the prior feature tensor characterizing the statistical features and morphological patterns of radio frequency interference specifically involves:
[0051] To address the characteristic that radio frequency interference typically exhibits a stronger response relative to the local background, an intensity anomaly prior value is constructed: Let the input be an astronomical time-frequency image. In position The pixel intensity at that location is In position Calculate the mean within the local neighborhood with standard deviation Then the position Prior value of intensity anomaly at the location Represented as:
[0052] ;
[0053] in, This indicates the position of the pixel to be calculated. This represents any pixel position when performing comparisons across the entire image. This is a constant, a small constant used to prevent numerical instability; in this embodiment of the invention, it is taken as 10. -8 .
[0054] To address the extended structure of radio frequency interference along the time and frequency axes, the Sobel operator is used to calculate the gradient response in the time and frequency directions, respectively: Let the input be an astronomical time-frequency image. In position The pixel intensity at that location is Then the position Gradient response in the time direction and frequency direction gradient response Represented as:
[0055] ;
[0056] ;
[0057] in, and These are the Sobel convolution kernels along the time and frequency directions, respectively.
[0058] Calculate the gradient magnitude response based on the time-direction gradient response and the frequency-direction gradient response. :
[0059] ;
[0060] The absolute values of the gradient magnitude response, the time-direction gradient response, and the frequency-direction gradient response are normalized to obtain the prior value of the gradient magnitude. Prior values of gradient in the time direction and frequency direction gradient prior value The expression is:
[0061] ;
[0062] ;
[0063] ;
[0064] To address the relative intensity variation between the interference region and the local background, a priori value for local contrast is constructed: Let the input be an astronomical time-frequency image. In position The pixel intensity at that location is In position Calculate the local maximum value of pixel intensity in the local neighborhood of each pixel. and local minimum Then the position Local contrast prior value at Represented as:
[0065] ;
[0066] Based on astronomical time-frequency images The intensity anomaly prior values, gradient magnitude prior values, temporal gradient prior values, frequency gradient prior values, and local contrast prior values calculated at all pixel locations constitute the corresponding intensity anomaly prior feature maps. Gradient magnitude prior feature map Temporal gradient prior feature map Frequency-direction gradient prior feature map and local contrast prior feature map ; The intensity anomaly prior feature map Gradient magnitude prior feature map Temporal gradient prior feature map Frequency-direction gradient prior feature map and local contrast prior feature map By concatenating the components, we obtain the prior feature tensor. For example, refer to Figure 3 , Figure 3 (a) is a randomly displayed original time-frequency image; Figure 3 (b) is the basis Figure 3 (a) Obtained prior feature map of intensity anomalies; Figure 3 (c) is the basis Figure 3 (a) Prior feature map of gradient magnitude obtained; Figure 3 (d) is the basis Figure 3 (a) Obtained frequency orientation gradient prior feature map; Figure 3 (e) is the basis Figure 3 (a) Obtained temporal gradient prior feature map; Figure 3 (f) is the basis Figure 3 (a) Obtained local contrast prior feature map.
[0067] Furthermore, the two-stage prior-guided fusion structure includes an early prior-guided module and a late prior-guided module, used to process the prior feature tensor. Enhance and refine.
[0068] Furthermore, the early prior guidance module specifically comprises:
[0069] Prior feature tensor Input the first mapping function To obtain early guiding weights :
[0070] ;
[0071] in, The first mapping function is a sigmoid function; the first mapping function includes a first convolutional layer, a BatchNorm2d normalization layer, a ReLU activation layer, and a second convolutional layer connected in sequence; wherein the first convolutional layer is a 3×3 convolution and the second convolutional layer is a 1×1 convolution.
[0072] Utilizing the aforementioned early guidance weights The original input astronomical time-frequency image Element-wise modulation is performed to obtain early prior guided output features. :
[0073] ;
[0074] in, This indicates element-wise multiplication.
[0075] The aforementioned prior guidance module specifically comprises:
[0076] Prior feature tensor Input the second mapping function To obtain the later guidance weight :
[0077] ;
[0078] The second mapping function includes a third convolutional layer, a BatchNorm2d normalization layer, a ReLU activation layer, a fourth convolutional layer, a BatchNorm2d normalization layer, a ReLU activation layer, and a fifth convolutional layer connected in sequence; wherein the third and fourth convolutional layers are 3×3 two-dimensional convolutional layers, and the fifth convolutional layer is a 1×1 two-dimensional convolutional layer.
[0079] Introducing preset weighting coefficients to guide weighting in later stages Output features of the MSEM-BGFM hierarchical decoder after local weighted modulation Element-wise modulation is performed to obtain the later prior guidance features. :
[0080] ;
[0081] in, The preset weight coefficients are obtained by mapping the learnable parameters to 0.3 using the sigmoid function. Their initial value is approximately 0.5744, and they are used to control the strength of the prior-guided refinement in the later stages.
[0082] As can be seen from the above technical solution, the present invention employs an early prior guidance module and a late prior guidance module to construct a dual-stage prior guidance fusion structure. The early prior guidance module in this structure is used to enhance the model's ability to focus on potential interference regions in the early stage of encoding; the late prior guidance module is used to improve the prediction stability in boundary regions and complex structural regions in the later stage of decoding.
[0083] Furthermore, the encoder employs a MiT-B5 encoder to extract multi-scale semantic features from the early prior guided output features, progressively extracting feature representations at different scales along the hierarchy to output multi-scale semantic features. , , and Among them, shallow features Mid-layer characteristics Mid-layer characteristics High-level characteristics Used for cross-layer fusion during the layered decoding process of the MSEM-BGFM layered decoder.
[0084] Considering that features from different layers are not consistent in terms of channel dimension, the multi-scale semantic features output by the encoder are processed before entering the decoder. , , Channel mapping is performed separately using a linear projection layer, and all channels are then uniformly projected to the same dimension. Thus obtain , , ,right After passing through the AllSpark module, linear projection layer, and ECA channel attention module in sequence, a high-level enhanced feature with a dimension of 64 is obtained.
[0085] Furthermore, the MSEM-BGFM hierarchical decoder is used to... , , The high-level enhancement features are used as input for stepwise recovery and structural refinement. Specifically, the decoding process of the MSEM-BGFM hierarchical decoder is as follows: spatial resolution is gradually recovered starting from the high-level enhancement features; the high-level enhancement features are then combined with... As input to the first multi-scale enhancement module MSEM_1, feature fusion and multi-scale enhancement are performed by the first multi-scale enhancement module MSEM_1; then the output of the first multi-scale enhancement module MSEM_1 is combined with... As input to the second multi-scale enhancement module MSEM_2, the second multi-scale enhancement module MSEM_2 further performs feature fusion and multi-scale enhancement; the output of the second multi-scale enhancement module MSEM_2 and Input the upsampling fusion module to obtain high-resolution fusion features; then combine the high-resolution fusion features with the shallow features. The output B after passing through the Boundary Extractor Map The input boundary guided fusion module BGFM obtains the decoded end features refined by the boundary region structure as the output features of the MSEM-BGFM layered decoder.
[0086] The upsampling fusion module specifically performs bilinear interpolation upsampling on the output of the second multi-scale enhancement module MSEM_2 and then merges it with... The features enhanced by the ECA channel attention module are concatenated along the channel dimension and then fused through a convolutional fusion module consisting of a 3×3 convolutional layer, a ReLU activation layer, and a SyncBN normalization layer to obtain high-resolution fused features.
[0087] Further, refer to Figure 4 The first multi-scale enhancement module MSEM_1 and the second multi-scale enhancement module MSEM_2 have the same structure. Taking the first multi-scale enhancement module MSEM_1 as an example, the specific steps are as follows: The high-level enhancement features (corresponding to x_high in the figure) are upsampled to the same level as the x_high feature in the figure through bilinear interpolation. With the same spatial dimensions, the first enhanced feature is obtained; simultaneously, for (Corresponding to x_low in the figure) The second enhanced feature is obtained through the ECA channel attention module, global average pooling GAP, and one-dimensional convolution. Then, the first and second enhanced features are concatenated along the channel dimension, and a preliminary fused feature is obtained through a 3×3 convolution, normalization layer, and ReLU activation layer. Further, a context gating mechanism is used to generate channel weights, which are then multiplied element-wise with the preliminary fused feature. Finally, multi-scale features are extracted through three 3×3 convolution branches with different dilation rates (1, 2, and 3), and after concatenation along the channel dimension, a 1×1 convolution is used to remap the features back to a unified channel dimension. The context gating mechanism is constructed based on a sequentially connected global average pooling GAP, 1×1 convolution layer, ReLU activation layer, 1×1 convolution layer, and sigmoid function. It should be noted that for the second multi-scale enhancement module MSEM_2, the output of the first multi-scale enhancement module MSEM_1 (corresponding to x_high in the figure) is upsampled to the same level as the first multi-scale enhancement module MSEM_1 through bilinear interpolation. With the same spatial dimensions, the first enhanced feature is obtained; simultaneously, for (corresponding to x_low in the figure) The second enhanced feature is obtained through the ECA channel attention module, global average pooling (GAP), and one-dimensional convolution.
[0088] Further, refer to Figure 5The boundary-guided fusion module BGFM uses the high-resolution fusion feature x output by the above-sampled fusion module and the boundary response map B output by the Boundary Extractor. Map As input; Boundary response map B Map The boundary attention branch maps the boundary-guided weights; subsequently, the high-resolution fusion feature x is modulated element-wise to enhance the feature response of the boundary-related region; based on this, the boundary-guided fusion module then refines the local structure through a refinement module to obtain refined features. Finally, the residual connections are combined to obtain the decoded end features refined by the boundary region structure. Therefore, the decoded end features refined by the boundary region structure output by BGFM are high-resolution fused features, which provide a more boundary-sensitive feature representation for subsequent segmentation prediction.
[0089] Furthermore, the boundary attention branch is constructed based on two stacked layers: a 3×3 convolutional layer + a ReLU activation layer, a 1×1 convolutional layer, and a sigmoid function. The thinning module is constructed based on two stacked layers: a 3×3 convolutional layer, a normalization layer, and a ReLU activation layer.
[0090] According to a second aspect of the present invention, a multi-scale target detection system for radio frequency interference based on semi-supervised deep learning is provided, comprising modules of any one of the above-described methods for multi-scale target detection of radio frequency interference based on semi-supervised deep learning, specifically including: an acquisition module for acquiring astronomical time-frequency images; an acquisition module for constructing a PA-AllSpark network model, wherein the PA-AllSpark network model includes a prior feature extraction module, a two-stage prior guided fusion structure, an encoder, and an MSEM-BGFM hierarchical decoder; training the PA-AllSpark network model using a semi-supervised training strategy to obtain a trained PA-AllSpark network model; and inputting the astronomical time-frequency images into the trained PA-AllSpark network model to obtain the trained PA-AllSpark network model. The prior feature extraction module extracts prior feature tensors that characterize the statistical features and morphological patterns of radio frequency interference. The two-stage prior guidance fusion structure is used to enhance and refine these prior feature tensors in the early stage, yielding early prior guidance output features. These early prior guidance output features are then used as input to the encoder, which extracts multi-scale semantic features. The MSEM-BGFM hierarchical decoder then performs feature fusion, spatial resolution restoration, and boundary region structure refinement on these multi-scale semantic features, obtaining the output features of the MSEM-BGFM hierarchical decoder. These output features are then introduced into the MSEM-BGFM hierarchical decoder, and the two-stage prior guidance fusion structure is used to enhance and refine the prior feature tensors in the later stage, yielding later prior guidance output features. Based on these later prior guidance output features, a radio frequency interference mask image is obtained. For details not elaborated on in the above modules, please refer to the relevant descriptions in this embodiment.
[0091] According to a third aspect of the present invention, a processor is provided for performing operations including the step of performing the described method for multi-scale target detection of radio frequency interference based on semi-supervised deep learning.
[0092] According to a fourth aspect of the present invention, a computer-readable storage medium is provided for storing a computer program; the computer program, when executed by a processor, implements the steps of the radio frequency interference multi-scale target detection method based on semi-supervised deep learning.
[0093] It should be noted that radio frequency (RF) interference in astronomical time-frequency images may originate from various types of radio transmission sources, including but not limited to communication equipment, satellite communication systems, radar equipment, and other radio radiation sources. Different interference sources differ in frequency range, duration, power distribution, and operating mode, resulting in diverse structural forms of RF interference in time-frequency images, such as narrowband structures, broadband structures, short-burst structures, fragmented structures, and irregular boundary structures. Furthermore, in actual astronomical observations, RF interference signals often superimpose with celestial signals and may be affected by factors such as the observation environment, atmospheric conditions, and receiver link status, thus increasing the complexity of RF interference identification and segmentation. The following describes the optional specific implementation process of this invention based on experimental data:
[0094] In this experiment, the batch size can be set according to the available GPU memory during model training. This experiment uses a batch size of 2, a total of 100 epochs, and employs a stochastic gradient descent (SGD) optimizer to train the model. The momentum coefficient is set to 0.9, and the weight decay coefficient is 1 × 10⁻⁶. -4 The initial learning rate is set to 5×10. -4 A multinomial learning rate decay strategy is used for dynamic adjustment. The specific implementation process is as follows:
[0095] Step 1: The dataset of this invention uses real radio observation data of 920 pulsars from the 40-meter radio telescope at the Yunnan Astronomical Observatory of the Chinese Academy of Sciences between November 2016 and March 2023 to train and evaluate the model. The observation frequency range is 2190-2290 MHz, the frequency channel bandwidth is 1 MHz, and it includes observation samples with different sub-integral lengths and durations.
[0096] Step 2: Data Preprocessing and Label Construction. First, the raw, real pulsar observation data is processed and converted into astronomical time-frequency images. Then, an automatic labeling tool is used for initial labeling, followed by manual correction to create masked label images. In the label images, white areas represent interference, and black areas represent non-interference.
[0097] Step 3: Dataset Partitioning and Semi-Supervised Labeling. The experimental dataset for this study consists of 7360 images. To ensure the rationality and fairness of the dataset partitioning, the labeled experimental dataset was randomly divided into a training set, a validation set, and a test set in a 1:1:1 ratio, with each set containing 2453 samples. The validation set was used for model selection and hyperparameter tuning, while the test set was used for model performance evaluation.
[0098] Furthermore, in the semi-supervised training scenario, the training set contains 2453 samples. This training set is further divided into labeled and unlabeled sample subsets. To systematically evaluate the impact of different labeled data sparsity on model performance, this study constructs three labeling ratio configurations: labeled to unlabeled samples at ratios of 1:4, 1:8, and 1:16. In specific implementation, the labeled sample subset is generated from the training set using a random strategy, while the unlabeled sample subset retains the original astronomical time-frequency image data, only removing the labeling information.
[0099] Step 4: During model training, labeled samples are used for supervised training; unlabeled samples are first input into the PA-AllSpark network model to obtain prediction results, and pseudo-labels are generated based on these prediction results; subsequently, labeled and unlabeled samples are input into the PA-AllSpark network model together. Supervised loss is calculated for labeled samples, and unsupervised loss is calculated for unlabeled samples using the pseudo-labels. Boundary auxiliary loss is also calculated using the boundary prediction results output by the model. These losses collectively participate in model parameter updates, thereby achieving semi-supervised learning. This embodiment of the invention uses Intersection over Union (IoU), mean Intersection over Union (mIoU), Boundary Intersection over Union (Boundary-IoU), and F1 score as model performance evaluation metrics. Specific details are as follows:
[0100] The intersection-to-union ratio (IoU) can directly reflect the model's actual ability to segment radio frequency interference regions under semi-supervised training conditions. The formula is:
[0101] ;
[0102] Wherein, TP (True Positives) is the number of pixels correctly predicted as radio frequency interference by the model, FP (False Positives) is the number of pixels incorrectly predicted as radio frequency interference by the model, and FN (False Negatives) is the number of pixels that the model failed to correctly predict as radio frequency interference.
[0103] The mean Intersection over Union (mIoU) ratio is used as an auxiliary indicator to reflect the overall pixel-level segmentation consistency. Since this example only involves radio frequency interference and background, . The formula is:
[0104] ;
[0105] Boundary Intersection over Union (IoU) more sensitively reflects the spatial alignment of prediction results within structurally sensitive regions by statistically analyzing the intersection and union relationships of the foreground regions within the boundary regions of the prediction and the ground truth mask. The formula is:
[0106] ;
[0107] in, and These represent the sets of pixels in the real and predicted masks within the boundary region, respectively. and These represent the true label mask and the radio frequency interference mask, respectively.
[0108] The F1 score assesses the balance between false positives and false negatives in radio frequency interference (RF interference) identification, and comprehensively reflects the model's accuracy in RF interference identification. The F1 score (…) The formula for ) is:
[0109] ;
[0110] To explore the superiority of the PA-AllSpark network model of this invention in the current field, this invention reproduces the representative semi-supervised segmentation models AllSpark (cited in Wang H, Zhang Q, Li Y, et al. Allspark: Reborn labeled features from unlabeled in transformer for semi-supervised semantic segmentation[C] / / Proceedings of the IEEE / CVF conferenceon computer vision and pattern recognition. 2024: 3627-3636.) and AllSpark-UNet (cited in Allspark-UNet: Li J, Liang B, Feng S, et al. RFI detection based on semi-supervised learning with improved Unet[J]. Astronomy and Computing, 2026, 54:101020.). Each model was trained and tested under the same experimental platform and dataset conditions. Table 1 presents the experimental results.
[0111] Table 1 Comparative Experiments
[0112]
[0113] As shown in Table 1, the overall performance of each method on different evaluation metrics decreases as the proportion of labeled samples decreases. For RFI segmentation tasks, it is necessary not only to ensure the accuracy of overall detection of the segmented region but also to ensure the stability of the boundary localization of the interference region. The IoU metric is mainly used to measure the overall overlap between the predicted region and the real region, while the Boundary-IoU metric can more sensitively reflect the matching between the predicted boundary and the real boundary. Therefore, this invention focuses on using IoU and Boundary-IoU to analyze the segmentation results and combines mIoU and F1 score to comprehensively evaluate the overall performance of the model. Taking the experimental data results of 1:8 segmentation as an example, it can be seen from the data that AllSpark's detection metrics are relatively low under this experimental condition, with an IoU metric of 67.49% and a Boundary-IoU metric of 60.71%; AllSpark-UNet's metrics are in the second tier, with an IoU metric of 69.92% and a Boundary-IoU metric of 64.00%, showing superior performance compared to AllSpark. It is worth noting that the PA-AllSpark proposed in this invention outperforms previous semi-supervised methods under different labeling ratios.
[0114] Based on the above implementation process, a test image is randomly selected as a representative sample, and its original astronomical time-frequency image is as follows: Figure 6 As shown, the actual mask label image is as follows: Figure 7 As shown in the image, the prediction results for different models are visualized as follows: Figures 8-10 As shown. From Figures 8-10 As can be seen, although many existing semi-supervised methods can detect RFI, the visualization results of PA-AllSpark proposed in this invention on this sample show higher regional consistency, and the large-scale error diffusion phenomenon near the boundary is significantly reduced.
[0115] To explore the rationality of the PA-AllSpark network of this invention, this invention uses AllSpark-UNet as a benchmark (Table 2A) and presents the following comparative experiments on different decoder architectures to verify the rationality of the decoder architecture of this invention, as shown in Table 2:
[0116] Table 2 Decoder Comparison Experiment
[0117]
[0118] In Table 2, B uses A as a baseline, expanding the ECA-upsample1, ECA-upsample2, ECA-upsample3, and ECA-upsample4 of the decoder in A into two parallel paths: the MSEM branch and the BGFM branch. The MSEM branch and the BGFM branch simultaneously model the input features of the current layer, and finally fuse them with a fixed weight of 0.5 / 0.5 before sending them to the next decoding layer (e.g., ECA-upsample1 is expanded into the MSEM1 branch and the BGFM1 branch, the MSEM1 branch and the BGFM1 branch model the original input of ECA-upsample1 to obtain the output, and the outputs of the MSEM1 branch and the BGFM1 branch are fused with a fixed weight of 0.5 / 0.5 as the output of the current decoding layer). In Table 2, C uses B as a baseline, modifying the fixed weights to learnable weights, and the network adaptively determines the proportion of the MSEM branch and the BGFM branch in different decoding stages through parameter adaptation. In Table 2, D is the MSEM-BGFM layered decoder of this invention.
[0119] The data shown in Table 2 demonstrates that the original AllSpark-UNet network performs excellently when transferred to the astronomical domain for RFI identification using semi-supervised learning methods. When MSEM and BGFM are introduced simultaneously, different decoder organization methods significantly impact performance. The configuration D using a serial decoding structure achieves 66.56% Boundary-IoU, outperforming the parallel structure (fixed weights) at 61.18% and the parallel structure (learnable weights) at 61.93%, indicating that a reasonable decoder architecture helps further enhance the model's stability in structural prediction.
[0120] Furthermore, the rationality of the described two-stage prior-guided fusion structure was verified (under the configuration D shown in Table 2). As shown in Table 3:
[0121] Table 3 Two-stage prior-guided fusion structure ablation experiment
[0122]
[0123] As shown in Table 3, under configuration D, introducing only the Early-only or Late-only prior guidance module negatively impacts all model metrics. However, the Dual-Stage prior guidance fusion structure of this invention, by combining the Early-only and Late-only prior guidance modules, achieves optimal results across all metrics. This indicates that introducing dual-stage prior guidance on top of the hierarchical decoding structure further enhances the model's prediction consistency and stability under low-annotation conditions.
[0124] Given the significant uncertainties and diversity in the structural morphology, boundary distribution, and intensity of radio frequency interference (RF interference) in astronomical time-frequency images, and the increased susceptibility to false detections, missed detections, and boundary blurring under low-annotation conditions, this invention proposes a semi-supervised multi-scale RF interference target detection method, PA-AllSpark, which integrates prior feature modeling and hierarchical decoding enhancement mechanisms. While maintaining the semi-supervised training logic, the PA-AllSpark method optimizes and upgrades the network structure by introducing a prior feature extraction module, a two-stage prior-guided fusion structure, an encoder, and an MSEM-BGFM hierarchical decoder composed of a multi-scale enhancement module (MSEM) and a boundary-guided fusion module (BGFM). This enhances the model's responsiveness to potential interference regions, boundary-sensitive regions, and complex structural regions, improving its characterization ability for RF interference targets at different scales and the predictive stability of boundary regions. The trained PA-AllSpark network model can automatically identify radio frequency interference regions in astronomical time-frequency images and output corresponding radio frequency interference mask images. At the same time, the method adopts a pseudo-label-driven semi-supervised training method, which reduces the dependence on a large amount of pixel-level manual annotation data and lowers the cost of manual annotation. Therefore, it has good detection stability and engineering application value under low annotation conditions.
[0125] The specific embodiments of the present invention have been described in detail above with reference to the accompanying drawings. However, the present invention is not limited to the above embodiments. Within the scope of knowledge possessed by those skilled in the art, various changes can be made without departing from the spirit of the present invention.
Claims
1. A method for multi-scale target detection of radio frequency interference based on semi-supervised deep learning, characterized in that, include: Acquire astronomical time-frequency images; A PA-AllSpark network model is constructed, comprising a prior feature extraction module, a two-stage prior guided fusion structure, an encoder, and an MSEM-BGFM hierarchical decoder. The PA-AllSpark network model is trained using a semi-supervised training strategy to obtain a trained PA-AllSpark network model. Astronomical time-frequency images are input into the trained PA-AllSpark network model. The trained PA-AllSpark network model extracts prior feature tensors that characterize the statistical features and morphological patterns of radio frequency interference through the prior feature extraction module. The two-stage prior guided fusion structure is used to perform early enhancement and refinement of the prior feature tensors to obtain early prior guided output features. The early prior guided output features are used as input to the encoder, which extracts multi-scale semantic features. The MSEM-BGFM hierarchical decoder then performs feature fusion, spatial resolution restoration, and boundary region structure refinement on these multi-scale semantic features to obtain the output features of the MSEM-BGFM hierarchical decoder. The output features of the MSEM-BGFM hierarchical decoder are used to enhance and refine the prior feature tensor in the later stage using the dual-stage prior-guided fusion structure to obtain the later prior-guided output features. Based on the prior guidance output characteristics, a radio frequency interference mask image is obtained.
2. The method for multi-scale target detection of radio frequency interference based on semi-supervised deep learning according to claim 1, characterized in that, The prior feature extraction module is used to extract prior feature tensors from the input astronomical time-frequency images that can characterize the statistical features and morphological patterns of radio frequency interference.
3. The method for multi-scale target detection of radio frequency interference based on semi-supervised deep learning according to claim 2, characterized in that, The extraction of the prior feature tensor characterizing the statistical features and morphological patterns of radio frequency interference is specifically as follows: Prior values for structural strength anomalies; The Sobel operator is used to calculate the gradient response in the time direction and the gradient response in the frequency direction. Based on the gradient response in the time direction and the gradient response in the frequency direction, the gradient magnitude response is calculated. The absolute values of the gradient magnitude response, the gradient response in the time direction, and the gradient response in the frequency direction are normalized to obtain the prior values of the gradient magnitude, the gradient in the time direction, and the gradient in the frequency direction. Construct local contrast prior values; Based on the prior values of intensity anomalies, gradient magnitude, temporal gradient, frequency gradient, and local contrast calculated at all pixel locations in the astronomical time-frequency image, corresponding prior feature maps of intensity anomalies are constructed. Gradient magnitude prior feature map Temporal gradient prior feature map Frequency-direction gradient prior feature map and local contrast prior feature map ; The intensity anomaly prior feature map Gradient magnitude prior feature map Temporal gradient prior feature map Frequency-direction gradient prior feature map and local contrast prior feature map By concatenating the components, we obtain the prior feature tensor. .
4. The method for multi-scale target detection of radio frequency interference based on semi-supervised deep learning according to claim 1, characterized in that, The two-stage prior guidance fusion structure includes an early prior guidance module and a late prior guidance module.
5. The method for multi-scale target detection of radio frequency interference based on semi-supervised deep learning according to claim 4, characterized in that, The early prior guidance module specifically includes: Prior feature tensor Input the first mapping function To obtain early guiding weights : ; in, For the Sigmoid function; Utilizing the aforementioned early guidance weights The original input astronomical time-frequency image Element-wise modulation is performed to obtain early prior guided output features. : ; in, This indicates element-wise multiplication.
6. The method for multi-scale target detection of radio frequency interference based on semi-supervised deep learning according to claim 4, characterized in that, The aforementioned prior guidance module specifically comprises: Prior feature tensor Input the second mapping function To obtain the later guiding weight : ; Introducing preset weighting coefficients to guide weighting in later stages Output features of the MSEM-BGFM hierarchical decoder after local weighted modulation Element-wise modulation is performed to obtain the later prior guidance features. : ; in, These are preset weighting coefficients.
7. The method for multi-scale target detection of radio frequency interference based on semi-supervised deep learning according to claim 1, characterized in that, The multi-scale semantic features output by the encoder , , Obtained through a linear projection layer , , ,right After passing through the AllSpark module, linear projection layer, and ECA channel attention module in sequence, high-level enhanced features are obtained; The MSEM-BGFM hierarchical decoder is used to... , , The high-level enhancement features are used as input for stepwise recovery and structural refinement. The decoding process of the MSEM-BGFM layered decoder is as follows: starting from the high-level enhancement features, the spatial resolution is gradually recovered. High-level enhancement features and As input to the first multi-scale enhancement module MSEM_1, feature fusion and multi-scale enhancement are performed by the first multi-scale enhancement module MSEM_1; then the output of the first multi-scale enhancement module MSEM_1 is combined with... As input to the second multi-scale enhancement module MSEM_2, the second multi-scale enhancement module MSEM_2 further performs feature fusion and multi-scale enhancement; the output of the second multi-scale enhancement module MSEM_2 and The input is an upsampling fusion module to obtain high-resolution fusion features; then the high-resolution fusion features and shallow features c1 are processed by the boundary feature extractor BoundaryExtractor to obtain the output B. Map The input boundary guided fusion module BGFM obtains the decoded end features refined by the boundary region structure as the output features of the MSEM-BGFM layered decoder.
8. The method for multi-scale target detection of radio frequency interference based on semi-supervised deep learning according to claim 7, characterized in that, The first multi-scale enhancement module MSEM_1 and the second multi-scale enhancement module MSEM_2 have the same structure. Taking the first multi-scale enhancement module MSEM_1 as an example, the specific steps are as follows: The high-level enhancement features are upsampled using bilinear interpolation to obtain the first enhancement features; simultaneously, the high-level enhancement features are... The second enhanced feature is obtained by applying the ECA channel attention module, global average pooling (GAP), and one-dimensional convolution. Subsequently, the first and second enhancement features are concatenated along the channel dimension, and preliminary fusion features are obtained through 3×3 convolution, normalization layer and ReLU activation layer; channel weights are further generated through context gating mechanism and multiplied element-wise with the preliminary fusion features; finally, multi-scale features are extracted through three 3×3 convolution branches with different dilation rates, and the output is obtained by 1×1 convolution after concatenation along the channel dimension.
9. The method for multi-scale target detection of radio frequency interference based on semi-supervised deep learning according to claim 7, characterized in that, The Boundary Guided Fusion Module (BGFM) takes the high-resolution fusion feature x output by the oversampling fusion module and the boundary response map output by the Boundary Extractor as input. The boundary response map is mapped to boundary guided weights through a boundary attention branch. Subsequently, the high-resolution fusion feature x is modulated in an element-wise multiplication manner. Then, the local structure is refined by the refinement module to obtain the refined features. Finally, the residual connections are combined to obtain the decoded end features refined by the boundary region structure. .
10. A multi-scale target detection system for radio frequency interference based on semi-supervised deep learning, characterized in that, The module includes the method described in any one of claims 1-9.