A multiscale differential infrared fusion airborne dim moving target classification system and method
By employing a multi-scale differential infrared fusion method and utilizing multi-modal data feature fusion technology, the problem of classifying small, moving targets in the air under complex backgrounds was solved, achieving stable classification results under low signal-to-clutter conditions.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANGHAI INSTITUTE OF TECHNICAL PHYSICS CHINESE ACADEMY OF SCIENCES
- Filing Date
- 2026-03-12
- Publication Date
- 2026-06-19
AI Technical Summary
In complex contexts, small moving targets in the air are difficult to distinguish from the background in infrared images. Existing technologies struggle to achieve reasonable fusion while ensuring the effective preservation of different information characteristics, resulting in the target's radiation features being easily obscured by background interference and limited ability to express texture and structural information.
A multi-scale differential infrared fusion method is adopted. Multimodal paired data is obtained through a data preprocessing module, multi-scale features are extracted using a dual-branch feature extraction module, and feature fusion is performed by combining a pyramid hierarchical fusion module and a cross-modal shared attention module. Finally, a comprehensive feature representation is generated through a multi-scale feature integration module for classification.
It achieves refined classification of weak moving targets in the air under complex backgrounds and low signal-to-clutter conditions, maintaining high classification stability and reliability.
Smart Images

Figure CN122244508A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of infrared detection and image information processing technology, and in particular to a multi-scale differential infrared fusion classification system and method for weak moving targets in the air. Background Technology
[0002] Infrared imaging technology, with its advantages of not relying on visible light and enabling all-weather, all-day imaging, holds significant value in the field of aviation monitoring. Especially on satellites or high-altitude monitoring platforms, infrared sensors can conduct long-distance, continuous observation of large airspaces, facilitating early detection and continuous tracking of aerial targets and providing effective technical support for ensuring civil aviation flight safety and preventing potential low-altitude risks. Addressing the difficulty in significantly characterizing weak, moving infrared targets against complex backgrounds, related research typically analyzes the data from both spatial and temporal domains: on the one hand, raw infrared images reflect the target's brightness and radiation characteristics; on the other hand, differential images can highlight the target's motion response and suppress static background interference.
[0003] Chinese patent CN202411708730 discloses a novel infrared differential detection method that can adapt to high dynamic background changes and suppress static background redundancy. The resulting original infrared data and differential data have certain complementarity at the feature level, providing a multi-source information basis for the identification and classification of weak infrared moving targets.
[0004] However, in practical applications, due to factors such as long imaging distance, small target scale, and system resolution limitations, weak moving targets in the air typically exhibit characteristics of few pixels, low contrast, and poor signal-to-clutter ratio in infrared images. Complex backgrounds such as clouds, land surfaces, sea surfaces, and atmospheric disturbances further introduce a large amount of non-stationary noise, making it difficult to effectively distinguish the target from the background in the spatial domain. When relying solely on the original infrared image, the target's radiation features are easily obscured by background interference, resulting in limited ability to express texture and structural information. While using only differential data can enhance motion sensitivity, it weakens the target's radiation intensity and spatial texture information to some extent, leading to incomplete feature representation. Therefore, how to reasonably fuse the original infrared information and differential information while ensuring the effective preservation of different information characteristics remains an urgent problem to be solved in the classification of weak moving targets in infrared imaging. Summary of the Invention
[0005] The purpose of this invention is to provide a multi-scale differential infrared fusion classification system and method for weak and moving targets in the air, which mainly solves the problems existing in the prior art. Based on candidate target detection, it introduces a joint modeling mechanism of raw infrared data and temporal differential data, making full use of the complementary advantages of the two types of data in terms of radiation characteristics and motion characteristics. Through multi-scale feature extraction and cross-modal fusion, it achieves fine classification of weak and moving targets in the air, and maintains high classification stability and reliability under complex background and low signal-to-clutter conditions.
[0006] To achieve the above objectives, the technical solution adopted by the present invention is to provide a multi-scale differential infrared fusion aerial weak moving target classification system, characterized in that it includes a data preprocessing module, a dual-branch feature extraction module, a pyramid hierarchical fusion module, a cross-modal shared attention module, a multi-scale feature integration module, and a classification module;
[0007] The data preprocessing module receives infrared sequence images at its input end and outputs multimodal paired data at its output end. The data preprocessing module obtains original infrared image slices containing weak moving targets in the air and corresponding temporal differential image slices from the infrared sequence images based on the target detection results, forming multimodal paired data.
[0008] The dual-branch feature extraction module is connected to the output of the data preprocessing module and includes an independent first feature extraction unit and a second feature extraction unit. The first feature extraction unit extracts multi-scale features from the original infrared image slices and outputs an infrared modal feature pyramid. The second feature extraction unit extracts multi-scale features from the temporal differential image slices and outputs a differential modal feature pyramid.
[0009] The pyramid-level fusion module is connected to the dual-branch feature extraction module and fuses the intermediate-level features in the infrared modal feature pyramid and the differential modal feature pyramid to generate intermediate-level fused features.
[0010] The cross-modal shared attention module is connected to the dual-branch feature extraction module, and performs cross-modal shared attention fusion on the corresponding features of the highest level of the infrared modal feature pyramid and the differential modal feature pyramid to generate the highest-level fused features;
[0011] The multi-scale feature integration module is connected to the pyramid-level fusion module and the cross-modal shared attention module. It performs scale alignment and channel concatenation on the intermediate layer fusion features and the highest layer fusion features, and generates a comprehensive feature representation through multi-kernel deep convolution processing.
[0012] The classification module is connected to the multi-scale feature integration module, performs category discrimination based on the comprehensive feature representation, and outputs the classification result of the weak moving target in the air.
[0013] Furthermore, the data preprocessing module includes a differential image generation unit, a target detection unit, and a slice cropping unit;
[0014] The differential image generation unit generates the temporal differential image from the infrared sequence image through an event-triggered mechanism with an adaptive threshold;
[0015] The target detection unit is connected to the differential image generation unit and detects candidate positions of weak moving targets in the air based on the temporal differential image.
[0016] The slice cropping unit is connected to the target detection unit, the differential image generation unit, and the external infrared image source, respectively. According to the candidate position, it crops target slices of fixed size from the original infrared image and the corresponding time-domain differential image to form the original infrared image slice and the corresponding time-domain differential image slice, which are used as the multimodal pairing data and output to the dual-branch feature extraction module.
[0017] Furthermore, the first feature extraction unit and the second feature extraction unit have the same network structure, and their network weights are trained independently and are both connected to the pyramid-level fusion module and the cross-modal shared attention module;
[0018] The input end of the first feature extraction unit is connected to the data preprocessing module, receives raw infrared image slices, and outputs features at each level of the infrared modal feature pyramid; the input end of the second feature extraction unit is connected to the data preprocessing module, receives corresponding temporal differential image slices, and outputs features at each level of the differential modal feature pyramid.
[0019] Furthermore, the input end of the pyramid-level fusion module is connected to the first feature extraction unit and the second feature extraction unit respectively, and receives intermediate-level features from the infrared modal feature pyramid and the differential modal feature pyramid; the pyramid-level fusion module performs fusion processing on the received intermediate-level features to generate intermediate-level fused features, and connects to the multi-scale feature integration module through its output end.
[0020] Furthermore, the cross-modal shared attention module includes a shared transformation unit, a bidirectional attention calculation unit, a first residual connection unit, a second residual connection unit, a feature stitching unit, and a shared projection unit;
[0021] The shared transformation unit has its input end connected to the first feature extraction unit and the second feature extraction unit, and performs transformation processing on the infrared branch highest-level feature and the differential branch highest-level feature to generate corresponding query features, key features and value features.
[0022] The bidirectional attention calculation unit is connected to the shared transformation unit. Based on the query features of the infrared branch and the key and value features of the differential branch, it calculates the attention features of the infrared attention differential. At the same time, the bidirectional attention calculation unit also calculates the attention features of the differential attention infrared based on the query features of the differential branch and the key and value features of the infrared branch.
[0023] The first residual connection unit is connected to the bidirectional attention calculation unit and the first feature extraction unit respectively, and adds the original features of the highest layer of the infrared branch to the attention features of the infrared attention derivative to generate enhanced infrared features;
[0024] The second residual connection unit is connected to the bidirectional attention calculation unit and the second feature extraction unit respectively, and adds the original features of the highest layer of the differential branch to the attention features of the differential attention infrared to generate enhanced differential features;
[0025] The feature splicing unit is connected to both the first residual connection unit and the second residual connection unit, and splices the enhanced infrared feature and the enhanced differential feature in the channel dimension to form a spliced feature;
[0026] The shared projection unit is connected to the feature stitching unit, performs linear transformation and compression on the stitched features to generate the highest-level fused features, and connects to the multi-scale feature integration module through its output.
[0027] Furthermore, the shared transformation unit shares parameters when transforming the highest-level features of the infrared branch and the highest-level features of the differential branch, so that the highest-level features of the infrared branch and the highest-level features of the differential branch are mapped to the same feature space through the same linear transformation.
[0028] Furthermore, the cross-modal shared attention module also includes a deep convolutional position encoding unit;
[0029] The deep convolutional position encoding unit is located between the bidirectional attention calculation unit and the first residual connection unit and the second residual connection unit. It performs deep convolution processing on the attention features of the infrared attention differential and the attention features of the differential attention infrared, respectively, introduces local spatial position information, generates attention features with position encoding, and then inputs them to the first residual connection unit and the second residual connection unit respectively.
[0030] Furthermore, the multi-scale feature integration module includes a scale alignment unit, a feature splicing unit, a multi-kernel deep convolution unit, and a feature fusion unit;
[0031] The scale alignment unit downsamples the high-resolution features in the intermediate layer fusion features and the highest layer fusion features, upsamples the low-resolution features, and performs convolutional channel compression on all features to make them have consistent spatial resolution and channel dimension.
[0032] The feature splicing unit is connected to the scale alignment unit, and splices the scale-aligned features in the channel dimension to generate spliced features;
[0033] The multi-kernel deep convolutional unit is connected to the feature splicing unit. It uses multiple depth-separable convolutional kernels of different sizes to perform deep convolution on the spliced features respectively, thereby capturing contextual information under different receptive fields.
[0034] The feature fusion unit is connected to the multi-kernel deep convolution unit. The output of the multi-kernel deep convolution unit is residually connected to the concatenated features, and after pointwise convolution remapping, a comprehensive feature representation is generated.
[0035] This invention also discloses a classification method using the above-mentioned multi-scale differential infrared fusion airborne weak moving target classification system, characterized by comprising the following steps:
[0036] Step S100: Through the data preprocessing module, based on the target detection results, the original infrared image slice containing weak moving targets in the air and the corresponding time-domain differential image slice are obtained from the infrared sequence image to form multimodal paired data.
[0037] Step S200: Through the first feature extraction unit and the second feature extraction unit of the dual-branch feature extraction module, multi-scale feature extraction is performed on the original infrared image slice and the temporal differential image slice respectively to obtain the infrared modal feature pyramid and the differential modal feature pyramid;
[0038] Step S300: The intermediate-level features in the infrared modal feature pyramid and the differential modal feature pyramid are fused by the pyramid-level fusion module to generate intermediate-level fused features;
[0039] Step S400: The cross-modal shared attention module is used to perform cross-modal shared attention fusion on the corresponding features of the highest level of the infrared modal feature pyramid and the differential modal feature pyramid to generate the highest level fused features;
[0040] Step S500: The intermediate layer fused features and the highest layer fused features are scale-aligned and channel-stitched by the multi-scale feature integration module, and a comprehensive feature representation is generated by multi-kernel deep convolution processing.
[0041] Step S600: The classification module performs category discrimination based on the comprehensive feature representation and outputs the classification result of the weak moving target in the air.
[0042] Further, in step S400, cross-modal shared attention fusion includes the following sub-steps:
[0043] Step S401: Through the shared transformation unit, the highest-level features of the infrared branch and the highest-level features of the differential branch are transformed simultaneously to generate corresponding query features, key features and value features; wherein the shared transformation unit completely shares parameters between the two modes.
[0044] Step S402: The bidirectional attention calculation unit calculates the attention features of the infrared attention differential based on the query features of the infrared branch and the key and value features of the differential branch, and calculates the attention features of the differential attention infrared based on the query features of the differential branch and the key and value features of the infrared branch.
[0045] Step S403: The attention features of the infrared attention differential and the attention features of the differential attention infrared are subjected to deep convolution processing by the deep convolution position encoding unit to introduce local spatial position information and generate attention features with position encoding.
[0046] Step S404: Through the first residual connection unit, the original feature of the highest layer of the infrared branch is added to the attention feature of the infrared attention differential with position encoding to generate the enhanced infrared feature;
[0047] Step S405: Through the second residual connection unit, the original features of the highest layer of the differential branch are added to the attention features of the position-encoded differential attention infrared to generate enhanced differential features;
[0048] Step S406: The enhanced infrared feature and the enhanced differential feature are spliced together in the channel dimension through the feature splicing unit to generate spliced features;
[0049] Step S407: The stitching features are linearly transformed and compressed using the shared projection unit to generate the highest-level fused features.
[0050] In view of the above technical features, the multi-scale differential infrared fusion airborne weak moving target classification system and method of the present invention has the following significant advantages compared with the prior art:
[0051] 1. This invention independently models the original infrared data and the time-domain differential data, and introduces cross-modal information interaction layer by layer in the multi-scale cross-modal progressive fusion structure, making full use of the complementary relationship between radiation characteristics and motion characteristics to achieve effective synergy of multi-layer semantic features.
[0052] 2. This invention introduces a cross-modal shared attention mechanism in the high-level feature fusion stage and combines it with a multi-scale feature integration module to form an intermediate-scale comprehensive feature representation, enabling the network to maintain stable classification performance under complex backgrounds and low signal-to-clutter conditions, thereby improving the reliability of classifying weak moving targets in the air. Attached Figure Description
[0053] Figure 1 This is a system block diagram of a preferred embodiment of the multi-scale differential infrared fusion airborne weak moving target classification system of the present invention;
[0054] Figure 2 yes Figure 1 A schematic diagram of the cross-modal shared attention module;
[0055] Figure 3 yes Figure 1 A schematic diagram of the multi-scale feature integration module;
[0056] Figure 4 This is a flowchart of a preferred embodiment of the present invention, which uses a multi-scale differential infrared fusion system for classifying weak moving targets in the air.
[0057] Figure 5 The infrared slices and differential slices selected in the test set of this invention;
[0058] Figure 6 This invention presents the classification results of infrared slices and differential slice pairs in the test set.
[0059] In the diagram: 100 - Data preprocessing module, 200 - Dual-branch feature extraction module, 300 - Pyramid-level fusion module, 400 - Cross-modal shared attention module, 500 - Multi-scale feature integration module, 600 - Classification module;
[0060] 110 - Differential image generation unit, 120 - Target detection unit, 130 - Slice cropping unit;
[0061] 210 - First feature extraction unit, 220 - Second feature extraction unit;
[0062] 410 - Shared Transformation Unit, 420 - Bidirectional Attention Calculation Unit, 430 - Deep Convolutional Position Encoding Unit, 440 - First Residual Connection Unit, 450 - Second Residual Connection Unit, 460 - Feature Concatenation Unit, 470 - Shared Projection Unit;
[0063] 510 - Scale alignment unit, 520 - Feature splicing unit, 530 - Multi-kernel deep convolution unit, 540 - Feature fusion unit. Detailed Implementation
[0064] The present invention will be further described below with reference to specific embodiments. It should be understood that these embodiments are for illustrative purposes only and are not intended to limit the scope of the invention. Furthermore, it should be understood that after reading the teachings of this invention, those skilled in the art can make various alterations or modifications to the invention, and these equivalent forms also fall within the scope defined by the appended claims.
[0065] Please see Figures 1 to 3 This invention discloses a multi-scale differential infrared fusion aerial weak moving target classification system. As shown in the figure, a preferred embodiment includes a data preprocessing module 100, a dual-branch feature extraction module 200, a pyramid hierarchical fusion module 300, a cross-modal shared attention module 400, a multi-scale feature integration module 500, and a classification module 600.
[0066] The data preprocessing module 100 is used to obtain original infrared image slices containing weak moving targets in the air and corresponding temporal differential image slices from infrared sequence images based on target detection results, forming multimodal paired data. The data preprocessing module 100 consists of a differential image generation unit 110, a target detection unit 120, and a slice cropping unit 130. The differential image generation unit 110 generates temporal differential images from the infrared sequence images through an event-triggered mechanism with an adaptive threshold. The target detection unit 120 is connected to the differential image generation unit 110 and detects candidate positions of weak moving targets in the air based on the temporal differential image. The slice cropping unit 130 is connected to the target detection unit 120, the differential image generation unit 110, and an external infrared image source. Based on the candidate positions provided by the target detection unit 120, it crops target slices of fixed sizes from the original infrared image and the temporal differential image provided by the differential image generation unit 110, forming original infrared image slices and their corresponding temporal differential image slices. Raw infrared image slices and temporal differential image slices constitute multimodal paired data, which is output to subsequent modules. The infrared sequence images are acquired by a long-wave infrared camera mounted on an infrared UAV or aircraft. Both the target and background are derived from raw long-wave infrared data. A 7:3 sampling ratio is used temporally to simulate platform jitter in a 1-pixel / frame staring mode, with the target size controlled between 1 and 2 pixels. The raw infrared image slices and temporal differential image slices are 32×32 pixels in size.
[0067] A dual-branch feature extraction module 200, connected to the data preprocessing module 100, consists of a first feature extraction unit 210 and a second feature extraction unit 220. The first and second feature extraction units 210 are independent of each other, possessing the same network structure (both using the YOLO v11 feature extraction network), but their network weights are trained independently. Specifically: the input of the first feature extraction unit 210 is connected to the data preprocessing module 100, receives raw infrared image slices, performs multi-scale feature extraction on the raw infrared image slices, and outputs an infrared modal feature pyramid. This infrared modal feature pyramid includes high-resolution features, mesoscale features, and low-resolution features. Similarly, the input of the second feature extraction unit 220 is connected to the data preprocessing module 100, receives corresponding temporal differential image slices, performs multi-scale feature extraction on the temporal differential image slices, and outputs a differential modal feature pyramid. The differential modal feature pyramid also includes high-resolution features, mesoscale features, and low-resolution features.
[0068] The pyramid-level fusion module 300 is connected to the first feature extraction unit 210 and the second feature extraction unit 220 in the dual-branch feature extraction module 200. It receives intermediate-level features (i.e., high-resolution features and mesoscale features) from the infrared modal feature pyramid and the differential modal feature pyramid, performs fusion processing on these intermediate-level features to generate intermediate-level fused features, and connects to subsequent modules through its output.
[0069] The cross-modal shared attention module 400 is connected to the first feature extraction unit 210 and the second feature extraction unit 220 in the dual-branch feature extraction module 200. It is used to perform cross-modal shared attention fusion on the corresponding features (i.e., low-resolution features) at the highest level of the infrared modal feature pyramid and the differential modal feature pyramid, thereby generating the highest-level fused features. The cross-modal shared attention module 400 is further composed of a shared transformation unit 410, a bidirectional attention calculation unit 420, a deep convolutional position encoding unit 430, a first residual connection unit 440, a second residual connection unit 450, a feature stitching unit 460, and a shared projection unit 470.
[0070] The input of the shared transformation unit 410 is connected to the first feature extraction unit 210 and the second feature extraction unit 220. It performs transformation processing on the highest-level features of the infrared branch and the highest-level features of the differential branch to generate corresponding query features, key features, and value features. When processing the infrared branch and the differential branch, the shared transformation unit 410 uses the same set of shared parameters so that the highest-level features of the two modes, after undergoing the same linear transformation, are ultimately mapped to the same feature space.
[0071] The bidirectional attention calculation unit 420 is connected to the shared transformation unit 410 and is used to provide bidirectional attention in the infrared branch and the differential branch. Specifically, it calculates the attention features of the infrared attention differential based on the query features of the infrared branch and the key and value features of the differential branch. At the same time, it also calculates the attention features of the differential attention infrared based on the query features of the differential branch and the key and value features of the infrared branch.
[0072] The deep convolutional position encoding unit 430 obtains the attention features of infrared attention differential and the attention features of differential attention infrared from the bidirectional attention calculation unit 420, and then performs deep convolution processing on them respectively to introduce local spatial position information and generate attention features with position encoding.
[0073] The first residual connection unit 440 and the second residual connection unit 450 are connected to the depthwise convolutional position encoding unit 430, and are respectively combined with the first feature extraction unit 210 and the second feature extraction unit 220 to generate enhanced differential features for the infrared branch and the differential branch. Specifically, the first residual connection unit 440 adds the original features of the highest layer of the infrared branch to the attention features of the position-encoded infrared attention differential to generate enhanced infrared features. Similarly, the second residual connection unit adds the original features of the highest layer of the differential branch to the attention features of the position-encoded differential attention infrared to generate enhanced differential features.
[0074] The feature splicing unit 460 is connected to the first residual connection unit 440 and the second residual connection unit 450 to splice the enhanced infrared features and the enhanced differential features in the channel dimension to generate spliced features.
[0075] The shared projection unit 470 is connected to the feature stitching unit 460. It performs a 1×1 convolutional linear transformation to compress the stitched features, generates the highest-level fused features, and connects to the multi-scale feature integration module 500 through its output.
[0076] The multi-scale feature integration module 500, connected to the pyramid-level fusion module 300 and the cross-modal shared attention module 400, performs scale alignment and channel concatenation on the intermediate-layer and highest-layer fused features, and generates a comprehensive feature representation through multi-kernel deep convolution processing. The multi-scale feature integration module 500 is further composed of: a scale alignment unit 510, a feature concatenation unit 520, a multi-kernel deep convolution unit 530, and a feature fusion unit 540.
[0077] The scale alignment unit 510 downsamples high-resolution features in the intermediate layer fusion features and the highest layer fusion features, upsamples low-resolution features, and performs 1×1 convolutional channel compression on all features to give them consistent spatial resolution and channel dimension.
[0078] The feature splicing unit 520 is connected to the scale alignment unit 510, which splices the scale-aligned features in the channel dimension to generate spliced features.
[0079] The multi-kernel deep convolution unit 530 is connected to the feature splicing unit 520. It uses multiple depth-separable convolution kernels of different sizes (such as 3×3 and 5×5) to perform depth convolution on the spliced features, captures contextual information under different receptive fields, and generates multi-scale features.
[0080] The feature fusion unit 540 is connected to the multi-kernel deep convolution unit 530. The multi-scale features output by the multi-kernel deep convolution unit 530 are residually connected with the original concatenated features output by the feature splicing unit 520, and after pointwise convolution remapping, a comprehensive feature representation is generated.
[0081] The classification module 600 is connected to the multi-scale feature integration module 500. It performs category discrimination based on the comprehensive feature representation and outputs the classification results for weak moving targets in the air. Specifically, the classification module 600 performs global average pooling and fully connected mapping on the comprehensive feature representation, and then outputs the probability distribution for each category.
[0082] Please see Figure 1 and Figure 4 The present invention also discloses a multi-scale differential infrared fusion method for classifying weak moving targets in the air. A preferred embodiment of this method, using the system described in the above embodiment, includes the following steps:
[0083] Step S100: Data preprocessing.
[0084] The data preprocessing module 100 obtains raw infrared image slices containing weak moving targets in the air from the infrared sequence images based on the target detection results, as well as corresponding temporal differential image slices, to form multimodal paired data. Specifically, this includes the following sub-steps:
[0085] Step S101: Generate a time-domain differential image.
[0086] The differential image generation unit 110 uses an adaptive threshold event triggering mechanism to generate a time-domain differential image from an infrared sequence image.
[0087] Step S102: Detect target candidate locations.
[0088] The target detection unit 120 detects candidate locations of weak moving targets in the air based on the temporal differential image.
[0089] Step S103: Generate multimodal pairing data.
[0090] The slice cropping unit 130, based on the candidate position, crops target slices of a fixed size (32×32) from the original infrared image and its corresponding temporal differential image, respectively, to form original infrared image slices and temporal differential image slices, which together constitute multimodal paired data.
[0091] Step S200: Dual-branch feature extraction.
[0092] The first feature extraction unit 210 and the second feature extraction unit 220 of the dual-branch feature extraction module 200 perform multi-scale feature extraction on the original infrared image slices and the temporal differential image slices, respectively, to obtain the infrared modal feature pyramid and the differential modal feature pyramid. The first feature extraction unit 210 and the second feature extraction unit 220 have the same network structure (both use the YOLO v11 feature extraction network), but their network weights are trained independently.
[0093] Step S300: Merging of intermediate pyramid levels.
[0094] The pyramid hierarchy fusion module 300 fuses the intermediate-level features (high-resolution features and mesoscale features) in the infrared modal feature pyramid and the differential modal feature pyramid, and generates the intermediate-level fused features for each intermediate level.
[0095] Step S400: Highest-level cross-modal shared attention fusion.
[0096] For the corresponding features (low-resolution features) at the highest level of the infrared modal feature pyramid and the differential modal feature pyramid, cross-modal shared attention fusion is performed on them through the cross-modal shared attention module 400 to generate the highest-level fused features. Specifically, the following sub-steps are included:
[0097] Step S401, feature transformation.
[0098] The shared transformation unit 410 simultaneously transforms the highest-level features of both the infrared branch and the differential branch, generating corresponding query features, key features, and value features. When processing the infrared and differential branches, the shared transformation unit 410 completely shares parameters between the two modes, ensuring that the highest-level features of both modes are mapped to the same feature space through the same linear transformation.
[0099] Step S402: Calculate bidirectional attention.
[0100] The bidirectional attention calculation unit 420 calculates the attention characteristics of the infrared attention differential based on the query characteristics of the infrared branch and the key and value characteristics of the differential branch. At the same time, it calculates the attention characteristics of the differential attention infrared based on the query characteristics of the differential branch and the key and value characteristics of the infrared branch.
[0101] Step S403: Add position encoding.
[0102] The attention features of infrared attention differentiation and the attention features of differential attention infrared are processed by deep convolution through the deep convolutional position coding unit 430, thereby introducing local spatial position information into the two attention features and generating attention features with position coding.
[0103] Step S404: Generate enhanced infrared features.
[0104] The first residual connection unit 440 superimposes the original features of the highest layer of the infrared branch with the attention features of the infrared attention differential with position encoding, and generates the enhanced infrared features in a residual connection manner.
[0105] Step S405: Generate enhanced differential features.
[0106] The second residual connection unit 450 superimposes the original features of the highest layer of the differential branch with the attention features of the differential attention infrared with position encoding, and generates the enhanced differential features in a residual connection manner.
[0107] Step S406: Generate splicing features.
[0108] The enhanced infrared features and enhanced differential features are spliced together in the channel dimension by the feature splicing unit 460 to generate spliced features.
[0109] Step S407: Generate the highest-level fusion feature.
[0110] The stitched features are compressed by performing a 1×1 convolutional linear transformation using the shared projection unit 470 to generate the highest-level fused features.
[0111] Step S500: Multi-scale feature integration.
[0112] The multi-scale feature integration module 500 performs scale alignment and channel concatenation on the high-resolution intermediate layer fused features, medium-resolution intermediate layer fused features, and highest-level fused features, and then generates a comprehensive feature representation through multi-kernel deep convolution processing. Specifically, it includes the following sub-steps:
[0113] Step S501: Align and fuse features.
[0114] The scale alignment unit 510 performs different processing on features of different resolutions in the high-resolution intermediate layer fusion features, medium-resolution intermediate layer fusion features, and highest-resolution fusion features. Specifically, high-resolution features are downsampled and low-resolution features are upsampled to align their scale with that of medium-resolution features. Then, all features are compressed using a 1×1 convolutional channel to ensure consistent spatial resolution and channel dimensions.
[0115] Step S502: Splicing and fusion features.
[0116] The feature splicing unit 520 splices the scale-aligned features along the channel dimension to generate spliced features.
[0117] Step S503: Generate multi-scale features.
[0118] The multi-kernel deep convolutional unit 530 employs multiple depth-separable convolutional kernels of different sizes (such as 3×3 and 5×5) to perform depth convolution on the spliced features, capturing contextual information under different receptive fields and generating multi-scale features.
[0119] Step S504: Generate a comprehensive feature representation.
[0120] The feature fusion unit 540 performs residual connection between the multi-scale features obtained in step S530 and the original spliced features obtained in step S520, and generates a comprehensive feature representation after pointwise convolution remapping.
[0121] Step S600: Classify.
[0122] The classification module 600 performs category discrimination based on the comprehensive feature representation, ultimately outputting the classification results for weak moving targets in the air. Specifically, global average pooling and fully connected mapping are applied to the comprehensive feature representation to output the probability distribution for each category.
[0123] The following describes the training process of a multi-scale differential infrared fusion aerial weak moving target classification system according to the present invention. Based on preprocessed differential infrared data pairs, the system learns the radiation and motion characteristics of weak moving targets. An adaptive optimizer (such as Adam) is used to optimize the network parameters, and an MSE loss function is constructed to calculate the network loss, thereby achieving network parameter training. Specific parameters are: 400 training epochs, 256 image sizes, and 8 batch sizes. The optimizer employs an adaptive strategy, and an early stopping mechanism is introduced during training to prevent overfitting (patience set to 50). Mosaic and color enhancement are disabled. In this embodiment, after obtaining multimodal paired data consisting of original infrared image slices and corresponding temporal differential image slices in step S100, the data is divided into a training set, a validation set, and a test set according to a preset ratio (e.g., 8:1:1), which are used for subsequent training, validation, and evaluation of the classification model. The system is trained using the training set and validated using the validation set until the training conditions are met, resulting in a trained classification model. Please refer to [link to relevant documentation]. Figure 5 and Figure 6 The images show the infrared slice and differential slice pairs in the test set, as well as the classification results for these infrared slice and differential slice pairs.
[0124] The above description is merely a preferred embodiment of the present invention and does not limit the patent scope of the present invention. Any equivalent structural or procedural transformations made based on the content of the present invention's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of the present invention.
Claims
1. A multi-scale differential infrared fusion airborne weak moving target classification system, characterized in that, It includes a data preprocessing module, a dual-branch feature extraction module, a pyramid-level fusion module, a cross-modal shared attention module, a multi-scale feature integration module, and a classification module; The data preprocessing module receives infrared sequence images at its input end and outputs multimodal paired data at its output end. The data preprocessing module obtains original infrared image slices containing weak moving targets in the air and corresponding temporal differential image slices from the infrared sequence images based on the target detection results, forming multimodal paired data. The dual-branch feature extraction module is connected to the output of the data preprocessing module and includes an independent first feature extraction unit and a second feature extraction unit. The first feature extraction unit extracts multi-scale features from the original infrared image slices and outputs an infrared modal feature pyramid. The second feature extraction unit extracts multi-scale features from the temporal differential image slices and outputs a differential modal feature pyramid. The pyramid-level fusion module is connected to the dual-branch feature extraction module and fuses the intermediate-level features in the infrared modal feature pyramid and the differential modal feature pyramid to generate intermediate-level fused features. The cross-modal shared attention module is connected to the dual-branch feature extraction module, and performs cross-modal shared attention fusion on the corresponding features of the highest level of the infrared modal feature pyramid and the differential modal feature pyramid to generate the highest-level fused features; The multi-scale feature integration module is connected to the pyramid-level fusion module and the cross-modal shared attention module. It performs scale alignment and channel concatenation on the intermediate layer fusion features and the highest layer fusion features, and generates a comprehensive feature representation through multi-kernel deep convolution processing. The classification module is connected to the multi-scale feature integration module, performs category discrimination based on the comprehensive feature representation, and outputs the classification result of the weak moving target in the air.
2. The multi-scale differential infrared fusion airborne weak moving target classification system according to claim 1, characterized in that, The data preprocessing module includes a differential image generation unit, a target detection unit, and a slice cropping unit; The differential image generation unit generates the temporal differential image from the infrared sequence image through an event-triggered mechanism with an adaptive threshold; The target detection unit is connected to the differential image generation unit and detects candidate positions of weak moving targets in the air based on the temporal differential image. The slice cropping unit is connected to the target detection unit, the differential image generation unit, and the external infrared image source, respectively. According to the candidate position, it crops target slices of fixed size from the original infrared image and the corresponding time-domain differential image to form the original infrared image slice and the corresponding time-domain differential image slice, which are used as the multimodal pairing data and output to the dual-branch feature extraction module.
3. The multi-scale differential infrared fusion airborne weak moving target classification system according to claim 1, characterized in that, The first feature extraction unit and the second feature extraction unit have the same network structure, and their network weights are trained independently. Both are connected to the pyramid-level fusion module and the cross-modal shared attention module. The input end of the first feature extraction unit is connected to the data preprocessing module, receives raw infrared image slices, and outputs features at each level of the infrared modal feature pyramid; the input end of the second feature extraction unit is connected to the data preprocessing module, receives corresponding temporal differential image slices, and outputs features at each level of the differential modal feature pyramid.
4. The multi-scale differential infrared fusion airborne weak moving target classification system according to claim 1, characterized in that, The input end of the pyramid-level fusion module is connected to the first feature extraction unit and the second feature extraction unit respectively, and receives intermediate-level features from the infrared modal feature pyramid and the differential modal feature pyramid; The pyramid-level fusion module performs fusion processing on the received intermediate-level features to generate intermediate-level fused features, and connects to the multi-scale feature integration module through its output.
5. The multi-scale differential infrared fusion airborne weak moving target classification system according to claim 1, characterized in that, The cross-modal shared attention module includes a shared transformation unit, a bidirectional attention calculation unit, a first residual connection unit, a second residual connection unit, a feature stitching unit, and a shared projection unit; The shared transformation unit has its input end connected to the first feature extraction unit and the second feature extraction unit, and performs transformation processing on the infrared branch highest-level feature and the differential branch highest-level feature to generate corresponding query features, key features and value features. The bidirectional attention calculation unit is connected to the shared transformation unit. Based on the query features of the infrared branch and the key and value features of the differential branch, it calculates the attention features of the infrared attention differential. At the same time, the bidirectional attention calculation unit also calculates the attention features of the differential attention infrared based on the query features of the differential branch and the key and value features of the infrared branch. The first residual connection unit is connected to both the bidirectional attention calculation unit and the first feature extraction unit, and adds the original features of the highest layer of the infrared branch to the attention features of the infrared attention derivative. Generate enhanced infrared features; The second residual connection unit is connected to the bidirectional attention calculation unit and the second feature extraction unit respectively, and adds the original features of the highest layer of the differential branch to the attention features of the differential attention infrared to generate enhanced differential features; The feature splicing unit is connected to both the first residual connection unit and the second residual connection unit, and splices the enhanced infrared feature and the enhanced differential feature in the channel dimension to form a spliced feature; The shared projection unit is connected to the feature stitching unit, performs linear transformation and compression on the stitched features to generate the highest-level fused features, and connects to the multi-scale feature integration module through its output.
6. The multi-scale differential infrared fusion airborne weak moving target classification system according to claim 5, characterized in that, The shared transformation unit shares parameters when transforming the highest-level features of the infrared branch and the highest-level features of the differential branch, so that the highest-level features of the infrared branch and the highest-level features of the differential branch are mapped to the same feature space through the same linear transformation.
7. A multi-scale differential infrared fusion airborne weak moving target classification system according to claim 5, characterized in that, The cross-modal shared attention module also includes a deep convolutional position encoding unit; The deep convolutional position encoding unit is located between the bidirectional attention calculation unit and the first residual connection unit and the second residual connection unit. It performs deep convolution processing on the attention features of the infrared attention differential and the attention features of the differential attention infrared, respectively, introduces local spatial position information, generates attention features with position encoding, and then inputs them to the first residual connection unit and the second residual connection unit respectively.
8. The multi-scale differential infrared fusion airborne weak moving target classification system according to claim 1, characterized in that, The multi-scale feature integration module includes a scale alignment unit, a feature splicing unit, a multi-kernel deep convolution unit, and a feature fusion unit; The scale alignment unit downsamples the high-resolution features in the intermediate layer fusion features and the highest layer fusion features, upsamples the low-resolution features, and performs convolutional channel compression on all features to make them have consistent spatial resolution and channel dimension. The feature splicing unit is connected to the scale alignment unit, and splices the scale-aligned features in the channel dimension to generate spliced features; The multi-kernel deep convolutional unit is connected to the feature splicing unit. It uses multiple depth-separable convolutional kernels of different sizes to perform deep convolution on the spliced features respectively, thereby capturing contextual information under different receptive fields. The feature fusion unit is connected to the multi-kernel deep convolution unit. The output of the multi-kernel deep convolution unit is residually connected to the concatenated features, and after pointwise convolution remapping, a comprehensive feature representation is generated.
9. A classification method using the multi-scale differential infrared fusion airborne weak moving target classification system as described in any one of claims 1 to 8, characterized in that, Including the following steps: Step S100: Through the data preprocessing module, based on the target detection results, the original infrared image slice containing weak moving targets in the air and the corresponding time-domain differential image slice are obtained from the infrared sequence image to form multimodal paired data. Step S200: Through the first feature extraction unit and the second feature extraction unit of the dual-branch feature extraction module, multi-scale feature extraction is performed on the original infrared image slice and the temporal differential image slice respectively to obtain the infrared modal feature pyramid and the differential modal feature pyramid; Step S300: The intermediate-level features in the infrared modal feature pyramid and the differential modal feature pyramid are fused by the pyramid-level fusion module to generate intermediate-level fused features; Step S400: The cross-modal shared attention module is used to perform cross-modal shared attention fusion on the corresponding features of the highest level of the infrared modal feature pyramid and the differential modal feature pyramid to generate the highest level fused features; Step S500: The intermediate layer fused features and the highest layer fused features are scale-aligned and channel-stitched by the multi-scale feature integration module, and a comprehensive feature representation is generated by multi-kernel deep convolution processing. Step S600: The classification module performs category discrimination based on the comprehensive feature representation and outputs the classification result of the weak moving target in the air.
10. The classification method using a multi-scale differential infrared fusion airborne weak moving target classification system according to claim 9, characterized in that, In step S400, cross-modal shared attention fusion includes the following sub-steps: Step S401: Through the shared transformation unit, the highest-level features of the infrared branch and the highest-level features of the differential branch are transformed simultaneously to generate corresponding query features, key features and value features. The shared transformation unit fully shares parameters between the two modes; Step S402: The bidirectional attention calculation unit calculates the attention features of the infrared attention differential based on the query features of the infrared branch and the key and value features of the differential branch, and calculates the attention features of the differential attention infrared based on the query features of the differential branch and the key and value features of the infrared branch. Step S403: The attention features of the infrared attention differential and the attention features of the differential attention infrared are subjected to deep convolution processing by the deep convolution position encoding unit to introduce local spatial position information and generate attention features with position encoding. Step S404: Through the first residual connection unit, the original feature of the highest layer of the infrared branch is added to the attention feature of the infrared attention differential with position encoding to generate the enhanced infrared feature; Step S405: Through the second residual connection unit, the original features of the highest layer of the differential branch are added to the attention features of the position-encoded differential attention infrared to generate enhanced differential features; Step S406: The enhanced infrared feature and the enhanced differential feature are spliced together in the channel dimension by the feature splicing unit to generate spliced features; Step S407: The stitching features are linearly transformed and compressed using the shared projection unit to generate the highest-level fused features.