A small micro unmanned aerial vehicle image fingerprint identification method based on heterogeneous information fusion

By using a micro-drone image fingerprinting method based on heterogeneous information fusion, the problems of precise identification and cross-scene stability of drones in complex scenarios have been solved, enabling model-level and individual-level identification of drones and improving the stability and security of identification.

CN122244737APending Publication Date: 2026-06-19CHANGCHUN INST OF OPTICS FINE MECHANICS & PHYSICS CHINESE ACAD OF SCI

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHANGCHUN INST OF OPTICS FINE MECHANICS & PHYSICS CHINESE ACAD OF SCI
Filing Date
2026-05-22
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing drone identification methods are limited by detection conditions in low-altitude, weak signal, silent flight, or complex obstruction scenarios, making it difficult to achieve precise identification of different individuals of the same model. They also have poor stability in cross-scenario identification and lack the ability to reject unknown targets in open sets.

Method used

A micro UAV image fingerprinting method based on heterogeneous information fusion is adopted. By acquiring visible light image sequences and infrared image sequences, time synchronization, spatial registration and scale normalization are performed. Structural texture, dynamic differential and infrared energy criterion features are extracted. Cross-scene decoupled hypergraph fingerprint network is used for fusion encoding to establish UAV image fingerprint database and perform model-level and individual-level matching decisions.

Benefits of technology

It enables fine-grained identification at the category, model, and individual levels without the need for radar or radio frequency auxiliary equipment, improves the stability of cross-scene identification and security in open set scenarios, and avoids forced misidentification of unknown targets.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244737A_ABST
    Figure CN122244737A_ABST
Patent Text Reader

Abstract

This application provides a method for image fingerprint recognition of small unmanned aerial vehicles (UAVs) based on heterogeneous information fusion, belonging to the field of UAV detection and recognition technology. The method includes: acquiring visible light image sequences and infrared image sequences of the target UAV, performing time synchronization, spatial registration, and scale normalization; extracting structural texture features, dynamic differential features, and infrared energy criterion feature vectors; inputting these into a cross-scene decoupled hypergraph fingerprint network for fusion encoding to obtain a low-dimensional unique image fingerprint vector; training the network using cross-scene consistency constraints and scene identity orthogonality constraints to establish a UAV image fingerprint database; matching the fingerprint vector of the target to be identified with the fingerprint database, using an open set unknown individual recognition mechanism to determine the matching result, and outputting the recognition result or the result for an unknown target. This application relies solely on image information to achieve fine-grained recognition at the model and individual levels, and has the advantages of low deployment cost, strong cross-scene stability, and high reliability of open set recognition.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of drone detection and identification technology, and more specifically, to a method for image fingerprinting of small drones based on heterogeneous information fusion. Background Technology

[0002] With the widespread application of drones in scenarios such as aerial photography and entertainment, inspection and monitoring, logistics transportation, and low-altitude security, the demand for precise drone identification, continuous tracking, and target tracing continues to increase. Existing technologies for drone identification mainly include: identification methods based on radar echoes or micro-Doppler signals, identification methods based on radio frequency communication links or transmission characteristics, and target detection and classification methods based on visible light images.

[0003] However, existing technologies have the following shortcomings: First, radar or radio frequency-based identification methods rely on dedicated detection equipment, which has limited detection conditions in low-altitude, weak signal, silent flight, or complex obstruction scenarios, resulting in high deployment costs and making them difficult to apply in passive detection scenarios.

[0004] Second, most existing image-based methods remain at the coarse-grained classification level, typically only able to distinguish between drones and non-drones, or multi-rotor and fixed-wing aircraft, making it difficult to achieve fine-grained identification of different individuals of the same model. Traditional image methods focus on static contours or local textures, failing to adequately mine stable individual features formed by differences in surface conditions, assembly errors, and power unit differences, making it difficult to construct unique and comparable image fingerprint representations.

[0005] Third, existing methods lack constraint modeling for scene changes. The feature representation of the same target is prone to drift under different lighting, background, temperature and viewing angle conditions, resulting in poor stability of cross-scene recognition.

[0006] Fourth, most existing identification methods adopt a closed-set identification strategy, which assumes that the target to be tested must belong to a known category. When the target is not in the sample library, it is easy to be forcibly misidentified as a known individual in the library, which reduces the credibility of the system and cannot meet the security needs of real open scenarios.

[0007] Therefore, it is necessary to propose a method that can achieve model-level and individual-level recognition by relying solely on heterogeneous image information, and has cross-scene stability and the ability to reject unknown targets in open sets. Summary of the Invention

[0008] The purpose of this application is to provide a method for image fingerprint recognition of small unmanned aerial vehicles (UAVs) based on heterogeneous information fusion, which can solve at least one of the technical problems mentioned above. The specific solution is as follows: According to a specific embodiment of this application, this application provides a method for image fingerprint recognition of small unmanned aerial vehicles based on heterogeneous information fusion, including the following steps: Acquire visible light and infrared image sequences of the target UAV; The visible light image sequence and the infrared image sequence are time-synchronized, spatially registered, and scale-normalized to obtain a stable dual-channel observation sequence. Structural texture features, dynamic differential features, and infrared energy criterion feature vectors are extracted from the dual-channel stable observation sequence. The structural texture features, the dynamic differential features, and the infrared energy criterion feature vector are input into a cross-scene decoupled hypergraph fingerprint network and fused and encoded to obtain a low-dimensional unique image fingerprint vector. The cross-scene decoupled hypergraph fingerprint network is trained using cross-scene consistency constraints and scene identity orthogonality constraints to establish a UAV image fingerprint database; The low-dimensional unique image fingerprint vector corresponding to the target to be identified is input into the UAV image fingerprint database for matching; The matching results are judged by the open set unknown individual identification mechanism, and the identification result or unknown target result is output.

[0009] Furthermore, time synchronization between the visible light image sequence and the infrared image sequence includes the following steps: Record the frame-level timestamps corresponding to the visible light image sequence and the infrared image sequence; The infrared image sequence is subjected to time interpolation processing; The visible light image sequence and the infrared image sequence are aligned frame-to-frame according to a unified time reference.

[0010] Further, spatial registration is performed between the visible light image sequence and the infrared image sequence, including the following steps: Extract corresponding feature points from the visible light image sequence and the infrared image sequence; Establish an affine transformation model; The affine transformation model is optimized using the reprojection error; The visible light image sequence and the infrared image sequence are spatially aligned using the optimized affine transformation model.

[0011] Furthermore, the extraction process of the structural texture features includes the following steps: Edge contour enhancement processing for visible light images; Multi-scale structural information is extracted using the multi-level pyramid backbone; Extract directional texture features using striped texture enhancement blocks; Utilize window shape attention blocks to enhance features in local component regions; The enhanced features are aggregated to obtain the structural texture feature vector.

[0012] Furthermore, the extraction process of the dynamic differential features includes the following steps: By performing differential operations on consecutive frames of visible light images, dynamic response information is obtained; Perform local spectral projection on the rotor candidate region; Extract the characteristics of the rotor speed's main frequency and harmonic ratio; The rotor speed main frequency and the harmonic ratio feature are encoded by temporal convolution to obtain a dynamic differential feature vector.

[0013] Furthermore, the process of constructing the infrared energy criterion feature vector includes the following steps: The infrared response corresponding to the target area is normalized. The normalized infrared response is weighted and integrated to obtain the equivalent thermal absorption response characteristics. The thermal accumulation characteristics, thermal decay characteristics, and equivalent thermal inertia characteristics are extracted based on the infrared time-series variation relationship. Extract heat diffusion distribution characteristics and heat memory characteristics based on the spatial distribution relationship of heat within the target area; By combining various thermal response characteristics, an infrared energy criterion feature vector is obtained.

[0014] Furthermore, the cross-scenario decoupled hypergraph fingerprint network includes: Visible light morphological microtexture co-coding backbone for extracting structural texture features; Infrared energy criterion field coding branch used to extract infrared energy criterion feature vectors; Rotor periodic micro-motion spectrum coding branch used to extract dynamic differential features; A cross-modal component hypergraph fusion module for multimodal relationship aggregation; Scene identity orthogonal decoupling projection head for generating low-dimensional unique image fingerprint vectors; A hierarchical prototype open set discriminator used to perform open set matching decisions.

[0015] Furthermore, the processing procedure of the cross-modal component hypergraph fusion module includes the following steps: The drone fuselage, arms, rotor, and hot spot area are constructed as component nodes; The component nodes are used to construct geometrically coupled hyperedges, thermally coupled hyperedges, and symmetric micro-motion hyperedges; Hypergraph convolution is used to aggregate the relationships between the structural texture features, the dynamic differential features, and the infrared energy criterion feature vectors. Attention weighting is applied to the aggregated features to obtain joint representation features.

[0016] Furthermore, the processing of the scene identity orthogonally decoupled projection head includes the following steps: The joint representation features are mapped to an identity fingerprint vector and a scene perturbation vector; The identity fingerprint vector is normalized to obtain a low-dimensional unique image fingerprint vector.

[0017] Further, the low-dimensional unique image fingerprint vector corresponding to the target to be identified is matched with the UAV image fingerprint database, including the following steps: The low-dimensional unique image fingerprint vector corresponding to the target to be identified is matched with the model-level prototype center to obtain the model matching result; Based on the model matching results, the model is matched with the individual-level prototype center in the corresponding model sub-library to obtain the individual matching results.

[0018] Compared with the prior art, the above-described solutions of this application have at least the following beneficial effects: 1. This application discloses an image fingerprinting method for small unmanned aerial vehicles (UAVs) based on heterogeneous information fusion. It achieves three levels of fine identification—from category level, model level, to individual level—relying only on visible light and infrared images, without requiring radar or radio frequency auxiliary equipment, resulting in low deployment costs. By extracting structural texture features, dynamic differential features, and infrared energy criterion feature vectors, and utilizing a cross-scene decoupled hypergraph fingerprint network for fusion encoding, a low-dimensional unique image fingerprint vector is generated. This ensures that different individuals of the same model possess distinguishable and stable features, thereby achieving fine identification at the individual level.

[0019] 2. This application discloses a fingerprint recognition method for small unmanned aerial vehicles based on heterogeneous information fusion. By introducing cross-scene consistency constraints and scene identity orthogonality constraints to train a cross-scene decoupled hypergraph fingerprint network, the joint representation features are mapped to identity fingerprint vectors and scene perturbation vectors and forced to be orthogonal. This ensures that the identity fingerprint vector of the same target remains stable under different lighting, background, ambient temperature and observation conditions, effectively improving the stability and generalization ability of cross-scene recognition.

[0020] 3. This application discloses a micro UAV image fingerprinting method based on heterogeneous information fusion. By constructing a hierarchical UAV image fingerprint database containing model-level prototype centers and individual-level prototype centers, a two-level matching strategy combining model-level matching and individual-level matching is adopted. Furthermore, a three-threshold joint decision mechanism is constructed using open set energy function, minimum matching distance, and maximum matching confidence to perform rejection output for targets outside the database. This avoids the defect of traditional closed set recognition methods that forcibly misidentify unknown targets as individuals within the database, and significantly improves the security and reliability of the system in open set scenarios. Attached Figure Description

[0021] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application. It is obvious that the drawings described below are merely some embodiments of this application, and those skilled in the art can obtain other drawings based on these drawings without any inventive effort. In the drawings: Figure 1 The flowchart shown here illustrates an embodiment of the present application, which discloses a method for image fingerprint recognition of small unmanned aerial vehicles based on heterogeneous information fusion.

[0022] Figure 2 This is a schematic diagram illustrating the overall framework of an image fingerprint recognition method for small unmanned aerial vehicles (UAVs) based on heterogeneous information fusion, as shown in the embodiments of this application. Detailed Implementation

[0023] To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0024] The terminology used in the embodiments of this application is for the purpose of describing particular embodiments only and is not intended to limit the application. The singular forms “a,” “said,” and “the” used in the embodiments of this application and the appended claims are also intended to include the plural forms, and “multiple” generally includes at least two unless the context clearly indicates otherwise.

[0025] It should be understood that the term "and / or" used in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, and B existing alone. Additionally, the character " / " in this article generally indicates that the preceding and following related objects have an "or" relationship.

[0026] It should be understood that although the terms first, second, third, etc., may be used in the embodiments of this application, these descriptions should not be limited to these terms. These terms are only used to distinguish the descriptions. For example, first may also be referred to as second without departing from the scope of the embodiments of this application, and similarly, second may also be referred to as first.

[0027] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that an article or device that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such an article or device. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the article or device that includes said element.

[0028] The optional embodiments of this application are described in detail below with reference to the accompanying drawings.

[0029] Example 1: This application provides a method for image fingerprint recognition of small unmanned aerial vehicles (UAVs) based on heterogeneous information fusion. It relies solely on visible light and infrared image information, requiring no radar or radio frequency auxiliary equipment, and can achieve precise identification from the category level, model level, to the individual level. For example... Figure 1 As shown and Figure 2 As shown, it includes the following steps: S1. Acquire the visible light image sequence and infrared image sequence of the target UAV.

[0030] The technical solution of this application embodiment employs a dual-channel imaging device, including a visible light camera and an infrared thermal imaging camera, to simultaneously acquire visible light image sequences and infrared image sequences of the target UAV. Acquisition conditions cover different flight altitudes, observation distances, time periods, background environments, and flight states.

[0031] To ensure the richness of subsequent feature extraction and the generalization ability of model training, the acquisition process should cover as many typical scenarios as possible, including different flight altitudes, observation distances, time periods, background environments, and flight states. Synchronous acquisition by a dual-channel visible light camera and an infrared thermal imaging camera is a prerequisite for subsequent spatiotemporal consistency processing.

[0032] S2. Perform time synchronization, spatial registration, and scale normalization on the visible light image sequence and the infrared image sequence to obtain a stable dual-channel observation sequence.

[0033] Due to differences in hardware characteristics between visible light cameras and infrared thermal imaging cameras, their images often differ in terms of acquisition time, field of view, and geometric distortion. Therefore, before analyzing the image content, it is necessary to perform rigorous spatiotemporal consistency processing on the dual-channel image sequence. This specifically includes three sub-steps: time synchronization, spatial registration, and scale normalization.

[0034] This application provides a preferred technical solution for time synchronization of the visible light image sequence and the infrared image sequence, including the following steps: Record the frame-level timestamps corresponding to the visible light image sequence and the infrared image sequence respectively; use the time base of the visible light sequence as a reference to perform time interpolation processing on the infrared image sequence; perform inter-frame alignment of the two sequences according to a unified time base so that each visible light frame can correspond to an infrared frame that is precisely matched in time.

[0035] Let the timestamp sequence of visible light frames be... The infrared frame timestamp sequence is For any target moment requiring alignment, the expression for the corresponding infrared interpolation frame calculated using the linear interpolation method is:

[0036] in, Indicates an infrared interpolated frame; , These represent the two nearest frames before and after time t in the infrared image sequence, respectively. , They represent , These are the timestamps corresponding to these two frames; k represents the frame number of the visible light frame. l Indicates the frame number of the infrared frame; This represents the timestamp corresponding to the k-th visible light image; Indicates the first l The timestamp corresponding to the frame of infrared image; t represents the target time that needs to be aligned.

[0037] In the technical solution of the embodiments of this application, , The values ​​of are all based on the absolute time relative to the start of acquisition; the value of t must be between two frames to ensure the effectiveness of interpolation, and is a continuous time value.

[0038] The technical solution of this application embodiment can obtain an infrared frame that is precisely aligned with the visible light frame in time at any target time t through linear interpolation. This effectively avoids the time registration error caused by the frame rate difference between the visible light camera and the infrared thermal imaging camera, and also provides a data foundation for precise alignment for the subsequent spatial fusion and time series analysis of multimodal features in this application embodiment.

[0039] This application provides a preferred technical solution for spatial registration of the visible light image sequence and the infrared image sequence, including the following steps: extracting common physical feature points from the visible light image and the infrared image; establishing an affine transformation model from the visible light image coordinate system to the infrared image coordinate system; optimizing the affine transformation model using reprojection error; and spatially aligning the visible light image and the infrared image using the optimized affine transformation model.

[0040] Let the coordinates of the i-th feature point in the visible light image be... The coordinates of the corresponding feature points in the infrared image are The goal of spatial registration is to find an optimal transformation matrix that minimizes the sum of reprojection errors for all feature point pairs, expressed as:

[0041] in, Let x and y represent the x and y coordinates of the i-th feature point in the visible light image, respectively. Let x and y represent the x and y coordinates of the i-th feature point in the infrared image, respectively; H represents the affine transformation matrix. In this embodiment, H is a 2×3 matrix used for two-dimensional plane transformations, including rotation, scaling, and translation. This represents the square of the Euclidean distance.

[0042] The technical solution of this application embodiment achieves geometric alignment between visible light images and infrared images in global space by employing the least squares criterion. To improve the robustness of registration, an iterative weighted least squares method is used to solve the problem, gradually eliminating out-of-field feature points caused by local occlusion or motion blur.

[0043] After spatial registration is completed, the target region of the target UAV in the image is extracted using target detection and temporal tracking algorithms. .in, This represents the pixel area occupied by the target UAV in the image at time t.

[0044] The extracted target region is subjected to scale normalization processing to unify the target UAVs with different imaging sizes at different observation distances into a fixed scale space. After the above processing, a number of stable dual-channel observation sequences with consistency in time, space and scale are obtained.

[0045] The technical solution of this application extracts corresponding feature points from visible light images and infrared images, establishes an affine transformation model, optimizes the affine transformation model using reprojection error, and uses the optimized affine transformation model to spatially align the dual-channel images, thereby eliminating the inconsistency of the dual-channel images in terms of geometric distortion and field of view deviation, and ensuring the accurate correspondence of the same spatial position in different modal images.

[0046] S3. Extract structural texture features, dynamic differential features, and infrared energy criterion feature vectors from the dual-channel stable observation sequence.

[0047] The dual-channel stable observation sequence contains rich target information. Visible light images primarily convey the static appearance and dynamic motion information of the target UAV, while infrared images primarily convey the thermal radiation characteristics of the target UAV. To fully acquire this information, the technical solution of this application extracts three complementary types of features: structural texture features, dynamic differential features, and infrared energy criterion feature vectors.

[0048] This application provides a preferred technical solution. The extraction process of structural texture features includes the following steps: performing edge contour enhancement processing on the visible light image; extracting multi-scale structural information using the multi-level pyramid backbone; extracting directional texture features using strip texture enhancement blocks; enhancing the features of local component regions using window shape attention blocks; and aggregating the enhanced features to obtain a structural texture feature vector.

[0049] For visible light images First, edge contour enhancement is performed to highlight the UAV's airframe structure. Let the Sobel operators in the horizontal and vertical directions be respectively... , Then the expression for calculating the edge response E is:

[0050] in, Indicates edge response; Represents a visible light image; This represents the Sobel operator in the horizontal direction; Represents the Sobel operator in the vertical direction; Represents a two-dimensional convolution operation; This indicates taking the absolute value.

[0051] The technical solution of this application embodiment utilizes a first-order gradient operator to respond to the location where the grayscale changes drastically in a visible light image, which can effectively locate the outline boundary, arm structure line, and rotor blade edge of the target UAV.

[0052] A gated enhancement input is constructed based on the edge response E. The edge response is normalized into a weight mask using the Sigmoid function and multiplied pixel-by-pixel with the original image to achieve adaptive suppression of the background region. The enhanced image expression is as follows:

[0053] in, This represents the enhanced image; This indicates pixel-by-pixel multiplication; This represents a non-linear activation function, which in this embodiment can be one of ReLU, ELU, or the Sigmoid function. The selection criteria for the non-linear activation function are as follows: ReLU is selected by default because it is simple to compute, sparsely activates, avoids gradient vanishing, and is suitable for most deep networks. ELU is selected when negative output values ​​are required, such as when negative suppression of node features is allowed, or when the network has a deep number of layers. Only when it is necessary to map node features to... Sigmoid is used for intervals such as those used to simulate probabilities, gating mechanisms, or attention weights, but it is generally not used for deep intermediate layers. This represents the edge gain coefficient, which controls the intensity of edge enhancement. This represents the edge bias parameter, which controls the activation threshold.

[0054] , All parameters are learnable and acquired through training, enabling the cross-scene decoupling hypergraph fingerprint network in subsequent steps to adaptively adjust the edge enhancement amplitude according to different imaging conditions. In the technical solution of this application embodiment, The value range is [0.5, 2.0]. A lower limit of 0.5 ensures that the edge enhancement effect is not too weak and fails. The upper limit of 2.0 prevents edge response from being over-amplified, which could lead to texture distortion or noise amplification, while maintaining numerical stability. The value range is [-1, 1]. Negative values ​​can suppress weak edge responses to filter out background noise, while positive values ​​can improve edge sensitivity in low-contrast areas. The symmetrical interval ensures that the network can adaptively adjust the activation sensitivity and suppression level according to the training data.

[0055] Obtain the enhanced image Afterwards, The input is fed into a multi-level pyramid backbone. The multi-level pyramid backbone constructs an image pyramid through progressive downsampling, extracting features at different scales. Local texture details of the rotor blades are captured at higher resolution levels, while the overall fuselage outline, arm extension ratios, and relative rotor layout are extracted at lower resolution levels. For the directional stripe-like edges of the arms and rotors, stripe texture enhancement blocks are introduced, and asymmetric convolution kernels are used to extract the horizontal and vertical responses respectively, which are then linearly superimposed. The specific expression is:

[0056] in, X represents the striped texture feature; X represents the input feature map. This represents a convolution kernel of size 1×k; This represents a convolution kernel of size k×1; k represents the length of the strip kernel.

[0057] The technical solution of this application embodiment reduces the number of parameters while simultaneously sensing the longitudinal and transverse edge features of the arm and rotor, thus maintaining the ability to sense edge features in both longitudinal and transverse directions.

[0058] Subsequently, the strip texture features The comprehensive structural texture features are obtained by fusing them with local convolutional features, expressed as:

[0059] in, Represents the overall structural texture features; This represents the stripe texture fusion coefficient, balancing the weights of local features and stripe features. It can be adaptively learned through training, and its value range is within the range specified in the technical solutions of this application. Consistent with the physical meaning of weight allocation in the attention mechanism, it represents the contribution ratio of the strip features and has clear interpretability and numerical stability; This represents local convolutional features.

[0060] Finally, key component areas such as the fuselage, arm tips, and rotor tips are weighted and focused using window-shaped attention blocks. This involves feature enhancement of local component areas, making the cross-scene decoupled hypergraph fingerprint network pay more attention to the most discriminative local regions for identification, ultimately yielding the structural texture feature vector. .

[0061] The technical solution of this application embodiment, by performing edge contour enhancement processing on visible light images, extracting multi-scale structural information using multi-level pyramid trunks, extracting directional texture features using strip texture enhancement blocks, and enhancing the features of local component regions using window shape attention blocks and aggregating the enhanced features, can simultaneously capture multi-scale structural information of UAV from local texture details to overall contour shape, significantly improving the discrimination ability of structural texture features.

[0062] This application provides a preferred technical solution, wherein the extraction process of the dynamic differential feature includes the following steps: performing differential operation on consecutive frames of visible light images to obtain dynamic response information; performing local spectral projection on the rotor candidate region; extracting the rotor speed main frequency and harmonic ratio features; and using temporal convolution to encode the rotor speed main frequency and harmonic ratio features to obtain a dynamic differential feature vector.

[0063] Perform differential operations on consecutive visible light images to extract the dynamic differential signals generated by rotor rotation and micro-vibrations of the airframe. The expression is as follows:

[0064] in, This represents the inter-frame difference result, i.e., dynamic response information; This represents the grayscale value of the visible light image in frame t at pixel coordinates; This represents the grayscale value of the visible light image at pixel coordinates in frame t-1.

[0065] The technical solution of this application embodiment eliminates the static background and static components of the airframe structure through differential operation, making the local brightness changes generated by rotor motion stand out, and providing a clean dynamic input for subsequent frequency domain analysis.

[0066] After obtaining the inter-frame difference results, a candidate region containing the rotor is selected, and local spectral analysis is performed on the temporal difference signal of each pixel within the candidate region. The difference signal of each pixel on the time axis is projected to the frequency domain through discrete Fourier transform, and the expression is:

[0067] in, This represents the time-series differential signal at pixel coordinates (u, v) at a frequency of Amplitude spectrum value at; This represents the inter-frame difference result at pixel coordinates (u,v); This represents the angular frequency variable.

[0068] exist Based on this, two of the most discriminative spectral features are extracted. In this embodiment, the rotor main frequency and harmonic ratio features are extracted, and the expressions are as follows:

[0069]

[0070] in; The rotor's main frequency is determined by its rotational speed. In this application's embodiment, the technical solution is 50 to 500 Hz, which is the frequency corresponding to the maximum value of the amplitude spectrum, reflecting the actual rotational speed of the rotor. h The harmonic ratio represents the ratio of the fundamental frequency to the second harmonic energy. h It is determined by individual parameters such as the number of rotor blades, chord length distribution, and blade installation angle; This represents the amplitude value at the second harmonic. Represents extremely small positive numbers.

[0071] Finally, temporal convolution is used to encode the extracted spectral feature sequence, outputting a dynamic differential feature vector. .

[0072] The technical solution of this application embodiment obtains dynamic response information by performing differential operations on continuous frame visible light images, performs local spectral projection on the rotor candidate region, extracts the main frequency and harmonic ratio features of rotor speed, and encodes these spectral features using temporal convolution. This can effectively capture dynamic differential signals generated by rotor rotation and micro-vibration of the airframe. These signals are jointly determined by the state of the UAV's power unit, rotor structural parameters, and flight control characteristics, and have significant model differences and individual distinguishability.

[0073] This application provides a preferred technical solution for constructing an infrared energy criterion feature vector, which includes the following steps: normalizing the infrared response corresponding to the target area; performing weighted integration on the normalized infrared response to obtain equivalent thermal absorption response features; extracting thermal accumulation features, thermal attenuation features, and equivalent thermal inertia features according to the infrared time-series variation relationship; extracting thermal diffusion distribution features and thermal memory features according to the thermal spatial distribution relationship within the target area; and combining various thermal response features to obtain an infrared energy criterion feature vector.

[0074] Let the infrared image acquired at time t be... The target area is The background area is The background mean is The background standard deviation is .

[0075] The background mean and background standard deviation are calculated. Based on these two values, the infrared response of each pixel is normalized to eliminate the influence of ambient temperature fluctuations and detector gain differences on the original radiance value. The specific expression is as follows:

[0076] in, It represents the radiation contrast of the target pixel relative to the background, which is the normalized infrared response and can eliminate the effects of ambient temperature fluctuations and detector gain differences. Indicates the background mean; This represents the background standard deviation.

[0077] Normalized infrared response In the target area After performing weighted integration, the equivalent thermal absorption response characteristics are obtained, and the specific expression is as follows:

[0078] in, It represents the equivalent heat absorption power density per unit area of ​​the target in physical terms, which is the equivalent heat absorption response characteristic. It is determined by the heat capacity of the target material, the absorption rate of the surface coating, and the heat dissipation state of the motor. In other words, it is determined by the heat dissipation of the UAV motor and has model and individual distinguishability. This represents a spatial weighting function based on the distance to the target center; This represents a monotonic nonlinear activation function used to compress the extreme value effects of high-brightness hot spots.

[0079] To obtain the time-series continuous equivalent thermal absorption response characteristics Next, features reflecting the dynamic process of thermal response, namely thermal accumulation features, are extracted. Thermal accumulation features describe the rate of heat accumulation on the target within a short time window, reflecting differences in the operating state of the UAV's power unit. The specific expression is as follows:

[0080] in, It represents the characteristics of thermal accumulation and is also the first time derivative of the equivalent energy density; express The equivalent thermal absorption response characteristics prior to the time point; Indicates a time interval.

[0081] The technical solution of this application calculates the first-order time derivative of the equivalent energy density. Different models of UAVs have different motor power densities and heat dissipation structure designs, under the same flight conditions. Values ​​vary across systems; even for the same model, different individuals may differ due to variations in motor aging and heat dissipation area. The existence of measurable minute differences is an important basis for individual identification.

[0082] The heat dissipation process after target shutdown or deceleration is parametrically modeled using an exponential decay model. Specifically, the heat decay characteristics are parametrically modeled using an exponential decay model, with the following expression:

[0083] in, This represents the fitted value of the exponential decay model; Indicates peak energy density; The thermal attenuation coefficient is determined by the thermal conductivity, specific heat capacity, and heat transfer coefficient with air of the target material. It represents the thermal attenuation coefficient of UAVs with different material structures after the same flight mission. The values ​​differ significantly and can be estimated from the time series curves using least squares fitting. This is the technical solution in the embodiments of this application. The value ranges from 0.1 to 0.01, representing the thermal decay coefficient. ,in Indicates the heat transfer coefficient. The surface area for heat dissipation is m, and the mass is m. This represents the specific heat capacity. For a typical small drone weighing 0.5~5kg, primarily made of plastic, carbon fiber, or aluminum alloy, the specific heat capacity is calculated under natural convection conditions. The theoretical value falls within the range of 0.01 to 0.1; Indicates the environmental baseline; Indicates the peak heat time.

[0084] The equivalent thermal inertia characteristic describes the target thermal properties from the perspective of system response delay, and its specific expression is:

[0085] in, This represents the equivalent thermal inertia characteristic, describing the memory depth of the target thermal system. The larger the material's heat capacity and the longer the heat dissipation path, the greater the thermal inertia. The larger the value, the more it is uniquely determined by the thermophysical properties of the drone's fuselage material, and it is particularly sensitive to differences between different batches of the same model. This represents the thermal decay time constant, which is the time required for the thermal response to decay to 37%, and in the technical solution of this application embodiment, it is 10 to 300 seconds. This represents the thermal response rise time constant, which is the time required for the thermal response to rise from 0 to 63%, and in the technical solution of this application embodiment, it is 1 to 30 seconds.

[0086] In this embodiment, the spatial distribution entropy of the equivalent energy density within the target area describes the degree of hot spot dispersion. That is, the heat diffusion distribution characteristics are quantified using information entropy to determine the uniformity of heat distribution within the target area. The specific expression is as follows:

[0087] in, Indicates the characteristics of heat diffusion distribution. A high value indicates uniform heat distribution, such as uniform heat dissipation from a multi-rotor rotor. A low value indicates the presence of concentrated hot spots, such as localized overheating of the motor. The heat diffusion distribution characteristics directly reflect the differences in the target thermal layout structure. It represents the normalized energy percentage of the i-th spatial block within the target region, obtained by dividing the equivalent energy density of each sub-block by the total target energy.

[0088] By integrating the absolute value of the difference between the target's equivalent heat absorption power density and the reference energy density over a historical period, the historical residual intensity of the target's thermal response is quantified, yielding thermal memory characteristics. These characteristics provide a cumulative description of the target's thermal behavior over a longer historical period, specifically expressed as:

[0089] in, The thermal memory characteristics, which are represented by the historical thermal response integral, reflect long-term thermal behavior. Due to differences in the service life, battery aging degree, and maintenance history of different UAVs, the thermal memory characteristics exhibit stable individual differences and have high reproducibility in multiple observations. Indicates the reference energy density value; , These represent the start and end times of the integration time interval, respectively.

[0090] The above normalized infrared response characteristics Equivalent thermal absorption response characteristics Thermal accumulation characteristics Thermal attenuation coefficient Equivalent thermal inertia characteristics Characteristics of thermal diffusion distribution and thermal memory characteristics The seven types of features are spliced ​​together to form an infrared energy criterion feature vector. .

[0091] The technical solution of this application eliminates the influence of ambient temperature fluctuations by normalizing the infrared response corresponding to the target area, obtains equivalent thermal absorption response characteristics through weighted integral processing, extracts thermal accumulation characteristics, thermal attenuation characteristics, and equivalent thermal inertia characteristics according to the infrared time-series change relationship, extracts thermal diffusion distribution characteristics and thermal memory characteristics according to the spatial distribution relationship of heat within the target area, and combines various thermal response characteristics to form an infrared energy criterion feature vector. It extracts the thermal response law of the target individual from the perspective of physical modeling rather than pure data-driven perspective, and has clear interpretability and stable individual distinguishability.

[0092] S4. Input the structural texture features, the dynamic differential features, and the infrared energy criterion feature vector into the cross-scene decoupled hypergraph fingerprint network, perform fusion encoding, and obtain a low-dimensional unique image fingerprint vector.

[0093] This application provides a preferred technical solution for a cross-scene decoupled hypergraph fingerprint network, comprising: a visible light morphological micro-texture co-coding backbone for extracting structural texture features; an infrared energy criterion field coding branch for extracting infrared energy criterion feature vectors; a rotor periodic micro-motion spectrum coding branch for extracting dynamic differential features; a cross-modal component hypergraph fusion module for performing multimodal relationship aggregation; a scene identity orthogonal decoupling projection head for generating low-dimensional unique image fingerprint vectors; and a hierarchical prototype open set discriminator head for performing open set matching decisions.

[0094] The specific expression for the entire fingerprint extraction process of the cross-scenario decoupled hypergraph fingerprint network is as follows:

[0095] in, The low-dimensional unique image fingerprint vector is represented by 64 to 512 dimensions in the technical solution of this application embodiment. According to the ablation experiment, the accuracy is significantly reduced below 64 dimensions, and the accuracy is improved by less than 2% above 512 dimensions, but the amount of computation is doubled. 64 to 512 dimensions is the best balance range between accuracy and efficiency. Represents the hierarchical prototype open set discriminant header; Indicates scene identity orthogonal decoupling of the projection head; This indicates a cross-modal component hypergraph fusion module; This represents the visible light morphology microtexture co-coding backbone; Indicates the infrared energy criterion field coding branch; This indicates the spectral encoding branch of the rotor periodic micro-motion; Represents a short-time dynamic image sequence; This represents an infrared image sequence.

[0096] The cross-modal component hypergraph fusion module processes the following steps: constructing the UAV fuselage, arms, rotor, and hotspot regions as component nodes; constructing geometrically coupled hyperedges, thermally coupled hyperedges, and symmetric micro-motion hyperedges using the component nodes; performing relational aggregation on structural texture features, dynamic differential features, and infrared energy criterion feature vectors using hypergraph convolution; and performing attention weighting on the aggregated features to obtain joint representation features.

[0097] Taking the UAV fuselage, arms, rotor, and hotspot region as component nodes, the joint feature of each component node i is defined as follows: The joint feature is composed of three sub-features. .in, This represents the structural texture sub-feature corresponding to the i-th component node; This represents the infrared energy sub-feature corresponding to the i-th component node; This represents the dynamic differential feature corresponding to the i-th component node.

[0098] Structural texture sub-features From structural texture feature vectors Infrared energy sub-features are extracted by slicing according to the spatial position of component node i. From the infrared energy criterion eigenvector Dynamic differential features are extracted by slicing according to the spatial position of component node i. From dynamic differential eigenvectors The component node i is extracted by slicing according to its spatial position.

[0099] A set of hyperedges is constructed on the component node set. This set includes three types: geometrically coupled hyperedges, thermally coupled hyperedges, and symmetric micro-motion hyperedges. Geometrically coupled hyperedges are used to capture the spatial symmetry relationship and overall configuration constraints between the arm and fuselage; thermally coupled hyperedges are used to capture the spatial co-occurrence relationship and energy coupling relationship between hot spots and rotors; and symmetric micro-motion hyperedges are used to capture the consistency constraints and synchronization relationships of the rotation periods among multi-rotors. A hypergraph association matrix is ​​then established based on the component node set and the hyperedge set.

[0100] Hypergraph convolution is used to aggregate relationships between structural texture features, dynamic differential features, and infrared energy criterion feature vectors. Hypergraph convolution propagates and aggregates node features according to the following update rule, the specific expression of which is:

[0101] in, Indicates the first l +1 layer node feature matrix; This represents a nonlinear activation function, which in this embodiment can be one of ReLU, ELU, or Sigmoid functions, used to introduce a nonlinear transformation; This represents the negative 1 / 2 power of the node degree matrix; Represents the hypergraph incidence matrix; This represents the diagonal matrix of hyperedge weights; The inverse matrix of the hypermarginality matrix; Represents the transpose of the hypergraph incidence matrix; Indicates the first l The node feature matrix of the layer; Indicates the first l The learnable parameter matrix of the layer.

[0102] The advantages of the ReLU function are: simple computation, sparse activation, avoidance of gradient vanishing, and suitability for most deep networks. The ELU function is used when negative output values ​​are required, such as when negative suppression of node features is allowed, or when the network has many layers. The Sigmoid function is used only when it is necessary to map node features to the (0,1) interval, such as for simulating probabilities, gating mechanisms, or attention weights, and is generally not used in deep intermediate layers. The choice can be made based on the actual situation; this application does not impose any restrictions on it.

[0103] The technical solution of this application embodiment is achieved through... and The normalization operation ensures the numerical stability of hypergraph convolution when the node degree distribution is uneven, so that the weight deviation of high-order component relationships is not caused by degree differences during the aggregation process.

[0104] The aggregated features are then subjected to attention weighting to obtain the joint representation features. The attention weight of each component node is calculated using a component attention mechanism. Attention weight Based on the final features of the i-th component node after L layers of hypergraph convolution. The calculation shows that the sum of the attention weights for all component nodes is 1. The final features of each component node are then weighted and summed according to their attention weights to obtain the joint representation features, specifically expressed as:

[0105] in, Indicates joint representation features; This represents the attention weight of the i-th component node; This represents the final feature of the i-th component node after passing through L layers of hypergraph convolution.

[0106] The technical solution of this application embodiment designs the cross-scene decoupled hypergraph fingerprint network as a cascaded structure including a visible light morphology micro-texture collaborative coding backbone, an infrared energy criterion field coding branch, a rotor periodic micro-motion spectrum coding branch, a cross-modal component hypergraph fusion module, a scene identity orthogonal decoupled projection head, and a hierarchical prototype open set discriminator head. Each module has a clear division of labor and functional decoupling, which can perform hierarchical processing and efficient fusion of multimodal features, thereby improving the interpretability and training efficiency of the network.

[0107] By constructing the UAV fuselage, arms, rotor, and hot spot region as component nodes, and using these component nodes to construct geometrically coupled hyperedges, thermally coupled hyperedges, and symmetric micro-motion hyperedges, and using hypergraph convolution to aggregate the relationships of the three types of features, and then performing attention weighting on the aggregated features to obtain joint representation features, it is able to capture high-order interaction relationships between three or more modalities, overcoming the limitation of traditional multimodal fusion methods that can only describe pairwise interaction relationships.

[0108] This application provides a preferred technical solution. The processing of the scene identity orthogonal decoupled projection head is performed according to the following steps: mapping the joint representation features into an identity fingerprint vector and a scene perturbation vector; normalizing the identity fingerprint vector to obtain a low-dimensional unique image fingerprint vector.

[0109] The joint representation feature Z is mapped to an identity fingerprint vector through two independent linear projection heads. and scene perturbation vector The specific expression is:

[0110]

[0111] in, The identity fingerprint vector represents the unnormalized identity features and is orthogonal to the scene perturbation vector. The identity mapping matrix is ​​a linear transformation matrix that jointly represents the mapping of features to the identity space. This represents the scene perturbation vector, which is a scene-related feature, and is orthogonally separated from the identity fingerprint vector; The scene mapping matrix is ​​a linear transformation matrix that jointly represents the mapping of features to the scene space.

[0112] The final low-dimensional unique image fingerprint vector used for identification is the normalized identity fingerprint vector, with the following expression:

[0113] S5. Train the cross-scene decoupled hypergraph fingerprint network using cross-scene consistency constraints and scene identity orthogonality constraints to establish a UAV image fingerprint database.

[0114] Cross-scenario consistency constraints are used to reduce the feature distance between the corresponding identity fingerprint vectors of the same target under different scenario conditions. During training, for samples collected by the same target UAV under multiple different scenarios, the identity fingerprint vector corresponding to each sample is extracted. Let... and This represents the cross-scene consistency constraint loss for sampling from two different scenarios using the same drone. Defined as the sum of the squared Euclidean distances between the identity fingerprint vectors of all scene sample pairs, the specific expression is:

[0115] in, This represents the loss due to cross-scenario consistency constraints. Indicates in the scene The extracted identity fingerprint vector; Indicates in the scene The extracted identity fingerprint vector.

[0116] The technical solution of this application embodiment minimizes the cross-scene consistency constraint loss, forces the identity fingerprint vectors extracted by the same target UAV in different scenes to be close to each other, and at the same time separates the information related to scene changes into the scene disturbance vector.

[0117] During training, scene identity orthogonality constraints are introduced. These constraints require that the identity fingerprint vector... and scene perturbation vector Geometrically, they maintain orthogonality, meaning the inner product between them approaches zero. Orthogonality constraint loss. Defined as the square of the Frobenius norm of the product of the transpose of the identity fingerprint vector and the scene perturbation vector, the specific expression is:

[0118] in, Indicates the orthogonal constraint loss; Represented as an identity fingerprint vector transpose; This represents the square of the Frobenius norm.

[0119] The technical solution of this application embodiment minimizes the orthogonal constraint loss. By forcing the identity fingerprint vector to be orthogonal to the scene perturbation vector, the identity information and scene information are completely decoupled, providing a theoretical guarantee for cross-scene consistency from a mathematical perspective.

[0120] The technical solution of this application embodiment maps the joint representation features to an identity fingerprint vector and a scene perturbation vector, and normalizes the identity fingerprint vector to obtain a low-dimensional unique image fingerprint vector. This makes the identity fingerprint vector carry features related to the target identity, and the scene perturbation vector carry features related to the imaging scene. The two are separated from each other, which effectively improves the stability of the fingerprint vector of the same target under different scene conditions.

[0121] Total training loss Classification loss Measure learning loss Cross-scenario consistency loss and orthogonal constraint loss The weighted combination is expressed as follows:

[0122] in, Indicates the total training loss; , , They represent the measurement of learning loss, respectively. Cross-scenario consistency loss Orthogonal constraint loss The weights of the items are determined through performance tuning on the validation set. The technical solution of this application embodiment... , , The values ​​were all obtained through optimization on the validation set. The value ranges from 0.3 to 0.7. Metric learning is the core of differentiating outliers and aggregating fingerprints of similar classes, and the weights need to be dominant. If the value is below 0.3, the inter-class distinguishability is insufficient, and if it is above 0.7, it will suppress other constraints, resulting in a decrease in scene adaptability. The value ranges from 0.2 to 0.5 to ensure the stability of the fingerprint of the same target in different scenarios, and the weight needs to be moderate; if it is below 0.2, the consistency constraint will fail, and if it is above 0.5, it will force fingerprint convergence too much and weaken the individual distinguishability. The value ranges from 0.1 to 0.3. The orthogonal constraint is an auxiliary regularization term, and its weight should be small. If it is lower than 0.1, the decoupling between identity and scene will not be complete. If it is higher than 0.3, it will interfere with the convergence of the main task, resulting in limited fingerprint expression.

[0123] The technical solution of this application embodiment classifies losses. To ensure the feature space is class-separable, fingerprint vectors from different models and individuals fall into different regions. Measuring the learning loss. Optimize the feature space structure so that fingerprint vectors of similar samples are close together, while fingerprint vectors of dissimilar samples are far apart. Implement cross-scene consistency loss. Orthogonal constraint loss Synergistic effects are employed to ensure the stability of fingerprint vectors across scenarios and the thorough decoupling of identity from scenario.

[0124] Extract low-dimensional unique image fingerprint vectors from the collected sample data. A hierarchical index structure is established according to model and individual number to form a UAV image fingerprint database.

[0125] In the technical solution of this application embodiment, the UAV image fingerprint database is divided into two layers: the first layer is a model-level prototype center. The second layer is the individual-level prototype center for each model. Model-level prototype center It is initialized with the mean of the fingerprint vectors of all samples of the same model, and the specific expression is:

[0126] in, This represents the set of all sample fingerprint vectors corresponding to model k; This indicates the model-level prototype center.

[0127] The technical solution of this application embodiment ensures that the model-level prototype center is located at the centroid of the sample distribution by initializing it with the mean, providing a stable reference anchor point for subsequent metric matching. Individual-level prototype centers are initialized independently within each model subset in the same manner and refined online through momentum updates during training. The recognition model is trained using a combination of metric learning, contrastive learning, and classification learning, enabling the low-dimensional unique image fingerprint vectors extracted from the same target at different times and under different scene conditions to cluster together, while separating the low-dimensional unique image fingerprint vectors extracted from different targets. The UAV image fingerprint database is updated online as new samples are added.

[0128] S6. Input the low-dimensional unique image fingerprint vector corresponding to the target to be identified into the UAV image fingerprint database for matching.

[0129] This application provides a preferred technical solution for identifying a target by using a low-dimensional unique image fingerprint vector. Matching with the drone image fingerprint database includes the following steps: The low-dimensional unique image fingerprint vector corresponding to the target to be identified is matched with the model-level prototype center to obtain the model matching result; Based on the model matching results, the model is matched with the individual-level prototype center in the corresponding model sub-library to obtain the individual matching results.

[0130] In the identification phase, the above steps are first performed on the target image to be identified, resulting in a low-dimensional unique image fingerprint vector corresponding to the target. The low-dimensional unique image fingerprint vector corresponding to the target to be identified. The image fingerprint of the target to be identified is matched with the image fingerprint database of the UAV. Match the prototype with the model-level prototype center to obtain the model matching result; then, according to the model matching result, match the prototype with the individual-level prototype center in the corresponding model sub-library to obtain the individual matching result.

[0131] Calculate the low-dimensional unique image fingerprint vector corresponding to the target to be identified. With all model-level prototype centers The Euclidean distance is used to determine the model corresponding to the minimum distance. The specific expression is:

[0132] in, This indicates the model number corresponding to the minimum distance.

[0133] Simultaneously calculate the model corresponding to the minimum distance. Normalized confidence level The specific expression is:

[0134] in, This represents the confidence level of the minimum matching distance after Softmax normalization, which is also the maximum matching confidence level. This indicates that the model corresponding to the matching result should be used. The prototype center; This represents the temperature coefficient, used to control the sharpness of the confidence level distribution. The smaller the value, the clearer the decision boundary. In the technical solution of this application embodiment, the value is taken as 0.1 to 0.5. Lower limit 0.1: The confidence distribution is extremely sharp. High confidence is only output when the minimum matching distance is significantly smaller than other distances. It is suitable for security scenarios where the cost of misidentification is extremely high. Lower values ​​will lead to numerical instability. The Softmax output will approach one-hot and lose gradient smoothness. Upper limit 0.5: The confidence distribution is relatively smooth, allowing for a certain degree of matching ambiguity, and is suitable for scenarios with sparse sample distribution or high noise; higher values ​​will make the confidence difference too small, making it difficult to distinguish between reliable matches and fuzzy matches. The optimal range for most scenarios is 0.2 to 0.3. Ablation experiments have verified that within this range, the recognition accuracy and rejection rate achieve the best balance. This represents the Euclidean distance between the target to be identified and the center of the nearest neighboring prototype, which is the Euclidean distance between the target to be identified and the center of the nearest neighboring prototype.

[0135] The technical solution of this application embodiment uses model matching results. With normalized confidence The combined two-parameter decision mechanism, compared to the single-parameter decision method that relies solely on the minimum matching distance (which only reflects the absolute feature distance between the target and the nearest neighbor prototype center in the fingerprint database, failing to characterize the relative significance of this distance across different models), offers advantages over the single-parameter method that relies solely on the minimum matching distance. By using the Softmax function to measure the minimum matching distance within the global distribution of matching distances for all models, the reliability of the matching results can be effectively evaluated. The model matching results are then... With normalized confidence Both are used in open set decision-making, so that when the system decides whether to accept the matching result, it considers both absolute feature similarity and relative discriminative significance, thereby reducing the risk of false matching caused by local dense distribution of feature space, and improving the accuracy of rejecting targets outside the database and the credibility of identifying targets inside the database.

[0136] In model In the corresponding sub-database, the low-dimensional unique image fingerprint vector corresponding to the target to be identified is further calculated. The Euclidean distances to the prototype centers of each individual are used to select the individual index corresponding to the minimum distance as the individual matching result. The specific expression is as follows:

[0137] in, This represents the individual matching result, which is the final identified individual index; Indicated in model The i-th individual-level prototype center.

[0138] The technical solution of this application embodiment obtains the model matching result by matching the low-dimensional unique image fingerprint vector corresponding to the target to be identified with the model-level prototype center, and then matching the individual-level prototype center in the corresponding model sub-database according to the model matching result to obtain the individual matching result. The two-level matching strategy from coarse to fine not only reduces the search space from all individuals to a set of individuals under a single model to reduce computational complexity, but also effectively avoids cross-model misidentification, thereby improving matching efficiency and recognition accuracy.

[0139] S7. Use the open set unknown individual identification mechanism to make a judgment on the matching results and output the identification result or the unknown target result.

[0140] In this embodiment of the application, the open set unknown individual identification mechanism includes: using the open set energy function, minimum matching distance and maximum matching confidence to jointly determine the matching result, and output the identification result or the unknown target result.

[0141] An open set energy function is constructed to quantify the probability that a sample to be identified belongs to a known fingerprint database distribution. The formula for calculating the open set energy function is:

[0142] in, This represents the energy function of an open set.

[0143] In this embodiment of the application, when the low-dimensional unique image fingerprint vector corresponding to the target to be identified... When the open set energy function is located in a clustered region at the center of a known category prototype, the sum of the exponential terms of each category is large, and the value of the open set energy function is small; when the low-dimensional unique image fingerprint vector corresponding to the target to be identified... When far from all known prototype centers, all exponential terms approach zero, and the open set energy function... The value is relatively large, thus enabling energy discrimination of targets outside the reservoir.

[0144] Based on the open set energy function, and combined with the minimum matching distance calculated in step S6, and maximum matching confidence A three-threshold joint decision mechanism is constructed. Let the minimum matching distance threshold be... The maximum matching confidence threshold is The open set energy threshold is , The joint judgment criterion is: when , and When all three conditions are met, the output model is: Individual The identification result is used; if any condition is not met, it is determined to be an unknown target and an alarm output is triggered. In the technical solution of this application embodiment, The value is determined based on dataset optimization. The value ranges from 0.6 to 0.9. A lower limit of 0.6: accepts matching results with moderate confidence, avoids the rejection of a large number of targets in the database due to excessively high thresholds, and ensures system availability; below 0.6, confidence discrimination fails, and targets outside the database are easily misidentified as individuals in the database. Upper limit 0.9: Only accepts high confidence matching results, suitable for high-security scenarios with extremely low tolerance for false recognition; above 0.9, most correct matches will be rejected due to confidence fluctuations, reducing practicality. Optimization range: 0.7 to 0.8 is the optimal range for most scenarios, achieving the best balance between in-database recognition rate and out-of-database rejection rate. The specific value is determined by the PR curve or F1-score peak on the validation set.

[0145] Compared to a single threshold scheme, the three-threshold joint decision mechanism effectively reduces the false recognition rate caused by individual extreme samples, while ensuring a high sensitivity to rejecting targets outside the database.

[0146] To verify the effectiveness of the micro-UAV identification method based on heterogeneous image fingerprinting provided in this application, data was collected and simulation tests were conducted under various real-world flight scenarios. The following section presents detailed measurement data and identification results using two typical test scenarios as examples.

[0147] Example 2: The drone, a small quadcopter, was flown at a high altitude on a clear day, at an altitude of 100 meters and a speed of 5 meters per second, for a data acquisition time of 30 seconds. The ambient temperature was 25 degrees Celsius, the relative humidity was 45%, and the wind speed was 2 meters per second. The visible light camera had a resolution of 1920×1080 pixels and a frame rate of 30 frames per second; the infrared camera had a resolution of 640×512 pixels and a frame rate of 30 frames per second. The target area in the image was approximately 80×60 pixels.

[0148] The visible light camera operates at a frame rate of 30.00 frames per second (standard deviation 0.02 frames per second), and the infrared camera operates at a frame rate of 29.98 frames per second (standard deviation 0.05 frames per second). The timestamp sequence of the first 10 frames is as follows: Visible light image sequence timestamps (Unit: seconds):

[0149] Infrared image sequence timestamp (Unit: seconds): .

[0150] After time interpolation processing of the infrared image sequence, the time synchronization error statistics are as follows: the average error is 2.3 milliseconds, the maximum error is 5.8 milliseconds, the standard deviation is 1.2 milliseconds, and the synchronization success rate is 99.8%.

[0151] Common physical feature points were extracted from visible light and infrared images, totaling 48 feature points, with a feature point matching success rate of 95.8% (46 / 48). The affine transformation matrix H was solved by minimizing the reprojection error, and the measured matrix is: H=[ ].

[0152] The registration accuracy indicators are as follows: the average reprojection error is 0.87 pixels, the maximum reprojection error is 2.34 pixels, the median reprojection error is 0.62 pixels, and the spatial registration success rate is 98.2%.

[0153] Edge contour enhancement processing of visible light images can be performed, and the learnable parameters converge after training to form: edge gain coefficients. Edge offset parameters The Sobel edge response statistics within the target area are as follows: average gradient value is 87.3, maximum gradient value is 245, minimum gradient value is 12, and standard deviation is 34.2.

[0154] Extracting directional texture features using striped texture enhancement blocks, specifically the striped texture features corresponding to the arm direction. for: . After feature enhancement and aggregation of local component regions using window shape attention blocks, a structural texture feature vector is obtained. It is 128-dimensional. Some components are... Eigenvector norm: 12.34.

[0155] Differential operations were performed on consecutive frames of visible light images. The statistical results of inter-frame difference within the rotor region were as follows: the average gray level difference was 6.2, the maximum gray level difference was 28, the minimum gray level difference was 1, and the standard deviation was 4.1.

[0156] Local spectral projection was performed on the candidate rotor region at a sampling rate of 30 frames / second, a sampling duration of 2 seconds, and a frequency resolution of 0.5 Hz. The spectral amplitude distribution was as follows: amplitude 238 (maximum) at 48.5 Hz, amplitude 182 (second harmonic) at 97.0 Hz, amplitude 98 (third harmonic) at 145.5 Hz, and amplitude 45 (fourth harmonic) at 194.0 Hz. The extracted rotor main frequency... The frequency is 48.5 Hz, corresponding to a rotational speed of 2910 rpm. Harmonic ratio By using temporal convolution to encode the rotor speed's main frequency and harmonic ratio features, a dynamic differential feature vector is obtained. 64-dimensional, some components are The eigenvector norm is 8.92.

[0157] Background statistics for infrared images: mean background Background standard deviation The minimum background pixel value is 145, and the maximum background pixel value is 235. The target area has 1247 pixels, with a mean target pixel value of 280.3 (grayscale value) and a standard deviation of 28.5. Normalization of the infrared response corresponding to the target area yields an average relative contrast ratio of 5.76, a maximum relative contrast ratio of 12.34, and a minimum relative contrast ratio of 0.89.

[0158] The normalized infrared response is weighted and integrated to obtain the equivalent thermal absorption response characteristics. The average value was 4.12 W / m², the peak value was 4.58 W / m², and the minimum value was 3.45 W / m². Thermal accumulation characteristics were extracted based on the infrared time-series variation relationship, with time intervals of seconds. average rate of ascent Maximum rise rate: 0.58 watts / square meter / second.

[0159] Thermal decay characteristics: The heat dissipation process after target shutdown is parametrically modeled using an exponential decay model, with peak energy density... =4.58 watts / square meter, heat decay coefficient Thermal decay time constant =8.45 seconds, environmental baseline Tiles per square meter.

[0160] Equivalent thermal inertia characteristics: thermal response rise time constant =7.2 seconds, equivalent thermal inertia characteristics .

[0161] Heat diffusion distribution features are extracted based on the spatial distribution relationship of heat within the target area: the target area is divided into 4×4 spatial blocks, the heat diffusion distribution features are 2.78 bits (information entropy), and the heat distribution uniformity is 78.3%.

[0162] Thermal memory characteristics: The integration period is 30 seconds. =1.62 watts per square meter.

[0163] The above normalized infrared response characteristics Equivalent thermal absorption response characteristics Thermal accumulation characteristics Thermal attenuation coefficient Equivalent thermal inertia characteristics Characteristics of thermal diffusion distribution and thermal memory characteristics The seven types of features are spliced ​​together to form an infrared energy criterion feature vector. .

[0164] structural texture feature vector Dynamic differential eigenvectors and infrared energy criterion eigenvectors Input a cross-scene decoupled hypergraph fingerprint network for fusion encoding to obtain a low-dimensional unique image fingerprint vector.

[0165] Cross-scenario consistency constraint verification: The same drone collects 50 frames each in three different scenarios: sunny day, cloudy day, and night, and extracts the identity fingerprint vector. .

[0166] Scene 1 (Sunny Day) ; Scene 2 (Cloudy) ; Scene 3 (Nighttime) .

[0167] Cross-scenario consistency loss Orthogonal constraint loss .

[0168] Establishing a drone image fingerprint database: Model-level prototype center A total of 50 samples were collected. The individual-level prototype center consisted of 20 samples. .

[0169] The low-dimensional unique image fingerprint vector corresponding to the target to be identified .

[0170] calculate With model-level prototype center Euclidean distance, minimum matching distance Temperature coefficient Maximum matching confidence .

[0171] The Euclidean distance to the individual-level prototype center is calculated in the corresponding model sub-library, and the individual matching distance is 0.0123.

[0172] Open set energy function .

[0173] Set minimum matching distance threshold Maximum matching confidence threshold Open collection energy threshold Joint decision: minimum matching distance Maximum matching confidence Open set energy function If all three conditions are met, the recognition result is output with a recognition confidence level of 79.62%.

[0174] Example 3: Example 3 is low-altitude flight on cloudy or rainy days. The working principle is similar to that of Examples 1 and 2, and the same content will not be described.

[0175] The drone model is a quadcopter micro-drone, with a flight altitude of 50 meters, a flight speed of 3 meters per second, and a data acquisition time of 30 seconds. The ambient temperature is 18 degrees Celsius, the relative humidity is 78%, the wind speed is 4 meters per second, and the weather conditions are overcast and rainy. The visible light camera has a resolution of 1920×1080 pixels and a frame rate of 30 frames per second; the infrared camera has a resolution of 640×512 pixels and a frame rate of 30 frames per second.

[0176] Key measurement data are as follows: rotor frequency 52.3 Hz, corresponding to a speed of 3138 rpm; harmonic ratio Equivalent thermal absorption response characteristics =3.87 W / m²; heat attenuation coefficient =0.076; Equivalent thermal inertia characteristic =1.24. Minimum distance for model-level matching. =0.0523, maximum matching confidence =0.7834. After joint decision based on three thresholds, the recognition result is output with a confidence level of 78.34%.

[0177] In the technical solution of this application embodiment, the model-level recognition accuracy is 98.6%, the individual-level recognition accuracy is 96.2%, the open set rejection rate (outside the database target) is 97.8%, and the false recognition rate is 1.2%. Cross-scene stability: the accuracy is 99.2% in sunny scenes, 97.8% in cloudy scenes, 95.4% in nighttime scenes, and 94.1% in rainy scenes, with an average cross-scene accuracy of 96.6%. Computational efficiency: the average processing time per frame is 45 milliseconds, including 28 milliseconds for feature extraction and 17 milliseconds for matching decision, achieving a real-time processing capability of 22 frames per second.

[0178] The above measurement data and simulation results fully verify the effectiveness of the micro UAV identification method based on heterogeneous image fingerprinting provided in this application. It can achieve high-precision model-level and individual-level identification in various scenarios and has reliable ability to reject targets outside the database.

[0179] In summary, this application provides a method for image fingerprinting of small unmanned aerial vehicles (UAVs) based on heterogeneous information fusion. By acquiring visible light and infrared image sequences of the target UAV, and after time synchronization, spatial registration, and scale normalization, structural texture features, dynamic differential features, and infrared energy criterion feature vectors are extracted. A cross-scene decoupled hypergraph fingerprint network is used for fusion encoding to obtain a low-dimensional unique image fingerprint vector. Cross-scene consistency constraints and scene identity orthogonality constraints are introduced during training to enhance the cross-scene stability of the fingerprint. Finally, a three-threshold joint decision mechanism combining model-level and individual-level matching with open set energy function, minimum matching distance, and maximum matching confidence is used to reliably reject targets outside the database. This method can achieve fine-grained identification from category-level, model-level to individual-level based solely on image information, without requiring radar or radio frequency auxiliary equipment. It has advantages such as low deployment cost, fine-grained identification, strong cross-scene stability, and high reliability of open set identification, providing an effective technical means for low-altitude security and UAV management.

[0180] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0181] The units described in the embodiments of this application can be implemented in software or hardware. The names of the units are not, in some cases, limiting the scope of the unit itself.

Claims

1. A method for image fingerprint recognition of small unmanned aerial vehicles based on heterogeneous information fusion, characterized in that, Includes the following steps: Acquire visible light and infrared image sequences of the target UAV; The visible light image sequence and the infrared image sequence are time-synchronized, spatially registered, and scale-normalized to obtain a stable dual-channel observation sequence. Structural texture features, dynamic differential features, and infrared energy criterion feature vectors are extracted from the dual-channel stable observation sequence. The structural texture features, the dynamic differential features, and the infrared energy criterion feature vector are input into a cross-scene decoupled hypergraph fingerprint network and fused and encoded to obtain a low-dimensional unique image fingerprint vector. The cross-scene decoupled hypergraph fingerprint network is trained using cross-scene consistency constraints and scene identity orthogonality constraints to establish a UAV image fingerprint database; The low-dimensional unique image fingerprint vector corresponding to the target to be identified is input into the UAV image fingerprint database for matching; The matching results are judged by the open set unknown individual identification mechanism, and the identification result or unknown target result is output.

2. The image fingerprint recognition method for small unmanned aerial vehicles according to claim 1, characterized in that, Synchronizing the visible light image sequence with the infrared image sequence includes the following steps: Record the frame-level timestamps corresponding to the visible light image sequence and the infrared image sequence; The infrared image sequence is subjected to time interpolation processing; The visible light image sequence and the infrared image sequence are aligned frame-to-frame according to a unified time reference.

3. The image fingerprint recognition method for small unmanned aerial vehicles according to claim 1, characterized in that, Spatial registration of the visible light image sequence and the infrared image sequence includes the following steps: Extract corresponding feature points from the visible light image sequence and the infrared image sequence; Establish an affine transformation model; The affine transformation model is optimized using the reprojection error; The visible light image sequence and the infrared image sequence are spatially aligned using the optimized affine transformation model.

4. The image fingerprint recognition method for small unmanned aerial vehicles according to claim 1, characterized in that, The extraction process of the structural texture features includes the following steps: Edge contour enhancement processing for visible light images; Multi-scale structural information is extracted using the multi-level pyramid backbone; Extract directional texture features using striped texture enhancement blocks; Utilize window shape attention blocks to enhance features in local component regions; The enhanced features are aggregated to obtain the structural texture feature vector.

5. The image fingerprint recognition method for small unmanned aerial vehicles according to claim 1, characterized in that, The extraction process of the dynamic differential features includes the following steps: By performing differential operations on consecutive frames of visible light images, dynamic response information is obtained; Perform local spectral projection on the rotor candidate region; Extract the characteristics of the rotor speed's main frequency and harmonic ratio; The rotor speed main frequency and the harmonic ratio feature are encoded by temporal convolution to obtain a dynamic differential feature vector.

6. The image fingerprint recognition method for small unmanned aerial vehicles according to claim 1, characterized in that, The process of constructing the infrared energy criterion feature vector includes the following steps: The infrared response corresponding to the target area is normalized. The normalized infrared response is weighted and integrated to obtain the equivalent thermal absorption response characteristics. The thermal accumulation characteristics, thermal decay characteristics, and equivalent thermal inertia characteristics are extracted based on the infrared time-series variation relationship. Extract heat diffusion distribution characteristics and heat memory characteristics based on the spatial distribution relationship of heat within the target area; By combining various thermal response characteristics, an infrared energy criterion feature vector is obtained.

7. The image fingerprint recognition method for small unmanned aerial vehicles according to claim 1, characterized in that, The cross-scenario decoupled hypergraph fingerprint network includes: Visible light morphological microtexture co-coding backbone for extracting structural texture features; Infrared energy criterion field coding branch used to extract infrared energy criterion feature vectors; Rotor periodic micro-motion spectrum coding branch used to extract dynamic differential features; A cross-modal component hypergraph fusion module for multimodal relationship aggregation; Scene identity orthogonal decoupling projection head for generating low-dimensional unique image fingerprint vectors; A hierarchical prototype open set discriminator used to perform open set matching decisions.

8. The image fingerprint recognition method for small unmanned aerial vehicles according to claim 7, characterized in that, The processing procedure of the cross-modal component hypergraph fusion module includes the following steps: The drone fuselage, arms, rotor, and hot spot area are constructed as component nodes; The component nodes are used to construct geometrically coupled hyperedges, thermally coupled hyperedges, and symmetric micro-motion hyperedges; Hypergraph convolution is used to aggregate the relationships between the structural texture features, the dynamic differential features, and the infrared energy criterion feature vectors. Attention weighting is applied to the aggregated features to obtain joint representation features.

9. The image fingerprint recognition method for small unmanned aerial vehicles according to claim 7, characterized in that, The process of orthogonally decoupling the projection head for scene identity includes the following steps: The joint representation features are mapped to an identity fingerprint vector and a scene perturbation vector; The identity fingerprint vector is normalized to obtain a low-dimensional unique image fingerprint vector.

10. The image fingerprint recognition method for small unmanned aerial vehicles according to claim 1, characterized in that, Matching the low-dimensional unique image fingerprint vector corresponding to the target to be identified with the UAV image fingerprint database includes the following steps: The low-dimensional unique image fingerprint vector corresponding to the target to be identified is matched with the model-level prototype center to obtain the model matching result; Based on the model matching results, the model is matched with the individual-level prototype center in the corresponding model sub-library to obtain the individual matching results.