A method and system for identifying small targets in a polar environment
By acquiring multiple frames of video images in polar environments, calculating the solar elevation angle, performing spatial alignment and temporal filtering, extracting the global illumination direction vector, performing directional spatial integration and cross-frame fusion, generating a basic shadow map, and combining it with a multi-dimensional weight map for target recognition and localization, the problem of low accuracy in recognizing small targets in polar environments is solved, achieving efficient and accurate target recognition.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- POLAR RES INST OF CHINA
- Filing Date
- 2026-04-01
- Publication Date
- 2026-06-19
AI Technical Summary
Existing computer vision processing technologies struggle to effectively suppress optical and geographical background interference in polar environments, resulting in low accuracy in identifying small targets. Furthermore, conventional target detection algorithms have weak generalization capabilities in polar environments, making them prone to false detections or missed detections.
By acquiring multi-frame video image sequences in polar environments, the solar elevation angle is calculated and spatial alignment and temporal filtering are performed. The global illumination direction vector is extracted, and directional spatial integration and cross-frame fusion are performed to generate a basic shadow map. Combined with a multi-dimensional weight map, target recognition and localization are performed. The physical properties of real target shadows and light projections are used to construct dynamic constraint thresholds for cross-validation.
It significantly improves the recognition accuracy of small targets in polar environments, effectively suppresses false alarms in complex backgrounds, reduces false detection and false negative rates, and achieves efficient and accurate target recognition.
Smart Images

Figure CN121963003B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of computer vision and image processing technology, specifically relating to a method and system for identifying tiny targets in polar environments. Background Technology
[0002] The evolution of polar ecosystems is a crucial indicator of global climate change, and the population distribution and spatiotemporal evolution trends of polar wildlife (such as Antarctic penguins) hold immense scientific research value. In recent years, with the rapid development of UAV remote sensing and computer vision technologies, automated monitoring systems based on UAV aerial imagery have gradually replaced traditional manual field surveys. In this process, image recognition, image detection, and image classification technologies have been widely applied to the automated processing and analysis of massive amounts of aerial data. Through efficient analysis of aerial videos and images, researchers can non-invasively obtain information on the quantity and distribution of target species.
[0003] However, the polar environment presents extremely unique and harsh optical and geographical conditions, posing significant challenges to existing computer vision processing tasks. Under specific environmental conditions such as the polar day, the solar altitude angle remains extremely low for extended periods, and the polar surface is covered by vast areas of highly reflective ice and snow. These extreme lighting conditions severely compress the dynamic range of images captured by UAV aerial sensors, resulting in reduced overall image contrast. Simultaneously, the polar regions are frequently accompanied by strong winds and snow, causing severe atmospheric disturbances. The raw images acquired often contain significant amounts of high-frequency noise and optical distortion. Traditional image enhancement techniques, as well as conventional image filtering and correction methods, struggle to effectively suppress background interference introduced by snow reflections and stray light while preserving the true physical characteristics of the target, leading to suboptimal image standardization processing results at the detection end.
[0004] Secondly, in large-scale aerial photography of polar regions, target objects (such as individual penguins) occupy an extremely low percentage of pixels in the image, posing a typical challenge for identifying tiny targets. Due to their severely insufficient resolution, the inherent texture, color, edges, and other apparent features of tiny targets are largely lost during imaging. When these tiny targets are intertwined with the complex polar background (such as bare rock, ice fissures, and snowmelt spots), the local features of the target are easily obscured by background features. Existing image extraction and matching algorithms often fail to extract sufficiently discriminative high-dimensional feature vectors when processing such low signal-to-noise ratio images. Even with complex image semantic segmentation techniques to separate the foreground from the background, severe oversegmentation or undersegmentation often occurs due to the blurred edges of tiny targets, making accurate localization difficult.
[0005] Currently, the industry typically uses deep learning-based object detection networks to handle target recognition tasks in aerial images. However, most existing conventional object detection algorithms and their variants are designed for ordinary urban or natural scenes. When directly transferred to the polar environment, due to the lack of prior consideration for the specific physical constraints and environmental disturbances of the polar region, the model relies solely on data-driven black-box feature learning from samples. To improve the detection rate of small targets, some existing technologies attempt to introduce generative adversarial networks at the detection front end to reconstruct blurred images. However, this purely data-driven image generation method is prone to producing illusions in the complex polar regions, where effective texture support is extremely scarce. Random ice and snow noise or rock shadows are mistakenly constructed as fake target features, leading to a very high false detection rate in subsequent image identification and detection models. On the other hand, simply increasing the detection threshold will result in a large number of real small targets being missed.
[0006] In summary, traditional computer vision processing workflows face technical bottlenecks when dealing with complex lighting conditions, strong background interference, and extreme scale variations in polar environments. These bottlenecks include difficulties in image feature extraction, weak model generalization ability, and low detection accuracy. Overcoming optical and background interference caused by the unique polar environment and designing a more rigorous and reliable image processing and recognition mechanism to achieve accurate identification and localization of minute targets under low signal-to-noise ratio conditions is a pressing technical challenge in the field of polar remote sensing monitoring. Summary of the Invention
[0007] This invention provides a method for identifying small targets in polar environments, the method comprising:
[0008] Acquire a multi-frame video image sequence and synchronous physical parameters in a polar environment, calculate the solar elevation angle at the current moment; perform spatial alignment and temporal filtering on the multi-frame video image sequence to extract static background images, and extract the global illumination direction vector from the static background images;
[0009] In the aligned multi-frame images, directional spatial integration is performed along the global illumination direction vector, and cross-frame fusion is performed in the temporal dimension to generate a basic shadow response map;
[0010] Multi-feature extraction was performed on the basic shadow response map to obtain the orientation consistency weight map, the polar morphology consistency weight map, and the community spatial aggregation weight map.
[0011] The joint confidence of local individuals is calculated by combining the directional consistency weight map and the polar morphology consistency weight map. Then, a dynamic constraint threshold is constructed by combining the community spatial aggregation weight map for cross-validation and dynamic correction to obtain the final weight map. This weight map is then fused with the basic shadow response map to generate a spatial guidance map.
[0012] The spatial guidance map and the original image are input into the target detection network, which outputs the target recognition and localization results.
[0013] Extracting the global illumination direction vector from a static background image includes: extracting the set of connected pixels in the static background image consisting of large bare rocks and their accompanying long shadows. ; Calculate the set of connected pixel components of order central moments Construct the covariance matrix of connected components ,in The zeroth central moment; the covariance matrix of the connected region. Perform eigenvalue decomposition and extract the unit eigenvector corresponding to the largest eigenvalue as the global illumination direction vector. .
[0014] Oriented spatial integration along the global illumination direction vector includes: based on the true relative height With solar altitude angle The theoretical upper limit of the target shadow length is calculated. :
[0015]
[0016] in, The target average physical height constant. This is the camera's physical focal length constant. The pixel physical size is constant; background subtraction is performed on the aligned multi-frame images to obtain the foreground-dark target difference map. Using the global illumination direction vector To guide the length of directional spatial integration is used to obtain the directional integral feature map of a single frame. :
[0017]
[0018] in, For those taking a walk, here is an index.
[0019] Perform cross-frame fusion over time to generate a base shadow response map, including: for the total number of frames contained in the sequence. Single-frame directional integral feature map Perform time-domain mean aggregation to obtain the spatiotemporal integral response map. ;
[0020] Statistical Spatiotemporal Integral Response Diagram Average pixel value of the entire image with standard deviation Based on statistical threshold truncation, the basic shadow response map is generated. :
[0021]
[0022] in, This is the background suppression adjustment coefficient.
[0023] The extraction process of the directional consistency weight map is as follows: calculate the Gaussian smoothed structure tensor matrix of the basic shadow response map within the pixel neighborhood. Eigenvalue decomposition is performed to obtain the unit eigenvector corresponding to the smallest eigenvalue. ; Calculate the unit eigenvector With global illumination direction vector The absolute cosine similarity is used to generate a directional consistency weighted graph. :
[0024]
[0025] The extraction process of the polar morphology consistency weight map is as follows: Candidate dark spot connected components are extracted from the basic shadow response map, and the weighted average value of any t is calculated. The actual pixel area occupied by each connected component Aspect Ratio ;
[0026] A polar morphological consistency weight map is generated using a two-dimensional Gaussian similarity distribution function. :
[0027]
[0028] in, and These represent the theoretical pixel area and theoretical pixel aspect ratio of the standard penguin shadow, respectively. and These are the area distribution tolerance and the aspect ratio distribution tolerance, respectively.
[0029] The process of extracting the community spatial clustering weight map is as follows: extract the set of geometric centroid coordinates of all candidate connected components, and construct the spatial density distribution field. :
[0030]
[0031] in, The number of candidate connected components. For the first The centroid coordinates of a connected domain. This is the aggregation adjustment coefficient. The theoretical pixel safety distance; a community spatial clustering weight map is generated through exponential normalization. :
[0032]
[0033] in, is the density gain constant.
[0034] Local individual joint confidence With dynamic constraint threshold matrix The calculation formula is:
[0035]
[0036]
[0037] in, For directional consistency weight graph, This is a weighted map of polar morphology consistency. This is a weighted map of community spatial aggregation. This is a preset upper limit constant for the penalty of isolated targets. This is a preset lower bound constant for community aggregation tolerance.
[0038] Final weighted graph and spatial guidance map The calculation formula is:
[0039]
[0040]
[0041] in, For gating activation parameters, Based on the shadow response map.
[0042] The present invention also provides a small target identification system in polar environments, the system comprising:
[0043] Acquisition module: acquires multi-frame video image sequences and synchronized physical parameters in polar environments, calculates the solar altitude angle at the current moment; performs spatial alignment and temporal filtering on the multi-frame video image sequences to extract static background images, and extracts the global illumination direction vector from the static background images;
[0044] Weight generation module: In the aligned multi-frame images, directional spatial integration is performed along the global illumination direction vector, and cross-frame fusion is performed in the time dimension to generate a basic shadow response map; multiple features are extracted from the basic shadow response map to obtain a directional consistency weight map, a polar morphology consistency weight map, and a community spatial aggregation weight map.
[0045] Weight Correction Module: The joint confidence of local individuals is calculated by combining the directional consistency weight map and the polar morphology consistency weight map. The dynamic constraint threshold is constructed by combining the community spatial aggregation weight map for cross-validation and dynamic correction to obtain the final weight map. The weight map is then fused with the basic shadow response map to generate a spatial guidance map.
[0046] Recognition Module: Inputs the spatial guidance map and the original image into the target detection network and outputs the target recognition and localization results.
[0047] This invention provides a method and system for identifying small targets in polar environments, which significantly improves the accuracy of small target identification in extreme environments such as polar day, strong winds and snow, and bare rock and ice.
[0048] In terms of target feature extraction, this invention overcomes the limitation of relying solely on the target's own pixels. It fully utilizes the resulting physical projection laws, performing directional spatial integration and temporal cross-frame fusion strictly along the global illumination direction in multi-frame aligned images. This transforms weak, easily obscured by complex backgrounds, minute target features into significant, continuous shadow responses. This process leverages the physical property that the real target shadow and the light projection are strictly collinear, causing the target response energy to superimpose along the integration path. Simultaneously, it effectively discretizes and weakens interference from ice and snow fissures or melting snow spots that lack consistent illumination direction. Furthermore, the temporal cross-frame fusion further offsets transient high-frequency noise such as blizzards stirred up by strong polar monsoons, purifying highly pure basic shadow features from extremely poor signal-to-noise ratio images.
[0049] To suppress false alarms in complex backgrounds, this invention introduces a multi-dimensional polar prior feature extraction and verification mechanism, constructing a weighted graph with three dimensions: directional consistency, polar morphological consistency, and community spatial aggregation. Through rigorous structural tensor and morphological analysis of local response regions, this invention mandates that candidate targets must be parallel to the global light rays in their geometric extension direction and conform to the theoretical scale requirements of specific polar animals from a high-altitude perspective in terms of actual projected aspect ratio and area. Multiple physical and optical constraints can filter out static environmental noise that does not conform to morphological or directional standards. Furthermore, this invention transforms the unique gregarious reproductive characteristics of polar species into community spatial aggregation features, enhancing the ability to identify target spatial distribution from a macro-ecological perspective.
[0050] To completely eliminate the limitations of single-feature dimensions, this invention constructs a cross-validation and dynamic correction mechanism among multi-dimensional features. By deeply integrating local physical morphological confidence with macroscopic biological community density, this invention achieves dynamic adaptive adjustment of constraint thresholds. In high-density target clustering areas, the system automatically relaxes the stringent constraints on individual shadow morphology and orientation, effectively accommodating shadow distortion caused by mutual occlusion, pushing, or snow trampling, significantly reducing the false negative rate in dense areas. In contrast, in open ice fields far from the community, the system automatically raises the entry threshold for physical features, forcing isolated candidate points to possess extremely high standards of shadow optical and morphological features, thereby completely distinguishing them from occasionally existing isolated rocks or ice floes, greatly suppressing false positives. The clean spatial guidance map generated after multi-dimensional dynamic correction can accurately shield against the massive redundant interference brought by the complex polar terrain, providing powerful spatial attention prior guidance for the target detection network, ultimately achieving efficient and accurate identification of small targets in extreme polar scenarios. Attached Figure Description
[0051] Figure 1 This is a flowchart illustrating the micro-target identification process in polar environments according to the present invention.
[0052] Figure 2 This is a schematic diagram of the GAN image enhancement network structure of the present invention;
[0053] Figure 3 This is a schematic diagram of the CSPBS module structure of the present invention;
[0054] Figure 4 This is a schematic diagram of the NDConv module structure of the present invention;
[0055] Figure 5 This is a diagram showing the detection results of the present invention. Detailed Implementation
[0056] This embodiment provides a method and system for identifying small targets in polar environments. It is mainly used to limit the extreme application scenarios to which the system is applicable and to disclose in detail the underlying hardware devices and apparatus layout that support image acquisition and image detection tasks.
[0057] This application is applied to a polar ecological monitoring environment under polar day conditions. Specifically, the environment used in this application is during the polar day period, with continuous sunlight and a relatively constant solar altitude angle within a single aerial photography mission. The image scene identified by this invention is a polar ice field, which is composed of a mixture of highly reflective ice and snow-covered areas and locally low-reflective bare rocks, while the surface is affected by the strong descent winds that prevail throughout the polar regions. The targets to be identified in this invention are small targets in polar regions. In a preferred embodiment, these small targets can be typical polar animals, such as adult Adélie penguins or emperor penguins. The small targets to be identified in this invention exhibit highly consistent morphology and height in biological characteristics.
[0058] In order to obtain high-quality image data streams in the above extreme scenarios, this embodiment provides a small target recognition system. The system includes a flight acquisition platform and a data processing terminal, which are connected to each other via a wireless broadband data link.
[0059] The flight data collection platform uses a cold- and wind-resistant drone as its carrier. To resist battery performance degradation and mechanical structural embrittlement caused by the extreme low temperatures of the polar regions, the drone is preferably equipped with a self-heating intelligent battery module and an internal temperature-controlled chamber. The drone is equipped with multimodal sensors, specifically including: an image acquisition module, a spatiotemporal and attitude perception module, an altitude measurement module, and a data processing terminal.
[0060] The image acquisition module is suspended vertically downwards from the underside of the UAV via a three-axis stabilized brushless gimbal. This module includes a high-resolution visible light camera configured to acquire continuous video image sequences at a preset high frame rate, preferably no less than 60 frames per second. This high frame rate ensures extremely high scene overlap between adjacent frames during rapid UAV flight, providing ample image data for subsequent elimination of polar wind disturbances and directional spatiotemporal integration of small target shadow features based on multi-frame video images.
[0061] The spatiotemporal and attitude perception module is fixedly mounted on the UAV's onboard bus and includes a real-time dynamic differential positioning (RTK-GPS) device and an inertial measurement unit (IMU). The RTK-GPS device is used to synchronously record latitude and longitude coordinates and absolute timestamps in each frame of image acquisition; the IMU is used to synchronously acquire the UAV's real-time roll angle, pitch angle, and yaw angle. The aforementioned timestamp and latitude and longitude coordinate data can be used at the data processing terminal to calculate the solar azimuth and solar altitude angles at the poles at the current moment; the attitude data is used to compensate for camera viewpoint shifts when aligning multiple frames of images.
[0062] The altitude measurement module preferably employs a single-line or multi-line laser rangefinder (LiDAR) or millimeter-wave radar, vertically mounted on the bottom of the UAV, parallel to the line of sight of the image acquisition module. This module is used to obtain the UAV's true relative flight altitude with respect to the polar ice or bare rock surface.
[0063] The data processing terminal can be deployed directly in the airborne edge computing node of the UAV, or it can be deployed in the ground control station of a polar research station. Considering the limited communication bandwidth in the polar environment, it is preferable to deploy the edge computing node as the data processing terminal inside the UAV.
[0064] The hardware computing platform of the data processing terminal includes a central processing unit (CPU) and dedicated artificial intelligence acceleration hardware adapted for matrix operations and image semantic segmentation tasks. This AI acceleration hardware is preferably a graphics processing unit (GPU), a neural network processor, or an application-specific integrated circuit (ASIC). The data processing terminal also includes a spatiotemporal synchronizer, which, through a hardware trigger signal, performs strict frame-level time alignment of multiple frames of video images output by the image acquisition module, coordinate and attitude data output by the spatiotemporal and attitude perception module, and height data output by the height measurement module.
[0065] Next, this embodiment will describe the implementation process of multi-frame image sequence acquisition and polar environment vector extraction in the S1 polar environment micro-target recognition method.
[0066] S1.1: Acquire image and synchronize physical parameters
[0067] The data processing terminal triggers and receives data streams from the image acquisition module, altitude measurement module, and spatiotemporal and attitude perception module via a hardware spatiotemporal synchronizer. In polar day conditions, the high reflectivity of large areas of ice and snow on the ground can easily cause the camera's automatic exposure algorithm to malfunction, resulting in severe underexposure of small targets in the image (i.e., appearing as completely black) or overexposure (i.e., background clipping). Therefore, the data processing terminal locks the sensitivity and exposure time of the image acquisition module to obtain a multi-frame video image sequence with constant exposure parameters.
[0068] Define the acquired multi-frame video image sequence as .in, This represents the set of image sequences acquired within a single analysis window; This indicates the total number of frames contained in the sequence; Represents the first in the sequence The original image of the frame, and the first frame Each frame image corresponds to a unique absolute timestamp .
[0069] In polar aerial photography, strong surface descent winds often whip up dry snow, creating blizzards. The probe beam emitted by the altitude measurement module is easily scattered by suspended ice and snow particles, causing random jumps in single altitude measurements. Simultaneously, the dramatic attitude deviation of the drone in strong winds can cause vertical ranging to become oblique ranging. Therefore, the data processing terminal uses a Kalman filter algorithm to fuse the raw ranging data from the altitude measurement module with the dynamic data from the spatial and temporal awareness module (IMU) to calculate the drone's true relative altitude.
[0070] According to the Frame image corresponding time The vertical distance measurement value is obtained by performing attitude compensation on the original ranging value based on the UAV's spatial attitude. :
[0071]
[0072] in, Indicates the height measurement module is in The original oblique distance measurement value output at any given time; and They represent the IMU in The drone's pitch and roll angles are output synchronously in real time.
[0073] Secondly, extract the three-axis acceleration of the IMU in the UAV body coordinate system. And project it onto the geographic coordinate system to obtain the dynamic acceleration in the vertical direction. :
[0074]
[0075] in, This represents the local gravitational acceleration constant in the polar monitoring area.
[0076] Constructing based on the actual relative altitude of drones With vertical velocity The state vector of the elements Based on the discrete state transition equation and the observation equation, optimal estimation is performed: the state prediction equation is:
[0077]
[0078] The state update equation is:
[0079]
[0080] in, express Prior state estimation at time step; Represents the state transition matrix; Indicates the time interval between adjacent frames; Represents the control input matrix; Represents the observation matrix; This represents the Kalman gain matrix.
[0081] After the above state update, the state vector The first element is the one we want to extract. Real-time high-precision true relative altitude synchronization of drones .
[0082] In polar day conditions, sunlight is the only globally consistent light source. The length of a small target's shadow has a strict geometric projection relationship with its actual height and the solar altitude angle, and the direction of the shadow's extension is strictly collinear with the solar azimuth angle. The data processing terminal uses the spatial positioning and temporal reference provided by RTK-GPS to deduce the polar optical environment vector corresponding to each frame of the image.
[0083] Extract RTK-GPS Real-time output of geographical latitude Geographical longitude This invention defines east longitude as positive and the corresponding year-end in Coordinated Universal Time (UTC). Decimal fractional hours .
[0084] calculate Sun angle corresponding to the time :
[0085]
[0086] calculate Solar declination angle at a given moment Mean time difference :
[0087]
[0088]
[0089] Based on mean time difference and longitude of the observation point Calculate true solar time (Unit: minutes) and solar hour angle (Unit: degrees):
[0090]
[0091]
[0092] Joint observation point latitude Solar declination angle With time angle ,calculate Solar altitude angle at time With solar azimuth :
[0093]
[0094]
[0095] In this invention, the latitude of the polar regions The absolute value is extremely large, that is, it approaches the maximum value. During the polar day, the above formula is calculated as follows: It is always greater than zero but its value is extremely small. The data processing terminal will use the parameter set obtained from the above synchronous calculation. With multi-frame image sequences The first in Frame Image Perform memory binding.
[0096] S1.2: Background alignment and extraction of lighting vectors
[0097] The polar regions are frequently accompanied by strong winds and snow. During aerial photography and data collection, drones are highly susceptible to yaw and high-frequency jitter due to airflow disturbances, resulting in multiple frames of images being lost. The same ground feature in the image experiences pixel drift between adjacent frames. Simultaneously, small targets are in an irregular motion state. To separate stable reference ground features in the complex polar background for calculating illumination direction, the data processing terminal first registers and aligns adjacent frames and extracts large-scale background targets that remain stationary across multiple frames.
[0098] Data processing terminal in the first Original frame image With multi-frame video image sequence Reference frame in Preferably, the corner points of the bare rock edge are extracted between the first or middle frames of the sequence. Let the extracted and successfully matched... For feature points in the th The homogeneous coordinates in the frame and the reference frame are respectively and .
[0099] Optimize using the least squares method to calculate the th Original frame image Affine transformation matrix mapped to the reference frame coordinate system Its objective function is expressed as:
[0100]
[0101] in, This represents the total number of successfully matched feature point pairs. Let be the affine transformation matrix.
[0102] Using the obtained affine transformation matrix For a set of multiple video image sequences Spatial alignment compensation is performed on each original frame of the image to obtain an aligned image sequence. , where the aligned image The pixel mapping relationship is as follows:
[0103]
[0104] In the formula, .
[0105] After achieving strict spatial alignment, the sequence is then analyzed in the time dimension. Median filtering is performed pixel by pixel to extract a clean and stable static background image. Static background image at any pixel coordinate The grayscale value at that location is defined as:
[0106]
[0107] In the formula, This indicates the operation of retrieving the median of a set. When aligning the sequence frame number... When the value is large enough, small moving targets and high-frequency snow noise appear as outliers in time at a single pixel and are filtered out by taking the median.
[0108] In the actual scenario of polar ice caps, large-scale bare rock formations rising above the ground would block continuous sunlight during the polar day. Since the preceding steps have already calculated the solar altitude angle at the current moment... A value consistently greater than zero but relatively small results in the bare rock casting a high-contrast, elongated dark band of shadow on the snow and ice surface. The data processing terminal can process static background images. Gray-scale thresholding and morphological closing operations were performed to extract the set of connected pixels consisting of large bare rock blocks and their accompanying long shadows. .
[0109] Since the shadows cast by the bare rocks within the connected domain and the elongated solar altitude angle extend strictly along the projection axis of the light rays, the data processing terminal derives the illumination direction in the actual image coordinate system by calculating the geometric distribution characteristics of the connected domain.
[0110] Calculate the set of connected pixels Image centroid coordinates :
[0111]
[0112]
[0113] in, Represents a set of pixels Pixel coordinates within; This represents the total number of pixels contained in the connected component.
[0114] Calculate the connected components order central moments :
[0115]
[0116] in, and Values Based on the above central moments, construct the covariance matrix of the connected components. :
[0117]
[0118] Among them, the zeroth order central moment Numerically equivalent to .
[0119] Data processing terminal on the covariance matrix of connected components Eigenvalue decomposition is performed. The largest eigenvalue corresponding to this matrix represents the direction of the most discrete spatial distribution of pixels in the connected region, i.e., the direction of the principal axis extension of the bare rock and its long shadow. The data processing terminal extracts the unit eigenvector corresponding to this largest eigenvalue and defines it as the ... Global illumination direction vector within a frame .
[0120] here, Theoretical astronomical orientation, which belongs to a three-dimensional geographic coordinate system, must be directly converted into a pixel projection vector in a two-dimensional image coordinate system, relying on the drone gimbal. The precise three-dimensional attitude at any given moment. However, under the strong descent winds in the polar regions, the UAV body shakes, causing delays or integral drift in the attitude data collected by RTK-GPS and IMU compared to the camera shutter exposure time. This attitude compensation error is drastically amplified in the detection of small targets, leading to a significant deviation between the theoretically calculated direction of illumination projection and the actual direction of shadow extension in the image. Therefore, in this embodiment, the solar altitude angle is calculated in step S1.1. With solar azimuth The global illumination direction vector is further extracted from the bare rock shadow entities in the image domain. .
[0121] S1.3: Texture Vector Extraction
[0122] The Antarctic ice sheet is constantly battered by strong, uniformly oriented descending winds. Over long periods, wind erosion and deposition create large, parallel aeolian snow ridges on the surface. From an aerial perspective, these ridges reflect light on the windward side and create dark spots on the leeward side. These dark spots, due to their scale, grayscale, and aspect ratio, are easily confused with penguins and their shadows, making them a significant source of background interference leading to false detections of small targets in the polar region. To effectively suppress this background interference, the data processing terminal performs frequency domain analysis to extract the inherent texture direction of the snow surface.
[0123] The data processing terminal performs masking operations on the static background image. The set of connected pixels of bare rock and its associated shadows in the image. These were removed, and the remaining pixel areas were defined as a clean snowy background image. .
[0124] Considering that aeolian snow ridges in the spatial domain exhibit large-area, regular parallel textures, the data processing terminal requires a size of... Pure snow background image Perform a two-dimensional discrete Fourier transform to map it to the frequency domain:
[0125]
[0126] in, Represents a complex matrix in the frequency domain; and These represent the width and height of the image, respectively. and These represent the horizontal and vertical coordinate variables in the frequency domain, respectively; It is the imaginary unit.
[0127] Calculate the frequency domain amplitude spectrum matrix :
[0128]
[0129] in, and These represent extracting the real and imaginary parts of a complex number, respectively.
[0130] In a two-dimensional Fourier transform, a strong parallel linear texture in the spatial domain is transformed into a bright, high-energy band passing through the center of the frequency domain amplitude spectrum, and the direction of this high-energy band is strictly orthogonal to the direction of the spatial texture. To accurately extract this highest-energy direction, the frequency domain coordinates are... Mapping to polar coordinates Below, among them Represents the radial frequency in polar coordinates. Represents the angular frequency. Calculate the angular energy distribution function along different angular frequency directions. :
[0131]
[0132] in, and This indicates the upper and lower limits of the effective high-frequency energy band integral, used to avoid excessively bright DC low-frequency components in the frequency domain center.
[0133] Data processing terminal search angle energy distribution function Find the global maximum value and obtain the angle corresponding to that maximum energy. :
[0134]
[0135] The principal axis of the frequency domain energy is orthogonal to the spatial domain texture direction. The data processing terminal increases... The phase offset is used to calculate the true extension angle of the wind-formed snow ridge in the spatial domain. :
[0136]
[0137] Based on the calculated true extension angle Construct a unit direction vector and define it as the global texture direction vector. :
[0138] .
[0139] Next, this embodiment will describe the specific implementation process of multi-frame directional spatiotemporal integration based on illumination vector in the method for identifying small targets in polar environments.
[0140] S2.1: Directional Integral
[0141] In large-scale aerial photography at high altitudes in the polar regions, tiny objects like penguins occupy very few pixels in the images, their surface texture is severely degraded, and they are easily submerged in the complex background of polar ice and snow. However, the solar altitude angle... Acting as an optical amplifier, it elongates the height of a tiny target into a dark band of shadow. The direction of this shadow's extension is strictly limited by the global illumination direction vector extracted in step S1.2. Meanwhile, the strong descent winds in the polar regions generate white dust, which can cause transient high-frequency occlusion and random noise in the images. To maximize the extraction of shadow features of small targets while suppressing this random noise, directional spatial integration of pixels is strictly performed along the illumination direction in aligned multi-frame images.
[0142] Combined with the actual relative height obtained in step S1.1 With solar altitude angle Calculate the current number The theoretical upper limit of the target shadow in the image coordinate system in a frame scene :
[0143]
[0144] in, A constant representing the average physical height of typical polar animals (such as adult Adélie penguins); This represents the physical focal length constant of the camera in the image acquisition module; This represents a constant indicating the physical size of the pixels in the camera's image sensor.
[0145] Because the penguin target and its cast shadow are opaque or weakly reflective, their grayscale value in the image is significantly lower than that of the highly reflective ice and snow background. The data processing terminal utilizes the static background image extracted in step S1.2. For spatially aligned image sequences Aligning images for each frame ( Perform background subtraction to obtain a foreground-dark target difference map. :
[0146]
[0147] By using nonlinear truncation, non-target bright spots caused by enhanced snow reflection are directly filtered out, with the difference being negative, retaining only potential dark target entities and shadows.
[0148] Using the global illumination direction vector As a guide, in each frame of the foreground-dark target difference map Inside, for each pixel Along the backlight direction, the length is directional spatial integration is used to obtain the directional integral feature map of a single frame. :
[0149]
[0150] in, This represents the integration step size, which is the number of pixels to round down from the theoretical maximum length of the shadow. This is the discrete step index along the direction of the light illumination line. Based on the calculated coordinates... Typically located at sub-pixel positions, bilinear interpolation is used to extract the corresponding... Pixel value.
[0151] For real penguin targets, the resulting shadows are completely different from... Because they are collinear, the integral along this specific vector will produce a coherent superposition of energy, making the originally weak shadow line a very high-intensity energy peak at the endpoint of the integral, i.e., the location of the target body. For randomly distributed ice and snow fissures, snow melt spots, or wind-generated noise, since their geometric distribution does not have strict global illumination direction consistency, the energy will be discretized and weakened during the directional integration process, thus highlighting the target shadow feature in the spatial domain.
[0152] S2.2: Generate the basic shadow response map
[0153] After the single-frame directional spatial integration in step S2.1, although static interference dark spots are suppressed, under extreme polar weather conditions, high-speed snowballs or local airflow disturbances may still induce transient dark spot projections similar to shadows in single-frame images, leading to... Discrete pseudo-target high responses exist. To counteract this random noise, cross-frame fusion is further performed in the temporal dimension to generate a base shadow response map that highlights all shadow features.
[0154] The UAV image acquisition module is configured for a high frame rate of no less than 60 frames per second. Within the time span of a single analysis window, the motion displacement of an individual penguin is extremely small relative to the large-scale aerial field of view. The shadow integral response of the real target exhibits high spatiotemporal persistence in the aligned time series, meaning it persists in the vicinity of the same pixel coordinates. Conversely, the noise response caused by whirlwinds or transient occlusions exhibits high randomness and non-repeatability on the time axis.
[0155] For all in the sequence Single-frame directional integral feature map of a frame Perform time-domain mean aggregation to obtain the spatiotemporal integral response map. :
[0156]
[0157] Through time-domain integration, the energy of transient random noise is further attenuated, while the penguin shadow, possessing stable spatiotemporal persistence characteristics, is effectively preserved and solidified. To completely eliminate residual low-frequency background drift noise and generate the final required feature map, the data processing terminal statistically analyzes the spatiotemporal integral response map. Pixel mean across the entire image with standard deviation :
[0158]
[0159]
[0160] in, and These represent the width and height of the image, respectively.
[0161] Based on statistical thresholds Perform hard truncation activation to generate the base shadow response map. :
[0162]
[0163] in, This represents the preset background suppression adjustment coefficient.
[0164] Next, this embodiment will describe in detail the process of generating the multidimensional polar weight map in step S3 after obtaining the basic shadow response map.
[0165] The base shadow response map generated by the above multi-frame directional spatiotemporal integration Although the dark shadows of the target have been highlighted, in the complex polar terrain, some snowmelt patches, local depressions in wind-blown snow ridges, or ice fissures, because they happen to present discrete dark tones, will still generate high-frequency false alarm responses in the response map. To filter out these false alarms, this invention further extracts a so-called polar weight map to guide the identification of small targets.
[0166] S3.1: Extract the directional consistency weight graph
[0167] A tiny target can be considered as an upright, opaque cylinder, and the principal axis of its shadow cast on the ice and snow surface must be strictly parallel to the projection of sunlight. To measure the degree of agreement between the true extension direction of local dark spots and global illumination, this invention uses a basic shadow response map. Perform local structure tensor analysis.
[0168] calculate Horizontal gradient in two-dimensional image space with vertical gradient A Gaussian-smoothed structure tensor matrix is constructed within the neighborhood of each pixel. :
[0169]
[0170] in, The standard deviation is expressed as Two-dimensional Gaussian smoothing kernel; This represents a two-dimensional convolution operation.
[0171] right Eigenvalue decomposition obtains the unit eigenvector corresponding to its smallest eigenvalue. Since the shadow appears as a narrow, dark band with a strong edge gradient, the direction of the most drastic grayscale change, i.e., the direction of the largest eigenvalue, is perpendicular to the direction of shadow extension. Therefore, the unit eigenvector corresponding to the smallest eigenvalue is... This indicates the actual spatial extension direction of the principal axis of the local dark spot.
[0172] Extract the global illumination direction vector obtained in step S1.2 Calculate the absolute cosine similarity between the local extension principal axis and the illumination direction to generate a direction consistency weight map. :
[0173]
[0174] The orientation consistency weighting diagram of this invention relates to the relationship between the extension direction of local dark spots and the global illumination direction. The closer the included angle is to or When the dot product is strictly parallel, the absolute value of the dot product approaches 1 / 2. The confidence level for identifying a shadow as a real target is extremely high; conversely, for ice and snow fissures or bare rock edges with random orientations, the weight value will decrease precipitously.
[0175] S3.2: Extracting the Polar Morphological Consistency Weight Map
[0176] Within the specific ecological communities of Antarctica, adult penguins exhibit a highly consistent biological scale in body size. Combining the aforementioned relative flight altitude and lighting angle, the aspect ratio and area of a real penguin's projection in the current image frame are definitively determined.
[0177] Combined with the calculated upper limit of the theoretical pixel length of the shadow Further calculate the theoretical pixel width of the penguin target body. :
[0178]
[0179] in, This represents the preset average physical width constant of the target species in the polar region.
[0180] Define the theoretical pixel area of the standard penguin shadow. and theoretical pixel aspect ratio .
[0181] Basic shadow response map Perform connected component analysis to extract Set of candidate dark spot connected regions For any number of... Connected components Calculate its actual pixel area occupied in the image. Compared with the actual aspect ratio derived from the eigenvalues of the covariance matrix .
[0182] By using a two-dimensional Gaussian similarity distribution function, a nonlinear mapping is performed between the dark spot morphology and the theoretical scale to generate a polar morphology consistency weight map. :
[0183]
[0184] in, and These represent the allowable tolerance for area distribution and aspect ratio distribution, respectively. Regions with excessively large shapes, such as large snowplow pits, or excessively fragmented shapes, such as isolated noise points, are forcibly suppressed in terms of weight.
[0185] S3.3: Extracting the spatial clustering weight map of the community
[0186] Polar penguins exhibit extremely typical gregarious characteristics, meaning that while individuals are distributed in high-density clusters on a macroscopic scale, they also maintain a safe distance of a standard neck extension length on a microscopic scale in order to defend against each other's aggression.
[0187] To mathematically represent the spatial priors of this biological community, we first extract all candidate connected components. set of geometric centroid coordinates ,in By combining the relative flight altitude of the drone, the theoretical pixel-safe distance of the target species at the current image scale is calculated. :
[0188]
[0189] in, This represents the physical safety distance constant inherent in the biology of penguins.
[0190] A Gaussian mixture model is used, with each candidate centroid... To determine the desired distribution center, a covariance matrix is constructed using the theoretical pixel safety distance as a scale constraint. The spatial density distribution field of community candidate points is then calculated across the entire map. :
[0191]
[0192] in, This is the preset aggregation adjustment coefficient.
[0193] A community spatial aggregation weight map is generated using exponential normalization. :
[0194]
[0195] in, The density gain constant is preset. Dark spots (density field) in the spatial clustering weight map at the center or edge of high-density penguin colonies. (High), its weight will approach Isolated dark spots scattered across vast ice fields far from the community, such as occasional isolated rock fragments or iceberg shadows, lack similar responses within their neighborhoods that satisfy the pecking distance distribution, make... Extremely low, thus in its weight map The confidence level was significantly weakened.
[0196] Step S4: Cross-validate guidance information and dynamically adjust the weight graph.
[0197] Relying solely on prior features from a single dimension is highly susceptible to false positives. For example, ice and snow fissures with random directional patterns may, in a particular local segment, exhibit directional consistency. The edges of large areas of dark patches from melting snow may occasionally correspond to the morphological characteristics of penguins. The spatial aggregation of some scattered bare rock communities is relatively high; however, due to their dense spatial distribution, some communities may trigger a high response in terms of spatial aggregation. The results are relatively high. To avoid interference from these false positives, this invention introduces a cross-validation mechanism to dynamically adjust the weight graph of the three dimensions.
[0198] For tiny targets in the polar regions, the shadows they cast on the ice and snow surfaces must simultaneously and rigorously satisfy two local physical and optical prerequisites: parallel to sunlight and conforming to the theoretical body size of the species.
[0199] Extracted directional consistency weight map Weighted map of polar morphology Nonlinear coupling is performed at the pixel level to calculate the joint confidence matrix of local individuals. :
[0200]
[0201] This invention employs pixel-by-pixel geometric averaging, a logic and constraint mechanism for constructing connected components. In a polar background, if the shape of a dark spot closely resembles a penguin, but its direction of extension completely contradicts sunlight, geometric averaging will combine its confidence scores. Forcibly lowered to near Conversely, this operation effectively eliminates discrete background noise that only satisfies a single physical dimension.
[0202] In polar ecosystems, penguins in colony areas often experience distorted or incomplete shadows due to mutual pushing, blocking, and frequent trampling of snow, leading to a decrease in the joint confidence level of some individuals. A reasonable decline occurs; conversely, if a penguin appears isolated on an open ice field far from the colony, undisturbed, its shadow will exhibit an extremely perfect standard shape and direction.
[0203] This invention utilizes the community spatial aggregation weight map extracted in step S3.3. As an environmental prior, a dynamic constraint threshold matrix is constructed for each pixel. :
[0204]
[0205] in, This represents a preset upper limit constant for the penalty of isolated targets, with a preferred value close to... ; This represents a preset lower bound constant for penguin colony aggregation tolerance. The dynamic threshold is applied when a pixel is located within a high-density penguin colony. Automatically descend to This means relaxing the stringent requirements on the shape and orientation of dark spots in the region to accommodate shadow distortion caused by occlusion; while when pixels are located in polar, barren areas, the dynamic threshold... Automatically upgraded to the highest level This means that the system requires the dark spots in the area to have high shadow optical and morphological characteristics; otherwise, they will be judged as isolated stones or ice blocks.
[0206] In obtaining the joint confidence of local individuals With dynamic constraint threshold matrix Then, a biased logistic regression function is used to perform pixel-by-pixel nonlinear activation on the entire image to calculate the final cross-validation weight map. :
[0207]
[0208] in, These are preset gating activation parameters. Only when the local physical confidence level is reached... Breakthrough in the dynamic threshold of real-time regulation by biological community density The activation function will only output high weights when the threshold is reached. For spurious target responses that fail to break through the corresponding threshold, their weight values will be exponentially compressed to near zero. .
[0209] The final weighted graph of cross-validation obtained after multi-dimensional dynamic correction The base shadow response map generated in step S2 is applied. The final spatial guidance map is obtained by performing pixel-by-pixel multiplication using the Hadamard product. :
[0210]
[0211] After the above steps are completed The strong false positive response in the original region caused by polar blizzard disturbances, snowmelt pits, and large-scale ice sheet fissures was suppressed.
[0212] The spatial guidance map obtained through step S4, multidimensional cross-validation and dynamic correction. This has created a highly confident spatial attention prior mask for tiny polar targets. The data processing terminal can directly use this clean spatial guidance map. Original penguin images from a multi-frame video image sequence Spatial feature fusion is performed and the data is directly input into conventional target detection networks in existing technologies, such as standard YOLO series or Faster R-CNN detection heads, for the detection of small targets.
[0213] However, considering that raw images from Antarctica's extreme climate and bare rock terrain are often blurred by severe atmospheric disturbances, relying solely on spatial probability guidance may be limited by the insufficient visual features of the target entity. Therefore, in a preferred embodiment, this invention improves both the front-end image quality and the back-end detection network: firstly, a generative adversarial network (GAN) is used to preprocess the raw image for high-frequency detail enhancement; then, the enhanced image and the spatial guidance map are fused using multi-source features, and finally input into the improved YOLOv5-H model.
[0214] S5: Multi-source feature fusion and YOLO object detection
[0215] In the extreme climate and bare rock terrain of Antarctica, images of penguins taken by drones from high altitudes often appear blurry. Relying solely on a single visual feature or a single physical prior of shadows can easily lead to missed or false detections. Therefore, this embodiment utilizes a multi-source feature fusion mechanism, combined with a specially designed YOLOv5-H model, to achieve the identification and localization of tiny penguin targets.
[0216] S5.1: Image Enhancement Preprocessing Based on Generative Adversarial Networks (GANs)
[0217] To overcome the image sharpness degradation caused by strong winds and atmospheric disturbances in Antarctica, the data processing terminal first uses a Generative Adversarial Network (GAN) to construct an image enhancement module to process the original penguin images in the input sequence. Perform enhanced preprocessing.
[0218] This GAN module contains a generator network. and discriminator network Generator Using an encoding / decoding structure, the original image that has been disturbed and blurred is... Mapped to an enhanced image with sharp edges and clear textures. ,Right now Discriminator Then used to distinguish the generated enhanced images High-resolution real sample images of the polar regions .
[0219] By constructing adversarial losses content perception loss The generator and discriminator reach Nash equilibrium during adversarial training, and their objective function is expressed as:
[0220]
[0221] in, Represents the mathematical expectation; These are the content loss weighting coefficients. After this processing step, the output is an enhanced image. The boundary outline of the tiny penguin target has been restored.
[0222] S5.2: Multi-source feature spatial attention fusion
[0223] Although image enhancement While visual clarity has been improved, the grayscale features of penguins are still easily obscured by the complex polar background of ice, snow, and bare rock. Therefore, the data processing terminal will use the spatial guidance map output in step S4. As a priori and enhanced image Perform multi-source feature fusion.
[0224] Let the enhanced image be In pixel coordinates And the channel , The pixel value at that location is Multi-source feature fusion is accomplished through scalar multiplication fusion, and the fused image tensor is calculated. :
[0225]
[0226] in, This represents the prior fusion coefficient.
[0227] S5.3: Feature Extraction and Object Detection Based on YOLOv5-H Model
[0228] The fused image tensor The input is fed into a specific YOLOv5-H model. To address the slight blurring characteristic of Antarctic penguin images, the YOLOv5-H model embeds a feature enhancement module (CSPBS) and a deep feature extraction module (NDConv) into the backbone of the original YOLOv5s network architecture.
[0229] The Feature Enhancement Module (CSPBS) addresses the issue that small targets, due to their severely insufficient resolution, easily lose their inherent appearance features during depthwise convolutional downsampling. The CSPBS module employs a bi-branch convolutional structure and residual connections to enhance feature representation while preserving essential details. Let the input feature map of the CSPBS module be... .
[0230] First convolutional branch: Performs a single convolutional dimensionality reduction operation on the input features to extract features from the basic receptive field.
[0231]
[0232] The second convolutional branch: processes the input features... Secondary convolutional operations are used to obtain the contextual semantics of small targets within a multi-scale receptive field. Let the... The output of the second convolution is :
[0233]
[0234] Residual connection structure: at any feature map location At this point, multi-level features extracted by different convolutional kernels are concatenated and fused along the channel dimension, forming residual connections with the input features. The formula for feature fusion is defined as follows:
[0235]
[0236]
[0237] in, Indicates the location The fused feature vector; This indicates a splicing operation performed along the channel dimension; For use in unifying the number of channels Convolutional operations. This structure effectively avoids gradient vanishing and feature sparsity in deep networks due to the small features of the penguin.
[0238] The Deep Feature Extraction (NDConv) module addresses the highly nonlinear interplay between polar bare rocks, ice fissures, and penguin colonies in the deep semantic space. Conventional static convolutional kernels struggle to simultaneously capture these complex distributions. The NDConv module employs a dynamic convolutional kernel mechanism, adaptively adjusting convolutional parameters based on the input features. Specifically, for the input features of the NDConv module... First, a set of attention weight distributions dependent on the current polar image content is dynamically calculated using Global Average Pooling (GAP) and Multilayer Perceptron (MLP). Then, the module will A static expert convolution kernel Dynamic linear weighting is performed to generate dynamic convolution parameters specific to the polar features of the current frame. :
[0239]
[0240] Using this dynamic convolution kernel Perform in-depth calculations on the features.
[0241] During the model training phase, the model parameters of the NDConv module and the entire YOLOv5-H network are... The update uses gradient descent.
[0242] After detail enhancement by CSPBS and deep dynamic feature extraction by NDConv, the final feature map is fed into the YOLO target detection head. The detection head outputs the identification, classification, and localization results of penguin targets in the fused polar feature space. (Detection result set) Includes the location coordinates, bounding box size, and confidence score of each individual penguin:
[0243]
[0244] in, For the first The center coordinates of the bounding box of the penguin target; and These are the width and height of the bounding box, respectively; The confidence score for the predicted box to belong to the penguin; This represents the total number of targets detected in a single frame of an image.
[0245] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided in this application can include non-volatile and / or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), Rambus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
[0246] In this specification, the same or similar parts between the various embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the descriptions of the embodiments described later are relatively simple, and relevant parts can be referred to the descriptions of the foregoing embodiments.
[0247] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A method for identifying a small target in a polar environment, characterized by, The identification method includes: Acquire a multi-frame video image sequence and synchronous physical parameters in a polar environment, calculate the solar elevation angle at the current moment; perform spatial alignment and temporal filtering on the multi-frame video image sequence to extract static background images, and extract the global illumination direction vector from the static background images; In the aligned multi-frame images, directional spatial integration is performed along the global illumination direction vector, and cross-frame fusion is performed in the temporal dimension to generate a basic shadow response map; Multi-feature extraction was performed on the basic shadow response map to obtain the orientation consistency weight map, the polar morphology consistency weight map, and the community spatial aggregation weight map. The extraction process of the directional consistency weight map is as follows: calculate the Gaussian smoothed structure tensor matrix of the basic shadow response map within the pixel neighborhood. Eigenvalue decomposition is performed to obtain the unit eigenvector corresponding to the smallest eigenvalue. ; Calculate the unit eigenvector With global illumination direction vector The absolute cosine similarity is used to generate a directional consistency weighted graph. : ; The extraction process of the polar morphology consistency weight map is as follows: Candidate dark spot connected components are extracted from the basic shadow response map, and the weighted average value of any t is calculated. The actual pixel area occupied by each connected component Aspect Ratio ; A polar morphological consistency weight map is generated using a two-dimensional Gaussian similarity distribution function. : ; in, and These represent the theoretical pixel area and theoretical pixel aspect ratio of the standard penguin shadow, respectively. and These are the area distribution tolerance and the aspect ratio distribution tolerance, respectively. The process of extracting the community spatial clustering weight map is as follows: extract the set of geometric centroid coordinates of all candidate connected components, and construct the spatial density distribution field. : ; in, The number of candidate connected components. For the first The centroid coordinates of a connected domain. This is the aggregation adjustment coefficient. The theoretical pixel safety distance; a community spatial clustering weight map is generated through exponential normalization. : ; in, It is the density gain constant; The joint confidence of local individuals is calculated by combining the directional consistency weight map and the polar morphology consistency weight map. Then, a dynamic constraint threshold is constructed by combining the community spatial aggregation weight map for cross-validation and dynamic correction to obtain the final weight map. This weight map is then fused with the basic shadow response map to generate a spatial guidance map. The spatial guidance map and the original image are input into the target detection network, which outputs the target recognition and localization results.
2. The method for identifying small targets in a polar environment according to claim 1, characterized in that, Extracting the global illumination direction vector from a static background image includes: extracting the set of connected pixels in the static background image consisting of large bare rocks and their accompanying long shadows. ; Calculate the set of connected component pixels of order central moments Construct the covariance matrix of connected components ,in The zeroth central moment; the covariance matrix of the connected region. Perform eigenvalue decomposition and extract the unit eigenvector corresponding to the largest eigenvalue as the global illumination direction vector. .
3. The method for identifying small targets in a polar environment according to claim 1, characterized in that, Oriented spatial integration along the global illumination direction vector includes: based on the true relative height With solar altitude angle The theoretical upper limit of the target shadow length is calculated. : ; in, The target average physical height constant. This is the camera's physical focal length constant. The pixel physical size is constant; background subtraction is performed on the aligned multi-frame images to obtain the foreground-dark target difference map. Using the global illumination direction vector To guide the length of directional spatial integration is used to obtain the directional integral feature map of a single frame. : ; in, For those taking a walk, here is an index.
4. A method for identifying small targets in a polar environment according to claim 3, characterized in that, Perform cross-frame fusion over time to generate a base shadow response map, including: for the total number of frames contained in the sequence. Single-frame directional integral feature map Perform time-domain mean aggregation to obtain the spatiotemporal integral response map. ; Statistical Spatiotemporal Integral Response Diagram Average pixel value of the entire image with standard deviation Based on statistical threshold truncation, the basic shadow response map is generated. : ; in, This is the background suppression adjustment coefficient.
5. A method for identifying small targets in a polar environment according to claim 1, characterized in that, Local individual joint confidence With dynamic constraint threshold The calculation formula is: ; ; in, For directional consistency weight graph, This is a weighted map of polar morphology consistency. This is a weighted map of community spatial aggregation. This is a preset upper limit constant for the penalty of isolated targets. This is a preset lower bound constant for community aggregation tolerance.
6. A method for identifying small targets in a polar environment according to claim 5, characterized in that, Final weighted graph and spatial guidance map The calculation formula is: ; ; in, For gating activation parameters, Based on the shadow response map.
7. A micro-target recognition system for polar environments, characterized in that, The system is used to perform a method for identifying small targets in a polar environment as described in claim 1, the system comprising: Acquisition module: acquires multi-frame video image sequences and synchronized physical parameters in polar environments, calculates the solar altitude angle at the current moment; performs spatial alignment and temporal filtering on the multi-frame video image sequences to extract static background images, and extracts the global illumination direction vector from the static background images; Weight generation module: In the aligned multi-frame images, directional spatial integration is performed along the global illumination direction vector, and cross-frame fusion is performed in the time dimension to generate a basic shadow response map; multiple features are extracted from the basic shadow response map to obtain a directional consistency weight map, a polar morphology consistency weight map, and a community spatial aggregation weight map. Weight Correction Module: The joint confidence of local individuals is calculated by combining the directional consistency weight map and the polar morphology consistency weight map. The dynamic constraint threshold is constructed by combining the community spatial aggregation weight map for cross-validation and dynamic correction to obtain the final weight map. The weight map is then fused with the basic shadow response map to generate a spatial guidance map. Recognition Module: Inputs the spatial guidance map and the original image into the target detection network and outputs the target recognition and localization results.