A target detection method, device and computer readable storage medium
By generating Gaussian kernel maps and multi-scale sample spaces, the problem of low detection accuracy of directional targets in remote sensing images is solved, enabling accurate localization of targets in complex backgrounds and densely distributed targets, thus improving detection accuracy and generalization ability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SUZHOU UNIV
- Filing Date
- 2026-03-12
- Publication Date
- 2026-06-19
AI Technical Summary
Existing remote sensing image orientation target detection methods suffer from low detection accuracy, especially when dealing with complex backgrounds and densely distributed targets, making accurate positioning difficult.
By generating a Gaussian kernel map based on the oriented bounding box of each target in the remote sensing image sample, calculating the Gaussian kernel scaling factor and hierarchical label, a multi-scale sample space is generated and used as supervision information to input the target detection model for iterative training, thus establishing a general mapping relationship between sparse real labels and dense prediction results.
It improves the accuracy and generalization ability of target detection in remote sensing images, reduces missed detections, false detections and positioning errors, and is suitable for various target detection tasks.
Smart Images

Figure CN122244702A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image target detection technology, and in particular to a target detection method, apparatus, and computer-readable storage medium. Background Technology
[0002] Oriented target detection in remote sensing images is a core technology for remote sensing image analysis and interpretation. It enables the precise location and classification of targets in remote sensing images. Based on the geometric features, texture, and spectral characteristics of remote sensing images, it identifies and locates targets with arbitrary orientations, dense arrangements, and different scales, providing crucial support for subsequent remote sensing information extraction and applications. Unlike target detection in natural images, remote sensing image target detection faces challenges such as arbitrary target orientation, complex background interference, and dense distribution, laying an important foundation for the efficient utilization and intelligent analysis of remote sensing data.
[0003] Existing remote sensing image orientation target detection methods rely on image-level labels to guide the model to implicitly learn target locations. Image-level labels only indicate that a certain type of target is present in the image, without providing clues such as the target's location and quantity. The model can only indirectly infer the possible location of the target based on the overall image features. The learning process depends on the statistical regularity of data distribution and cannot learn the target's edge and contour features. When the image coverage is wide and the background is complex, the model struggles even more to learn the target's location features, leading to low target detection accuracy. Furthermore, existing technologies use sparse orientation bounding boxes for target labeling, which essentially describe the target's location and shape using only a few point coordinates and width / height information. The model outputs a dense feature map, where each pixel corresponds to a target with predicted probabilities, offsets, and angles. It is difficult to establish a universal mapping relationship between the dense feature map predictions and the location bounding boxes. In addition, existing target detection methods ignore the multi-scale characteristics of targets in remote sensing images. Using fixed-scale feature mapping methods cannot adapt to the location requirements of targets of different sizes. All these factors affect the accuracy of target detection.
[0004] In summary, existing methods for directional target detection in remote sensing images suffer from low detection accuracy. Summary of the Invention
[0005] Therefore, the technical problem to be solved by the present invention is to overcome the problem of low detection accuracy in the existing remote sensing image orientation target detection methods.
[0006] To address the aforementioned technical problems, this invention provides a target detection method, comprising: The linear size of a target is obtained by taking the arithmetic square root of the number of non-zero pixels in the Gaussian kernel map generated from the oriented bounding box of each target in the remote sensing image sample; the layer identifier to which the target belongs is obtained by taking the linear size of the target and the scale range of each level in the feature pyramid in the target detection model. Based on the linear size of each target, the layer identifier, and the number of layers in the feature pyramid, calculate the Gaussian kernel scaling factor for that target; using a two-dimensional Gaussian matrix, generate the Gaussian kernel distribution matrix of that target on the feature map of its respective layer based on the target's oriented bounding box and the Gaussian kernel scaling factor. The Gaussian kernel distribution matrix of each target on its respective level feature map is accumulated and added to the feature layer matrix of the position sample space of the level feature map; based on the region where the value of the target in the Gaussian kernel distribution matrix on its respective level feature map is greater than a preset threshold, the region corresponding to the scale matrix of the scale sample space of the level feature map is assigned as the orientation bounding box parameter of the target; Based on the location sample space and scale sample space of all hierarchical feature maps, a multi-scale sample space for remote sensing image samples is obtained. Remote sensing image samples and their multi-scale sample space are input into the target detection model, which outputs multi-scale prediction results and constructs a target detection loss function. The target detection model is then iteratively trained until the value of the target detection loss function converges. The trained target detection model is then used to perform directional target detection on remote sensing images.
[0007] Preferably, the scale range of the i-th level in the feature pyramid is represented as follows: ,in, , This indicates the number of levels in the feature pyramid. This represents the preset standard scale endpoint value of the i-th level. ; like and If the target layer is 1, then the layer identifier is 1; where, Represents the linear size of the target; like and If so, the layer to which the target belongs is identified as L; like and If the target layer is identified as i-1 and i, then the layer identifier is i-1 and i.
[0008] Preferably, the target's Gaussian kernel scaling factor The calculation formula is: , in, Indicates the balance coefficient; Indicates scale matching terms; Represents the natural constant; This indicates the preset standard size of the target's level i; Indicates scale deviation; The mean vector of the Gaussian kernel distribution matrix of the target on its respective layer feature map for: , in, This represents the coordinates of the center point of the target's oriented bounding box. The x-coordinate of the center point of the target's oriented bounding box. The ordinate of the center point of the target's oriented bounding box; The covariance matrix of the Gaussian kernel distribution matrix of the target on its respective hierarchical feature map for: , in, This represents the Gaussian kernel scaling factor; Represents a standard two-dimensional rotation matrix; Indicates the target's rotation angle; Represents the scale matrix; Indicates the width of the target's orientation bounding box; Indicates the height of the target's orientation bounding box; This represents the transpose of a standard two-dimensional rotation matrix.
[0009] Preferably, the target detection model includes an encoder for multi-scale feature extraction of input remote sensing image samples and a decoder for outputting multi-scale prediction results based on multi-scale feature maps, wherein the encoder includes a backbone network and a feature pyramid, and the decoder includes two mutually coupled standard convolutional branches.
[0010] Preferably, the encoder performs multi-scale feature extraction on the input remote sensing image samples, and outputs a multi-scale feature map. Represented as: , in, , This represents the feature map of the i-th layer. , express Height and width, Indicates the number of channels in the feature map; Indicates the number of levels in the feature pyramid; Represents remote sensing image samples; Indicates encoder; Indicates the backbone network; Represents a feature pyramid; Multi-scale prediction results output by the encoder Represented as: , in, Indicates decoder; , , This indicates the location prediction result. Given a list containing L elements, This represents the position prediction result on the feature map of the i-th layer. Let be a two-dimensional matrix, representing the probability that each pixel in the i-th feature layer is the center of the target. , This represents the height and width of the feature map at layer i; , , Indicates the scale prediction results, Given a list containing L elements, This represents the scale prediction result on the feature map of the i-th layer. The number of parameters representing the target's oriented bounding box.
[0011] Preferably, the target detection loss function is constructed by summing the cross-entropy distance function between the location prediction result of the remote sensing image sample and the location sample space, and the cross-entropy distance function between the scale prediction result of the remote sensing image sample and the scale sample space, to obtain the target detection loss function.
[0012] Preferably, the target detection loss function Represented as: , in, Represents the location sample space; Represents the scale of the sample space; This represents the cross-entropy distance function.
[0013] Preferably, target detection is performed on remote sensing images using a trained target detection model, including: A new target detector is constructed based on the remote sensing image to be detected, and the parameters of the trained target detection model are loaded into the backbone network and neck network of the new target detector. The parameters of the new target detector are fine-tuned using a new remote sensing image training set. Only the parameters of the detector head in the target detector are updated, while the parameters of the backbone network and feature pyramid are fixed, resulting in the fine-tuned target detector. The remote sensing image to be detected is input into the fine-tuned target detector, which outputs the predicted bounding box parameters and their class probabilities. Redundant predicted bounding boxes are filtered using a non-maximum suppression algorithm to obtain the target localization and detection results.
[0014] The present invention also provides a target detection device, comprising: The target identifier layer acquisition module is used to obtain the linear size of each target based on the arithmetic square root of the number of non-zero pixels in the Gaussian kernel map generated by the oriented bounding box in the remote sensing image sample; and to obtain the layer identifier to which the target belongs based on the linear size of the target and the scale range of each level in the feature pyramid in the target detection model. The Gaussian kernel distribution matrix acquisition module is used to calculate the Gaussian kernel scaling factor of each target based on its linear size, layer identifier, and number of layers in the feature pyramid; and using a two-dimensional Gaussian matrix, based on the target's oriented bounding box and the Gaussian kernel scaling factor, to generate the Gaussian kernel distribution matrix of the target on the feature map of its respective layer. The multi-scale sample space filling module is used to accumulate the Gaussian kernel distribution matrix of each target on its respective level feature map to the feature layer matrix of the position sample space of the level feature map; based on the region where the value of the target in the Gaussian kernel distribution matrix on its respective level feature map is greater than a preset threshold, the region corresponding to the scale matrix of the scale sample space of the level feature map is assigned as the orientation bounding box parameter of the target. The multi-scale sample space acquisition module is used to obtain the multi-scale sample space of remote sensing image samples based on the location sample space and scale sample space of all level feature maps. The model training and acquisition module is used to input remote sensing image samples and their multi-scale sample space into the target detection model, output multi-scale prediction results and construct the target detection loss function, thereby iteratively training the target detection model until the value of the target detection loss function converges, and using the trained target detection model to perform directional target detection on remote sensing images.
[0015] The present invention also provides a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the steps of the target detection method described above.
[0016] The target detection method provided in this application has the following beneficial effects: This method utilizes a feature map association algorithm to obtain the layer identifier and Gaussian kernel scaling factor of each target based on the oriented bounding box annotations of each target in remote sensing image samples. Then, using a positive sample definition algorithm, a multi-scale sample space is obtained from the remote sensing image samples based on these annotations, layer identifiers, and Gaussian kernel scaling factors. Finally, this multi-scale sample space is used as supervisory information and input into the target detection model along with the remote sensing image samples. This method directly generates supervisory information based on the oriented bounding box annotations of the targets, improving the accuracy of target localization cues. This allows the model to directly learn target contours and edge features based on the supervisory information. Furthermore, by converting sparse oriented bounding boxes into dense Gaussian kernel representations, a universal mapping relationship between sparse ground truth labels and dense prediction results is established, ensuring the sparse boundaries of oriented targets of different shapes and locations. The bounding box labels can be stably and accurately mapped to the corresponding pixels in the dense feature map, allowing the model's predicted output to accurately correspond to the real labels, significantly reducing localization bias. Combined with the Gaussian kernel scaling factor to balance the positive sample resources of targets at different scales, for densely distributed small targets, the Gaussian kernel scaling factor can adapt to the size of small targets, generating dense supervision signals that focus on target details and strengthening the localization feature learning of small targets. For large targets, the Gaussian kernel scaling factor can adaptively expand the supervision range, taking into account both the overall outline and local details of the target, avoiding localization offset of large targets. This solves the problems of insufficient localization of small targets and excessive resource consumption of large targets in remote sensing scenarios, improves target detection accuracy, reduces problems such as fuzzy localization, missed detection, and localization errors, and enhances the localization accuracy and generalization ability of directional target detection, making it more suitable for various target detection tasks. Attached Figure Description
[0017] To make the content of this invention easier to understand, the invention will be further described in detail below with reference to specific embodiments and accompanying drawings, wherein: Figure 1 The flowchart of the target detection method provided in this application; Figure 2 The target detection network architecture diagram provided for this application; Figure 3 The feature map association algorithm flowchart provided in this application; Figure 4 The flowchart of the positive sample definition algorithm provided in this application; Figure 5 A schematic diagram illustrating the multi-scale prediction results of the target detection model provided in this application at different feature layers; wherein, Figure 5 (a) in the image is a remote sensing image sample containing multi-scale target orientation bounding box labels. Figure 5 (b) in the diagram is a schematic representation of the prediction results of the first layer feature map. Figure 5(c) in the diagram is a schematic representation of the prediction results of the second layer feature map. Figure 5 In the diagram, (d) represents the prediction result of the third layer feature map. Figure 5 (e) in the diagram is a schematic representation of the prediction results of the 4th layer feature map; Figure 6 A schematic diagram comparing the target detection model provided in this application with existing target detection models on different remote sensing images; wherein, Figure 6 (a) in the figure is a schematic diagram of the detection results of the target detection model in the prior art on the first remote sensing image. Figure 6 (b) in the figure is a schematic diagram of the detection results of the target detection model provided in this application on the first remote sensing image. Figure 6 (c) in the diagram is a schematic diagram of the target detection results of the existing target detection model on the second remote sensing image. Figure 6 (d) in the figure is a schematic diagram of the target detection model provided in this application for the detection of the second remote sensing image. Figure 6 (e) in the diagram is a schematic diagram of the target detection results of the existing target detection model on the third remote sensing image. Figure 6 (f) in the figure is a schematic diagram of the detection results of the target detection model provided in this application on the third remote sensing image. Figure 6 In the diagram, (g) is a schematic diagram of the target detection results of the existing target detection model on the fourth remote sensing image. Figure 6 (h) in the diagram is a schematic diagram of the target detection model provided in this application for the detection results of the fourth remote sensing image. Figure 6 In the diagram, (i) is a schematic diagram of the target detection results of the existing target detection model on the fifth remote sensing image. Figure 6 (j) in the figure is a schematic diagram of the detection results of the target detection model provided in this application on the fifth remote sensing image. Detailed Implementation
[0018] The present invention will be further described below with reference to the accompanying drawings and specific embodiments, so that those skilled in the art can better understand and implement the present invention. However, the embodiments described are not intended to limit the present invention.
[0019] The target detection method provided in this application utilizes a feature map association algorithm to obtain the layer identifier and Gaussian kernel scaling factor of each target based on the oriented bounding box annotations of each target in remote sensing image samples. Using a positive sample definition algorithm, a multi-scale sample space of remote sensing image samples is obtained based on the oriented bounding box annotations, layer identifiers, and Gaussian kernel scaling factors of each target in the remote sensing image samples. Finally, the multi-scale sample space is used as supervision information and input together with the remote sensing image samples into the target detection model. Supervision information is generated directly based on the oriented bounding box annotations of the targets, improving the accurate localization cues of the targets. This allows the model to directly learn the target contours and edge features based on the supervision information. By converting sparse oriented bounding boxes into dense Gaussian kernel representations, a general mapping relationship between sparse true labels and dense prediction results is established. Simultaneously, the Gaussian kernel scaling factor is combined to balance the positive sample resources of targets at different scales, addressing the problems of insufficient localization of small targets and excessive resource consumption by large targets in remote sensing scenes, thereby improving target detection accuracy.
[0020] Specifically, please refer to the following: Figure 1 , Figure 2 , Figure 3 and Figure 4 ,in, Figure 1 The diagram shown is a flowchart of the target detection method provided in this application. Figure 2 The diagram shown is a network architecture diagram for object detection provided in this application. Figure 3 The following is a flowchart of the feature map association algorithm provided in this application. Figure 4 The diagram shown is a flowchart of the positive sample definition algorithm provided in this application. The target detection method provided in this application specifically includes S10~S50: S10: The linear size of the target is obtained by taking the arithmetic square root of the number of non-zero pixels in the Gaussian kernel map generated by the oriented bounding box for each target in the remote sensing image sample; the layer identifier to which the target belongs is obtained based on the linear size of the target and the scale range of each level in the feature pyramid in the target detection model.
[0021] Specifically, the width and height of the remote sensing image samples are both 256, and the number of channels is 3. During the training phase, all target categories are uniformly regarded as foreground categories, ignoring the differences between specific categories and focusing only on the location information.
[0022] Let the directional bounding box parameters of a single target in a remote sensing image sample be... , Indicates the coordinates of the center point of the oriented bounding box. , Indicates the width and height of the oriented bounding box. The rotation angle of the oriented bounding box is given. If the number of non-zero pixels in the Gaussian kernel map generated based on the oriented bounding box is n, meaning the area of the target is approximately n, then its linear dimension is... .
[0023] Furthermore, the feature pyramid levels are traversed to determine whether the linear size of the target falls within the scale range of each level. Specifically, the scale range of the i-th level in the feature pyramid is represented as follows: ,in, , This indicates the number of levels in the feature pyramid. This represents the preset standard scale endpoint value of the i-th level, corresponding to the anchor frame size of each level. .
[0024] like and If the target layer is 1, then the layer identifier is 1.
[0025] like and If the target layer is L, then the layer to which the target belongs is identified.
[0026] like and If the target layer is identified as i-1 and i, then the layer identifier is i-1 and i.
[0027] S20: Calculate the Gaussian kernel scaling factor for each target based on its linear size, layer identifier, and number of layers in the feature pyramid; using a two-dimensional Gaussian matrix, generate the Gaussian kernel distribution matrix of the target on its corresponding layer feature map based on the target's oriented bounding box and the Gaussian kernel scaling factor.
[0028] Specifically, the target's Gaussian kernel scaling factor The calculation formula is: , in, This represents the balance coefficient, i.e., the absolute scaling factor; Indicates scale matching terms; Represents the natural constant; This indicates the preset standard size of the target's level i; This indicates scale deviation.
[0029] The mean vector of the Gaussian kernel distribution matrix of the target on its respective layer feature map for: , in, This represents the coordinates of the center point of the target's oriented bounding box. The x-coordinate of the center point of the target's oriented bounding box. The ordinate of the center point of the target's oriented bounding box; The covariance matrix of the Gaussian kernel distribution matrix of the target on its respective hierarchical feature map for: , in, This represents the Gaussian kernel scaling factor, used to control the range of covariance of the Gaussian distribution, so that the generated distribution range is adapted to the scale characteristics of the target's level. Represents a standard two-dimensional rotation matrix; Indicates the target's rotation angle; The scaling matrix is defined as the variance of the Gaussian distribution along the x-axis and y-axis at the front of rotation. Let be the variance of the Gaussian distribution along the x-axis. Let be the variance of the Gaussian distribution along the y-axis; Indicates the width of the target's orientation bounding box; Indicates the height of the target's orientation bounding box; This represents the transpose of a standard two-dimensional rotation matrix.
[0030] Furthermore, using the mean vector Covariance Matrix This can generate a single-scale sample space. Single-scale sample space Represented as: , in, Represents a plotting function; The parameter is and The probability density function of the two-dimensional normal distribution is: , Representing coordinates The probability density value at that location. The normalization constant is Pi Covariance matrix The determinant, Represents an exponential function. Represents the deviation vector. This represents the currently calculated pixel coordinate vector. This represents the mean vector of a Gaussian distribution. The inverse matrix of the covariance matrix; Indicates the canvas size. Indicates the width of the canvas. Indicates the height of the canvas.
[0031] S30: Accumulate the Gaussian kernel distribution matrix of each target on its respective level feature map to the feature layer matrix of the position sample space of the level feature map; based on the region where the value of the target in the Gaussian kernel distribution matrix on its respective level feature map is greater than a preset threshold, assign the region corresponding to the scale matrix of the scale sample space of the level feature map as the orientation bounding box parameter of the target.
[0032] It should be noted that the location sample space for: , , in, Let be the location sample space on the i-th layer feature map, with dimension . L represents the total number of layers in the feature pyramid network, with a value range of [3,5]. Let be the height of the feature map of the i-th layer. The width of the feature map at layer i; Scale sample space for: , , in, Let i be the scale sample space on the feature map of the i-th layer, with dimension i. 5 represents the 5 parameters of the oriented bounding box. .
[0033] Specifically, if the target belongs to level i, then from the location sample space The feature layer matrix indexed to the corresponding i-th level feature map For the Gaussian kernel distribution matrix and Element-wise summation, i.e. ,in, pixel coordinates of the feature layer matrix If the Gaussian kernel distribution matrices of multiple targets overlap at the same pixel coordinates, the value at that position after accumulation can be greater than 1, and no normalization is required. After accumulation, the feature layer matrix of the i-th level feature map is updated and stored in the position sample space again.
[0034] Furthermore, a preset threshold T=0.1 is set, and the Gaussian kernel distribution matrix is traversed. Determine the coordinates of all pixels. Is it greater than T? Determine the coordinates This is the effective location area for the current target; if If the coordinates are determined to be within the background area, no assignment operation will be performed. From the scale sample space Index to the i-th level scale matrix Coordinates within the aforementioned effective positioning area Place, will The five channels are respectively assigned the ground truth bounding box parameters of the current target and the coordinates of the background region. Place, The five channels will remain at their initial value of 0 and will not be modified.
[0035] S40: Based on the location sample space and scale sample space of all level feature maps, obtain the multi-scale sample space of remote sensing image samples.
[0036] S50: Input the remote sensing image samples and their multi-scale sample space into the target detection model, output the multi-scale prediction results and construct the target detection loss function, and then iteratively train the target detection model until the value of the target detection loss function converges. Use the trained target detection model to perform directional target detection on the remote sensing image.
[0037] Specifically, the target detection model includes an encoder for multi-scale feature extraction of input remote sensing image samples and a decoder for outputting multi-scale prediction results based on multi-scale feature maps. The encoder includes a backbone network and a feature pyramid, and the decoder includes two coupled standard convolutional branches. The backbone network supports CNN or Transformer architecture.
[0038] Furthermore, the encoder performs multi-scale feature extraction on the input remote sensing image samples, and outputs a multi-scale feature map. Represented as: , in, , This represents the feature map of the i-th layer. , express Height and width, Indicates the number of channels in the feature map; Indicates the number of levels in the feature pyramid; Represents remote sensing image samples; Indicates encoder; Indicates the backbone network; Represents a feature pyramid; Multi-scale prediction results output by the encoder Represented as: , in, Indicates decoder; , , This indicates the location prediction result. Given a list containing L elements, This represents the position prediction result on the feature map of the i-th layer. Let be a two-dimensional matrix, representing the probability that each pixel in the i-th feature layer is the center of the target. , This represents the height and width of the feature map at layer i; , , Indicates the scale prediction results, Given a list containing L elements, This represents the scale prediction result on the feature map of the i-th layer. The number of parameters representing the target's oriented bounding box.
[0039] Furthermore, the construction of the target detection loss function in step S50 includes: the sum of the cross-entropy distance function between the location prediction result of the remote sensing image sample and the location sample space, and the cross-entropy distance function between the scale prediction result of the remote sensing image sample and the scale sample space, to obtain the target detection loss function.
[0040] Specifically, the target detection loss function Represented as: , in, Represents the location sample space; Represents the scale of the sample space; This represents the cross-entropy distance function.
[0041] The formula for calculating the cross-entropy distance function is as follows: Where A is the predicted value, B is the actual value, and log is the natural logarithm. This represents the loss when the current pixel is the target. This represents the loss when the current pixel is the background.
[0042] Optionally, target detection is performed on remote sensing images using a trained target detection model, including: A new target detector is constructed based on the remote sensing image to be detected, and the parameters of the trained target detection model are loaded into the backbone and neck networks of the new target detector; specifically, a target detector can be constructed for a specific task, such as Oriented R-CNN.
[0043] The parameters of the new target detector are fine-tuned using a new remote sensing image training set. Only the parameters of the detector head in the target detector are updated, while the parameters of the backbone network and feature pyramid are fixed, resulting in the fine-tuned target detector. The remote sensing image to be detected is input into the fine-tuned target detector, which outputs the predicted bounding box parameters and their class probabilities. Redundant predicted bounding boxes are filtered using a non-maximum suppression algorithm to obtain the target localization and detection results.
[0044] To verify the effectiveness of the target detection method provided in this application, embodiments of this application also visualize the localization prediction results of multi-scale targets using the above scheme, such as... Figure 5 The diagram shows the multi-scale prediction results of the target detection model provided in this application at different feature layers; where, Figure 5 (a) in the image is a remote sensing image sample containing multi-scale target orientation bounding box labels. Figure 5 Image (b) in the diagram is a schematic representation of the prediction results of the first layer feature map. It is a high-resolution image corresponding to a shallow feature layer. Figure 5 (c) in the diagram is a schematic representation of the prediction results of the second layer feature map, corresponding to the medium-resolution feature layer. Figure 5 In the diagram, (d) represents the prediction result of the third layer feature map. Figure 5 (e) in the diagram is a schematic diagram of the prediction results of the 4th layer feature map, which has the lowest resolution and corresponds to the deep feature layer.
[0045] from Figure 5 As can be seen in (a), the input remote sensing image contains targets of various scales, including densely distributed small-scale targets and solitary large-scale targets, and the targets can be rotated at arbitrary angles; from Figure 5 As can be seen in (b), the localization prediction results of the first high-resolution feature layer are presented in the form of a Gaussian kernel heatmap. The brightness of the heatmap is concentrated in the small-scale target area, and the Gaussian kernel distribution is highly consistent with the contour and position of the small target, indicating that the feature map of this layer can accurately capture the localization information of small-scale targets; from Figure 5 As can be seen in (c), the predicted heatmap of the second medium-resolution feature layer focuses on medium-scale targets, and the Gaussian kernel range adapts to the size of medium-scale targets, achieving complete coverage and accurate localization of such targets; from Figure 5 As can be seen from (d), the prediction results of the third medium-resolution deep feature layer, for medium to large targets, show a high degree of match between the heatmap distribution and the target's rotation angle and contour shape, demonstrating its ability to adapt to the localization of irregularly shaped targets; from Figure 5As shown in (e), the predicted heatmap of the fourth low-resolution deep feature layer completely covers the large-scale target area, and the Gaussian kernel is evenly distributed and its range fits the outline of the large target, proving that the large receptive field of this feature map can effectively capture the localization information of large-scale targets. Therefore, this application, by combining multi-scale feature layers with Gaussian kernel representation, achieves accurate localization adaptation of different resolution feature layers to targets of corresponding scales. Each feature layer can specifically learn the localization rules of targets at its corresponding scale, fully verifying the effectiveness of the above method. This provides accurate prior knowledge for downstream remote sensing orientation target detection tasks, thereby improving the accuracy and generalization ability of target detection.
[0046] like Figure 6 The diagram shows a comparison of the target detection results of the target detection model provided in this application and existing target detection models on different remote sensing images; wherein, Figure 6 (a) in the figure is a schematic diagram of the detection results of the target detection model in the prior art on the first remote sensing image. Figure 6 (b) in the figure is a schematic diagram of the detection results of the target detection model provided in this application on the first remote sensing image. Figure 6 (c) in the diagram is a schematic diagram of the target detection results of the existing target detection model on the second remote sensing image. Figure 6 (d) in the figure is a schematic diagram of the target detection model provided in this application for the detection of the second remote sensing image. Figure 6 (e) in the diagram is a schematic diagram of the target detection results of the existing target detection model on the third remote sensing image. Figure 6 (f) in the figure is a schematic diagram of the detection results of the target detection model provided in this application on the third remote sensing image. Figure 6 In the diagram, (g) is a schematic diagram of the target detection results of the existing target detection model on the fourth remote sensing image. Figure 6 (h) in the diagram is a schematic diagram of the target detection model provided in this application for the detection results of the fourth remote sensing image. Figure 6 In the diagram, (i) is a schematic diagram of the target detection results of the existing target detection model on the fifth remote sensing image. Figure 6 (j) in the figure is a schematic diagram of the detection results of the target detection model provided in this application on the fifth remote sensing image.
[0047] pass Figure 6 As can be seen, the target detection model provided in this application effectively solves the problems of missed detection, false detection, positioning bias, and scale mismatch in existing methods. In typical remote sensing scenarios such as dense small targets, long strip targets, dense vehicles, large-scale facilities, and dense small buildings included in the DOTA dataset, it shows better detection performance, which fully verifies the effectiveness of the target detection model provided in this application in improving the accuracy and generalization ability of directional target detection.
[0048] Furthermore, this embodiment also compares the detection results of existing baseline models with those before and after introducing the localization-based general representation learning network provided in this application for different datasets. Specifically, the localization-based general representation learning network provided in this application was introduced into the model, and the detection results are shown in Table 1: Table 1
[0049] As shown in Table 1, the model achieved significant performance improvements on all three datasets after adding the localization-based general representation learning network, especially on the DIOR-R dataset which contains large-scale, extreme aspect ratio components. Although the performance was improved without the localization-based general representation learning network compared to the baseline model, it was only a small improvement. The mAP value was further improved after adding the localization-based general representation learning network.
[0050] Based on the target detection method provided in the above embodiments, this application also provides a target detection device, which specifically includes: The target identifier layer acquisition module is used to obtain the linear size of each target based on the arithmetic square root of the number of non-zero pixels in the Gaussian kernel map generated by the oriented bounding box in the remote sensing image sample; and to obtain the layer identifier to which the target belongs based on the linear size of the target and the scale range of each level in the feature pyramid in the target detection model.
[0051] The Gaussian kernel distribution matrix acquisition module is used to calculate the Gaussian kernel scaling factor of each target based on its linear size, layer identifier, and number of layers in the feature pyramid; and using a two-dimensional Gaussian matrix, based on the target's oriented bounding box and the Gaussian kernel scaling factor, to generate the Gaussian kernel distribution matrix of the target on the feature map of its respective layer.
[0052] The multi-scale sample space filling module is used to accumulate the Gaussian kernel distribution matrix of each target on its respective level feature map to the feature layer matrix of the position sample space of the level feature map; based on the region in the Gaussian kernel distribution matrix of the target on its respective level feature map with a value greater than a preset threshold, the corresponding region of the scale matrix of the scale sample space of the level feature map is assigned as the orientation bounding box parameter of the target.
[0053] The multi-scale sample space acquisition module is used to obtain the multi-scale sample space of remote sensing image samples based on the location sample space and scale sample space of all level feature maps.
[0054] The model training and acquisition module is used to input remote sensing image samples and their multi-scale sample space into the target detection model, output multi-scale prediction results and construct the target detection loss function, thereby iteratively training the target detection model until the value of the target detection loss function converges, and using the trained target detection model to perform directional target detection on remote sensing images.
[0055] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the target detection method described above.
[0056] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0057] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0058] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0059] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0060] Obviously, the above embodiments are merely illustrative examples for clear explanation and are not intended to limit the implementation. Those skilled in the art will recognize that other variations or modifications can be made based on the above description. It is neither necessary nor possible to exhaustively list all possible implementations here. However, obvious variations or modifications derived therefrom are still within the scope of protection of this invention.
Claims
1. A target detection method, characterized in that, include: The linear size of a target is obtained by taking the arithmetic square root of the number of non-zero pixels in the Gaussian kernel map generated from the oriented bounding box of each target in the remote sensing image sample. Based on the linear size of the target and the scale range of each level in the feature pyramid of the target detection model, the layer identifier to which the target belongs is obtained; Calculate the Gaussian kernel scaling factor for each target based on its linear size, layer identifier, and number of levels in the feature pyramid. Using a two-dimensional Gaussian matrix, based on the target's oriented bounding box and Gaussian kernel scaling factor, a Gaussian kernel distribution matrix of the target on its respective hierarchical feature map is generated; The Gaussian kernel distribution matrix of each target on its respective level feature map is accumulated and added to the feature layer matrix of the position sample space of that level feature map; Based on the region in the Gaussian kernel distribution matrix of the target on its respective hierarchical feature map where the value is greater than a preset threshold, the region corresponding to the scale matrix of the scale sample space of the hierarchical feature map is assigned as the directional bounding box parameter of the target. Based on the location sample space and scale sample space of all hierarchical feature maps, a multi-scale sample space for remote sensing image samples is obtained. Remote sensing image samples and their multi-scale sample space are input into the target detection model, which outputs multi-scale prediction results and constructs a target detection loss function. The target detection model is then iteratively trained until the value of the target detection loss function converges. The trained target detection model is then used to perform directional target detection on remote sensing images.
2. The target detection method according to claim 1, characterized in that, The scale range of the i-th level in the feature pyramid is represented as: ,in, , This indicates the number of levels in the feature pyramid. This represents the preset standard scale endpoint value of the i-th level. ; like and If the target layer is 1, then the layer identifier is 1; where, Represents the linear size of the target; like and If so, the layer to which the target belongs is identified as L; like and If the target layer is identified as i-1 and i, then the layer identifier is i-1 and i.
3. The target detection method according to claim 2, characterized in that, Gaussian kernel scaling factor of the target The calculation formula is: , in, Indicates the balance coefficient; Indicates scale matching terms; Represents the natural constant; This indicates the preset standard size of the target's level i; Indicates scale deviation; The mean vector of the Gaussian kernel distribution matrix of the target on its respective layer feature map for: , in, This represents the coordinates of the center point of the target's oriented bounding box. The x-coordinate of the center point of the target's oriented bounding box. The ordinate of the center point of the target's oriented bounding box; The covariance matrix of the Gaussian kernel distribution matrix of the target on its respective hierarchical feature map for: , in, This represents the Gaussian kernel scaling factor; Represents a standard two-dimensional rotation matrix; Indicates the target's rotation angle; Represents the scale matrix; Indicates the width of the target's orientation bounding box; Indicates the height of the target's orientation bounding box; This represents the transpose of a standard two-dimensional rotation matrix.
4. The target detection method according to claim 1, characterized in that, The target detection model includes an encoder for multi-scale feature extraction of input remote sensing image samples and a decoder for outputting multi-scale prediction results based on multi-scale feature maps. The encoder includes a backbone network and a feature pyramid, and the decoder includes two coupled standard convolutional branches.
5. The target detection method according to claim 4, characterized in that, The encoder extracts multi-scale features from the input remote sensing image samples and outputs a multi-scale feature map. Represented as: , in, , This represents the feature map of the i-th layer. , express Height and width, Indicates the number of channels in the feature map; Indicates the number of levels in the feature pyramid; Represents remote sensing image samples; Indicates encoder; Indicates the backbone network; Represents a feature pyramid; Multi-scale prediction results output by the encoder Represented as: , in, Indicates decoder; , , This indicates the location prediction result. Given a list containing L elements, This represents the position prediction result on the feature map of the i-th layer. Let be a two-dimensional matrix, representing the probability that each pixel in the i-th feature layer is the center of the target. , This represents the height and width of the feature map at layer i; , , Indicates the scale prediction results, Given a list containing L elements, This represents the scale prediction result on the feature map of the i-th layer. The number of parameters representing the target's oriented bounding box.
6. The target detection method according to claim 5, characterized in that, The target detection loss function is constructed by summing the cross-entropy distance function between the location prediction result of the remote sensing image sample and the location sample space, and the cross-entropy distance function between the scale prediction result of the remote sensing image sample and the scale sample space.
7. The target detection method according to claim 6, characterized in that, Object detection loss function Represented as: , in, Represents the location sample space; Represents the scale of the sample space; This represents the cross-entropy distance function.
8. The target detection method according to claim 1, characterized in that, Utilizing a trained object detection model to perform object detection on remote sensing images includes: A new target detector is constructed based on the remote sensing image to be detected, and the parameters of the trained target detection model are loaded into the backbone network and neck network of the new target detector. The parameters of the new target detector are fine-tuned using a new remote sensing image training set. Only the parameters of the detector head in the target detector are updated, while the parameters of the backbone network and feature pyramid are fixed, resulting in the fine-tuned target detector. The remote sensing image to be detected is input into the fine-tuned target detector, which outputs the predicted bounding box parameters and their class probabilities. Redundant predicted bounding boxes are filtered using a non-maximum suppression algorithm to obtain the target localization and detection results.
9. A target detection device, characterized in that, include: The target identifier layer acquisition module is used to obtain the linear size of each target based on the arithmetic square root of the number of non-zero pixels in the Gaussian kernel map generated by the oriented bounding box in the remote sensing image sample; and to obtain the layer identifier to which the target belongs based on the linear size of the target and the scale range of each level in the feature pyramid in the target detection model. The Gaussian kernel distribution matrix acquisition module is used to calculate the Gaussian kernel scaling factor of each target based on its linear size, layer identifier, and number of layers in the feature pyramid; and using a two-dimensional Gaussian matrix, based on the target's oriented bounding box and the Gaussian kernel scaling factor, to generate the Gaussian kernel distribution matrix of the target on the feature map of its respective layer. The multi-scale sample space filling module is used to accumulate the Gaussian kernel distribution matrix of each target on its respective layer feature map to the feature layer matrix of the position sample space of the feature map of that layer; Based on the region in the Gaussian kernel distribution matrix of the target on its respective hierarchical feature map where the value is greater than a preset threshold, the region corresponding to the scale matrix of the scale sample space of the hierarchical feature map is assigned as the directional bounding box parameter of the target. The multi-scale sample space acquisition module is used to obtain the multi-scale sample space of remote sensing image samples based on the location sample space and scale sample space of all level feature maps. The model training and acquisition module is used to input remote sensing image samples and their multi-scale sample space into the target detection model, output multi-scale prediction results and construct the target detection loss function, thereby iteratively training the target detection model until the value of the target detection loss function converges, and using the trained target detection model to perform directional target detection on remote sensing images.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the steps of the target detection method according to any one of claims 1 to 8.