A data processing method and device based on a low-orbit target and related products
By acquiring multi-dimensional information and graph neural network matching, combined with a three-state tracker management mechanism, the problem of low matching accuracy in multi-low orbit target tracking in existing technologies is solved, achieving higher accuracy and more stable target tracking.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BAIYANG TIMES (BEIJING) TECH CO LTD
- Filing Date
- 2026-05-25
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies for multi-low orbit target tracking rely solely on the intersection-union ratio (IUU) of the detected position of the tracked target in the current frame and the tracked target in historical frames for matching, resulting in low matching accuracy and failing to effectively consider the influence of factors such as illumination, attitude changes, or motion information.
By acquiring multi-dimensional information about the target being tracked in the current frame, including spatial, motion, and appearance information, and combining graph neural networks and the Hungarian algorithm, the similarity between the tracked target and the tracked target in historical frames is determined, thereby matching the tracker. A three-state tracker management mechanism and Kalman filtering are used to correct the position.
It significantly improves the matching accuracy of multi-low orbit target tracking, reduces the impact of illumination, attitude changes and motion information on matching, enhances tracking stability and robustness, and reduces the probability of false detection.
Smart Images

Figure CN122244104A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of low-Earth orbit target tracking technology, and in particular to a data processing method, apparatus and related products based on low-Earth orbit targets. Background Technology
[0002] Low Earth orbit (LEO) targets refer to targets operating in low Earth orbit (LEO), such as LEO satellites or LEO space debris. LEO specifically refers to the orbits of spacecraft orbiting the Earth at altitudes between 200 km and 2000 km. In scenarios such as space monitoring and space situational awareness, it is necessary to track LEO targets to achieve functions such as target identification, orbit prediction, and collision warning.
[0003] Currently, multi-target tracking is often performed using the following methods: Kalman filtering is used to predict the positions of targets in the current frame based on the target positions in the previous frame, resulting in predicted positions for each target in the current frame. A fast region convolutional neural network (GRN) is used to train a model to detect the positions of targets in the current frame, yielding the detected positions for each target. Finally, the predicted and detected positions of each target in the current frame are combined with the intersection-over-union (IoU) ratio and the Hungarian algorithm to match the trackers for the targets in the current frame, and an identifier is assigned to each target in the current frame.
[0004] However, current tracking methods rely solely on the intersection-union ratio (IUU) of the detected position of the tracked target in the current frame and the tracked target in historical frames to complete tracker matching. This results in low matching accuracy in scenarios involving the tracking of multiple low-orbit targets. Summary of the Invention
[0005] This application provides a data processing method, apparatus, and related products based on low-orbit targets, which can improve matching accuracy.
[0006] In a first aspect, embodiments of this application provide a data processing method based on low-Earth orbit targets, the method comprising: Acquire multi-dimensional information of the tracked target in the current frame; the multi-dimensional information includes the spatial information, appearance information, and motion information of the tracked target; the tracked target is any low-orbit target in the current frame. Based on the multi-dimensional information, the information similarity between the tracked target and the tracked target in the historical frame is determined, where the historical frame is the image frame acquired before the current frame; Based on the similarity, the tracker of the tracked target is determined from the trackers of the tracked target.
[0007] Optionally, the similarity is a comprehensive similarity, and determining the information similarity between the tracked target and the tracked target in historical frames based on the multi-dimensional information includes: Based on the multi-dimensional information, the intersection-union ratio, minimum similarity distance, and information similarity are determined. Wherein, the intersection-over-union ratio (IoU) indicates the IoU between the tracked target and the tracked target in the most recent historical frame; the minimum similarity distance indicates the minimum similarity distance among the similarity distances between the tracked target and the tracked target in the historical frame; the information similarity indicates the information similarity between the tracked target and the tracked target in the most recent historical frame; the information similarity includes the similarity based on the motion information of the tracked target and the motion information of the tracked target in the historical frame; the most recent historical frame is the historical frame whose acquisition time is closest to that of the current frame; The comprehensive similarity is determined based on the intersection-union ratio, the minimum similarity distance, and the information similarity.
[0008] Optionally, determining the comprehensive similarity based on the intersection-union ratio, the minimum similarity distance, and the information similarity includes: ); in, The first weight parameter is adaptively adjusted according to the state of the tracked target. The second weighting parameter is adaptively adjusted according to the state of the tracked target. The third weighting parameter is adaptively adjusted based on the state of the tracked target. , and All are numbers greater than 0 and less than 1. , and The sum of the three is 1. The minimum similarity distance, The intersection-union ratio, and The similarity of the information is denoted as .
[0009] Optionally, determining the tracker of the tracked target from the trackers of the tracked target based on the similarity includes: Based on the similarity, an association graph is obtained; the association graph includes target nodes, tracker nodes, and edges between the target nodes and the tracker nodes; The target node indicates the tracking target, including multi-dimensional information of the tracking target; the tracker node indicates the tracker of the tracked target, including multi-dimensional information of the tracked target in the historical frame; and the attributes of the edge indicate the similarity between the tracking target and the tracked target. The association graph is processed using a graph neural network to obtain the association probability between the target node and the tracker node; Based on the Hungarian algorithm and the association probability, the tracker of the tracked target is determined from the trackers of the tracked target.
[0010] Optionally, the spatial information of the tracking target includes the position of the tracking target in the current frame, and obtaining the spatial information of the tracking target in the current frame includes: Determine the initial position of the tracked target in the current frame; The predicted position of the tracked target is determined based on the Kalman filter algorithm. Based on the predicted position of the tracked target, the initial position of the tracked target is corrected to obtain the position of the tracked target in the current frame.
[0011] Optionally, determining the initial position of the tracking target in the current frame includes: Preliminary features of the tracked target are extracted from the current frame using the YOLO backbone network; The preliminary features are encoded using a DETR encoder to obtain the position information of the tracked target in the current frame; the position information includes the initial position of the tracked target and the corresponding detection confidence. The determination of the predicted location of the tracked target based on the Kalman filter algorithm includes: When the detection confidence level is higher than or equal to a preset confidence threshold, the predicted position of the tracked target in the current frame is determined based on the Kalman filter algorithm.
[0012] Optionally, the method further includes: If the tracked target does not match a tracker from the tracked target's trackers, a temporary state tracker is created; For the temporary state tracker, if it can be matched in the low-orbit target matching of the subsequent N1 consecutive frames, the temporary state tracker is adjusted to the reserved state tracker, where N1 is a positive integer; otherwise, the temporary state tracker is deleted. For the reserved state tracker, if no low-orbit target is matched in the subsequent N2 consecutive frames, but the overlap between the predicted trajectory and the detection area of the current frame is greater than or equal to a preset overlap threshold, the reserved state tracker is adjusted to a dormant state tracker, where N2 is a positive integer. For the sleep state tracker, if it is matched during sleep, it is woken up and participates in matching; if it is not matched during sleep, the sleep state tracker is deleted.
[0013] Secondly, this application provides a data processing apparatus based on a low-Earth orbit target, the apparatus comprising: The acquisition unit is used to acquire multi-dimensional information of the tracked target in the current frame; the multi-dimensional information includes the spatial information, appearance information and motion information of the tracked target; the tracked target is any low-orbit target in the current frame. The determining unit is configured to determine the information similarity between the tracking target and the tracked target in the historical frame based on the multi-dimensional information, wherein the historical frame is an image frame acquired before the current frame; A matching unit is configured to determine the tracker of the tracked target from among the trackers of the tracked target based on the similarity.
[0014] Thirdly, embodiments of this application also provide a computer storage medium for storing a computer program; when the computer program is executed, it is used to perform the method described in any of the first aspects.
[0015] Fourthly, embodiments of this application also provide a computer program product containing instructions that, when the computer program product is run on at least one computing device, cause the at least one computing device to perform the method as described in any of the first aspects.
[0016] Fifthly, embodiments of this application provide an electronic device, including a processor and a memory; The memory is used to store a computer program; the processor is used to execute the method as described in any of the first aspects based on the computer program.
[0017] Beneficial effects: This application provides a data processing method, apparatus, and related products based on low-Earth orbit (LEO) targets. When executing the method, firstly, the spatial information, motion information, and appearance information of the target being tracked in the current frame are acquired. Based on these information, the similarity between the tracked target and tracked targets in historical frames of the current frame is determined. Based on this similarity, the tracker of the target being tracked in the current frame is matched from the trackers of the tracked targets. By fully considering the spatial information, motion information, or appearance information of the tracked target when determining the similarity, the matching accuracy can be significantly improved compared to matching based solely on the intersection-union ratio (IUU) of spatial location. Attached Figure Description
[0018] Figure 1 A flowchart of a data processing method provided in an embodiment of this application; Figure 2 A schematic diagram of an association diagram provided for an embodiment of this application; Figure 3 A flowchart of another data processing method provided in this application embodiment; Figure 4 This is a schematic diagram of the structure of a data processing device provided in an embodiment of this application. Detailed Implementation
[0019] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. The terminology used in the following embodiments is for the purpose of describing specific embodiments only and is not intended to be a limitation of this application. As used in the specification and appended claims of this application, the singular expressions "a," "an," "the," "the," "the," and "this" are intended to also include expressions such as "one or more," unless the context clearly indicates otherwise.
[0020] It should also be understood that in the embodiments of this application, "one or more" refers to one, two, or more; "and / or" describes the correspondence between associated objects, indicating that three relationships can exist; for example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following associated objects have an "or" relationship.
[0021] References to "one embodiment" or "some embodiments" as described in this specification mean that one or more embodiments of this application include a specific feature, structure, or characteristic described in connection with that embodiment. Therefore, the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in still other embodiments," etc., appearing in different parts of this specification do not necessarily refer to the same embodiment, but rather mean "one or more, but not all, embodiments," unless otherwise specifically emphasized. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless otherwise specifically emphasized.
[0022] The "multiple" mentioned in the embodiments of this application refers to two or more. It should be noted that in the description of the embodiments of this application, terms such as "first" and "second" are used only for the purpose of distinguishing descriptions and should not be construed as indicating or implying relative importance, nor should they be construed as indicating or implying order.
[0023] In existing technologies, detection-based multi-target tracking methods are implemented in the following ways: Step 1: Using the Kalman filter method, predict the position of the target in the current frame using the target position in the previous frame, and obtain the predicted position of each target in the current frame.
[0024] In practical use, the position of the target in the image frame is identified by bounding box information. For example, the bounding box is a rectangular bounding box, represented by a quadruple (x1, y1, x2, y2), where x1 and y1 are the coordinates of the first corner of the bounding box, and x2 and y2 are the coordinates of the second corner of the bounding box. The first corner and the second corner are the diagonals of the bounding box, so the position of the target in the image frame can be represented as (x1, y1, x2, y2).
[0025] For example, if two targets, target A and target B, were tracked in the previous frame, and the tracker for target A includes target A's identifier A, and the tracker for target B includes target B's identifier B, and the tracked targets detected in the current frame include target 1 and target 2, then using the Kalman filter algorithm, the predicted position of target A in the current frame can be determined using the position of target A in the previous frame, and the predicted position of target B in the current frame can be determined using the position of target B in the previous frame. For example, the predicted position of target A in the current frame is the predicted bounding box of target A (x11, y11, x21, y21), and the predicted position of target B in the current frame is the predicted bounding box of target B (x12, y12, x22, y22).
[0026] Step 2: Train the model using Faster R-CNN to detect the location of the target in the current frame and obtain the detection location of each target in the current frame.
[0027] Referring to the example above, the detection bounding box of target 1 is (x11', y11', x21', y21'), and the detection bounding box of target 2 is (x12', y12', x22', y22').
[0028] Step 3: Calculate the cross-union ratio (CUB) of the predicted location and the detection location of each target in the current frame.
[0029] Specifically, the intersection area and union area of the predicted bounding box and the detected bounding box of each target are calculated. The ratio of the intersection area to the union area is the cross-union ratio of the predicted position and the detected position of the target in the current frame. It measures the position matching degree of the target in the current frame and the previous frame. The higher the value, the higher the matching degree of the two bounding boxes.
[0030] For example, the cross-union ratio (CUB) between the predicted bounding box of target A and the detected bounding box of target 1 is calculated to be 0.8, the CUB between the predicted bounding box of target A and the detected bounding box of target 2 is 0.1, the CUB between the predicted bounding box of target B and the detected bounding box of target 1 is 0.05, and the CUB between the predicted bounding box of target B and the detected bounding box of target 2 is 0.75.
[0031] Step 4: Using the Hungarian algorithm and the calculated intersection-union ratio, match the tracker of the target in the current frame and assign an identity to the target in the current frame.
[0032] The Hungarian algorithm uses the maximum intersection-union ratio (MUC) as a basis to match trackers for the target in the current frame. For example, the tracker matched for target 1 is the tracker for target A, and the tracker matched for target 2 is the tracker for target B. The identifier A in the tracker for target A is used as the identifier for target 1, and the identifier B in the tracker for target B is used as the identifier for target 2.
[0033] It should be noted that in target tracking, each identifiable target carries a tracker to record information such as the target's historical location, movement trajectory, and identification.
[0034] For ease of explanation, the target that needs to be determined by the tracker in the current frame will be called the tracking target, and the target in the historical frame will be called the tracked target.
[0035] However, current tracking methods only match the optimal tracker for the target in the current frame based on the intersection-union ratio (IUU) between the detection position of the tracked target in the current frame and the tracked target in historical frames. They do not consider the influence of factors such as illumination, attitude changes, or motion information on the matching. In tracking scenarios with multiple low-orbit targets, these factors can also affect the matching effect, resulting in poor matching accuracy.
[0036] Low Earth orbit (LEO) targets refer to targets in low Earth orbit, such as LEO satellites or space debris. In practical use, LEO refers to the orbit of a spacecraft orbiting the Earth at an altitude between 200 km and 2000 km. Multiple LEO target tracking scenarios refer to situations where an image includes multiple LEO targets, and one or more of these targets are tracked.
[0037] To address the aforementioned issues, this application provides a data processing method that, by comprehensively considering appearance information and motion information, matches the optimal tracker for the target in the current frame from the trackers of the tracked target in historical frames, thereby improving matching accuracy.
[0038] The following description, in conjunction with the accompanying drawings, provides a detailed and complete explanation of a data processing method based on a low-Earth orbit (LEO) target, according to embodiments of this application. In these embodiments, the data processing object is a LEO target, and the execution entity of the data processing method is an electronic device as an example.
[0039] It should be noted that the data processing method provided in this application can also be used for other electronic devices or modules with data processing functions, such as servers, laptops, etc., and the embodiments of this application are not limited to this.
[0040] Appendix Figure 1 A flowchart of a data processing method provided in this application embodiment is shown. The method includes the following steps: S10, the electronic device acquires multi-dimensional information of the tracked target in the current frame.
[0041] The tracked target refers to any low-orbit target in the current frame.
[0042] The multi-dimensional information of the tracked target includes its spatial information, appearance information, and motion information in the current frame. Spatial information represents the target's position in the current frame. Motion information represents the target's motion state, including its velocity, acceleration, and / or trajectory. Appearance information represents the target's inherent visual characteristics, including, but not limited to, the shape, grayscale distribution, contour structure, surface texture, and / or local component structure of a low-orbit target.
[0043] The current frame is an image frame captured by the electronic device at the target time, including at least one low-orbit target. In actual use, the target time is the current time.
[0044] S20, the electronic device determines the similarity between the tracked target and the tracked target in the historical frame based on the multi-dimensional information.
[0045] The historical frames of the current frame are N image frames acquired by the electronic device before the target time, where N is a positive integer.
[0046] In the embodiments of this application, the electronic device can determine the similarity between the tracked target in the current frame and the tracked target in the historical frames of the current frame based on the spatial information, motion information and / or appearance information of the tracked target in the current frame.
[0047] When determining similarity, fully considering the spatial, motion, and appearance information of the target helps reduce the impact of lighting, pose changes, or motion information on matching, and improves matching accuracy, compared to matching based solely on the intersection-union ratio of spatial location.
[0048] In practical use, electronic devices can determine the similarity between the tracked target in the current frame and the tracked target in the historical frames of the current frame based on the spatial information, motion information and appearance information of the tracked target, and based on the target similarity measurement method that integrates the spatial information, motion information and appearance information of the tracked target.
[0049] For ease of explanation, the following description uses the j-th target in the current frame (hereinafter referred to as the j-th target) as the tracking target and the target corresponding to tracker i in the historical frame (hereinafter referred to as the i-th target) as the tracked target. Here, j and i are both positive integers. Specifically, the electronic device can determine the similarity between the j-th target and the i-th target through the following steps: Step A1: The electronic device first determines the minimum similarity distance between the j-th target and the i-th target. , intersection and comparison and motion information similarity .
[0050] in, Let be the minimum similarity distance between the j-th target and the i-th target in a historical frame. This distance describes the similarity between the apparent information of the j-th target and the apparent information of the i-th target in the historical frame. The more similar the apparent information, the lower the minimum similarity distance. The larger.
[0051] in, The calculation is shown in formula (1): (1) in, This represents the transpose of the feature description vector of the j-th target. R represents the feature description vector of the i-th target in the k-th historical frame of the current frame, and R represents the set of historical frames.
[0052] Among them, intersection and union ratio The ratio of the target intersection to the target union is used to describe the spatial similarity between the j-th target and the i-th target. The target intersection is the positional intersection of the most recent historical frames containing the j-th target and the i-th target, and the target union is the positional union of the most recent historical frames containing the j-th target and the i-th target. The most recent historical frame is the historical frame in R that is closest to the target at that time.
[0053] Information similarity The similarity between information describing the j-th target and information describing the i-th target includes motion information. Additionally, this information may also include appearance information and spatial information.
[0054] Step A2: The electronic device according to , and Determine the comprehensive similarity d between the j-th target in the current frame and the i-th target in a historical frame. Where d is... Positively correlated with and There is a negative correlation.
[0055] Specifically, the electronic device can determine the comprehensive similarity d according to formula (2): (2) in, , and All of these are weight parameters that are dynamically adjusted based on the state of the tracked target. , and All are numbers greater than 0 and less than 1. , and The sum of the three is 1. In practical use, if the tracked target is in an occluded state, Less than ,and Less than .For example, Adjusted to 0.1 Adjusted to 0.6 and Adjust to 0.3. If the tracked target is in a state of rapid motion, Less than ,and Less than ,For example, Adjusted to 0.3 Adjusted to 0.3 and Adjust to 0.4.
[0056] S30, based on the similarity between the tracked target and the tracked target in the historical frame, determine the tracker of the low-orbit target from the trackers of the tracked targets in the historical frame.
[0057] In one example, the electronic device can use the similarity between the tracked target and the tracked target in the historical frame as elements of two-dimensional data. The rows of the two-dimensional data represent the identifiers of the trackers of the tracked targets in the historical frames, and the columns represent the tracked targets in the current frame. A similarity threshold t is set, and the magnitude of each element in the two-dimensional data relative to t is determined to match the tracker for the tracked target in the current frame. Specifically, this includes the following steps: Find the smallest element in the two-dimensional array and record its index (i1, j1). Check if the value of the smallest element is less than t. If it is less than t, assign the tracker i1 identifier to the target j1 and proceed to the next step; otherwise, terminate. Discard the elements in row i and column j of the two-dimensional array. Find the remaining smallest element in the two-dimensional array and check if its value is less than the threshold t. If it is true, record its index (i2, j2) and assign the tracker i2 identifier to the target j2 at this point; otherwise, terminate. Continue until all rows or columns of the two-dimensional array are discarded, at which point the matching ends.
[0058] In another example, the electronic device can combine graph neural networks and the Hungarian algorithm to match the tracker of the target in the current frame with the tracked target in historical frames based on the similarity between the tracked target in the current frame and the tracked target in historical frames of the current frame. The method includes the following steps: Step B1: Obtain the association graph.
[0059] In this embodiment, the electronic device constructs an association graph using the tracked target in the current frame as the target node and the trackers corresponding to the tracked targets in historical frames as tracker nodes. For example, if the detected targets in the current frame include target 1 and target 2, and the historical trackers include the tracker identified as target A, the tracker identified as target B, and the tracker identified as target C, the constructed association graph is as follows: Figure 2 As shown.
[0060] In this association graph, the node features of the target node include multi-dimensional information about the tracked target, and the node features of the tracker node include multi-dimensional information about the tracker in historical frames.
[0061] Furthermore, in this association graph, the edges of the association graph are the connecting edges between the target node and the tracker node, and the attributes of the edges include the similarity between the target corresponding to the target node and the target corresponding to the tracker node.
[0062] Step B2: Process the association graph using a graph neural network to obtain the association probabilities between nodes.
[0063] The correlation probability between nodes refers to the correlation probability between a target node and a tracker node. The higher the correlation between a target node and a tracker node, the better the match between the tracking target corresponding to the target node and the tracker corresponding to that tracker node.
[0064] In this embodiment of the application, the electronic device can process the association graph based on the graph neural network to determine the association probability between nodes.
[0065] For example, a graph neural network is a two-layer graph convolutional network. The first layer performs graph convolution aggregation on node features, fusing multi-dimensional information of tracker nodes and outputting the fused features of the nodes. The second layer, based on the fused node features, corrects the similarity of edges in the association graph. The sigmoid activation function maps the corrected similarity to between 0 and 1 to obtain the association probability between nodes.
[0066] Optionally, in step B3, the weight parameters are adjusted using the association probabilities between nodes. , and This reduces ambiguity in association caused by target occlusion and feature similarity.
[0067] For example, if in the initial weights, , , The j-th target in the current frame and the i-th target in a historical frame It is 0.9. It is 0.2, and The correlation probability between the target node corresponding to the j-th target in the current frame and the tracker node corresponding to the i-th target in a historical frame is 0.9. The weight parameters are adjusted based on the node correlation probability. , and .For example, The contribution score was 0.9. 0.5 = 0.45 The contribution score was 0.06. The contribution score was 0.16. The contribution scores of each factor were adjusted by multiplying the association probability by the contribution score. The contribution score was 0.405. The contribution score was 0.054. The contribution score was 0.144. The adjusted total score was 0.603. The adjusted weight parameters can be obtained by dividing the single-factor adjusted score by the total score. , , 4.
[0068] It should be noted that the above is only an illustrative example, and adjustments can be made as needed in actual use.
[0069] Step B4: Convert the association probability of the node into a cost matrix, solve for the minimum cost matching using the Hungarian algorithm, and match the historical trackers of each target in the current frame.
[0070] The association probability of nodes can be used as the cost matrix.
[0071] In this embodiment of the application, the minimum cost of each target in the current frame can be matched from the cost matrix using the Hungarian algorithm. For targets whose minimum cost is less than or equal to the similarity threshold, the matching of the target is terminated, that is, no more historical trackers are assigned to the target.
[0072] In summary, the data processing method for low-Earth orbit targets provided in this application first acquires the spatial, motion, and appearance information of the target being tracked in the current frame. Based on this information, the similarity between the tracked target and targets tracked in historical frames of the current frame is determined. Based on this similarity, the tracker of the target being tracked in the current frame is matched from the trackers of the previously tracked targets. By fully considering the spatial, motion, and appearance information of the target when determining similarity, the matching accuracy can be significantly improved compared to matching based solely on the intersection-union ratio (IUU) of spatial location.
[0073] Furthermore, this application embodiment also provides another data processing method, in which the tracker is managed by a three-state management mechanism, and the position of the tracked target in the current frame is predicted and corrected, thereby improving the detection accuracy of low-orbit small targets and low-confidence targets and reducing the probability of false detection.
[0074] Appendix Figure 3 A flowchart illustrating another data processing method provided in an embodiment of this application. Figure 3 As shown, the method includes the following steps: S310, obtain the initial position of the tracked target in the current frame.
[0075] Among them, electronic devices can use attention-enhanced detection models to detect the tracked target in the current frame and obtain the initial position of the tracked target in the current frame.
[0076] In practical applications, the tracked target can be considered a valid target. The attention-enhanced detection model can be a fusion model of DETR and YOLO. Specifically, the electronic device uses the backbone network of YOLO (e.g., YOLOv8) to extract preliminary features of the target in the current frame to quickly locate potential target regions. Then, the preliminary features are input into the DETR encoder. The DETR encoder constructs the association between the target and global image frames, outputting the target's position coordinates and detection confidence to eliminate false detections caused by low-orbit small targets and background interference. The global image frames include the current frame and historical frames acquired before the current frame. Low-confidence targets with detection confidence below a confidence threshold are filtered out, and targets with detection confidence greater than or equal to the confidence threshold are considered valid targets.
[0077] Example Description: The current frame with a resolution of 1024×1024 is input into an attention-enhanced detection model that includes a DETR encoder and a YOLO backbone network. The YOLO backbone network is a YOLOv8 backbone network, used to output feature maps at three scales. The DETR encoder and decoder both employ a 6-layer transformer, with 8 attention heads and a confidence threshold set to 0.7. After filtering out low-confidence noise, the initial positions of the 8 valid targets in the current frame are output.
[0078] S320: Using the predicted position of the tracked target in the current frame, correct the initial position of the tracked target and obtain the final position of the tracked target.
[0079] In this embodiment of the application, the electronic device can use the Kalman filter method to predict the position of the tracked target in the current frame and obtain the predicted position of the tracked target.
[0080] Then, the electronic device uses the predicted position of the tracked target to correct the initial position of the tracked target in the current frame obtained in step S310, and obtains the final position of the tracked target.
[0081] The Kalman filter method predicts the position of the tracked target in the current frame by: for a target, selecting frames that tracked the target from the historical frames of the current frame, and selecting the frame closest to the current frame from these frames as the target frame, and predicting the target's position in the current frame based on the target's position in the target frame, thereby obtaining the predicted position of the target.
[0082] In one example, the electronic device can perform a weighted fusion of the predicted and initial positions of the tracked target. The weights are dynamically assigned based on the gain of a Kalman filter to suppress errors caused by detection noise, jitter, false detections, missed detections, or positional jumps, thereby obtaining a smoother, more stable final position that more closely approximates the true trajectory.
[0083] Furthermore, in the embodiments of this application, the Kalman filtering method can also output motion information such as velocity vectors and acceleration vectors.
[0084] S330: Electronic device acquires feature description vectors of the tracked target.
[0085] Electronic devices can utilize convolutional neural networks to obtain a feature description vector of the tracked target based on its final position. This feature description vector is a high-dimensional vector indicating the target's appearance and motion information, reflecting its distinctive features such as texture, structure, and / or motion state.
[0086] For example, the backbone of the convolutional neural network uses ResNet50 to downsample the target region five times, outputting a 256-dimensional global appearance information vector. A ViT encoder is constructed with 16... 16 The target image patch of size 3 is transformed into a 768-dimensional vector through an embedding layer. This vector is then processed through a 6-layer transformer and an encoder with 8 attention heads, outputting a 64-dimensional local appearance information vector. The motion information vector is normalized to the [0,1] interval, resulting in 4-dimensional motion information vectors. These 3 types of feature vectors (256+64+4=320 dimensions) are concatenated and passed through a fully connected layer (e.g., a fully connected layer with ReLU activation function) to output a 320-dimensional feature description vector.
[0087] S340, determine the similarity between the tracked target and the tracked target in historical frames.
[0088] For details on the implementation, please refer to the above text, which will not be repeated here.
[0089] S350: Determine the tracker of the target from the trackers of the target being tracked in the historical frames.
[0090] For the specific implementation details, please refer to S30 described above, which will not be repeated here.
[0091] S360: Manages trackers based on a three-state tracker management mechanism.
[0092] The three states are temporary, reserved, and dormant.
[0093] For a current target that does not match a tracker, a new tracker is created and set to a temporary state, recording its initial features and motion trajectory. For the new tracker, observe the subsequent N1 frames (N1 is a positive integer, e.g., N1 = 4). If the tracker is matched in all N1 consecutive target matches, the new tracker is set to a reserved state. If no match is found, meaning there are instances where the tracker is not matched in any of the N1 consecutive target matches, the new tracker is set to a deleted state, and its identifier is deregistered. For clarity, trackers in a temporary state are called temporary state trackers, trackers in a reserved state are called reserved state trackers, and trackers in a dormant state are called dormant state trackers.
[0094] For a retained-state tracker, if no match is found for N2 consecutive frames (N2 is a positive integer, e.g., N2 is 3), but the overlap between the predicted position of the tracked target in the current frame and the detected region in the current frame is greater than or equal to a preset overlap threshold (e.g., 0.3), then the retained-state tracker is switched to a dormant state tracker. The dormant duration can be adjusted as needed, for example, set to 5 frames. If the retained-state tracker is dormant and a match is found within N3 frames, it is immediately awakened and its status restored. If a match is found more than N3 frames (e.g., N3 is 5), the tracker is deleted.
[0095] For trackers in a dormant state, they will directly participate in matching after waking up. If they fail to match for another N2 consecutive frames, the tracker will be deleted.
[0096] In summary, the data processing method provided in this application, by employing an attention-enhanced detection model that integrates DETR and YOLO, effectively eliminates false detection problems caused by small low-Earth orbit targets and background interference, significantly improving the detection accuracy of low-Earth orbit targets. Simultaneously, the introduction of a three-state tracker management mechanism—temporary state, reserved state, and dormant state—especially the newly added dormant state design, effectively solves the problem of identity loss in long-term occlusion scenarios for low-Earth orbit targets, further improving the stability and robustness of low-Earth orbit target tracking and meeting the requirements for high-precision and high-reliability low-Earth orbit target tracking.
[0097] Furthermore, embodiments of this application also provide a data processing apparatus for performing attached... Figures 1-3 The data processing method shown.
[0098] Appendix Figure 4 This application provides a schematic diagram of the structure of a data processing apparatus 400, which includes: The acquisition unit 401 is used to acquire multi-dimensional information of the tracked target in the current frame; the multi-dimensional information includes the spatial information, appearance information and motion information of the tracked target; the tracked target is any low-orbit target in the current frame. The determining unit 402 is used to determine the information similarity between the tracking target and the tracked target in the historical frame based on the multi-dimensional information, wherein the historical frame is an image frame acquired before the current frame; The matching unit 403 is used to determine the tracker of the tracked target from the trackers of the tracked target based on the similarity.
[0099] Optionally, the similarity is a comprehensive similarity, and determining the information similarity between the tracked target and the tracked target in historical frames based on the multi-dimensional information includes: Based on the multi-dimensional information, the intersection-union ratio, minimum similarity distance, and information similarity are determined. Wherein, the intersection-over-union ratio (IoU) indicates the IoU between the tracked target and the tracked target in the most recent historical frame; the minimum similarity distance indicates the minimum similarity distance among the similarity distances between the tracked target and the tracked target in the historical frame; the information similarity indicates the information similarity between the tracked target and the tracked target in the most recent historical frame; the information similarity includes the similarity based on the motion information of the tracked target and the motion information of the tracked target in the historical frame; the most recent historical frame is the historical frame whose acquisition time is closest to that of the current frame; The comprehensive similarity is determined based on the intersection-union ratio, the minimum similarity distance, and the information similarity.
[0100] Optionally, determining the comprehensive similarity based on the intersection-union ratio, the minimum similarity distance, and the information similarity includes: ); in, , and All of these are weight parameters that are adaptively adjusted based on the state of the tracked target. , and All are numbers greater than 0 and less than 1. , and The sum of the three is 1. The minimum similarity distance, The intersection-union ratio, and The similarity of the information is denoted as .
[0101] Optionally, determining the tracker of the tracked target from the trackers of the tracked target based on the similarity includes: Based on the similarity, an association graph is obtained; the association graph includes target nodes, tracker nodes, and edges between the target nodes and the tracker nodes; The target node indicates the tracking target, including multi-dimensional information of the tracking target; the tracker node indicates the tracker of the tracked target, including multi-dimensional information of the tracked target in the historical frame; and the attributes of the edge indicate the similarity between the tracking target and the tracked target. The association graph is processed using a graph neural network to obtain the association probability between the target node and the tracker node; Based on the Hungarian algorithm and the association probability, the tracker of the tracked target is determined from the trackers of the tracked target.
[0102] Optionally, the spatial information of the tracking target includes the position of the tracking target in the current frame, and obtaining the spatial information of the tracking target in the current frame includes: Determine the initial position of the tracked target in the current frame; The predicted position of the tracked target is determined based on the Kalman filter algorithm. Based on the predicted position of the tracked target, the initial position of the tracked target is corrected to obtain the position of the tracked target in the current frame.
[0103] Optionally, determining the initial position of the low-orbit target in the current frame includes: Preliminary features of the low-orbit target are extracted from the current frame using the YOLO backbone network; The preliminary features are encoded using a DETR encoder to obtain the position information of the low-orbit target in the current frame; the position information includes the initial position of the low-orbit target and the corresponding detection confidence. The method of determining the predicted position of the low-orbit target in the current frame based on the Kalman filter algorithm includes: When the detection confidence level is higher than or equal to a preset confidence threshold, the predicted position of the low-orbit target in the current frame is determined based on the Kalman filter algorithm.
[0104] Optionally, the device 400 includes a three-state management unit, which is configured to: create a temporary state tracker for a low-orbit target that has not been matched with a tracker; adjust the temporary state tracker to a reserved state tracker if the temporary state tracker is matched successfully for N1 consecutive frames, where N1 is a positive integer; otherwise, delete the temporary state tracker; adjust the reserved state tracker to a dormant state tracker if the reserved state tracker is not matched successfully for N2 consecutive frames, but the overlap between the predicted trajectory and the detection area of the current frame is greater than or equal to a preset overlap threshold; the dormant state tracker's dormant duration is set to N3 frames, where N2 and N3 are positive integers; wake up and participate in matching if the dormant state tracker is matched successfully during the dormant period; delete the dormant state tracker if it is not matched during the dormant period.
[0105] The data processing apparatus provided in this application embodiment can be used to first acquire the spatial information, motion information, and appearance information of the target being tracked in the current frame. Based on the position, motion information, and appearance information of the target being tracked in the current frame, the similarity between the target being tracked in the current frame and low-Earth orbit targets in historical frames of the current frame is determined. Based on this similarity, historical trackers of the target being tracked in the current frame are matched. By fully considering the spatial information, motion information, and appearance information of the low-Earth orbit targets in determining the similarity between the target being tracked in the current frame and low-Earth orbit targets in historical frames of the current frame, this method helps to reduce the influence of illumination, attitude changes, or motion information on the matching process, and improves the matching accuracy, compared to matching based solely on the intersection-union ratio of spatial position.
[0106] According to the method provided in the embodiments of this application, this application also provides a computer program product, which includes: computer program code, which, when run on a computer, causes the computer to execute the various steps or processes performed in any of the foregoing method embodiments.
[0107] According to the method provided in the embodiments of this application, this application also provides a computer-readable storage medium storing program code, which, when run on a computer, causes the computer to execute the various steps or processes performed in any of the foregoing method embodiments.
[0108] The computer-readable storage medium may be the aforementioned volatile memory or non-volatile memory, or it may include both volatile memory and non-volatile memory.
[0109] In the embodiments of this application, the terms and English abbreviations are exemplary examples given for ease of description and should not be construed as limiting the application in any way. This application does not preclude the possibility of defining other terms that can achieve the same or similar functions in existing or future agreements.
[0110] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions. When these computer instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated.
[0111] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.
[0112] It should be understood that in the various embodiments of this application, the sequence number of each process does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.
[0113] In summary, the above description is merely a preferred embodiment of the technical solution of this application and is not intended to limit the scope of protection of this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application.
Claims
1. A data processing method based on low-orbit targets, characterized in that, The method includes: Acquire multi-dimensional information of the tracked target in the current frame; the multi-dimensional information includes the spatial information, appearance information, and motion information of the tracked target; the tracked target is any low-orbit target in the current frame. Based on the multi-dimensional information, the information similarity between the tracked target and the tracked target in the historical frame is determined, where the historical frame is the image frame acquired before the current frame; Based on the similarity, the tracker of the tracked target is determined from the trackers of the tracked target.
2. The method according to claim 1, characterized in that, The similarity is a comprehensive similarity. Determining the information similarity between the tracked target and the tracked target in historical frames based on the multi-dimensional information includes: Based on the multi-dimensional information, the intersection-union ratio, minimum similarity distance, and information similarity are determined. Wherein, the intersection-over-union ratio (IoU) indicates the IoU between the tracked target and the tracked target in the most recent historical frame; the minimum similarity distance indicates the minimum similarity distance among the similarity distances between the tracked target and the tracked target in the historical frame; the information similarity indicates the information similarity between the tracked target and the tracked target in the most recent historical frame; the information similarity includes the similarity based on the motion information of the tracked target and the motion information of the tracked target in the historical frame; the most recent historical frame is the historical frame whose acquisition time is closest to that of the current frame; The comprehensive similarity is determined based on the intersection-union ratio, the minimum similarity distance, and the information similarity.
3. The method according to claim 2, characterized in that, The determination of the comprehensive similarity based on the intersection-union ratio, the minimum similarity distance, and the information similarity includes: ; in, The first weight parameter is adaptively adjusted according to the state of the tracked target. The second weighting parameter is adaptively adjusted according to the state of the tracked target. The third weighting parameter is adaptively adjusted based on the state of the tracked target. , and All are numbers greater than 0 and less than 1. , and The sum of the three is 1. The minimum similarity distance, The intersection-union ratio, and The similarity of the information is denoted as .
4. The method according to claim 1, characterized in that, The step of determining the tracker of the tracked target from the trackers of the tracked target based on the similarity includes: Based on the similarity, an association graph is obtained; the association graph includes target nodes, tracker nodes, and edges between the target nodes and the tracker nodes; The target node indicates the tracking target, including multi-dimensional information of the tracking target; the tracker node indicates the tracker of the tracked target, including multi-dimensional information of the tracked target in the historical frame; and the attributes of the edge indicate the similarity between the tracking target and the tracked target. The association graph is processed using a graph neural network to obtain the association probability between the target node and the tracker node; Based on the Hungarian algorithm and the association probability, the tracker of the tracked target is determined from the trackers of the tracked target.
5. The method according to claim 1, characterized in that, The spatial information of the tracked target includes the position of the tracked target in the current frame. Obtaining the spatial information of the tracked target in the current frame includes: Determine the initial position of the tracked target in the current frame; The predicted position of the tracked target is determined based on the Kalman filter algorithm. Based on the predicted position of the tracked target, the initial position of the tracked target is corrected to obtain the position of the tracked target in the current frame.
6. The method according to claim 5, characterized in that, Determining the initial position of the tracked target in the current frame includes: Preliminary features of the tracked target are extracted from the current frame using the YOLO backbone network; The preliminary features are encoded using a DETR encoder to obtain the position information of the tracked target in the current frame; the position information includes the initial position of the tracked target and the corresponding detection confidence. The determination of the predicted location of the tracked target based on the Kalman filter algorithm includes: When the detection confidence level is higher than or equal to a preset confidence threshold, the predicted position of the tracked target in the current frame is determined based on the Kalman filter algorithm.
7. The method according to claim 1, characterized in that, The method further includes: If the tracked target does not match a tracker from the tracked target's trackers, a temporary state tracker is created; For the temporary state tracker, if it can be matched in the low-orbit target matching of the subsequent N1 consecutive frames, the temporary state tracker is adjusted to the reserved state tracker, where N1 is a positive integer; otherwise, the temporary state tracker is deleted. For the reserved state tracker, if no low-orbit target is matched in the subsequent N2 consecutive frames, but the overlap between the predicted trajectory and the detection area of the current frame is greater than or equal to a preset overlap threshold, the reserved state tracker is adjusted to a dormant state tracker, where N2 is a positive integer. For the sleep state tracker, if it is matched during sleep, it is woken up and participates in matching; if it is not matched during sleep, the sleep state tracker is deleted.
8. A data processing device based on low-orbit targets, characterized in that, The device includes: The acquisition unit is used to acquire multi-dimensional information of the tracked target in the current frame; the multi-dimensional information includes spatial information, appearance information and motion information of the tracked target; the tracked target is any low-orbit target in the current frame. The determining unit is configured to determine the information similarity between the tracking target and the tracked target in the historical frame based on the multi-dimensional information, wherein the historical frame is an image frame acquired before the current frame; A matching unit is configured to determine the tracker of the tracked target from among the trackers of the tracked target based on the similarity.
9. A computer storage medium, characterized in that, Used to store computer programs; when the computer programs are executed, they are used to perform the method described in any one of claims 1-7.
10. An electronic device, characterized in that, Including processor and memory; The memory is used to store computer programs; The processor is configured to perform the method according to any one of claims 1-7 based on the computer program.