Target tracking method and device, computer device and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By acquiring depth point clouds, radar point clouds, and color images, density clustering and matching are performed to construct a 2D to 3D multi-sensor tracking scheme, which solves the problem of insufficient accuracy in traditional target tracking schemes and achieves higher precision target tracking.

CN116266359BActive Publication Date: 2026-06-16SHENZHEN PUDU TECH CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: SHENZHEN PUDU TECH CO LTD
Filing Date: 2021-12-16
Publication Date: 2026-06-16

Application Information

Patent Timeline

16 Dec 2021

Application

16 Jun 2026

Publication

CN116266359B

IPC: G06T7/246; G06N3/08; G06N3/04; G06V10/762

CPC: G06T7/246; G06N3/08; G06T2207/10028; G06T2207/10024; G06T2207/20221; G06T2207/30232; G06T2207/30241; G06T2207/20081

AI Tagging

Application Domain

Image enhancement Image analysis

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN116266359B_ABST

Patent Text Reader

Abstract

The application relates to a target tracking method and device, computer equipment and a storage medium. The method comprises the following steps: acquiring a depth point cloud, a color image and a radar point cloud obtained by collecting a target environment; performing density clustering on the depth point cloud and the radar point cloud respectively to obtain a depth point cloud cluster and a radar point cloud cluster; detecting a target in the color image to obtain a detection frame with a semantic label; matching the detection frame with the depth point cloud cluster and the radar point cloud cluster respectively to obtain a target depth point cloud cluster and a target radar point cloud cluster that have an intersection with a target detection frame; and tracking the target based on the target detection frame and a fusion point cloud cluster between the target depth point cloud cluster and the target radar point cloud cluster. Through the 2D and 3D combined tracking mode, the method can improve the accuracy of the tracking system.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer vision technology, and in particular to a method, apparatus, computer device, and storage medium for tracking a target object. Background Technology

[0002] Visual object tracking is a crucial technology in the field of computer vision. Despite extensive research in recent years, with the continuous advancement of science and technology, object tracking is becoming an indispensable and critical component for robots due to task requirements and decision-making needs.

[0003] Traditional target tracking schemes typically rely on color images acquired at different times for target detection and tracking. However, using traditional target tracking schemes can lead to inaccurate tracking results. Summary of the Invention

[0004] Therefore, it is necessary to provide a method, apparatus, computer device, and storage medium for tracking target objects to address the aforementioned technical problems.

[0005] A method for tracking a target object, the method comprising:

[0006] Acquire depth point clouds, color images, and radar point clouds obtained from surveying the target environment;

[0007] Density clustering is performed on the depth point cloud and the radar point cloud respectively to obtain depth point cloud clusters and radar point cloud clusters; and, target objects in the color image are detected to obtain detection boxes with semantic labels.

[0008] The detection box is matched with the depth point cloud cluster and the radar point cloud cluster respectively to obtain the target depth point cloud cluster and the target radar point cloud cluster that intersect with the target detection box;

[0009] The target object is tracked based on the target detection bounding box and the fused point cloud cluster between the target depth point cloud cluster and the target radar point cloud cluster.

[0010] In one embodiment, the step of performing density clustering on the depth point cloud and the radar point cloud respectively to obtain depth point cloud clusters and radar point cloud clusters includes:

[0011] In the depth point cloud and the radar point cloud, depth points and radar points are selected as target depth points and target radar points;

[0012] Determine a first distance between the target depth point and its neighborhood, and a second distance between the target radar point and its neighborhood;

[0013] When the first distance is less than the distance threshold, the target depth point is added to the neighborhood of the depth point; and when the second distance is less than the distance threshold, the target radar point is added to the neighborhood of the radar point.

[0014] Traverse the depth points in the depth point cloud and the radar points in the radar point cloud until all depth points in the depth point cloud and radar points in the radar point cloud are added to the corresponding depth point neighborhood and radar point neighborhood, and obtain the depth point cloud cluster and radar point cloud cluster based on the depth point neighborhood and radar point neighborhood.

[0015] In one embodiment, detecting the target object in the color image to obtain a detection box with semantic labels includes:

[0016] The target objects in the color image are detected and bounded using a target detection model to obtain the detection box;

[0017] Determine the behavioral state of the target object, and generate multi-level semantic tags based on the behavioral state;

[0018] Output a detection box with the multi-level semantic labels.

[0019] In one embodiment, the target detection box includes a first target detection box and a second target detection box;

[0020] The step of matching the detection box with the depth point cloud cluster and the radar point cloud cluster respectively to obtain the target depth point cloud cluster and radar point cloud cluster that intersect with the target detection box includes:

[0021] The depth point cloud cluster is projected onto the plane containing the color image to obtain a depth point cloud projection based on the depth point cloud cluster and the first target detection box; the intersection-over-union ratio (IoU) between the first target detection box and the depth point cloud projection is calculated; and the target depth point cloud cluster is determined based on the magnitude of the IoU.

[0022] The radar point cloud clusters are projected onto the plane containing the color image to obtain radar point cloud projections and the second target detection box. There are intersection areas between each radar point cloud projection and the second target detection box. The radar point cloud cluster with the largest intersection area is determined as the target radar point cloud cluster.

[0023] In one embodiment, determining the target depth point cloud cluster based on the magnitude of the intersection-union ratio includes:

[0024] Each intersection-union ratio is compared with a preset upper limit value and a preset lower limit value.

[0025] When the first target intersection-union ratio is greater than the preset upper limit of the intersection-union ratio, the depth point cloud cluster corresponding to the first target intersection-union ratio is determined as the target depth point cloud cluster; the first target intersection-union ratio belongs to at least one of the intersection-union ratios;

[0026] When the second target intersection-union ratio is greater than the preset lower limit of intersection-union ratio and less than the preset upper limit of intersection-union ratio, the depth point cloud clusters corresponding to the second target intersection-union are clustered to obtain target depth point cloud clusters; the second target intersection-union ratio belongs to at least two of the intersection-union ratios.

[0027] In one embodiment, the target detection box includes a first target detection box and a second target detection box; the method further includes:

[0028] When the first target detection box matches the target depth point cloud cluster, and the second target detection box matches the target radar point cloud cluster, the target object is tracked based on the first target detection box and the target depth point cloud cluster; or...

[0029] The target object is tracked based on the second target detection box and the target radar point cloud cluster.

[0030] In one embodiment, the method further includes:

[0031] When the Euclidean distance between the target depth point cloud cluster and the radar point cloud cluster is less than the preset distance threshold, the target depth point cloud cluster and the radar point cloud cluster are fused to obtain the fused point cloud cluster.

[0032] In one embodiment, tracking the target object based on the target detection bounding box and the fused point cloud cluster between the target depth point cloud cluster and the radar point cloud cluster includes:

[0033] Calculate the first matching cost between the current target detection box and the historically tracked target detection boxes, and the second matching cost between the current fused point cloud cluster and the historically tracked fused point cloud cluster; the current fused point cloud cluster is a point cloud cluster fused between the target depth point cloud cluster and the radar point cloud cluster;

[0034] The first matching cost and the second matching cost are weighted and summed to obtain the comprehensive matching cost;

[0035] The comprehensive matching result is determined based on the comprehensive matching cost; the comprehensive matching result is used to represent the matching situation between the current target detection box and the historically tracked target detection box, and between the current fused point cloud cluster and the historically tracked fused point cloud cluster.

[0036] Based on the comprehensive matching results, tracking information for tracking the target object is determined, and the target object is tracked based on the tracking information.

[0037] In one embodiment, the tracking information for tracking the target object based on the comprehensive matching result includes:

[0038] When the comprehensive matching result is determined based on a comprehensive matching cost less than a preset loss threshold, the state of the historically tracked target detection boxes and the fused point cloud clusters is updated according to the current target detection boxes and the current fused point cloud clusters to obtain tracking information.

[0039] A target tracking device, the device comprising:

[0040] The acquisition module is used to acquire depth point clouds, color images, and radar point clouds obtained from the collection of data on the target environment;

[0041] The clustering detection module is used to perform density clustering on the depth point cloud and the radar point cloud respectively to obtain depth point cloud clusters and radar point cloud clusters; and to detect target objects in the color image to obtain detection boxes with semantic labels.

[0042] The matching module is used to match the detection box with the depth point cloud cluster and the radar point cloud cluster respectively, to obtain the target depth point cloud cluster and radar point cloud cluster that intersect with the target detection box;

[0043] The tracking module is used to track the target object based on the target detection box and the fused point cloud cluster between the target depth point cloud cluster and the radar point cloud cluster.

[0044] In one embodiment, the clustering detection module is further configured to: select depth points and radar points as target depth points and target radar points in the depth point cloud and the radar point cloud; determine a first distance between the target depth point and its neighborhood, and a second distance between the target radar point and its neighborhood; add the target depth point to the neighborhood of the depth point when the first distance is less than a distance threshold; and add the target radar point to the neighborhood of the radar point when the second distance is less than the distance threshold; traverse the depth points in the depth point cloud and the radar points in the radar point cloud until all depth points in the depth point cloud and radar points in the radar point cloud are added to their corresponding neighborhoods of the depth points and the radar points, and obtain a depth point cloud cluster and a radar point cloud cluster based on the depth point neighborhood and the radar point neighborhood.

[0045] In one embodiment, the clustering detection module is further configured to use an object detection model to detect and frame objects in the color image to obtain detection boxes; determine the behavioral state of the object; generate multi-level semantic labels based on the behavioral state; and output detection boxes with the multi-level semantic labels.

[0046] In one embodiment, the target detection box includes a first target detection box and a second target detection box;

[0047] The matching module is further configured to project the depth point cloud cluster onto the plane where the color image is located, to obtain a depth point cloud projection based on the depth point cloud cluster and the first target detection box; calculate the intersection-union ratio (IUGR) between the first target detection box and the depth point cloud projection; determine the target depth point cloud cluster based on the IUGR; project the radar point cloud cluster onto the plane where the color image is located, to obtain a radar point cloud projection and the second target detection box, wherein there is an intersection region between each radar point cloud projection and the second target detection box; and determine the radar point cloud cluster with the largest intersection region as the target radar point cloud cluster.

[0048] In one embodiment, the matching module is further configured to compare each cross-union ratio (CUNR) with a preset upper limit CUNR and a preset lower limit CUNR; when the first target CUNR is greater than the preset upper limit CUNR, the depth point cloud cluster corresponding to the target CUNR is determined as a target depth point cloud cluster; the first target CUNR belongs to at least one of the CUNRs; when the second target CUNR is greater than the preset lower limit CUNR and less than the preset upper limit CUNR, the depth point cloud cluster corresponding to the second target CUNR is clustered to obtain a target depth point cloud cluster; the second target CUNR belongs to at least two of the CUNRs.

[0049] In one embodiment, the target detection box includes a first target detection box and a second target detection box; the device further includes:

[0050] The selection module is used to track the target object based on the first target detection box and the target depth point cloud cluster when the first target detection box matches the target depth point cloud cluster and the second target detection box matches the target radar point cloud cluster; or, to track the target object based on the second target detection box and the target radar point cloud cluster.

[0051] In one embodiment, the device further includes:

[0052] The fusion module is used to fuse the target depth point cloud cluster and the radar point cloud cluster to obtain the fused point cloud cluster when the Euclidean distance between the target depth point cloud cluster and the radar point cloud cluster is less than the preset distance threshold.

[0053] In one embodiment, the tracking module is further configured to calculate a first matching cost between the current target detection box and the historically tracked target detection boxes, and a second matching cost between the current fused point cloud cluster and the historically tracked fused point cloud cluster; the current fused point cloud cluster is a point cloud cluster fused from the target depth point cloud cluster and the radar point cloud cluster; the first matching cost and the second matching cost are weighted and summed to obtain a comprehensive matching cost; a comprehensive matching result is determined based on the comprehensive matching cost; the comprehensive matching result is used to represent the matching situation between the current target detection box and the historically tracked target detection boxes, and between the current fused point cloud cluster and the historically tracked fused point cloud cluster; tracking information for tracking the target is determined according to the comprehensive matching result, and the target is tracked according to the tracking information.

[0054] In one embodiment, the tracking module is further configured to update the status of historically tracked target detection boxes and fused point cloud clusters based on the current target detection box and the current fused point cloud cluster when the comprehensive matching result is determined based on a comprehensive matching cost less than a preset loss threshold, thereby obtaining tracking information.

[0055] A computer device includes a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to invoke and perform the steps of the target tracking method described above.

[0056] A computer-readable storage medium storing a computer program, which, when executed by a processor, causes the processor to invoke and execute the steps of the target tracking method described above.

[0057] The aforementioned target tracking method, apparatus, computer equipment, and storage medium acquire depth point clouds, color images, and radar point clouds obtained from collecting data on the target environment; perform density clustering on the depth point clouds and radar point clouds respectively to obtain depth point cloud clusters and radar point cloud clusters; detect targets in the color images to obtain detection boxes with semantic labels; match the detection boxes with the depth point cloud clusters and radar point cloud clusters respectively to obtain target depth point cloud clusters and radar point cloud clusters that intersect with the target detection boxes; and construct a 2D to 3D multi-sensor tracking scheme. Based on the target detection boxes and the fused point cloud clusters between the target depth point cloud clusters and radar point cloud clusters, the target is tracked. By combining 2D and 3D tracking methods, the accuracy of the tracking system is improved. Attached Figure Description

[0058] Figure 1 This is an application environment diagram of a target tracking method in one embodiment;

[0059] Figure 2a This is a schematic diagram of the algorithm flow of a target tracking method in one embodiment;

[0060] Figure 2b This is a flowchart illustrating a target tracking method in one embodiment;

[0061] Figure 3a This is a schematic diagram illustrating the intersection of a target detection bounding box and a point cloud cluster in one embodiment;

[0062] Figure 3b This is another schematic diagram of the intersection between the target detection box and the point cloud cluster in one embodiment;

[0063] Figure 4 This is a flowchart illustrating the density clustering process in one embodiment;

[0064] Figure 5 This is a flowchart illustrating the process of determining a target depth point cloud cluster based on the intersection-union ratio in one embodiment.

[0065] Figure 6 This is a schematic diagram of semi-supervised clustering in one embodiment;

[0066] Figure 7 This is a structural block diagram of a target tracking device in one embodiment;

[0067] Figure 8 This is a structural block diagram of a target tracking device in one embodiment;

[0068] Figure 9 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation

[0069] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0070] This application provides a target tracking method that can be applied to, for example... Figure 1 The application environment shown is illustrated. This target tracking method is applied to a target tracking system, which includes a terminal 102 and a server 104.

[0071] The terminal 102 can be a robot, smartphone, tablet, laptop, desktop computer, smart speaker, smartwatch, etc., but is not limited to these.

[0072] Server 104 can be an independent physical server or a service node in a blockchain system. The service nodes in the blockchain system form a peer-to-peer (P2P) network. The P2P protocol is an application layer protocol that runs on top of the Transmission Control Protocol (TCP).

[0073] In addition, server 104 can also be a server cluster consisting of multiple physical servers, which can be a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDN), and big data and artificial intelligence platforms.

[0074] Terminal 102 and server 104 can be connected via Bluetooth, USB (Universal Serial Bus) or network, etc., and this application does not impose any restrictions.

[0075] In one embodiment, such as Figure 2a As shown, a flowchart of an algorithm for tracking a target object is provided. Figure 2b The diagram illustrates a flowchart of a target tracking method, which is applied to... Figure 1 Taking the terminal in the example, the explanation includes the following steps:

[0076] S202, acquire depth point cloud, color map and radar point cloud obtained from the collection of the target environment.

[0077] Depth point clouds are derived from depth maps, which are images captured by depth cameras. Each pixel value in a depth map represents the perpendicular distance from a point on an object in space to a plane perpendicular to the lens's optical axis and passing through the lens's optical center (the optical zero point of the depth camera). Converting a depth map to a depth point cloud is the reverse process of projecting 3D points onto a 2D plane. Color images are captured by color cameras. Radar point clouds are obtained from lidar. Each point in a radar point cloud contains three-dimensional coordinate information, commonly referred to as the X, Y, and Z elements, and sometimes also includes color information, reflection intensity information, and echo count information.

[0078] In one embodiment, prior to S202, the terminal uses camera intrinsic parameters as constraints to convert the depth map into a depth point cloud. Let the coordinates of the depth point cloud in the world coordinate system be (x, y, z), and the coordinates of the depth map in the image coordinate system be (x', y'), where D is the depth value, and the camera intrinsic parameters are... The conversion formula is:

[0079]

[0080] S204: Density clustering is performed on the depth point cloud and radar point cloud respectively to obtain depth point cloud clusters and radar point cloud clusters; and, target objects in the color image are detected to obtain detection boxes with semantic labels.

[0081] Clustering is the process of dividing a dataset into different classes or clusters based on a specific criterion (such as distance), ensuring that data objects within the same cluster are highly similar, while data objects in different clusters are highly dissimilar. In other words, after clustering, data of the same class are grouped together as much as possible, while data of different classes are separated as much as possible. Clustering is an unsupervised learning method. Data clustering methods can be mainly divided into partition-based methods, density-based methods, and hierarchical methods, among others.

[0082] Furthermore, this density-based clustering method can be Density-Based Spatial Clustering of Applications with Noise (DBSCAN), a typical density-based clustering algorithm. The set of samples with the highest density connectivity, derived from density reachability relationships, constitutes a final cluster category. This density-based clustering algorithm generally assumes that categories can be determined by the density of sample distribution. Samples within the same category are closely connected; that is, any sample in that category will have samples of the same category nearby. By grouping closely connected samples into one category, a cluster category is obtained. By dividing all groups of closely connected samples into different categories, the final cluster category results are obtained.

[0083] In this context, "detection" can refer to object detection. Detection focuses on specific object targets, requiring the simultaneous acquisition of both the target's category and location information. Detection provides an understanding of the foreground and background of an image, needing to separate the target of interest from the background and determine its description (category and location). Therefore, the output of the detection model is a list, where each item uses a set of data to provide the category and location of the detected target (commonly represented by the coordinates of a rectangular detection box).

[0084] Object detection algorithms can be broadly divided into two types. One type is the Two-Stage algorithm, represented by Faster R-CNN (Region with CNN Feature), which detects objects by dividing the process into two parts: generating candidate boxes and finding foreground elements, as well as adjusting bounding boxes, using dedicated modules. The other type is the One-Stage algorithm, represented by SSD and YOLO, which directly classifies objects and adjusts bounding boxes based on anchors.

[0085] YOLO is a novel object detection method that achieves both fast detection and high accuracy. It treats object detection as a regression problem involving object region prediction and class prediction. This method uses a single neural network to directly predict object boundaries and class probabilities, achieving end-to-end object detection.

[0086] YOLOv5 is a one-stage object detection algorithm that balances accuracy and real-time performance. This case adds three secondary semantic labels to the YOLOv5 object detection algorithm: posture (sitting, standing, and others), orientation (facing, back to, and others), and risk level (elderly, child, pregnant woman, and others). All three are multi-class classification problems. For YOLOv5, this effectively increases the dimension of the output feature map, resulting in a total of 5 categories. Including the primary classification labels, there are a total of 4 classifiers. Therefore, three additional classification loss functions are needed at the training end. Here, the same BCEcls loss binary cross-entropy loss function as the native YOLOv5 is used to calculate the classification loss. The backpropagation loss of the entire model is:

[0087] Loss = Loss obj +Loss cls +Loss cls_pose +Loss cls_orient +Loss cls_risk

[0088] In one embodiment, detecting a target object in a color image and obtaining a detection box with semantic labels may include: using an object detection model to detect the target object in the color image and framing it to obtain a detection box; determining the behavior state of the target object and generating multi-level semantic labels based on the behavior state; and outputting the detection box with multi-level semantic labels.

[0089] S206, Match the detection box with the depth point cloud cluster and the radar point cloud cluster respectively to obtain the target depth point cloud cluster and the target radar point cloud cluster that intersect with the target detection box;

[0090] In one embodiment, S206 may include: the terminal projecting a depth point cloud cluster onto the plane where the color image is located to obtain a depth point cloud projection based on the depth point cloud cluster and a first target detection box; calculating the intersection-union ratio (IUR) between the first target detection box and the depth point cloud projection; determining the target depth point cloud cluster based on the size of the IUR; projecting a radar point cloud cluster onto the plane where the color image is located to obtain a radar point cloud projection and a second target detection box, wherein there is an intersection region between each radar point cloud projection and the second target detection box; and determining the radar point cloud cluster with the largest intersection region as the target radar point cloud cluster.

[0091] In one embodiment, prior to S206, the terminal filters depth point cloud clusters based on the number of point clouds. When the number of point clouds in a depth point cloud cluster exceeds a certain fixed value, it is determined to be background information and filtered out. For example, when the number of point clouds in both the depth point cloud cluster and the radar point cloud cluster exceeds 2000, it is determined to be background information and filtered out. The terminal also filters radar point cloud clusters based on the number of point clouds. When the number of point clouds in a depth point cloud cluster exceeds a certain fixed value, it is determined to be background information and filtered out. For example, when the number of point clouds in both the depth point cloud cluster and the radar point cloud cluster exceeds 50, it is determined to be background information and filtered out.

[0092] In one embodiment, the step of projecting the depth point cloud cluster onto the plane of the color image to obtain the depth point cloud projection and the first target detection box based on the depth point cloud cluster may specifically include: the terminal projects the depth point cloud cluster onto the plane of the color image to obtain the pseudo target box and the first target detection box of the depth point cloud cluster; sorting each pseudo target box in ascending order of its Euclidean distance to (the origin of the world coordinate system or the target position in a historical frame), with the smaller the distance, the higher the priority; and removing the overlapping parts of the low-priority pseudo target boxes in a single target matching. After removing the overlap of the pseudo target boxes, the depth point cloud projection based on the depth point cloud cluster is obtained.

[0093] In one embodiment, the step of projecting the radar point cloud cluster onto the plane of the color image to obtain the radar point cloud projection and the second target detection box may specifically include: the terminal projecting the radar point cloud cluster onto the plane of the color image to obtain the radar point cloud projection; filtering the detection boxes based on the intersection range of the radar point cloud projection and the detection box; fixing the height of the second target detection box, with the height synchronized with the lidar projection area, to obtain the second target detection box.

[0094] In one embodiment, the step of filtering the radar point cloud clusters to obtain the target radar point cloud clusters that intersect with the target detection box can specifically include: the terminal needs to match the second target detection box with the radar point cloud clusters corresponding to the radar point cloud projection, which can be regarded as a 0-1 programming problem.

[0095] This problem can be transformed into finding a set of solutions under the constraint w. ij =0 or w ij When =1, the objective function f(w) is made to... ij The value of ) is the smallest. For example Figure 3a The diagram shown illustrates the intersection of the target detection bounding box and a point cloud cluster.

[0096]

[0097]

[0098]

[0099]

[0100] Among them, u ij v is an element in U, where U is the set of valid intersection areas of the regions containing BoxO and BoxC; ij r is an element in V, where V is the set of intersection areas of the regions containing BoxO and BoxC; cj For R c middle element, R c The area of the bounding box region of the BoxC cluster, such as Figure 3b The diagram shows another intersection of the target detection bounding box and the point cloud cluster. When Bc j+1 and Bc j with Bo i When the intersection of the regions has an overlapping region S0, the effective region is:

[0101] u ij =v ij ;

[0102] u i(j+1) =v i(j+1) -S0

[0103] Rc ={r c1 r c2 ...r cm}

[0104] Solving the above 0-1 linear programming problem, we obtain the following in W: {w} iq} = 1 is the solution to the above problem, {w 1q},{w 2q}...{w nq} represents the point cloud clusters corresponding to targets 1 to n. If there are multiple point cloud clusters, they are merged. The merged cluster set is as follows:

[0105] Clusters*{Cl0, Cl1, ... Cl n}

[0106] S208 tracks targets based on target detection boxes and fused point cloud clusters between target depth point cloud clusters and target radar point cloud clusters.

[0107] Target tracking is the process of finding the location of the candidate target region most similar to the target template in an image sequence by effectively representing the target. Target tracking algorithms include optical flow, Meanshift, Camshift, Kalman filtering, particle filtering, and correlation filtering (CF), while multi-target tracking algorithms include DeepSort, Motdt, and Towards Real-Time Multi-Object Tracking. The MOT algorithm tracks detected targets and consists of two parts: 2D image-based tracking and 3D spatial location-based tracking. Both algorithms primarily consist of two parts: the Hungarian algorithm and the Kalman filter, with the KF motion equations being uniform models.

[0108] In one embodiment, before S208, when the first target detection box matches the target depth point cloud cluster and the second target detection box matches the target radar point cloud cluster, the terminal tracks the target based on the first target detection box and the target depth point cloud cluster; or, tracks the target based on the second target detection box and the target radar point cloud cluster.

[0109] In one embodiment, before S208, when the Euclidean distance between the target depth point cloud cluster and the radar point cloud cluster is less than a preset distance threshold, the terminal fuses the target depth point cloud cluster and the radar point cloud cluster to obtain a fused point cloud cluster.

[0110] Specifically, when the target depth point cloud cluster and the radar point cloud cluster have the same area, the center positions of two 3D targets p are obtained from the radar and the depth camera, respectively.

[0111] P = k1 * P lidar+k2*P depth

[0112] When P lidar P depth When both exist, if the Euclidean distance between the two sources is greater than the threshold, for example, the threshold is 0.5m, then the radar detection is considered abnormal, and k1 = 0 and k2 = 1; within the threshold range, k1 and k2 take fixed constants, for example, k1 = 0.7 and k2 = 0.3.

[0113] In one embodiment, S208 may include: the terminal calculating a first matching cost between the current target detection box and the historically tracked target detection boxes, and a second matching cost between the current fused point cloud cluster and the historically tracked fused point cloud cluster; the current fused point cloud cluster is a point cloud cluster fused from a target depth point cloud cluster and a radar point cloud cluster; weighted summing of the first matching cost and the second matching cost to obtain a comprehensive matching cost; determining a comprehensive matching result based on the comprehensive matching cost; the comprehensive matching result is used to represent the matching situation between the current target detection box and the historically tracked target detection boxes, and between the current fused point cloud cluster and the historically tracked fused point cloud cluster; determining tracking information for tracking the target object based on the comprehensive matching result, and tracking the target object based on the tracking information.

[0114] In one embodiment, determining the tracking information for tracking the target object based on the comprehensive matching result includes: when the comprehensive matching result is determined based on a comprehensive matching cost less than a preset loss threshold, the terminal updates the status of historically tracked target detection boxes and fused point cloud clusters based on the current target detection box and the current fused point cloud cluster to obtain tracking information. When the comprehensive matching result is based on a comprehensive matching cost greater than or equal to the preset loss threshold and the detected target cannot find an existing tracked target, the current number and occurrence period are recorded; if the period is greater than the generation time, a new track is generated. When the comprehensive matching result is based on a comprehensive matching cost greater than or equal to the preset loss threshold and the existing tracked target cannot find a corresponding detected target, the current number and disappearance period are recorded; if the period is greater than the deletion time, the current track is deleted; otherwise, it is retained. For example, the generation time can be set to 2 seconds and the deletion time to 3 seconds.

[0115] In the above embodiments, depth point clouds, color images, and radar point clouds obtained from collecting data on the target environment are acquired; density clustering is performed on the depth point clouds and radar point clouds respectively to obtain depth point cloud clusters and radar point cloud clusters; target objects are detected in the color images to obtain detection boxes with semantic labels; the detection boxes are matched with the depth point cloud clusters and radar point cloud clusters respectively to obtain target depth point cloud clusters and radar point cloud clusters that intersect with the target detection boxes; a 2D to 3D multi-sensor tracking scheme is constructed, which tracks the target object based on the target detection boxes and the fused point cloud clusters between the target depth point cloud clusters and radar point cloud clusters. By combining 2D and 3D tracking methods, the accuracy of the tracking system is improved.

[0116] In one embodiment, such as Figure 4 As shown, density clustering of depth point clouds and radar point clouds can specifically include:

[0117] S402, select depth points and radar points from the depth point cloud and radar point cloud as the target depth point and target radar point.

[0118] S404, determine the first distance between the target depth point and its neighborhood, and the second distance between the target radar point and its neighborhood.

[0119] The depth point neighborhood is a set consisting of at least the core depth points in the depth point cloud, and may also include depth points whose distance to the core depth points is less than a distance threshold. The radar point neighborhood is a set consisting of at least the core radar points in the radar point cloud, and may also include radar points whose distance to the core radar points is less than a distance threshold. The first distance refers to the distance between the target depth point and the core depth points (depth core objects) in the depth point neighborhood. The second distance refers to the distance between the target radar point and the core radar points (radar core objects) in the radar point neighborhood.

[0120] In one embodiment, prior to S404, the terminal determines the core depth point and the core radar point in the depth point cloud and the radar point cloud.

[0121] S406, when the first distance is less than the distance threshold, add the target depth point to the depth point neighborhood; and when the second distance is less than the distance threshold, add the target radar point to the radar point neighborhood.

[0122] S408, traverse the depth points in the depth point cloud and the radar points in the radar point cloud until all depth points in the depth point cloud and radar points in the radar point cloud are added to the corresponding depth point neighborhood and radar point neighborhood, and obtain the depth point cloud cluster and radar point cloud cluster based on the depth point neighborhood and radar point neighborhood.

[0123] Density-based clustering (DBSCAN) describes the density of a sample set based on a set of neighborhoods. The parameters (∈, MinPts) describe the density of the sample distribution within each neighborhood. ∈ describes the neighborhood distance threshold for a given sample, and MinPts describes the threshold number of samples in the neighborhood of a given sample at distance ∈. For example, depending on the specific point cloud distribution, for depth camera point clouds, ∈ is often set to 0.05m and minpts to 20; for LiDAR, ∈ is often set to 0.05m and minpts to 3.

[0124] Assuming the sample set is D = (x1, x2, ..., xm), the specific density description of DBSCAN is defined as follows:

[0125] ∈-neighborhood: For xj∈D, its ∈-neighborhood contains a subset of samples in the sample set whose distance from xj in D is no greater than ∈, i.e., N∈(xj)={xi∈D|distance(xi,xj)≤∈}, and the number of such subsets is denoted as |N∈(xj)|.

[0126] Core object: For any sample xj∈D, if its ∈-neighborhood N∈(xj) contains at least MinPts samples, that is, if |N∈(xj)|≥MinPts, then xj is a core object.

[0127] Density reachability: If xi is located in the ∈-neighborhood of xj, and xj is a core object, then xi is said to be density reachable from xj. Note that the converse is not necessarily true; that is, in this case, we cannot say that xj is density reachable from xi, unless xi is also a core object.

[0128] Density reachability: For xi and xj, if there exist sample sequences p1, p2, ..., pT satisfying p1 = xi, pT = xj, and pt+1 is density-directly reachable from pt, then xj is said to be density-directly reachable from xi. In other words, density reachability satisfies transitivity. In this case, the transitive samples p1, p2, ..., pT-1 in the sequence are all core objects, because only core objects can make other samples density-directly reachable. Note that density reachability does not satisfy symmetry, which can be derived from the asymmetry of density direct reachability.

[0129] Density connectivity: For xi and xj, if there exists a core object sample xk such that both xi and xj are density-reachable from xk, then xi and xj are said to be density-connected. Note that the density connectivity relationship satisfies symmetry.

[0130] The steps of the DBSCAN clustering algorithm can be:

[0131] Input: Sample set D = (x1, x2, ..., xm), neighborhood parameters (∈, MinPts), sample distance metric

[0132] Output: Cluster partition C.

[0133] Initialize core object collection Initialize the number of clusters k = 0, initialize the set of unvisited samples Γ = D, and then perform cluster partitioning.

[0134] 2) For j = 1, 2, ... m, find all core objects using the following steps:

[0135] a) Find the ∈-neighborhood subset N∈(xj) of sample xj using a distance metric;

[0136] b) If the number of samples in the subsample set satisfies |N∈(xj)|≥MinPts, add sample xj to the core object sample set: Ω=Ω∪{xj}.

[0137] 3) If the core object collection The algorithm ends if the algorithm terminates; otherwise, proceed to step 4.

[0138] 4) In the core object set Ω, randomly select a core object o, initialize the current cluster core object queue Ωcur={o}, initialize the category number k=k+1, initialize the current cluster sample set Ck={o}, and update the unvisited sample set Γ=Γ-{o};

[0139] 5) If the current cluster core object queue If the current cluster Ck has been generated, update the cluster partition C = {C1, C2, ..., Ck}, update the core object set Ω = Ω - Ck, and proceed to step 3. Otherwise, update the core object set Ω = Ω - Ck.

[0140] 6) Take a core object o′ from the current cluster core object queue Ωcur, find all ∈-neighborhood subsample sets N∈(o′) through the neighborhood distance threshold ∈, let Δ=N∈(o′)∩Γ, update the current cluster sample set Ck=Ck∪ΔCk=Ck∪Δ, update the unvisited sample set Γ=Γ-Δ, update Ωcur=Ωcur∪(Δ∩Ω)-o′, and go to step 5.

[0141] The output is: cluster partition C = {C1, C2, ..., Ck}.

[0142] In the above embodiments, depth point clouds, color images, and radar point clouds obtained by collecting data on the target environment are acquired; density clustering is performed on the depth point clouds and radar point clouds respectively to obtain depth point cloud clusters and radar point cloud clusters; a 2D to 3D multi-sensor tracking scheme is constructed, which improves the accuracy of the tracking system by combining 2D and 3D tracking methods.

[0143] In one embodiment, such as Figure 5 As shown, the target depth point cloud cluster is determined based on the intersection-union ratio (IU / UGROUP) as follows:

[0144] S502, compare each cross-union ratio with the preset upper limit value and the preset lower limit value of the cross-union ratio.

[0145] S504, when the first target crossover ratio is greater than the preset crossover ratio upper limit, the depth point cloud cluster corresponding to the first target crossover ratio is determined as the target depth point cloud cluster; the first target crossover ratio belongs to at least one of the crossover ratios.

[0146] S506, when the second target crossover ratio is greater than the preset lower limit of crossover ratio and less than the preset upper limit of crossover ratio, the depth point cloud clusters corresponding to the second target crossover are clustered to obtain the target depth point cloud clusters; the second target crossover ratio belongs to at least two of the crossover ratios.

[0147] In one embodiment, such as Figure 6 The diagram illustrates semi-supervised clustering. Clustering the depth point cloud clusters corresponding to the intersection and union of the second target results in target depth point cloud clusters. This can include: for a given dataset D, the number of clusters l requires a set of mandatory connection constraints ML and a set of non-mandatory connection constraints CL, as well as their respective penalty constraint sets {k}. ij} and {k′ ij Based on PCKMeans, a label constraint set L is added, and the pairwise constraint construction initialization strategy in the original algorithm is changed to label constraint construction. Through these constraints, points belonging to the target depth point cloud cluster, i.e., set P, can be further extracted. Y This refers to the target depth point cloud cluster (semi-supervised clustering).

[0148] Among them, label constraints: falling on the respective p Y p N With the center of the circle, r Y r N Points within a circle of radius are labeled Y and N, as shown in green and blue in the image. Paired constraints: The projected region lies within the target bounding box Bo. i Point P in Y If it belongs to ML, then it belongs to CL; similarly, it belongs to P. N Paired constraints. And from point p Y The Euclidean distance is used as the penalty weight.

[0149]

[0150]

[0151] Where, p i For Pcl i Point L in maxFor Pcl i Midpoint to p Y The maximum distance.

[0152] In the above embodiments, the target depth point cloud cluster is determined based on the magnitude of the cross-union ratio (CUI), and each CUI is compared with a preset upper limit and a preset lower limit. This accurately obtains the target depth point cloud cluster, constructing a 2D-to-3D multi-sensor tracking scheme. By combining 2D and 3D tracking methods, the accuracy of the tracking system is improved.

[0153] It should be understood that, although Figures 2a-2b The steps in flowcharts 4-5 are shown sequentially as indicated by the arrows; however, these steps are not necessarily executed in the order indicated by the arrows. Unless otherwise explicitly stated herein, there is no strict order in which these steps are performed, and they can be executed in other orders. Figures 2a-2b At least some of the steps in 4-5 may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but may be executed at different times. The execution order of these steps or stages is not necessarily sequential, but may be executed in turn or alternately with other steps or at least some of the steps or stages in other steps.

[0154] In one embodiment, such as Figure 7 As shown, a target tracking device is provided, which specifically includes: an acquisition module 702, a clustering detection module 704, a matching module 706, and a tracking module 708; wherein:

[0155] The acquisition module 702 is used to acquire depth point clouds, color images, and radar point clouds obtained from the collection of data on the target environment;

[0156] The clustering detection module 704 is used to perform density clustering on the depth point cloud and radar point cloud respectively to obtain depth point cloud clusters and radar point cloud clusters; and to detect target objects in the color image to obtain detection boxes with semantic labels.

[0157] The matching module 706 is used to match the detection box with the depth point cloud cluster and the radar point cloud cluster respectively, to obtain the target depth point cloud cluster and radar point cloud cluster that intersect with the target detection box;

[0158] The tracking module 708 is used to track the target based on the target detection box and the fused point cloud cluster between the target depth point cloud cluster and the radar point cloud cluster.

[0159] In one embodiment, the clustering detection module 704 is further configured to select depth points and radar points as target depth points and target radar points in the depth point cloud and radar point cloud; determine a first distance between the target depth point and its neighborhood, and a second distance between the target radar point and its neighborhood; add the target depth point to the neighborhood of the depth point when the first distance is less than a distance threshold; and add the target radar point to the neighborhood of the radar point when the second distance is less than a distance threshold; traverse the depth points in the depth point cloud and the radar points in the radar point cloud until all depth points in the depth point cloud and radar points in the radar point cloud are added to their corresponding neighborhoods of the depth point and radar point, and obtain the depth point cloud cluster and the radar point cloud cluster based on the depth point neighborhood and the radar point neighborhood.

[0160] In one embodiment, the clustering detection module 704 is further configured to use an object detection model to detect and define target objects in a color image to obtain a detection box; determine the behavioral state of the target object; generate multi-level semantic labels based on the behavioral state; and output a detection box with multi-level semantic labels.

[0161] In one embodiment, the target detection box includes a first target detection box and a second target detection box; the matching module 706 is further configured to project the depth point cloud cluster onto the plane where the color image is located to obtain a depth point cloud projection based on the depth point cloud cluster and the first target detection box; calculate the intersection-union ratio (IUGR) between the first target detection box and the depth point cloud projection; determine the target depth point cloud cluster based on the size of the IUGR; project the radar point cloud cluster onto the plane where the color image is located to obtain a radar point cloud projection and a second target detection box, wherein there is an intersection area between each radar point cloud projection and the second target detection box; and determine the radar point cloud cluster with the largest intersection area as the target radar point cloud cluster.

[0162] In one embodiment, the matching module 706 is further configured to compare each cross-union ratio with a preset upper limit value and a preset lower limit value; when the first target cross-union ratio is greater than the preset upper limit value, the depth point cloud cluster corresponding to the first target cross-union ratio is determined as the target depth point cloud cluster; the first target cross-union ratio belongs to at least one of the cross-union ratios; when the second target cross-union ratio is greater than the preset lower limit value and less than the preset upper limit value, the depth point cloud cluster corresponding to the second target cross-union ratio is clustered to obtain the target depth point cloud cluster; the second target cross-union ratio belongs to at least two of the cross-union ratios.

[0163] In one embodiment, such as Figure 8 As shown, the target detection box includes a first target detection box and a second target detection box; the device also includes:

[0164] The selection module 710 is used to track a target object based on the first target detection box and the target depth point cloud cluster when the first target detection box matches the target depth point cloud cluster and the second target detection box matches the target radar point cloud cluster; or, to track a target object based on the second target detection box and the target radar point cloud cluster.

[0165] The fusion module 712 is used to fuse the target depth point cloud cluster and the radar point cloud cluster to obtain a fused point cloud cluster when the Euclidean distance between the target depth point cloud cluster and the radar point cloud cluster is less than a preset distance threshold.

[0166] In one embodiment, the tracking module 708 is further configured to calculate a first matching cost between the current target detection box and the historically tracked target detection boxes, and a second matching cost between the current fused point cloud cluster and the historically tracked fused point cloud cluster; the current fused point cloud cluster is a point cloud cluster fused from the target depth point cloud cluster and the radar point cloud cluster; the first matching cost and the second matching cost are weighted and summed to obtain a comprehensive matching cost; a comprehensive matching result is determined based on the comprehensive matching cost; the comprehensive matching result is used to represent the matching situation between the current target detection box and the historically tracked target detection boxes, and between the current fused point cloud cluster and the historically tracked fused point cloud cluster; tracking information for tracking the target is determined according to the comprehensive matching result, and the target is tracked according to the tracking information.

[0167] In one embodiment, the tracking module 708 is further configured to update the status of the historically tracked target detection boxes and fused point cloud clusters based on the current target detection box and the current fused point cloud cluster when the comprehensive matching result is determined based on a comprehensive matching cost less than a preset loss threshold, thereby obtaining tracking information.

[0168] In the above embodiments, depth point clouds, color images, and radar point clouds obtained from collecting data on the target environment are acquired; density clustering is performed on the depth point clouds and radar point clouds respectively to obtain depth point cloud clusters and radar point cloud clusters; target objects are detected in the color images to obtain detection boxes with semantic labels; the detection boxes are matched with the depth point cloud clusters and radar point cloud clusters respectively to obtain target depth point cloud clusters and radar point cloud clusters that intersect with the target detection boxes; a 2D to 3D multi-sensor tracking scheme is constructed, which tracks the target object based on the target detection boxes and the fused point cloud clusters between the target depth point cloud clusters and radar point cloud clusters. By combining 2D and 3D tracking methods, the accuracy of the tracking system is improved.

[0169] Specific limitations regarding the target tracking device can be found in the limitations of the target tracking method described above, and will not be repeated here. Each module in the aforementioned target tracking device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in hardware or independently of the processor in a computer device, or stored in software in the memory of a computer device, so that the processor can call and execute the corresponding operations of each module.

[0170] In one embodiment, a computer device is provided, which may be a terminal or a server. In this embodiment, the computer device is described as a terminal, and its internal structure diagram is as follows. Figure 9 As shown, the computer device includes a processor, memory, communication interface, display screen, and input devices connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The communication interface is used for wired or wireless communication with external terminals; wireless communication can be achieved through Wi-Fi, carrier networks, NFC (Near Field Communication), or other technologies. When executed by the processor, the computer program implements a method for tracking a target object. The display screen can be an LCD screen or an e-ink screen. The input devices can be a touch layer covering the display screen, buttons, a trackball, or a touchpad mounted on the computer device casing, or an external keyboard, touchpad, or mouse.

[0171] Those skilled in the art will understand that Figure 9 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0172] In one embodiment, a computer device is also provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the above method embodiments.

[0173] In one embodiment, a computer-readable storage medium is provided storing a computer program that, when executed by a processor, implements the steps in the above method embodiments.

[0174] In one embodiment, a computer program product or computer program is provided, the computer program product or computer program including computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, causing the computer device to perform the steps in the above method embodiments.

[0175] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, or optical storage, etc. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM can be in various forms, such as static random access memory (SRAM) or dynamic random access memory (DRAM), etc.

[0176] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0177] The above embodiments merely illustrate several implementation methods of this application, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this patent application should be determined by the appended claims.

Claims

1. A method for tracking a target object, characterized in that, The method includes: Acquire depth point clouds, color images, and radar point clouds obtained from surveying the target environment; Density clustering is performed on the depth point cloud and the radar point cloud respectively to obtain depth point cloud clusters and radar point cloud clusters; and, target objects in the color image are detected to obtain detection boxes with semantic labels; the semantic labels are multi-level semantic labels generated based on the behavioral state of the target objects; The detection box is matched with the depth point cloud cluster and the radar point cloud cluster respectively to obtain the target depth point cloud cluster and the target radar point cloud cluster that intersect with the target detection box; Tracking the target object based on the target detection bounding box and the fused point cloud cluster between the target depth point cloud cluster and the target radar point cloud cluster includes: calculating a first matching cost between the current target detection bounding box and the historically tracked target detection bounding box, and a second matching cost between the current fused point cloud cluster and the historically tracked fused point cloud cluster; the current fused point cloud cluster is a point cloud cluster fused between the target depth point cloud cluster and the radar point cloud cluster; weighted summing of the first matching cost and the second matching cost to obtain a comprehensive matching cost; determining a comprehensive matching result based on the comprehensive matching cost; the comprehensive matching result is used to represent the matching situation between the current target detection bounding box and the historically tracked target detection bounding box, and between the current fused point cloud cluster and the historically tracked fused point cloud cluster; determining tracking information for tracking the target object based on the comprehensive matching result, and tracking the target object based on the tracking information; wherein the fused point cloud cluster is obtained by fusing the target depth point cloud cluster and the target radar point cloud cluster when the Euclidean distance between them is less than a preset distance threshold.

2. The method according to claim 1, characterized in that, The step of performing density clustering on the depth point cloud and the radar point cloud respectively to obtain depth point cloud clusters and radar point cloud clusters includes: In the depth point cloud and the radar point cloud, depth points and radar points are selected as target depth points and target radar points; Determine a first distance between the target depth point and its neighborhood, and a second distance between the target radar point and its neighborhood; When the first distance is less than the distance threshold, the target depth point is added to the neighborhood of the depth point; and when the second distance is less than the distance threshold, the target radar point is added to the neighborhood of the radar point. Traverse the depth points in the depth point cloud and the radar points in the radar point cloud until all depth points in the depth point cloud and radar points in the radar point cloud are added to the corresponding depth point neighborhood and radar point neighborhood, and obtain the depth point cloud cluster and radar point cloud cluster based on the depth point neighborhood and radar point neighborhood.

3. The method according to claim 1, characterized in that, The process of detecting target objects in the color image and obtaining detection boxes with semantic labels includes: The target objects in the color image are detected and bounded using a target detection model to obtain the detection box; Determine the behavioral state of the target object, and generate multi-level semantic tags based on the behavioral state; Output a detection box with the multi-level semantic labels.

4. The method according to claim 1, characterized in that, The target detection box includes a first target detection box and a second target detection box; The step of matching the detection box with the depth point cloud cluster and the radar point cloud cluster respectively to obtain the target depth point cloud cluster and the target radar point cloud cluster that intersect with the target detection box includes: The depth point cloud cluster is projected onto the plane containing the color image to obtain a depth point cloud projection based on the depth point cloud cluster and the first target detection box; the intersection-over-union ratio (IoU) between the first target detection box and the depth point cloud projection is calculated; and the target depth point cloud cluster is determined based on the magnitude of the IoU. The radar point cloud clusters are projected onto the plane containing the color image to obtain radar point cloud projections and the second target detection box. There are intersection areas between each radar point cloud projection and the second target detection box. The radar point cloud cluster with the largest intersection area is determined as the target radar point cloud cluster.

5. The method according to claim 4, characterized in that, The step of determining the target depth point cloud cluster based on the magnitude of the intersection-union ratio includes: Each intersection-union ratio is compared with a preset upper limit value and a preset lower limit value. When the first target intersection-union ratio is greater than the preset upper limit of the intersection-union ratio, the depth point cloud cluster corresponding to the first target intersection-union ratio is determined as the target depth point cloud cluster; the first target intersection-union ratio belongs to at least one of the intersection-union ratios; When the second target intersection-union ratio is greater than the preset lower limit of intersection-union ratio and less than the preset upper limit of intersection-union ratio, the depth point cloud clusters corresponding to the second target intersection-union are clustered to obtain target depth point cloud clusters; the second target intersection-union ratio belongs to at least two of the intersection-union ratios.

6. The method according to claim 1, characterized in that, The target detection box includes a first target detection box and a second target detection box; the method further includes: When the first target detection box matches the target depth point cloud cluster, and the second target detection box matches the target radar point cloud cluster, the target object is tracked based on the first target detection box and the target depth point cloud cluster; or... The target object is tracked based on the second target detection box and the target radar point cloud cluster.

7. The method according to claim 1, characterized in that, The tracking information for tracking the target object determined based on the comprehensive matching result includes: When the comprehensive matching result is determined based on a comprehensive matching cost less than a preset loss threshold, the state of the historically tracked target detection boxes and the fused point cloud clusters is updated according to the current target detection boxes and the current fused point cloud clusters to obtain tracking information.

8. A target tracking device, characterized in that, The device includes: The acquisition module is used to acquire depth point clouds, color images, and radar point clouds obtained from the collection of data on the target environment; The clustering detection module is used to perform density clustering on the depth point cloud and the radar point cloud respectively to obtain depth point cloud clusters and radar point cloud clusters; and to detect target objects in the color image to obtain detection boxes with semantic labels; the semantic labels are multi-level semantic labels generated based on the behavioral state of the target objects. The matching module is used to match the detection box with the depth point cloud cluster and the radar point cloud cluster respectively, to obtain the target depth point cloud cluster and radar point cloud cluster that intersect with the target detection box; A tracking module is used to track a target object based on the target detection bounding box and a fused point cloud cluster between the target depth point cloud cluster and the radar point cloud cluster. The tracking module includes: calculating a first matching cost between the current target detection bounding box and historically tracked target detection bounding boxes, and a second matching cost between the current fused point cloud cluster and historically tracked fused point cloud clusters; the current fused point cloud cluster is a point cloud cluster fused between the target depth point cloud cluster and the radar point cloud cluster; weighted summing of the first matching cost and the second matching cost to obtain a comprehensive matching cost; determining a comprehensive matching result based on the comprehensive matching cost; the comprehensive matching result represents the matching situation between the current target detection bounding box and historically tracked target detection bounding boxes, and between the current fused point cloud cluster and historically tracked fused point cloud clusters; determining tracking information for tracking the target object based on the comprehensive matching result, and tracking the target object based on the tracking information; wherein the fused point cloud cluster is obtained by fusing the target depth point cloud cluster and the target radar point cloud cluster when the Euclidean distance between them is less than a preset distance threshold.

9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, The processor is used to implement the steps of the method according to any one of claims 1 to 7 when it invokes and executes the computer program.

10. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is invoked and executed by the processor, it implements the steps of the method according to any one of claims 1 to 7.

11. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 7.