Target tracking method and device, electronic equipment and storage medium
By combining a 3D target detection model with a Kalman filter, the problem of low accuracy in target tracking under complex environments in existing technologies is solved, achieving efficient target tracking in complex traffic environments and improving accuracy and robustness.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- WESTERN CHINA SCI CITY INNOVATION CENT OF INTELLIGENT & CONNECTED VEHICLES (CHONGQING) CO LTD
- Filing Date
- 2024-11-28
- Publication Date
- 2026-06-23
Smart Images

Figure CN119919460B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of autonomous driving technology, and in particular to a target tracking method, device, electronic device, and storage medium. Background Technology
[0002] Autonomous driving technology is a hot research topic in the automotive industry and technological development today. This technology plays a significant role in improving traffic safety, alleviating urban congestion, and promoting economic development. However, autonomous vehicles need to continuously perceive their own and surrounding vehicles' motion states under various environments and road conditions to ensure safe driving in complex and ever-changing road environments. Among related technologies, generative models (such as optical flow, particle filtering, and Meanshift algorithms) can be used to track targets. However, describing the target under tracking using a single mathematical model has significant limitations, resulting in relatively low accuracy. Summary of the Invention
[0003] To address the aforementioned technical problems, this application provides a target tracking method, apparatus, electronic device, and storage medium.
[0004] According to a first aspect of this application, a target tracking method is provided, comprising:
[0005] The motion state data of the first target at the previous moment is acquired. The motion state data of the first target includes the motion state data of the second target and the motion state data of the lost target. The motion state data of the second target is obtained by inputting the image of the vehicle's surroundings collected at the previous moment into a pre-trained 3D target detection model. The lost target is empty at the initial moment.
[0006] Images of the vehicle's surroundings are acquired at the current moment and input into the 3D target detection model to obtain the motion state data of the third target at the current moment;
[0007] By using a Kalman filter, the motion state data of the first target at the previous moment is predicted to obtain the motion state data of the first target at the current moment.
[0008] Based on the motion state data of the first and third targets at the current moment, the correlation cost value between each first target and each third target is determined, and a correlation cost matrix between the first and third targets is constructed based on the correlation cost value; wherein, the correlation cost value is related to the position, overlap and direction of the first and third targets;
[0009] Using the Hungarian algorithm, the globally optimal match between the first and third objectives is determined from the associated cost matrix;
[0010] Add the motion state data of each third target at the current moment to the corresponding trajectory data;
[0011] The first target that fails to match is identified as a lost target, and the lost targets that have been matched N times consecutively are deleted; N is an integer greater than 1.
[0012] The current moment is taken as the previous moment, the next moment is taken as the current moment, and the state of the Kalman filter is updated. The process returns to the step of acquiring the motion state data of the first target at the previous moment, until the acquisition of images around the vehicle stops.
[0013] Optionally, the method further includes:
[0014] Before predicting the motion state data of the first target at the previous moment using the Kalman filter, the bird's-eye view motion state data of the first target at the previous moment is constructed based on the motion state data of the first target at the previous moment.
[0015] The step of predicting the motion state data of the first target at the previous moment using a Kalman filter to obtain the motion state data of the first target at the current moment specifically includes:
[0016] By using a Kalman filter, the bird's-eye view motion state data of the first target at the previous moment is predicted to obtain the filtered bird's-eye view motion state data of the first target at the current moment.
[0017] The motion state data of the first target at the current moment is obtained by combining the filtered motion state data of the first target from the bird's-eye view at the current moment and the motion state data of the first target in the Z direction from the motion state data of the first target at the previous moment.
[0018] Optionally, the motion state data of the third target includes: the position coordinates, size, and orientation of the third target; the motion state data of the first target includes: the position coordinates, size, and orientation of the first target.
[0019] The step of determining the correlation value between each third target and each first target based on the motion state data of the third target and the first target at the current moment includes:
[0020] For each third target and each first target, calculate the positional cost between the third target and the first target based on the position coordinates of the third target and the position coordinates of the first target;
[0021] Calculate the bounding box of the third target based on its position coordinates and size; calculate the bounding box of the first target based on its position coordinates and size.
[0022] Calculate the cost of overlap between the bounding box of the third target and the bounding box of the first target;
[0023] Calculate the directional cost between the third and first objectives based on the directions of the third and first objectives;
[0024] The weighted average of the positional cost, overlap cost, and directional cost between the third objective and the first objective is taken as the correlation cost between the third objective and the first objective.
[0025] Optionally, the motion state data of the first target further includes: the perception confidence of the first target;
[0026] The construction of the correlation cost matrix between the third objective and the first objective includes:
[0027] For each first objective, the product of the association cost between the third objective and the first objective and the perceived confidence of the first objective at the previous moment is used as an element in the association cost matrix to construct the association cost matrix.
[0028] Optionally, calculating the positional cost between the third target and the first target based on the position coordinates of the third target and the position coordinates of the first target includes:
[0029] Assume p i p represents the current position coordinates of the third target i from a bird's-eye view. j This represents the position coordinates of the first target j in the bird's-eye view at the current moment;
[0030] According to the formula:
[0031] Calculate the positional cost P between the first target j and the third target i. (i,j) w is a pre-set value, which is greater than 0 and less than or equal to 2.
[0032] Optionally, calculating the overlap value between the bounding box of the third target and the bounding box of the first target includes:
[0033] Assume A i B represents the bounding box of the third target i in 3D space at the current moment. j This represents the bounding box of the first target j in three-dimensional space at the current moment;
[0034] According to the formula:
[0035] Calculate the overlap cost LIoU between the first target j and the third target i. (i,j) .
[0036] Optionally, calculating the directional cost between the third target and the first target based on the direction of the third target and the direction of the first target includes:
[0037] Assume F i F represents the direction vector of the third target i at the current time. j This represents the direction vector of the first target j at the current moment;
[0038] According to the formula: S (i-j) =1-cos(F i F j );
[0039] Calculate the directional cost S between the first objective j and the third objective i. (i,j) .
[0040] According to a second aspect of this application, a target tracking device is provided, comprising:
[0041] The first target motion state data acquisition module is used to acquire the motion state data of the first target at the previous moment. The motion state data of the first target includes: the motion state data of the second target and the motion state data of the lost target; the motion state data of the second target is obtained by inputting the image of the vehicle's surroundings collected at the previous moment into a pre-trained 3D target detection model; the lost target is empty at the initial moment.
[0042] The third target motion state data determination module is used to collect images of the vehicle's surroundings at the current moment and input the images into the 3D target detection model to obtain the motion state data of the third target at the current moment.
[0043] The Kalman filter module is used to predict the motion state data of the first target at the previous moment through the Kalman filter, and obtain the motion state data of the first target at the current moment.
[0044] The correlation cost determination module is used to determine the correlation cost between each first target and each third target based on the motion state data of the first target and the third target at the current moment; wherein, the correlation cost is related to the position, overlap and direction of the first target and the third target;
[0045] The correlation cost matrix construction module is used to construct the correlation cost matrix between the first objective and the third objective based on the correlation cost value.
[0046] The global optimal matching judgment module is used to determine the global optimal match between the first objective and the third objective from the association cost matrix using the Hungarian algorithm.
[0047] The trajectory addition module is used to add the motion state data of each third target at the current moment to the corresponding trajectory data;
[0048] The lost target update module is used to identify the first target that failed to match as a lost target, and delete the lost targets that have been matched N times consecutively; N is an integer greater than 1.
[0049] The update module is used to take the current moment as the previous moment, take the next moment as the current moment, update the state of the Kalman filter, and return to the first target motion state data acquisition module until the acquisition of images around the vehicle stops.
[0050] Optionally, the target tracking device further includes:
[0051] The bird's-eye view processing module is used to construct the bird's-eye view motion state data of the first target in the previous moment based on the motion state data of the first target in the previous moment before predicting the motion state data of the first target in the previous moment through the Kalman filter.
[0052] The Kalman filter module is specifically used to predict the bird's-eye view motion state data of the first target at the previous moment through the Kalman filter, and obtain the bird's-eye view motion state filtered data of the first target at the current moment; and use the Z-direction motion state data of the bird's-eye view motion state filtered data of the first target at the current moment and the motion state data of the first target at the previous moment as the motion state data of the first target at the current moment.
[0053] Optionally, the motion state data of the third target includes: the position coordinates, size, and orientation of the third target; the motion state data of the first target includes: the position coordinates, size, and orientation of the first target.
[0054] The association cost determination module is specifically used for each third target and each first target to calculate the position cost between the third target and the first target based on the position coordinates of the third target and the first target; to calculate the bounding box of the third target based on the position coordinates and size of the third target, and to calculate the bounding box of the first target based on the position coordinates and size of the first target; to calculate the overlap cost between the bounding boxes of the third target and the first target; to calculate the directional cost between the third target and the first target based on the direction of the third target and the direction of the first target; and to take the weighted average of the position cost, overlap cost, and directional cost between the third target and the first target as the association cost between the third target and the first target.
[0055] Optionally, the motion state data of the first target further includes: the perception confidence of the first target;
[0056] The construction of the correlation cost matrix between the third objective and the first objective includes:
[0057] For each first objective, the product of the association cost between the third objective and the first objective and the perceived confidence of the first objective at the previous moment is used as an element in the association cost matrix to construct the association cost matrix.
[0058] Alternatively, assume p i p represents the current position coordinates of the third target i from a bird's-eye view. j This represents the position coordinates of the first target j in the bird's-eye view at the current moment;
[0059] The correlation cost determination module is specifically used to calculate the positional cost between the third target and the first target through the following steps:
[0060] According to the formula:
[0061]
[0062] Calculate the positional cost P between the first target j and the third target i. (i,j) w is a pre-set value, which is greater than 0 and less than or equal to 2.
[0063] Optionally, the association cost determination module is specifically used to calculate the overlap cost between the bounding box of the third target and the bounding box of the first target through the following steps:
[0064] Assume A i B represents the bounding box of the third target i in 3D space at the current moment. j This represents the bounding box of the first target j in three-dimensional space at the current moment;
[0065] According to the formula:
[0066] Calculate the overlap cost LIoU between the first target j and the third target i. (i,j) .
[0067] Optionally, the correlation cost determination module is specifically used to calculate the directional cost between the third target and the first target through the following steps:
[0068] Assume F i F represents the direction vector of the third target i at the current time. j This represents the direction vector of the first target j at the current moment;
[0069] According to the formula: S (i,j) =1-cos(F i F j );
[0070] Calculate the directional cost S between the first objective j and the third objective i.(i,j) .
[0071] According to a third aspect of this application, an electronic device is provided, comprising: a processor configured to execute a computer program stored in a memory, wherein the computer program, when executed by the processor, implements the method described in the first aspect.
[0072] According to a fourth aspect of this application, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements the method described in the first aspect.
[0073] According to a fifth aspect of this application, a computer program product is provided that, when the computer program product is run on a computer, causes the computer to perform the method described in the first aspect.
[0074] The technical solution provided in this application has the following advantages compared with the prior art:
[0075] A 3D target detection model is used to detect targets in the image from the previous time step, obtaining the motion state data of the second target at that time step. This second target is then merged with previously unmatched targets (i.e., lost targets) to form the first target. Target detection is then performed on the image from the current time step, obtaining the motion state data of the third target at that time step. A Kalman filter is used to predict the motion state data of the first target from the previous time step, effectively handling noise present during perception and maintaining good anti-interference capability even in complex traffic environments. For each first and third target, the matching relationship between them is determined from multiple dimensions, including position, overlap, and directional correlation, thus more accurately determining the global optimal match and improving the accuracy and robustness of target tracking. Attached Figure Description
[0076] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.
[0077] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0078] Figure 1 This is a flowchart of a target tracking method in an embodiment of this application;
[0079] Figure 2 This is a schematic diagram of the target tracking device in one embodiment of this application;
[0080] Figure 3 This is a schematic diagram of the structure of an electronic device in an embodiment of this application. Detailed Implementation
[0081] To better understand the above-mentioned objectives, features, and advantages of this application, the solution of this application will be further described below. It should be noted that, unless otherwise specified, the embodiments and features described in these embodiments can be combined with each other.
[0082] Many specific details are set forth in the following description in order to provide a full understanding of this application, but this application may also be implemented in other ways different from those described herein; obviously, the embodiments in the specification are only some embodiments of this application, and not all embodiments.
[0083] See Figure 1 , Figure 1 This is a flowchart of a target tracking method in an embodiment of this application, which may include the following steps:
[0084] Step S102: Obtain the motion state data of the first target at the previous moment. The motion state data of the first target includes the motion state data of the second target and the motion state data of the lost target. The motion state data of the second target is obtained by inputting the image of the vehicle's surroundings collected at the previous moment into a pre-trained 3D target detection model.
[0085] As the vehicle moves, it can periodically acquire images of its surroundings. A 3D object detection model can process these images and identify the motion state data of a second target within them. This second target, identified from the image, can be one or more targets. The motion state data of the second target can include the position coordinates, size, and orientation of the first target.
[0086] A lost target refers to a target that is lost during target tracking; initially, the lost target is empty. As time progresses, the lost target is continuously updated. The second target and the lost target are merged into the first target to continue tracking the lost target.
[0087] Since target tracking is performed in real time, the 3D target detection model can process the acquired images in real time. That is, the images acquired in the previous moment have already been processed by the 3D target detection model. Therefore, in this loop, we can directly obtain the motion state data of the first target in the previous moment.
[0088] Step S104: Collect images of the area around the vehicle at the current moment and input the images into the 3D target detection model to obtain the motion state data of the third target at the current moment.
[0089] During this loop, the 3D target detection model processes the image around the vehicle at the current moment to obtain the motion state data of the third target. Similarly, the third target is also a target identified from the image, and there can be one or more targets. The motion state data of the third target may include: the position coordinates, size, and orientation of the first target, etc.
[0090] Step S106: The motion state data of the first target at the previous moment is predicted by using a Kalman filter to obtain the motion state data of the first target at the current moment.
[0091] The Kalman filter (KF) algorithm is a state-space model-based filtering algorithm primarily used for estimating and predicting system states. The KF algorithm estimates and predicts system states using both a system model and a measurement model, continuously updating the model to improve estimation accuracy. The core of the KF algorithm is the state-space model, which consists of two equations: the state transition equation and the observation equation. The state transition equation describes the dynamic changes in the system state, while the observation equation describes the relationship between the observed quantities and the system state.
[0092] Optionally, before predicting the motion state data of the first target at the previous moment using a Kalman filter, a bird's-eye view motion state data of the first target at the previous moment is constructed based on the motion state data of the first target at the previous moment. The bird's-eye view motion state data of the first target at the previous moment can be represented as: (p x v x p y v y ), (p x p y (v) represents the position coordinates of the first target from a bird's-eye view. x V y () indicates the velocity of the first target in the X and Y directions from a bird's-eye view.
[0093] Then, the bird's-eye view motion state data of the first target at the previous moment is predicted by the Kalman filter to obtain the bird's-eye view motion state filtered data of the first target at the current moment; the Z-direction motion state data of the first target at the current moment and the motion state data of the first target at the previous moment are used as the motion state data of the first target at the current moment.
[0094] The motion state of the vehicle at the current moment is predicted by the state transition equation, and the covariance matrix is updated.
[0095] Based on the following prediction model formula: Obtain prior estimates
[0096] Among them, F kRepresent the state transition equation, This represents the posterior estimate of the motion state data of the first target at time k-1 (i.e., the previous time). P represents the prior estimate of the motion state data of the first target at time k (i.e., the current time) through the state transition equation. k-1|k-1 Let P represent the state covariance matrix at time k-1. k|k-1 Let W represent the predicted state covariance matrix generated at time k. k The noise matrix representing the state variables, Q k The noise matrix represents the covariance.
[0097] According to the formula: The predicted motion state data of the first target at time k can be obtained.
[0098] Where H represents the observation matrix representation.
[0099] In this way, simplifying the motion state data in three-dimensional space to a BEV (Bird's Eye View) perspective can reduce the amount of computation and enable it to run efficiently on devices with limited resources.
[0100] Step S108: Based on the motion state data of the first target and the third target at the current moment, determine the correlation value between each first target and each third target, wherein the correlation value is related to the position, overlap and direction of the first target and the third target.
[0101] The motion state data of the first target includes: the position coordinates, size, and orientation of the first target; the motion state data of the third target includes: the position coordinates, size, and orientation of the third target.
[0102] For each third target and each first target, the positional cost between them can be calculated based on their coordinates. The closer the first target is to the third target, the smaller the corresponding positional cost.
[0103] Based on the position coordinates and dimensions of the third target, calculate its bounding box. Based on the position coordinates and dimensions of the first target, calculate its bounding box. Then, calculate the overlap value between the bounding boxes of the third and first targets. The greater the overlap between the bounding boxes of the first and third targets, the smaller the overlap value. Based on the directions of the third and first targets, calculate the directional value between them. The more consistent the directions of the third and first targets, the smaller the directional value.
[0104] Alternatively, assume A iB represents the bounding box of the third target i in 3D space at the current moment. j This represents the bounding box of the first target j in three-dimensional space at the current moment;
[0105] According to the formula:
[0106] Calculate the overlap cost LIoU between the first target j and the third target i. (i,j) .
[0107] Assume F i F represents the direction vector of the third target i at the current time. j This represents the direction vector of the first target j at the current moment;
[0108] According to the formula: S (i,j) =1-cos(F i F j );
[0109] Calculate the directional cost S between the first objective j and the third objective i. (i,j) .
[0110] Alternatively, assume p i p represents the current position coordinates of the third target i from a bird's-eye view. j This represents the position coordinates of the first target j in the bird's-eye view at the current moment;
[0111] According to the formula:
[0112] Calculate the positional cost P between the first target j and the third target i. (i,j) .
[0113] Since both the overlap value and the direction value are less than or equal to 1, the position value can be set to no greater than w, where w is a pre-set value that is greater than 0 and less than or equal to 2. For example, w can be 2. This avoids a decrease in the accuracy of the association value due to the position value being much greater than the overlap value and the direction value.
[0114] Finally, the weighted average of the positional cost, overlap cost, and directional cost between the third objective and the first objective is taken as the correlation cost between the third objective and the first objective, which can be expressed as the following formula:
[0115] c ij =w p P (i,j )+w LIoU LIoU (i,j) +w s S (i,j) ;
[0116] Among them, c ij W represents the cost-benefit relationship between the first objective j and the third objective i. p W LIoU W and Ws represent the corresponding weighting factors, respectively, representing the confidence level of each feature. LIoU and orientation have higher confidence levels when matching, so their weighting can be relatively higher. LIoU Ws and Ws are set to 0.5 and 0.3 respectively, while position awareness is greatly affected by environmental factors, so its weight can be relatively small. p Set it to 0.2.
[0117] Step S110: Construct the association cost matrix between the first objective and the third objective based on the association cost value.
[0118] By calculating the correlation cost between each first objective and each third objective, we can obtain the correlation cost matrix C, which can be represented as follows:
[0119]
[0120] Considering that 3D object detection models face varying difficulties in perceiving targets at different distances, a perception confidence level can be introduced into the cost matrix to improve the accuracy of target perception. For example, the farther away the perceived target is, the lower the perception confidence level; conversely, the closer the perceived target is, the higher the perception confidence level.
[0121] Optionally, the motion state data of the first target also includes: the perceived confidence of the first target. For each first target, the product of the association cost between the third target and the first target and the perceived confidence of the first target at the previous moment is used as an element in the association cost matrix to construct the association cost matrix.
[0122] After introducing perceived confidence, the resulting association cost representation is as follows:
[0123]
[0124] Among them, f j This represents the perceptual confidence level corresponding to the first target j.
[0125] Step S112: Using the Hungarian algorithm, determine the globally optimal match between the first objective and the third objective from the correlation cost matrix.
[0126] The Hungarian algorithm is used to find the globally optimal match on the correlation cost matrix, determining the best matching relationship between the first target at the previous time step and the third target at the current time step. In other words, it identifies the target that matches the first target and the third target from all first targets and all third targets. The matching results include: successfully matched first and third targets, unmatched first targets, and unmatched third targets.
[0127] Step S114: Add the motion state data of each third target at the current moment to the corresponding trajectory data.
[0128] For the first and third targets that match successfully, it means that the first target can be successfully tracked. For the third target that does not match successfully, it means that the third target is a newly appeared target, and the motion state data of the third target is the starting trajectory. Regardless of whether the third target matches successfully, its motion state data needs to be added to the corresponding trajectory data.
[0129] Step S116: The first target that failed to match is identified as the lost target, and the lost targets that have been matched N times consecutively are deleted; N is an integer greater than 1.
[0130] If the first target is not matched, it means that the first target existed at the previous moment, but does not exist at the current moment. In other words, the first target is a lost target and can be tracked again later.
[0131] If a lost target has been matched N times without success, it means that the lost target has not reappeared in the N consecutive frames after its last appearance. The lost target can be deleted and will no longer be tracked, in order to reduce the amount of computation and save storage space.
[0132] Step S118: Determine whether to stop acquiring images around the vehicle.
[0133] If the acquisition of images around the vehicle does not stop, proceed to step S120; if the acquisition of images around the vehicle stops, the process ends.
[0134] Step S120: Take the current time as the previous time, take the next time as the current time, and update the state of the Kalman filter.
[0135] The vehicle continuously acquires new images while driving, and achieves target tracking by updating the images, time, and the state of the Kalman filter, and executing the above-mentioned cyclic process.
[0136] The method for updating the state of the Kalman filter is as follows:
[0137] According to the Kalman gain calculation formula:
[0138]
[0139] Calculate the Kalman gain K at time k. k , where R k Let be the observation noise covariance matrix at time k.
[0140] The prior estimate is obtained through the above prediction model. Then, after processing through the observation model, the posterior estimate can be obtained, specifically based on the following observation model:
[0141]
[0142] Posterior estimation of the motion state data of the first target at time k and the state covariance matrix P at time k k|k Z k This represents the actual motion state data of the first target at time k, i.e., the motion state data obtained through the 3D target detection model.
[0143] The target tracking method in this application utilizes a 3D target detection model to detect targets in the image from the previous moment, obtaining the motion state data of the second target from the previous moment. The second target and previously unmatched targets (i.e., lost targets) are then merged into a first target. Target detection is then performed on the image from the current moment to obtain the motion state data of the third target from the current moment. Simplifying the motion state data in three-dimensional space to a BEV (Balance of Vehicle) perspective reduces computational load, enabling efficient operation on resource-constrained devices. Predicting the motion state data of the first target from the previous moment using a Kalman filter effectively handles noise present during perception, maintaining good anti-interference capability even in complex traffic environments. For each first target and each third target, the matching relationship between the first and third targets is determined from multiple dimensions such as position, overlap, and directional correlation, thereby more accurately determining the globally optimal match and improving the accuracy and robustness of target tracking.
[0144] It should be noted that although the steps of the method in this application are described in a specific order in the accompanying drawings, this does not require or imply that the steps must be performed in that specific order, or that all the steps shown must be performed to achieve the desired result. Additional or alternative steps may be omitted, multiple steps may be combined into one step, and / or one step may be broken down into multiple steps.
[0145] Corresponding to the above method embodiments, this application also provides a target tracking device, see [link to relevant documentation]. Figure 2 The target tracking device 200 includes:
[0146] The first target motion state data acquisition module 202 is used to acquire the motion state data of the first target at the previous moment. The motion state data of the first target includes: the motion state data of the second target and the motion state data of the lost target; the motion state data of the second target is obtained by inputting the image of the vehicle's surroundings collected at the previous moment into a pre-trained 3D target detection model; the lost target is empty at the initial moment.
[0147] The third target motion state data determination module 204 is used to acquire images of the vehicle's surroundings at the current moment and input the images into the 3D target detection model to obtain the motion state data of the third target at the current moment.
[0148] Kalman filter module 206 is used to predict the motion state data of the first target at the previous moment through Kalman filter and obtain the motion state data of the first target at the current moment.
[0149] The correlation cost determination module 208 is used to determine the correlation cost between each first target and each third target based on the motion state data of the first target and the third target at the current moment; wherein, the correlation cost is related to the position, overlap and direction of the first target and the third target;
[0150] The correlation cost matrix construction module 210 is used to construct the correlation cost matrix between the first objective and the third objective based on the correlation cost value.
[0151] The global optimal matching judgment module 212 is used to determine the global optimal match between the first objective and the third objective from the association cost matrix using the Hungarian algorithm;
[0152] The trajectory adding module 214 adds the motion state data of each third target at the current moment to the corresponding trajectory data;
[0153] The lost target update module 216 is used to identify the first target that failed to match as a lost target and delete the lost targets that have been matched N times consecutively; N is an integer greater than 1.
[0154] The update module 218 is used to take the current moment as the previous moment, take the next moment as the current moment, update the state of the Kalman filter, and return to the first target motion state data acquisition module 202 until the acquisition of images around the vehicle stops.
[0155] Optionally, the target tracking device 200 further includes:
[0156] The bird's-eye view processing module is used to construct the bird's-eye view motion state data of the first target in the previous moment based on the motion state data of the first target in the previous moment before predicting the motion state data of the first target in the previous moment through the Kalman filter.
[0157] The Kalman filter module 206 is specifically used to predict the bird's-eye view motion state data of the first target at the previous moment through the Kalman filter, and obtain the bird's-eye view motion state filtered data of the first target at the current moment; and use the Z-direction motion state data of the bird's-eye view motion state filtered data of the first target at the current moment and the motion state data of the first target at the previous moment as the motion state data of the first target at the current moment.
[0158] Optionally, the motion state data of the third target includes: the position coordinates, size, and orientation of the third target; the motion state data of the first target includes: the position coordinates, size, and orientation of the first target.
[0159] The association cost determination module 208 is specifically used for each third target and each first target to calculate the position cost between the third target and the first target based on the position coordinates of the third target and the first target; to calculate the bounding box of the third target based on the position coordinates and size of the third target, and to calculate the bounding box of the first target based on the position coordinates and size of the first target; to calculate the overlap cost between the bounding boxes of the third target and the first target; to calculate the directional cost between the third target and the first target based on the direction of the third target and the direction of the first target; and to take the weighted average of the position cost, overlap cost, and directional cost between the third target and the first target as the association cost between the third target and the first target.
[0160] Optionally, the motion state data of the first target also includes: the perception confidence of the first target;
[0161] The association cost matrix construction module 210 is specifically used to construct the association cost matrix for each first objective by multiplying the association cost between the third objective and the first objective by the perceived confidence of the first objective at the previous moment.
[0162] Alternatively, assume p i p represents the current position coordinates of the third target i from a bird's-eye view. j This represents the position coordinates of the first target j in the bird's-eye view at the current moment;
[0163] The associated cost determination module 208 is specifically used to calculate the positional cost between the third target and the first target through the following steps:
[0164] According to the formula:
[0165] Calculate the positional cost P between the first target j and the third target i. (i,j) w is a pre-set value, which is greater than 0 and less than or equal to 2.
[0166] Optionally, the association cost determination module 208 is specifically used to calculate the overlap cost between the bounding box of the third target and the bounding box of the first target through the following steps:
[0167] Assume A i B represents the bounding box of the third target i in 3D space at the current moment. j This represents the bounding box of the first target j in three-dimensional space at the current moment;
[0168] According to the formula:
[0169] Calculate the overlap cost LIoU between the first target j and the third target i. (i,j) .
[0170] Optionally, the associated cost determination module 208 is specifically used to calculate the directional cost between the third target and the first target through the following steps:
[0171] Assume F i F represents the direction vector of the third target i at the current time. j This represents the direction vector of the first target j at the current moment;
[0172] According to the formula: S (i,j) =1-cos(F i F j );
[0173] Calculate the directional cost S between the first objective j and the third objective i. (i,j) .
[0174] The specific details of each module or unit in the above-mentioned device have been described in detail in the corresponding methods, so they will not be repeated here.
[0175] It should be noted that although several modules or units for the device used to perform actions have been mentioned in the detailed description above, this division is not mandatory. In fact, according to the embodiments of this application, the features and functions of two or more modules or units described above can be embodied in one module or unit. Conversely, the features and functions of one module or unit described above can be further divided and embodied by multiple modules or units.
[0176] This application also provides an electronic device, see [link to relevant documentation] Figure 3 , Figure 3 The present invention provides a structural diagram of an electronic device according to an embodiment of the present application, which includes: a processor 301, a communication interface 302, a memory 303, and a communication bus 304, wherein the processor 301, the communication interface 302, and the memory 303 communicate with each other through the communication bus 304;
[0177] Memory 303 is used to store computer programs;
[0178] The processor 301 is used to implement the target tracking method described above when executing the program stored in the memory 303.
[0179] It should be noted that the communication bus 304 mentioned in the above electronic device can be a PCI (Peripheral Component Interconnect) bus or an EISA (Extended Industry Standard Architecture) bus, etc. The communication bus 304 can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 3 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.
[0180] Communication interface 302 is used for communication between the above-mentioned electronic device and other devices.
[0181] The memory 303 may include RAM (Random Access Memory) or non-volatile memory, such as at least one disk storage device. Optionally, the memory 303 may also be at least one storage device located remotely from the aforementioned processor.
[0182] The processor 301 mentioned above can be a general-purpose processor, including: CPU (Central Processing Unit), NP (Network Processor), etc.; it can also be DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array) or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
[0183] This application also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the above-described target tracking method.
[0184] It should be noted that the computer-readable storage medium shown in this application can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard disks, random access memory, read-only memory, erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this application, the computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. The program code contained on the computer-readable storage medium can be transmitted using any suitable medium, including but not limited to: wireless, wire, optical fiber, radio frequency, etc., or any suitable combination thereof.
[0185] In this embodiment of the application, a computer program product is also provided, which, when run on a computer, causes the computer to execute the above-described target tracking method.
[0186] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0187] The above description is merely a specific embodiment of this application, enabling those skilled in the art to understand or implement this application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application is not to be limited to the embodiments described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A target tracking method, characterized in that, include: The motion state data of the first target at the previous moment is acquired. The motion state data of the first target includes the motion state data of the second target and the motion state data of the lost target. The motion state data of the second target is obtained by inputting the image of the vehicle's surroundings collected at the previous moment into a pre-trained 3D target detection model. The lost target is empty at the initial moment. Images of the vehicle's surroundings are acquired at the current moment and input into the 3D target detection model to obtain the motion state data of the third target at the current moment; By using a Kalman filter, the motion state data of the first target at the previous moment is predicted to obtain the motion state data of the first target at the current moment. Based on the motion state data of the first and third targets at the current moment, the correlation cost value between each first target and each third target is determined, and a correlation cost matrix between the first and third targets is constructed based on the correlation cost value; wherein, the correlation cost value is related to the position, overlap and direction of the first and third targets; Using the Hungarian algorithm, the globally optimal match between the first and third objectives is determined from the associated cost matrix; Add the motion state data of each third target at the current moment to the corresponding trajectory data; The first target that fails to match is identified as a lost target, and the lost targets that have been matched N times consecutively are deleted; N is an integer greater than 1. The current moment is taken as the previous moment, the next moment is taken as the current moment, and the state of the Kalman filter is updated. The process returns to the step of acquiring the motion state data of the first target at the previous moment, until the acquisition of images around the vehicle stops.
2. The method according to claim 1, characterized in that, The method further includes: Before predicting the motion state data of the first target at the previous moment using the Kalman filter, the bird's-eye view motion state data of the first target at the previous moment is constructed based on the motion state data of the first target at the previous moment. The step of predicting the motion state data of the first target at the previous moment using a Kalman filter to obtain the motion state data of the first target at the current moment specifically includes: By using a Kalman filter, the bird's-eye view motion state data of the first target at the previous moment is predicted to obtain the filtered bird's-eye view motion state data of the first target at the current moment. The motion state data of the first target at the current moment is obtained by combining the filtered motion state data of the first target from the bird's-eye view at the current moment and the motion state data of the first target in the Z direction from the motion state data of the first target at the previous moment.
3. The method according to claim 1, characterized in that, The motion state data of the third target includes: the position coordinates, size, and orientation of the third target; the motion state data of the first target includes: the position coordinates, size, and orientation of the first target. The step of determining the correlation value between each third target and each first target based on the motion state data of the third target and the first target at the current moment includes: For each third target and each first target, calculate the positional cost between the third target and the first target based on the position coordinates of the third target and the position coordinates of the first target; Calculate the bounding box of the third target based on its position coordinates and size; calculate the bounding box of the first target based on its position coordinates and size. Calculate the cost of overlap between the bounding box of the third target and the bounding box of the first target; Calculate the directional cost between the third and first objectives based on the directions of the third and first objectives; The weighted average of the positional cost, overlap cost, and directional cost between the third objective and the first objective is taken as the correlation cost between the third objective and the first objective.
4. The method according to claim 3, characterized in that, The motion state data of the first target also includes: the perception confidence of the first target; The construction of the correlation cost matrix between the third objective and the first objective includes: For each first objective, the product of the association cost between the third objective and the first objective and the perceived confidence of the first objective at the previous moment is used as an element in the association cost matrix to construct the association cost matrix.
5. The method according to claim 3, characterized in that, The step of calculating the positional cost between the third target and the first target based on the position coordinates of the third target and the first target includes: Assume p i p represents the current position coordinates of the third target i from a bird's-eye view. j This represents the position coordinates of the first target j in the bird's-eye view at the current moment; According to the formula: Calculate the positional cost P between the first target j and the third target i. (i,j) w is a pre-set value, which is greater than 0 and less than or equal to 2.
6. The method according to claim 3, characterized in that, The calculation of the overlap cost between the bounding box of the third target and the bounding box of the first target includes: Assume A i B represents the bounding box of the third target i in 3D space at the current moment. j This represents the bounding box of the first target j in three-dimensional space at the current moment; According to the formula: Calculate the overlap cost LIoU between the first target j and the third target i. (i,j) .
7. The method according to claim 3, characterized in that, The step of calculating the directional cost between the third target and the first target based on the directions of the third target and the first target includes: Assume F i F represents the direction vector of the third target i at the current time. j This represents the direction vector of the first target j at the current moment; According to the formula: S (i,j) =1-cos(F i F j ); Calculate the directional cost S between the first objective j and the third objective i. (i,j) .
8. A target tracking device, characterized in that, The device includes: The first target motion state data acquisition module is used to acquire the motion state data of the first target at the previous moment. The motion state data of the first target includes: the motion state data of the second target and the motion state data of the lost target; the motion state data of the second target is obtained by inputting the image of the vehicle's surroundings collected at the previous moment into a pre-trained 3D target detection model; the lost target is empty at the initial moment. The third target motion state data determination module is used to collect images of the vehicle's surroundings at the current moment and input the images into the 3D target detection model to obtain the motion state data of the third target at the current moment. The Kalman filter module is used to predict the motion state data of the first target at the previous moment through the Kalman filter, and obtain the motion state data of the first target at the current moment. The correlation cost determination module is used to determine the correlation cost between each first target and each third target based on the motion state data of the first target and the third target at the current moment; wherein, the correlation cost is related to the position, overlap and direction of the first target and the third target; The correlation cost matrix construction module is used to construct the correlation cost matrix between the first objective and the third objective based on the correlation cost value. The global optimal matching judgment module is used to determine the global optimal match between the first objective and the third objective from the association cost matrix using the Hungarian algorithm. The trajectory addition module is used to add the motion state data of each third target at the current moment to the corresponding trajectory data; The lost target update module is used to identify the first target that failed to match as a lost target, and delete the lost targets that have been matched N times consecutively; N is an integer greater than 1. The update module is used to take the current moment as the previous moment, take the next moment as the current moment, update the state of the Kalman filter, and return to the first target motion state data acquisition module until the acquisition of images around the vehicle stops.
9. An electronic device, characterized in that, include: A processor for executing a computer program stored in a memory, wherein the computer program, when executed by the processor, implements the method of any one of claims 1-7.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method described in any one of claims 1-7.