A multi-sensor fusion vehicle target tracking method, system and storage medium

By fusing local tracks from millimeter-wave radar and cameras, and using Hungarian matching and Kalman filtering to form a global track, the problem of insufficient stability and accuracy of single sensors under environmental changes is solved, thereby improving the stability and accuracy of vehicle target tracking.

CN115792894BActive Publication Date: 2026-06-12WUHAN UNIV OF SCI & TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
WUHAN UNIV OF SCI & TECH
Filing Date
2022-11-09
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

In existing technologies, single sensors are prone to failure under environmental changes, resulting in insufficient stability and accuracy in vehicle target tracking. Multi-sensor fusion methods have not yet been effectively complementary.

Method used

Local tracks are generated by millimeter-wave radar and cameras respectively, and then fused into a global track using Hungarian matching and Kalman filtering methods to ensure the stability and accuracy of target ID matching.

🎯Benefits of technology

It improves the stability and accuracy of vehicle target tracking, solves the problem of temporary failure of a single sensor due to environmental changes, and outputs more accurate global trajectory information.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115792894B_ABST
    Figure CN115792894B_ABST
Patent Text Reader

Abstract

The application discloses a kind of multi-sensor fusion vehicle target tracking method, system and storage medium, the method comprises: S100: obtaining the millimeter wave radar data and image data of vehicle driving environment;S200: according to radar data, obtain the local track of current period radar effective target;S300: according to image data, obtain the local track of camera target at current time;S400: radar and camera are time-space alignment, then radar effective target and camera target are matched, the local track of radar effective target and camera target that matching is successful is fused to obtain fusion global track, and the last time global track is updated using fusion global track.The application can improve the stability of vehicle tracking, and more accurate global track can be obtained.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of intelligent driving assistance technology, specifically relating to a multi-sensor fusion vehicle target tracking method, system, and storage medium. Background Technology

[0002] Currently, vehicles equipped with intelligent driving assistance functions are gradually entering traffic environments. When onboard sensors detect a hazard in the driving environment, they will enter a hazard avoidance state when necessary, using their own environmental perception-decision planning-control system to prevent accidents. Environmental perception is a fundamental prerequisite for intelligent driving; a good environmental perception system can provide reliable information for subsequent operations.

[0003] In environmental perception, sensors such as LiDAR, millimeter-wave radar, and cameras play a major role. LiDAR offers high resolution and imaging capabilities, and its rich point cloud information enables target classification and detection, acquiring 3D target information. However, it is susceptible to interference from rain and snow and is expensive, thus it has not yet been widely adopted in vehicles. Millimeter-wave radar has strong penetration and stability, can operate in harsh environments, and is low-cost. Although it lacks target resolution and has poor angular resolution, its accurate distance detection makes it an indispensable sensor for the development of intelligent driving vehicle functions. Cameras can acquire rich image information and identify target categories, making them a reliable vehicle perception sensor, but they have significant errors in measuring position information. Each single sensor has obvious shortcomings, while multi-sensor fusion can complement each other's advantages, representing a trend in environmental perception.

[0004] In vehicle detection and tracking technology, multi-sensor fusion is a key research focus. In existing intelligent driving assistance systems such as LKA (Lane Keeping Assist) and ACC (Adaptive Cruise Control), cameras and millimeter-wave radar are indispensable sensors. How to effectively fuse multiple sensors, complementing their respective advantages and disadvantages, and further improve the stability and accuracy of tracking is a crucial research challenge currently facing vehicle tracking. Summary of the Invention

[0005] The purpose of this application is to provide a multi-sensor fusion vehicle target tracking method, system, and storage medium, which can further improve the stability and accuracy of tracking.

[0006] To achieve the above objectives, the first aspect of this application provides a multi-sensor fusion vehicle target tracking method, comprising:

[0007] S100: Acquires millimeter-wave radar data and image data of the vehicle's driving environment;

[0008] S200: Extracts radar effective targets and their position and velocity information from radar data, assigns a unique ID to the extracted radar effective targets; then predicts and updates the radar effective target status, and obtains the local track of radar effective targets in the current period;

[0009] S300: Detects camera targets from image data, tracks camera targets and assigns them unique IDs, estimates the position and velocity information of camera targets, and obtains the local track of camera targets at the current moment.

[0010] S400: Aligns the radar and camera in time and space, then performs Hungarian matching on the radar effective targets and camera targets, fuses the local tracks of the successfully matched radar effective targets and camera targets to obtain the fused global trajectory, and uses the fused global trajectory to update the global trajectory of the previous moment.

[0011] In some specific implementations, in step S200, the Kalman filter method is used to predict and update the state of effective radar targets.

[0012] In some specific implementations, in step S300, the DeepSORT method is used to track the camera target.

[0013] In some specific embodiments, step S300 further includes:

[0014] S310: Detect camera targets from image data and obtain detection boxes;

[0015] S320: Using the detection box state as input, predict the tracking box of the detection box at the next time step; the detection box state includes at least the detection box position features, shape features, and the rate of change of the position features and shape features in the image;

[0016] S330: Matches the tracking box and detection box at the current moment, assigns a unique ID to the camera target obtained after matching, estimates the position and velocity information of the camera target, and forms a local track of the camera target.

[0017] In some specific implementations, the detection box position features include the center position of the detection box; the shape features include the aspect ratio and height of the detection box; the rate of change of the position features and shape features in the image refers to the rate of change of the center position, aspect ratio and height of the detection box relative to the previous moment.

[0018] In some specific embodiments, step S330 further includes:

[0019] S331: Calculate the Intersection over Union (IOU) of the detection box and the tracking box, and perform Hungarian matching using 1-IOU as the value of the cost matrix to obtain the successfully matched tracking box and detection box;

[0020] S332: Repeat sub-step S331 to perform continuous tracking. When the number of tracking times reaches the preset number N, the tracking box is in the confirmed state, and then sub-step S333 is executed.

[0021] S333: Perform cascade matching on the confirmed tracking boxes, calculate the distance metric between the tracking boxes and the detection boxes, generate a cost matrix based on the distance metric, and perform matching based on the cost matrix. When matching, prioritize matching the tracking boxes with the fewest loss counts and the unmatched detection boxes; then, execute sub-step S334; the distance metric refers to the linear weighted sum of the Mahalanobis distance and cosine distance between the tracking boxes and the detection boxes.

[0022] S334: Calculate the Intersection over Union (IOU) of unmatched tracking boxes and detection boxes, perform Hungarian matching using 1-IOU as the cost matrix value, output the matched and unmatched tracking boxes and detection boxes, assign a unique ID to the obtained camera target, if the camera target already existed in the previous time step, use the ID from the previous time step, estimate the position and velocity information of the camera target, and form a local track of the camera target.

[0023] In the above process, for successfully matched detection boxes and tracking boxes, the tracking count of the tracking box is incremented by 1, and the tracking box is updated using Kalman filtering with the detection box; for tracking boxes that fail to match, the loss count is incremented by 1, and when the loss count reaches the loss threshold, the tracking box is deleted.

[0024] In step S400 above, the radar and camera are aligned in time and space, specifically by aligning the time according to the scanning period of the millimeter-wave radar.

[0025] In some specific embodiments, step S400 further includes:

[0026] Using the combined difference in position and velocity between radar effective targets and camera targets as the cost matrix, Hungarian matching is performed on radar effective targets and camera targets to obtain the matching relationship between radar effective targets and camera targets; the successfully matched camera targets and radar effective targets are fused to form a global target.

[0027] After the initial matching is completed, the position and velocity information of the successfully matched targets are fused to form the global track information for the current moment, and the IDs of the successfully matched radar and camera targets are saved; in the next moment, the following is executed:

[0028] (1) When the camera target ID and radar effective target ID saved in the global trajectory at the previous moment correspond to the camera target ID and radar effective target ID in the current fused global trajectory, and the comprehensive difference between the radar effective target and the camera target is not greater than the set threshold, then the global trajectory at the previous moment is predicted and updated by Kalman filtering using the current fused global trajectory, and the tracking count is incremented by 1.

[0029] (2) When the camera target ID saved in the global track at the previous moment corresponds to the camera target ID in the current fused global trajectory, but the radar effective target ID does not correspond, the radar effective target ID in the global track at the previous moment is updated to the radar effective target ID in the current fused global trajectory. At the same time, it is determined whether the comprehensive difference between the radar effective target and the camera target is not greater than the set threshold. If so, the match is successful. Kalman filtering prediction and update of the global track at the previous moment is performed using the current fused global trajectory, and the tracking count is incremented by 1.

[0030] (3) When the camera target ID saved in the global track at the previous moment does not correspond to the camera target ID in the current fused global trajectory, but the radar effective target ID does correspond, the camera target ID in the global track at the previous moment is updated to the camera target ID in the current fused global trajectory; at the same time, it is determined whether the comprehensive difference between the radar effective target and the camera target is not greater than the set threshold. If so, the matching is successful, and the global track at the previous moment is predicted and updated by Kalman filtering with the current fused global trajectory, and the tracking count is incremented by 1.

[0031] (4) When the camera target ID and radar effective target ID saved in the previous global track do not correspond to the camera target ID and radar effective target ID in the current fused global track, the current fused global track is taken as the new global track and the number of tracking times is recorded as 0.

[0032] In some specific implementations, the global trajectory from the previous time step is predicted and updated using Kalman filtering based on the current fused global trajectory, specifically as follows:

[0033] The global trajectory and global track correspond to the global target at different times. A uniform velocity model is used to describe the motion state of the global target, which includes the position and velocity information of the global target. The global target at the previous time is predicted by Kalman filtering to obtain the state of the global target at the current time. The fused global trajectory obtained by fusing the local tracks of the successfully matched radar effective target and camera target is used as the observation vector. The state vector of the global target is updated by Kalman filtering, which completes the update of the global track.

[0034] A second aspect of this application provides a multi-sensor fusion vehicle target tracking system, comprising:

[0035] The first module is used to acquire millimeter-wave radar data and image data of the vehicle's driving environment;

[0036] The second module is used to extract effective radar targets and their position and velocity information from radar data, and assign a unique ID to the extracted effective radar targets; then it predicts and updates the state of effective radar targets to obtain the local tracks of effective radar targets in the current period.

[0037] The third module is used to detect camera targets from image data, track camera targets and assign them unique IDs, estimate the position and velocity information of camera targets, and obtain the local trajectory of camera targets at the current moment.

[0038] The fourth module is used to align the radar and camera in time and space, then perform Hungarian matching on the effective radar targets and camera targets, fuse the local tracks of the successfully matched effective radar targets and camera targets to obtain the fused global trajectory, and use the fused global trajectory to update the global trajectory of the previous moment.

[0039] In some specific implementations, the third module further includes the following sub-modules:

[0040] The first submodule is used to detect camera targets from image data and obtain detection boxes;

[0041] The second submodule is used to predict the tracking box of the detection box at the next time step, taking the detection box state as input; the detection box state includes at least the detection box position features, shape features, and the rate of change of the position features and shape features in the image;

[0042] The third submodule is used to match the tracking box and the detection box at the current moment, assign a unique ID to the camera target obtained after matching, estimate the position and velocity information of the camera target, and form a local track of the camera target.

[0043] A third aspect of this application provides a storage medium on which a computer program is stored, which, when executed by a processor, implements the above-described method.

[0044] Compared with the prior art, this application has the following advantages and beneficial effects:

[0045] In this application, millimeter-wave radar and a camera each generate their own local tracks, which are then merged to form a global track. The global track is used for ID matching and confirmation between the camera target and the effective radar target. Once a threshold number of tracking attempts is reached, the global track target is stably tracked. A threshold condition is set to prevent erroneous tracking due to incorrect matching. In this application, as long as two of the global track target, camera target, and radar target match, the information can be fused and the global track output.

[0046] This application can solve the problem of temporary failure of a single sensor due to environmental changes, and can improve the stability of vehicle tracking; the output global trajectory information is obtained by fusing the historical trajectory information of radar effective targets and camera targets, so the position and distance information is more accurate, that is, the obtained global trajectory is more accurate. Attached Figure Description

[0047] Figure 1This is a flowchart of the multi-sensor fusion vehicle target tracking method provided in the embodiments of this application;

[0048] Figure 2 This is a flowchart illustrating the camera target tracking process in an embodiment of this application.

[0049] Figure 3 This is a flowchart illustrating the target ID matching process between the camera and the millimeter-wave radar in an embodiment of this application. Detailed Implementation

[0050] The technical solutions in the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.

[0051] The terms "comprising" and "having," and any variations thereof, used in the specification, claims, and drawings of this application are intended to cover non-exclusive inclusion. References to "embodiment" herein mean that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly or implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments. The terms "module," "system," etc., used in this specification are used to denote computer-related entities, hardware, firmware, combinations of hardware and software, software, or software in execution.

[0052] This application presents a multi-sensor fusion vehicle target tracking method for detecting and tracking vehicles. Specifically, it utilizes millimeter-wave radar and cameras to collect millimeter-wave radar data and image data of the vehicle's driving environment, and obtains and outputs the target's global trajectory information based on the millimeter-wave radar data and image data.

[0053] The sensors used in this application include millimeter-wave radar and a camera, which are respectively installed in the front grille and inside the windshield of the vehicle to collect millimeter-wave radar data and image data from the front. The millimeter-wave radar and camera are jointly calibrated to ensure spatiotemporal alignment. A device for executing a target tracking method is used to receive the millimeter-wave radar data and image data, and to achieve target tracking based on the received millimeter-wave radar data and image data.

[0054] In this embodiment, time alignment is based on the scanning cycle of the millimeter-wave radar. When receiving millimeter-wave radar data for the current cycle, image data from the camera is received synchronously. In this embodiment, the camera's intrinsic parameters are first calibrated using the Zhang Zhengyou calibration method. Then, the coordinate systems of the camera and the millimeter-wave radar are transformed to the global coordinate system through rotation and translation to obtain the extrinsic parameter matrix. At this point, both the camera and the millimeter-wave radar are mapped to the global coordinate system, thus completing the spatial alignment of the camera and the millimeter-wave radar. The global coordinate system is defined as follows: with the center of the vehicle's rear axle as the origin, the vehicle's driving direction as the y-axis, the horizontal direction to the right perpendicular to the vehicle's driving direction as the x-axis, and the vertical upward direction perpendicular to the ground as the z-axis, wherein the x-axis and y-axis lie on the same horizontal plane.

[0055] Please see Figure 1 The diagram shown is a flowchart of a multi-sensor fusion vehicle target tracking method provided in an embodiment of this application. The embodiments of this application will be described in detail below with reference to the flowchart.

[0056] This application provides a multi-sensor fusion vehicle target tracking method, including the following steps:

[0057] S100: Acquires millimeter-wave radar data and image data of the driving environment collected by millimeter-wave radar and cameras.

[0058] In this embodiment, the millimeter-wave radar and camera are spatiotemporally registered vehicle sensors. When the vehicle is in motion, the millimeter-wave radar and camera can respectively collect millimeter-wave radar data and image data of the driving environment.

[0059] S200: Extracts effective radar targets and their position and velocity information from radar data, assigning a unique ID to each extracted effective radar target; then, it uses the Kalman filter method to predict and update the state of the effective radar targets, obtaining the local tracks of the effective radar targets in the current period. The period refers to the scanning period of the millimeter-wave radar.

[0060] In this embodiment, the radar local track includes at least the position information, speed information, ID, number of matches, and number of losses of the effective target, wherein the number of matches and the number of losses are initialized to 0. The local track may also include RCS (radar cross-section) and angle information, where the angle refers to the angle of the target vehicle relative to the millimeter-wave radar. The position information includes longitudinal distance and lateral distance, which are the y-values ​​and x-values ​​of the effective target in the radar coordinate system, respectively; the speed information includes longitudinal relative speed and lateral relative speed, where relative speed refers to the speed of the effective radar target relative to the vehicle; longitudinal relative speed refers to the component of speed in the direction of travel, and lateral relative speed refers to the component of speed perpendicular to the direction of travel.

[0061] The following provides a detailed implementation method for this step, including sub-steps:

[0062] S210: Extract effective radar targets and their position and velocity information from radar data.

[0063] First, target information, including its location and speed, is extracted from radar data. Then, noisy targets and stationary targets are removed, leaving other moving vehicles as valid targets.

[0064] The effective target can be determined using the following formula (1):

[0065]

[0066] In equation (1), y k+1 x k+1 v y,k+1 These represent the longitudinal distance, lateral distance, and longitudinal relative velocity of the target at time k+1, respectively; y k x k v y,k Let Δy, Δx, and Δv represent the longitudinal distance, lateral distance, and longitudinal relative velocity of the target at time k, respectively. y These represent the threshold values ​​corresponding to longitudinal distance, lateral distance, and longitudinal relative velocity, respectively.

[0067] The validity of a target is determined by the changes in its longitudinal distance, lateral distance, and longitudinal relative velocity between adjacent time points. Targets that satisfy equation (1) are valid targets, meaning that the position and velocity information of the same target should not change too much between adjacent time points. If the changes exceed the threshold, the target is considered a noisy target and should be eliminated.

[0068] S220: Assign a unique ID to each extracted valid target; if the valid target already exists in the previous period, use the ID from the previous period.

[0069] S230: Predict and update the state of effective radar targets, specifically using the Kalman filter method. The target state includes position and velocity information.

[0070] In this embodiment, a uniform velocity model is used to describe the state of the effective radar target, as follows:

[0071]

[0072] In formula (2):

[0073] X represents the state vector, X = [xyv] x v y ] T Where x and y represent the lateral and longitudinal distances of the effective target, respectively; v x v yLet A represent the lateral relative velocity and longitudinal relative velocity of the effective target, respectively; let ω represent the state transition matrix; let Z represent the observation vector; let H represent the observation matrix; and let ψ represent the observation noise.

[0074] Let the subscripts k and k+1 represent time points, then X k X k+1 Let ω represent the state vectors at times k and k+1, respectively; k+1 Z represents the process noise at time k+1; k+1 Let ψ represent the observation vectors at times k and k+1, respectively; k+1 This represents the observation noise at time k+1.

[0075] Predicting and updating the state of effective radar targets using the Kalman filter method involves obtaining the current state X of the effective radar targets. k According to the current state X k Predict the state X at the next moment k+1 .

[0076] S300: Detect camera targets from image data, track the camera targets and assign them unique IDs, estimate the position and velocity information of the camera targets, and obtain the local track of the camera targets at the current moment. In this embodiment, the DeepSORT method is used to track the camera targets.

[0077] The process for this step in the embodiments of this application is shown below. Figure 2 The following will combine Figure 2 The specific implementation process of this step is described as follows:

[0078] S310: Detect camera targets from image data and obtain detection boxes.

[0079] In this embodiment, a deep learning model is specifically used to detect camera targets and acquire detection boxes. Considering both accuracy and real-time performance, the YOLOv4 target detection model is preferably used.

[0080] S320: Using the detection box state as input, the Kalman filter method is used to predict the tracking box of the detection box at the next time step; the detection box state includes at least the detection box position features, shape features, and changes of position features and shape features in the image.

[0081] In this embodiment, the positional features include the center position of the detection box, the shape features include the aspect ratio and height of the detection box, the rate of change of the positional features in the image is the change of the center position in the image, and the rate of change of the shape features in the image is the change of the aspect ratio and height in the image.

[0082] In this embodiment of the application, the detection box state vector X' is represented as:

[0083]

[0084] In formula (3): (p u ,p v ) represents the pixel coordinates of the center of the detection box, γ represents the aspect ratio of the detection box, and h represents the height of the detection box. Indicates (p) u ,p v The speed in the image, This represents the changes in γ and h.

[0085] The measurement vector Z' is represented as:

[0086] Z' = [p u p v γ h] T (4)

[0087] S330: Matches the tracking box and detection box at the current moment, assigns a unique ID to the camera target obtained after matching, estimates the position and velocity information of the camera target, and forms a local track of the camera target.

[0088] In this embodiment of the application, the ground plane assumption method is used to obtain location information and estimate velocity information.

[0089] This step further includes:

[0090] S331: Calculate the Intersection over Union (IOU) of the detection box and the tracking box, use 1-IOU as the value of the cost matrix to perform Hungarian matching, obtain the associated target, the associated target is the successfully matched tracking box and detection box, and save the appearance feature of the detection box, which is the cosine distance between the detection box and the tracking box.

[0091] S332: Repeat sub-step S331 to perform continuous tracking. When the number of tracking times reaches the preset number N, the tracking box is in the confirmed state, and then sub-step S333 is executed.

[0092] S333: Perform cascade matching on the confirmed tracking boxes, calculate the distance metric between the tracking boxes and the detection boxes, generate a cost matrix based on the distance metric, and perform matching based on the cost matrix, prioritizing the matching of the tracking boxes with the fewest loss counts and the unmatched detection boxes; then, execute sub-step S334; here, the distance metric refers to the linear weighted sum of the Mahalanobis distance and cosine distance between the tracking boxes and the detection boxes.

[0093] S334: Calculate the Intersection over Union (IOU) of unmatched tracking boxes and detection boxes, perform Hungarian matching using 1-IOU as the cost matrix value, output the matched and unmatched tracking boxes and detection boxes, assign a unique ID to the obtained camera target, if the camera target already existed in the previous time step, use the ID from the previous time step, estimate the position and velocity information of the camera target, and form a local track of the camera target.

[0094] In this embodiment, for a successfully matched detection box and a tracking box, the tracking count of the tracking box is incremented by 1, and the tracking box is updated using Kalman filtering with the detection box; when the tracking box count reaches a preset number N, the tracking box is in a confirmed state; for a failed detection box, the tracking box loss count is incremented by 1, and when the loss count reaches a loss threshold, the tracking box is deleted.

[0095] In this embodiment, a local track of the camera target at the current moment is formed by locally tracking the detection box (i.e., the camera target). The local track of the camera target in this embodiment includes the target category, longitudinal relative velocity, lateral relative velocity, longitudinal distance, lateral distance, confidence level, number of matches, number of misses, ID, and the pixel coordinates of the detection box.

[0096] S400: Aligns the radar and camera in time and space, then performs Hungarian matching on the radar effective targets and camera targets, fuses the local tracks of the successfully matched radar effective targets and camera targets to obtain the fused global trajectory, and uses the fused global trajectory to update the global trajectory of the previous moment.

[0097] In this embodiment, temporal alignment is performed based on the scanning cycle of the millimeter-wave radar, and spatial alignment is calibrated through rotation and translation transformation.

[0098] In this embodiment, Hungarian matching is performed using the combined difference between the position and velocity of the effective radar target and the camera target as the cost matrix. The combined difference e is calculated as follows:

[0099]

[0100] In equation (5): σ, α, and β represent proportionality coefficients, and v y.e y e x e These represent the absolute differences in longitudinal relative velocity, longitudinal distance, and lateral distance between the camera target and the radar's effective target, respectively.

[0101] Using the comprehensive difference e as the cost matrix, Hungarian matching is performed on the camera target and the effective radar target based on the cost matrix to obtain the matching relationship between the camera target and the effective radar target. The successfully matched camera target and the effective radar target are fused to form a global target. It is determined whether the number of times the global target is tracked has reached a preset threshold. If it has not reached the threshold, stable tracking has not been formed; if it has, stable tracking has been formed.

[0102] When stable tracking is not established, the comprehensive difference between the current camera target and all radar targets is calculated, and Hungarian matching is performed to obtain the matching result. When stable tracking of the global target is established, if the IDs of the current global target's camera target and the effective radar target can be found in the local camera and radar targets, the comprehensive difference between the corresponding camera target and the radar target is calculated. The comprehensive difference between the corresponding camera target and the remaining radar targets is set to a certain constant; therefore, the camera target and the radar target can always match. At the same time, thresholds are set for the comprehensive differences of the longitudinal distance, lateral distance, and longitudinal relative velocity between the camera and the radar. If the difference exceeds the threshold n times consecutively, the tracking is terminated, the comprehensive difference between the camera target and all radar targets is recalculated, and Hungarian matching is performed again to avoid continuously tracking the wrong target.

[0103] After the initial matching is completed, the position and velocity information of the successfully matched targets are fused to form the global track information fusion for the current moment, and the IDs of the successfully matched camera targets and the radar-valid targets are saved. In the next moment, after obtaining the IDs of the successfully matched camera targets and the radar-valid targets, the IDs of the camera targets and the radar-valid targets saved in the global track corresponding to the previous moment are retrieved. This process involves four cases, which can be found in the following sections. Figure 3 :

[0104] (1) When the camera target ID and radar effective target ID saved in the global trajectory at the previous moment correspond to the camera target ID and radar effective target ID in the current fused global trajectory, and the comprehensive difference between the radar effective target and the camera target is not greater than the set threshold, the matching is successful. Kalman filtering prediction and update of the global trajectory at the previous moment is performed using the current fused global trajectory, and the tracking count is incremented by 1.

[0105] (2) When the camera target ID saved in the global track at the previous moment corresponds to the camera target ID in the current fused global trajectory, but the radar effective target ID does not correspond, the radar effective target ID in the global track at the previous moment is updated to the radar effective target ID in the current fused global trajectory. At the same time, it is determined whether the comprehensive difference between the radar effective target and the camera target is not greater than the set threshold. If so, the match is successful. Kalman filtering prediction and update of the global track at the previous moment is still performed using the current fused global trajectory, and the tracking count is incremented by 1.

[0106] (3) When the camera target ID saved in the global track at the previous moment does not correspond to the camera target ID in the current fused global trajectory, but the radar effective target ID does correspond, the camera target ID in the global track at the previous moment is updated to the camera target ID in the current fused global trajectory; at the same time, it is determined whether the comprehensive difference between the radar effective target and the camera target is not greater than the set threshold. If so, the matching is successful, and the global track at the previous moment is predicted and updated by Kalman filtering with the current fused global trajectory, and the tracking count is incremented by 1.

[0107] (4) When the camera target ID and radar effective target ID saved in the previous global track do not correspond to the camera target ID and radar effective target ID in the current fused global track, the current fused global track is taken as the new global track and the number of tracking times is recorded as 0.

[0108] In this embodiment of the application, the global trajectory at the previous time step is predicted and updated using the Kalman filter method, specifically as follows:

[0109] The global trajectory and global track correspond to the global target at different times; a uniform velocity model is used to describe the motion state of the global target, and the motion state equation is: This represents the state vector of the global objective at time k+1. Kalman filtering is used to predict the global target at time k to obtain the global target state at time k+1; the global trajectory information fused from the camera and radar (including at least longitudinal distance, lateral distance, longitudinal relative velocity, and lateral relative velocity) is used as the observation vector Z. k+1 Z k+1 Z represents the observation vector of the global trajectory target at time k+1; the input observation vector is Z. k+1 The global target state vector at time k+1 is updated using Kalman filtering, thus completing the update of the global trajectory.

[0110] The global track combines information from radar and cameras. Once a target is stably tracked, if the camera or radar cannot detect a stable target temporarily, as long as one sensor can detect it, it means that the target still exists.

[0111] Note that the above are merely preferred embodiments and the technical principles employed in this application. Those skilled in the art will understand that this application is not limited to the specific embodiments described herein, and various obvious changes, readjustments, and substitutions can be made without departing from the scope of protection of this application. Therefore, although this application has been described in detail through the above embodiments, this application is not limited to the above embodiments. Many other equivalent embodiments may be included without departing from the concept of this application, all of which fall within the scope of protection of this application.

Claims

1. A multi-sensor fusion method for vehicle target tracking, characterized in that, include: S100: Acquires millimeter-wave radar data and image data of the vehicle's driving environment; S200: Extracts radar effective targets and their position and velocity information from radar data, assigns a unique ID to the extracted radar effective targets; then predicts and updates the radar effective target status, and obtains the local track of radar effective targets in the current period; S300: Detects camera targets from image data, tracks camera targets and assigns them unique IDs, estimates the position and velocity information of camera targets, and obtains the local track of camera targets at the current moment. S400: Aligns the radar and camera in time and space, then performs Hungarian matching on the effective radar targets and camera targets, fuses the local tracks of the successfully matched effective radar targets and camera targets to obtain the fused global trajectory, and uses the fused global trajectory to update the global trajectory of the previous moment. Step S400 further includes: Using the combined difference in position and velocity between radar effective targets and camera targets as the cost matrix, Hungarian matching is performed on radar effective targets and camera targets to obtain the matching relationship between radar effective targets and camera targets; the successfully matched camera targets and radar effective targets are fused to form a global target. After the initial matching is completed, the position and velocity information of the successfully matched targets are fused to form the global track information for the current moment, and the IDs of the successfully matched radar and camera targets are saved; in the next moment, the following is executed: If the camera target ID and radar effective target ID saved in the global trajectory at the previous moment correspond to the camera target ID and radar effective target ID in the current fused global trajectory, and the comprehensive difference between the radar effective target and the camera target is not greater than the set threshold, then the global trajectory at the previous moment is predicted and updated using Kalman filtering based on the current fused global trajectory, and the tracking count is incremented by 1. When the camera target ID saved in the global track at the previous moment corresponds to the camera target ID in the current fused global trajectory, but the radar effective target ID does not correspond, the radar effective target ID in the global track at the previous moment is updated to the radar effective target ID in the current fused global trajectory. At the same time, it is determined whether the comprehensive difference between the radar effective target and the camera target is not greater than the set threshold. If so, the match is successful. Kalman filtering prediction and update of the global track at the previous moment is performed using the current fused global trajectory, and the tracking count is incremented by 1. When the camera target ID saved in the global track at the previous moment does not correspond to the camera target ID in the current fused global trajectory, but the radar effective target ID does correspond, the camera target ID in the global track at the previous moment is updated to the camera target ID in the current fused global trajectory; at the same time, it is determined whether the comprehensive difference between the radar effective target and the camera target is not greater than a set threshold. If so, the match is successful, and the global track at the previous moment is predicted and updated using the current fused global trajectory, and the tracking count is incremented by 1. When the camera target ID and radar effective target ID saved in the previous global track do not correspond to the camera target ID and radar effective target ID in the current fused global track, the current fused global track is taken as the new global track and the number of tracking is recorded as 0. The process of using the current fused global trajectory to predict and update the global trajectory from the previous moment using Kalman filtering is as follows: The global trajectory and global track correspond to the global target at different times. A uniform velocity model is used to describe the motion state of the global target, which includes the position and velocity information of the global target. The global target at the previous time is predicted by Kalman filtering to obtain the state of the global target at the current time. The fused global trajectory obtained by fusing the local tracks of the successfully matched radar effective target and camera target is used as the observation vector. The state vector of the global target is updated by Kalman filtering, which completes the update of the global track.

2. The multi-sensor fusion vehicle target tracking method as described in claim 1, characterized in that: Step S300 further includes: S310: Detect camera targets from image data and obtain detection boxes; S320: Using the detection box state as input, predict the tracking box of the detection box at the next time step; the detection box state includes at least the detection box position features, shape features, and the rate of change of the position features and shape features in the image; S330: Matches the tracking box and detection box at the current moment, assigns a unique ID to the camera target obtained after matching, estimates the position and velocity information of the camera target, and forms a local track of the camera target.

3. The multi-sensor fusion vehicle target tracking method as described in claim 2, characterized in that: The detection box position feature includes the center position of the detection box; the shape feature includes the aspect ratio and height of the detection box; the rate of change of the position feature and shape feature in the image refers to the rate of change of the center position, aspect ratio and height of the detection box relative to the previous moment.

4. The multi-sensor fusion vehicle target tracking method as described in claim 2, characterized in that: Step S330 further includes: S331: Calculate the Intersection over Union (IOU) of the detection box and the tracking box, and perform Hungarian matching using 1-IOU as the value of the cost matrix to obtain the successfully matched tracking box and detection box; S332: Repeat sub-step S331 to perform continuous tracking. When the number of tracking times reaches the preset number N, the tracking box is in the confirmed state, and then sub-step S333 is executed. S333: Perform cascade matching on the confirmed tracking boxes, calculate the distance metric between the tracking boxes and the detection boxes, generate a cost matrix based on the distance metric, and perform matching based on the cost matrix. When matching, prioritize matching the tracking boxes with the fewest loss counts and the unmatched detection boxes; then, execute sub-step S334; the distance metric refers to the linear weighted sum of the Mahalanobis distance and cosine distance between the tracking boxes and the detection boxes. S334: Calculate the Intersection over Union (IOU) of unmatched tracking boxes and detection boxes, perform Hungarian matching using 1-IOU as the cost matrix value, output the matched and unmatched tracking boxes and detection boxes, assign a unique ID to the obtained camera target, if the camera target already existed in the previous time step, use the ID from the previous time step, estimate the position and velocity information of the camera target, and form a local track of the camera target. In the above process, for successfully matched detection boxes and tracking boxes, the tracking count of the tracking box is incremented by 1, and the tracking box is updated using Kalman filtering with the detection box; for tracking boxes that fail to match, the loss count is incremented by 1, and when the loss count reaches the loss threshold, the tracking box is deleted.

5. The multi-sensor fusion vehicle target tracking method as described in claim 1, characterized in that: The step S400, which involves aligning the radar and camera in time and space, specifically involves aligning the time based on the scanning cycle of the millimeter-wave radar.

6. A storage medium, characterized in that: The storage medium stores a computer program that, when executed by a processor, implements the method as described in any one of claims 1 to 5.