A target tracking method, device and electronic equipment
By combining FMCW lidar with a video camera, a cost matrix is constructed, and Kalman filtering and Hungarian algorithms are used for target tracking. This solves the problem of low accuracy in tracking pedestrians and non-motorized vehicles in existing technologies and improves the accuracy of small target identification and tracking.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG UNIV
- Filing Date
- 2023-02-09
- Publication Date
- 2026-06-16
AI Technical Summary
In existing technologies, target tracking methods based on TOF lidar and video cameras have good detection performance for large targets such as vehicles, but their accuracy in tracking pedestrians and non-motorized vehicles is low. Problems such as the tracking object disappearing without reason and ID jumping often occur, especially in urban environments, leading to a high proportion of pedestrian injuries and deaths in traffic accidents.
By combining FMCW lidar with a video camera, the cost matrix of the Hungarian algorithm is constructed by acquiring the velocity, acceleration, micro-Doppler features and image category of the target object. This matrix is then combined with the Kalman filter algorithm for target tracking, thereby improving the tracking accuracy of small targets.
It improves the tracking accuracy of pedestrians and non-motorized vehicles, reduces the ID jump rate, and enhances the ability to identify and track small targets, especially the ability to distinguish between pedestrians and non-motorized vehicles in urban environments.
Smart Images

Figure CN116338713B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of intelligent transportation technology, and in particular to a target tracking method, device, and electronic device. Background Technology
[0002] In intelligent transportation, multi-sensor fusion is commonly used for target detection, including fusing data from LiDAR and video cameras. In existing technologies, whether on the vehicle or roadside, LiDAR uses data from TOF (Time of Flight) LiDAR and video cameras as input to a deep learning model to learn and identify targets.
[0003] Target tracking methods based on TOF lidar and video cameras perform well in detecting large targets such as vehicles, but their tracking performance is poor for pedestrians and non-motorized vehicles. Problems frequently arise, such as the tracking object disappearing without explanation or the same object exhibiting inconsistent IDs (ID jumps). Pedestrians and non-motorized vehicles are important but vulnerable road users, especially in urban environments. Traffic accidents involving pedestrians and non-motorized vehicles often result in serious injuries or deaths, with pedestrians accounting for over 20% of traffic fatalities annually. To reduce such injuries and deaths, we need more accurate tracking of pedestrians and non-motorized vehicles on the roadside for more accurate prediction and traffic guidance. Summary of the Invention
[0004] This invention provides a target tracking method, apparatus, and electronic device to solve the technical problem of low tracking accuracy of pedestrians and / or non-motorized vehicles on roads in the prior art.
[0005] In a first aspect, the present invention provides a target tracking method applied to a roadside perception system, the roadside perception system comprising at least one set of FMCW lidar and video cameras, wherein the FMCW lidar and the video cameras are disposed on the roadside and their perception areas overlap, the method comprising:
[0006] Traffic object identification is performed on the current perception data collected by the FMCW lidar and the video camera respectively to obtain target objects with a size smaller than a size threshold;
[0007] Based on the current sensing data of the FMCW lidar, the target velocity, target acceleration, target micro-Doppler features, and target position of the target object are obtained. The target micro-Doppler features include the torso Doppler frequency and the total bandwidth of the Doppler signal.
[0008] The image category of the target object is obtained based on the current perception data of the video camera;
[0009] The cost matrix of the Hungarian algorithm is constructed based on the target velocity, the target acceleration, the target microDoppler features, the target location, and the image category.
[0010] The target object is tracked using the Kalman filter algorithm and the Hungarian algorithm.
[0011] Optionally, the target micro-Doppler features may also include: trunk Doppler bandwidth and limb movement cycle.
[0012] Optionally, a cost matrix for the Hungarian algorithm is constructed based on the target velocity, the target acceleration, the target micro-Doppler features, the target position, and the image category, including:
[0013] The Mahalanobis distance between the predicted and detected values of the target object is calculated based on the target velocity, the target acceleration, and the target position.
[0014] Calculate the cosine distance between the predicted attribute and the detected attribute of the target object based on the target micro-Doppler features and the image category;
[0015] The cost matrix is constructed based on the Mahalanobis distance and the cosine distance.
[0016] Optionally, the tracking of the target object using the Kalman filter algorithm and the Hungarian algorithm includes:
[0017] Based on the previous tracking trajectory of the current sensing data, the predicted value and the predicted attribute are calculated using the Kalman filter algorithm.
[0018] Based on the cost matrix, the target object and the tracking trajectory are matched using the Hungarian algorithm, and tracking is performed based on the matching results.
[0019] Optionally, the matching of the target object and the tracking trajectory based on the cost matrix using the Hungarian algorithm includes:
[0020] The Mahalanobis distance and the cosine distance are weighted and summed based on preset hyperparameters;
[0021] Determine whether the sum obtained by the weighted summation is greater than the first matching threshold;
[0022] If the value is greater than the target object, it is confirmed that the target object matches the tracking trajectory; if the value is not greater than the target object, it is determined whether the cosine distance is greater than the second matching threshold, and the second matching threshold is greater than the first matching threshold.
[0023] If the cosine distance is greater than the second matching threshold, the target object is confirmed to match the tracking trajectory.
[0024] Secondly, the present invention provides a target tracking device applied to a roadside perception system, the roadside perception system including at least one set of FMCW lidar and video camera, wherein the FMCW lidar and the video camera are disposed on the roadside and their perception areas overlap, the device comprising:
[0025] The identification unit is used to identify traffic objects from the current perception data collected by the FMCW lidar and the video camera, respectively, and to obtain target objects with a size smaller than a size threshold.
[0026] The feature extraction unit is used to obtain the target velocity, target acceleration, target micro-Doppler features and target position of the target object based on the current perception data of the FMCW lidar. The target micro-Doppler features include the torso Doppler frequency and the total bandwidth of the Doppler signal.
[0027] The recognition unit is also used to obtain the image category of the target object based on the current perception data of the video camera;
[0028] A construction unit is used to construct the cost matrix of the Hungarian algorithm based on the target velocity, the target acceleration, the target microDoppler features, the target position, and the image category;
[0029] The tracking unit is used to track the target object using the Kalman filter algorithm and the Hungarian algorithm.
[0030] Optionally, the target micro-Doppler features may also include: trunk Doppler bandwidth and limb movement cycle.
[0031] Optionally, the building unit is specifically used for:
[0032] The Mahalanobis distance between the predicted and detected values of the target object is calculated based on the target velocity, the target acceleration, and the target position.
[0033] Calculate the cosine distance between the predicted attribute and the detected attribute of the target object based on the target micro-Doppler features and the image category;
[0034] The cost matrix is constructed based on the Mahalanobis distance and the cosine distance.
[0035] Thirdly, the present invention provides an electronic device including a memory and one or more programs, wherein one or more programs are stored in the memory and configured to be executed by one or more processors to implement any of the methods described in the first aspect.
[0036] Fourthly, the present invention provides a computer-readable storage medium having a computer program stored thereon that, when executed by a processor, implements any of the methods described in the first aspect.
[0037] The above-described one or more technical solutions of this invention have at least the following technical effects:
[0038] This application provides a target tracking method that uses an FMCW lidar and a video camera as roadside perception devices. For target objects smaller than a size threshold, the method extracts the target object's velocity, acceleration, and position information based on the FMCW lidar's perception data, while also extracting the target's micro-Doppler features. Based on the video camera's perception data, the method obtains the target object's image category. The cost matrix of the Hungarian algorithm is constructed based on the target object's velocity, acceleration, position, micro-Doppler features, and image category, thus reconstructing the Hungarian algorithm's input. This makes the matching features closer to the target object itself during matching. The target object is then tracked using a Kalman filter algorithm and the reconstructed Hungarian algorithm, significantly improving the accuracy of target object tracking and solving the technical problem of low accuracy in pedestrian and / or non-motorized vehicle detection and tracking in existing technologies. Furthermore, this embodiment also distinguishes between pedestrians and pedestrians, and between pedestrians and non-motorized vehicles, by using the torso Doppler frequency and the total bandwidth of the Doppler signal, further improving the tracking accuracy of small targets on the road. Attached Figure Description
[0039] Figure 1 A schematic diagram of a roadside sensing system provided in an embodiment of this application;
[0040] Figure 2 A flowchart illustrating a target tracking method provided in an embodiment of this application;
[0041] Figure 3 A schematic diagram of a target tracking device provided in an embodiment of this application;
[0042] Figure 4 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation
[0043] Before introducing the embodiments of this disclosure, it should be noted that:
[0044] Some embodiments of this disclosure are described as processing flows. Although the various operational steps of the flow may be numbered sequentially, the operational steps may be performed in parallel, concurrently, or simultaneously.
[0045] The term “and / or” may be used in embodiments of this disclosure, and “and / or” includes any and all combinations of one or more of the associated features listed.
[0046] It should be understood that when describing the connection or communication relationship between two components, unless it is explicitly stated that the two components are directly connected or communicate directly, the connection or communication between the two components can be understood as a direct connection or communication, or it can be understood as an indirect connection or communication through an intermediate component.
[0047] To make the technical solutions and advantages of the embodiments of this disclosure clearer, the exemplary embodiments of this disclosure will be described in further detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this disclosure, and not an exhaustive list of all embodiments. It should be noted that, unless otherwise specified, the embodiments and features in the embodiments of this disclosure can be combined with each other.
[0048] Example 1
[0049] Please refer to Figure 1 This embodiment provides a roadside perception system, including: an FMCW (Frequency Modulated Continuous Wave) lidar, a video camera, a roadside computing module, and a perception result data communication module. The roadside computing module can be an edge computing device, and the perception result data communication module can be a 5G base station, an IoT base station, or other device capable of data communication. The FMCW lidar and video camera are mounted on a bracket on the roadside. There can be n groups of FMCW lidar and video cameras, where n≥1. Each group of FMCW lidar and video cameras includes at least one FMCW lidar and at least one video camera, and their perception areas completely or partially overlap, meaning that the data collected by each group of FMCW lidar and video camera can be fused and detected.
[0050] FMCW lidar, without considering target Doppler frequency shift, exhibits consistent beat frequencies between the emitted and echo signals for stationary targets at the same radial distance. However, the beat frequencies differ at different radial distances, and the beat frequency signals also differ if the target exhibits Doppler frequency shift. Therefore, FMCW lidar is highly sensitive to target distance and velocity parameters. The inventors have discovered that small targets, such as pedestrians and non-motorized vehicles, possess significant micro-Doppler characteristics. However, pedestrians and / or non-motorized vehicles on roads have more variable positions and significantly lower resolution compared to vehicles, resulting in poorer target tracking accuracy. Therefore, this embodiment utilizes FMCW lidar to collect data from the sensing area and detect small targets to be tracked. It acquires the small targets' velocity, acceleration, position, and micro-Doppler characteristics, and then combines this with image categories provided by a video camera for target tracking. By adding velocity, acceleration, micro-Doppler characteristics, and image categories to the small targets, the feature dimension of the cost matrix is increased, thereby improving the accuracy of Hungarian matching and ultimately enhancing target tracking accuracy.
[0051] Based on the roadside perception system described above, this embodiment provides a target tracking method. Please refer to [reference needed]. Figure 2 The method includes:
[0052] S210. Traffic object identification is performed on the current perception data collected by the FMCW lidar and video camera respectively to obtain target objects with a size smaller than the size threshold.
[0053] S220. Based on the current perception data of the FMCW lidar, acquire the target velocity, target acceleration, target micro-Doppler features, and target position of the target object;
[0054] S230. Obtain the image category of the target object based on the current perception data of the video camera;
[0055] S240. Construct the cost matrix of the Hungarian algorithm based on target velocity, target acceleration, target micro-Doppler features, target position, and image category;
[0056] S250. The target object is tracked using the Kalman filter algorithm and the Hungarian algorithm.
[0057] In practical applications, both point cloud-based and image-based object detection have poor accuracy in identifying pedestrians and non-motorized vehicles, especially at a distance, often resulting in false detections, misidentifying non-motorized vehicles as pedestrians or vice versa. To reduce the impact of category detection on target object tracking, S210 selects small targets for tracking based on the size of the traffic object, rather than directly selecting the tracking object based on the object recognition type. Furthermore, the tracking trajectory can be used to further confirm whether the traffic object is a pedestrian or a non-motorized vehicle. The size threshold involved in S210 can be set based on empirical data to ensure that the selected target objects include pedestrians and non-motorized vehicles while excluding motorized vehicles and other interfering objects.
[0058] Following S210, S220 and S230 are executed to acquire tracking parameters. Two types of information can be extracted from the current sensing data acquired by the FMCW lidar: Doppler information and point cloud information. S220 further processes the Doppler information to obtain the target velocity, target acceleration, and target micro-Doppler features. The point cloud is voxelized to obtain 3D point cloud voxel information, which is then used for target recognition to obtain the target position.
[0059] ① Target velocity and target acceleration
[0060] Specifically, the S220 can generate a range-Doppler map based on the extracted Doppler information, and obtain the target velocity and acceleration of the target object based on the range-Doppler map. The range-Doppler map describes the distance and velocity between the target object and the radar in the radar data frame. The range-Doppler map can be obtained by sequentially performing range-dimensional windowing, FFT, Doppler-dimensional windowing, and FFT on the radar data frame. First, to reduce spectral leakage, windowing is performed on the range dimension of the FMCW radar data frame, followed by FFT processing on the range dimension, thus obtaining the target object's range information. Next, windowing is performed on the Doppler dimension, and then FFT processing is performed on the Doppler dimension, thus obtaining the target object's velocity information, finally yielding the range-Doppler map.
[0061] Assume that the radar data frame is represented by the following formula:
[0062]
[0063] Where L represents the number of chirps in a radar data frame, K represents the number of sampling points per chirp, and A m f represents the amplitude of the signal reflected back from the target. r f represents the range frequency received from the target. DThe frequency is denoted by the radial velocity of the target, and j represents the imaginary unit.
[0064] The final distance-Doppler map obtained through distance Vega window processing, FFT, Doppler Vega window processing, and FFT can be represented as:
[0065]
[0066] Based on the range-Doppler image, the distance and velocity of the target in front of the radar corresponding to the peak position of the two-dimensional FFT can be obtained, and then the acceleration can be obtained based on the distance, velocity and time interval.
[0067] ② Target micro-Doppler characteristics
[0068] The S220 can generate a time-Doppler spectrum map based on the constructed distance-Doppler map, and then obtain the signal statistical features related to the small target in the time-Doppler spectrum map, and use its signal statistical features as the target micro-Doppler features.
[0069] Each element in the range-Doppler map is called a "range cell". For each range-Doppler map, the range cells are summed along the range axis to form a vector e. Combining n temporally consecutive frames forms a time-Doppler spectrogram E with a time length of n frames. To obtain the signal statistical characteristics related to small targets, this embodiment defines a time window T containing Z temporally consecutive frames. w Based on this time window, the following four signal statistical characteristics were obtained.
[0070] (1) Trunk Doppler frequency x1, used to characterize the trunk activity characteristics of pedestrians. Trunk activity characteristics such as speed are very basic but important information; trunk speed varies greatly among different pedestrians and for different activities. The trunk Doppler frequency x1 can be expressed as follows:
[0071]
[0072] Among them, dopplerArgmax(e i V represents the Doppler frequency shift corresponding to the last maximum signal strength of vector e. i λ represents the speed of torso movement in a frame, and λ represents the wavelength.
[0073] (2) The total bandwidth of the Doppler signal, x2, is used to characterize the overall activity characteristics of the human body. This characteristic is related to the movement speed of the human limbs and can be calculated using the following formula:
[0074] x2 = max(upperEnv(E) Tw ))-min(lowerEnv(E Tw ))
[0075] Among them, E Tw It is a time-Doppler spectrum with a time window size of Tw, upperEnv(E Tw ) and lowerEnv(E Tw The numbers ) represent extracting the upper envelope and lower envelope within the time window, respectively.
[0076] (3) The trunk Doppler bandwidth x3 represents the Doppler bandwidth without a Doppler effect, which can be expressed as:
[0077] x3 = avg(upperEnv(E Tw ))-avg(lowerEnv(E Tw ))
[0078] (4) The limb movement cycle x4, which corresponds to the swing rate of the pedestrian's arms and legs, can be expressed as:
[0079]
[0080] By statistically analyzing the torso Doppler frequency, total bandwidth of the Doppler signal, torso Doppler bandwidth, and limb movement period of the target object using time-Doppler spectrograms, one or more of these features are used as micro-Doppler features of the target object for tracking and matching small targets. The torso Doppler frequency effectively distinguishes between pedestrians, while the total bandwidth of the Doppler signal effectively distinguishes between pedestrians and non-motorized vehicles, improving matching accuracy. Furthermore, the torso Doppler bandwidth and limb movement period allow for more precise differentiation of pedestrians, further improving pedestrian matching accuracy.
[0081] ③Target location
[0082] S220 acquires a 3D point cloud from the current sensing data. After allocating the 3D point cloud to different voxels, it performs random sampling and normalization of points within each voxel. For each non-empty voxel, it uses several VFE (Voxel Feature Encoding) layers to extract local features, obtaining voxel-wise features. Then, it uses 3D Convolutional Middle Layers to further abstract the features, increasing the receptive field and learning geometric spatial representation. Finally, it uses RPN (Region Proposal Network) to classify, detect, and regress the position of objects, obtaining the target position of the target object.
[0083] While S220 acquires the matching parameters, S230 performs video preprocessing on the current sensing data captured by the video camera to obtain an RGB image synchronized with the current sensing data of the FMCW. Preprocessing includes image extraction, filtering, image enhancement, image differencing, and other image processing, as well as temporal and spatial synchronization processing. Image recognition is then performed on the preprocessed RGB image to obtain the image category of the target object. Furthermore, S230 can also extract the image appearance features of the target object, such as color and shape, as features required for Hungarian matching.
[0084] After obtaining the matching parameters in S220 and S230, S240 constructs the cost matrix of the Hungarian algorithm based on the target velocity, target acceleration, target micro-Doppler features, target position, and image category. Based on the aforementioned cost matrix, S250 tracks the target object using both the Kalman filter algorithm and the Hungarian algorithm.
[0085] Specifically, target velocity, target acceleration, target position, target micro-Doppler features, and image category can be used as inputs to the Hungarian algorithm to calculate the Mahalanobis distance between the predicted and detected values of the target object. The Mahalanobis distance is then used as the cost matrix for matching using the Hungarian algorithm, and tracking is performed based on the matching results. However, this approach is not very effective at judging attributes such as micro-Doppler features and image category, and cannot fully utilize the characteristics of these attributes.
[0086] To optimize the matching process of the Hungarian algorithm, this embodiment calculates the Mahalanobis distance between the predicted and detected values of the target object based on the target velocity, target acceleration, and target position; it calculates the cosine distance between the predicted and detected attributes of the target object based on the target micro-Doppler features and image category, or based on the target micro-Doppler features, image category, and image appearance features; and it constructs a cost matrix based on the Mahalanobis and cosine distances. By reconstructing the cost matrix of the Hungarian algorithm and adding target micro-Doppler features and image category and / or image appearance features, feature fusion matching of radar and video is achieved, greatly increasing the accuracy of matching. Furthermore, the cost matrix of the Hungarian algorithm no longer uses only Mahalanobis distance, but combines Mahalanobis and cosine distances, which is effective for both near and far target matching, greatly reducing the ID jump rate during target tracking.
[0087] The Hungarian algorithm is used to match the state predicted by the Kalman filter algorithm with the new detection results. Information such as RGB image classification (det.label), target position (det.xy), target velocity (det.vel), target acceleration (det.acc), and target micro-Doppler features (det.dop) are integrated into the matching strategy, and two matching metrics are combined.
[0088] Mahalanobis distance metric:
[0089]
[0090] Where d j S represents the j-th detection value, including det.xy, det.vel, and det.acc. i It is the covariance matrix of the observation space at the current moment, predicted by the trajectory Kalman filter algorithm. i It represents the predicted value of the track at the current moment corresponding to the detected value. The entire formula above represents the matching degree between the j-th detected value and the i-th track.
[0091] Cosine distance metric:
[0092]
[0093] Wherein, for each target object, the detection value d j Calculate the detection attribute descriptor r j With the predicted attribute descriptor r k The cosine distance between them is used to detect attributes including image category (det.label), target micro-Doppler features (det.dop), and / or image appearance features. Furthermore, for each trajectory k, all nearest attribute descriptors R are stored. i Finally, the minimum cosine distance between the i-th trajectory and the j-th detected attribute was measured.
[0094] The two matching metrics are combined to obtain the overall matching index:
[0095] c i,j =λd (1) (i,j)+(1-λ)d (2) (i,j)
[0096] Where λ is a preset hyperparameter that controls the ratio of Mahalanobis distance to cosine distance. The overall matching index c i,j They are complementary. Specifically, Mahalanobis distance provides information on the possible position, velocity, and acceleration of the target, which is useful for short-term prediction; cosine distance considers more the appearance features of the prediction and trajectory information, and is particularly useful for recovering the ID after long-term occlusion when the displacement of the tracked object is small.
[0097] Based on the above overall matching index, when S250 tracks the target object using the Kalman filter algorithm and the Hungarian algorithm, it performs a weighted summation of the Mahalanobis distance and the cosine distance based on preset hyperparameters, and determines whether the sum obtained by the weighted summation is greater than the first matching threshold, i.e., the overall matching threshold. If it is greater, it indicates that the target object matches the current trajectory, i.e., matches the tracking object corresponding to the trajectory; otherwise, it does not match.
[0098] To address the aforementioned mismatch, existing technologies assume that the target object and the tracked object are not the same target. This embodiment, considering factors such as occlusion, employs a different approach than existing technologies. It further determines whether the cosine distance is greater than a second matching threshold, which is greater than the first matching threshold. If the cosine distance is greater than the second matching threshold, the target object is confirmed to match the current trajectory; otherwise, there is no match. This method is particularly suitable for pedestrian tracking based on micro-Doppler features.
[0099] After matching, the prediction part of the Kalman filter algorithm is updated. After obtaining the prediction result for the current state, the Kalman filter algorithm also needs the measured value of the current state. Using the predicted and measured values, the optimal estimate for the current state k can be obtained. After obtaining the optimal estimate for state k, to ensure the Kalman filter continues running until the system process ends, the covariance in state k needs to be updated: when the system enters the next state, the algorithm can continue its autoregressive operation. For each trajectory k, the number of matching frames is calculated from the time of the last first match; the count is incremented whenever a match is found during the Kalman prediction period. A lifetime threshold is also set; if no match is found within this time, the tracked object is considered to have left the scene and is deleted from the trajectory.
[0100] In the above embodiments, FMCW lidar and video cameras are used as roadside perception devices. For target objects smaller than a size threshold, the velocity, acceleration, and position information of the target object are extracted based on the perception data of the FMCW lidar, while also extracting the target's micro-Doppler features. The image category of the target object is obtained based on the perception data of the video camera. The cost matrix of the Hungarian algorithm is constructed based on the target object's velocity, acceleration, position, micro-Doppler features, and image category, which is the input of the reconstructed Hungarian algorithm. This makes the matching features closer to the target object itself during matching. The target object is then tracked using the Kalman filter algorithm and the reconstructed input Hungarian algorithm, thereby greatly improving the accuracy of target object tracking and solving the technical problem of low accuracy in pedestrian and / or non-motorized vehicle detection and tracking in the prior art. Furthermore, this embodiment also distinguishes between pedestrians and pedestrians, and pedestrians and non-motorized vehicles by using the torso Doppler frequency and the total bandwidth of the Doppler signal, further improving the tracking accuracy of small targets on the road.
[0101] based on Figure 2 This embodiment provides a target tracking method and a corresponding target tracking device applied to a roadside perception system. The roadside perception system includes at least one set of FMCW lidar and video cameras. The FMCW lidar and video cameras are located on the roadside and their perception areas overlap. Please refer to [reference needed]. Figure 3 The device includes:
[0102] The identification unit 31 is used to identify traffic objects from the current perception data collected by the FMCW lidar and the video camera, respectively, and to obtain target objects with a size smaller than a size threshold.
[0103] The feature extraction unit 32 is used to obtain the target velocity, target acceleration, target micro-Doppler features and target position of the target object based on the current perception data of the FMCW lidar. The target micro-Doppler features include the torso Doppler frequency and the total bandwidth of the Doppler signal.
[0104] The identification unit 31 is also used to obtain the image category of the target object based on the current perception data of the video camera;
[0105] Construction unit 33 is used to construct the cost matrix of the Hungarian algorithm based on the target velocity, the target acceleration, the target microDoppler features, the target position, and the image category;
[0106] The tracking unit 34 is used to track the target object using the Kalman filter algorithm and the Hungarian algorithm.
[0107] As an optional implementation, the target micro-Doppler features may further include: trunk Doppler bandwidth and limb movement cycle.
[0108] As an optional implementation, the building unit 33 is specifically used for:
[0109] The Mahalanobis distance between the predicted and detected values of the target object is calculated based on the target velocity, the target acceleration, and the target position; the cosine distance between the predicted and detected attributes of the target object is calculated based on the target micro-Doppler features and the image category; and the cost matrix is constructed based on the Mahalanobis distance and the cosine distance.
[0110] As an optional implementation, the tracking unit 34 is also used for:
[0111] Based on the previous tracking trajectory of the current sensing data, the predicted value and the predicted attribute are calculated using the Kalman filter algorithm; based on the cost matrix, the target object and the tracking trajectory are matched using the Hungarian algorithm, and tracking is performed based on the matching result.
[0112] When matching the target object and the tracking trajectory using the Hungarian algorithm based on the cost matrix, the Mahalanobis distance and the cosine distance can be weighted and summed based on preset hyperparameters; it is determined whether the sum obtained by the weighted summation is greater than a first matching threshold; if it is greater, it is confirmed that the target object and the tracking trajectory match; if it is not greater, it is determined whether the cosine distance is greater than a second matching threshold, the second matching threshold being greater than the first matching threshold; if the cosine distance is greater than the second matching threshold, it is confirmed that the target object and the tracking trajectory match.
[0113] Regarding the apparatus in the above embodiments, the specific manner in which each unit performs its operation has been described in detail in the embodiments related to the method, and will not be elaborated upon here.
[0114] Figure 4 This is a block diagram illustrating an electronic device 400 for implementing a target tracking method according to an exemplary embodiment. For example, the electronic device 400 may be an industrial control computer, a computer, an edge server, an edge computing device, etc.
[0115] Reference Figure 4 The electronic device 400 may include one or more of the following components: a processing component 402, a memory 404, a power supply component 406, an input / display (I / O) interface 408, and a communication component 410.
[0116] Processing component 402 typically controls the overall operation of electronic device 400, such as operations associated with data calculation, control, command issuance, and camera triggering. Processing component 402 may include one or more processors 420 to execute instructions to complete all or part of the steps of the methods described above. Furthermore, processing component 402 may include one or more modules to facilitate interaction between processing component 402 and other components.
[0117] Memory 404 is configured to store various types of data to support the operation of device 400. Examples of such data include instructions for any application or method operating on electronic device 400, image data, associated data, configuration data, etc. Memory 404 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.
[0118] Power supply component 406 provides power to various components of electronic device 400. Power supply component 406 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 400.
[0119] Communication component 410 is configured to facilitate wired or wireless communication between electronic device 400 and other devices. Electronic device 400 can access wireless networks based on communication standards, such as WiFi, 4G, or 5G, or combinations thereof. In one exemplary embodiment, communication component 410 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, communication component 410 also includes a near-field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
[0120] In an exemplary embodiment, the electronic device 400 may be one or more application-specific integrated circuits (ASICs).
[0121] Implemented by an ASIC, digital signal processor (DSP), digital signal processing device (DSPD), programmable logic device (PLD), field-programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic component for performing the above methods.
[0122] In an exemplary embodiment, a non-transitory computer-readable storage medium including instructions is also provided, such as a memory 404 including instructions, which can be executed by a processor 420 of an electronic device 400 to perform the above-described method. For example, the non-transitory computer-readable storage medium may be a ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, or optical data storage device, etc. When the instructions in this non-transitory computer-readable storage medium are executed by the processor 420 of the electronic device 400, the point cloud data processing method in the above embodiments can be implemented.
[0123] Other embodiments of the invention will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention that follow the general principles of the invention and include common knowledge or customary techniques in the art not disclosed in these embodiments. The specification and embodiments are to be considered exemplary only, and the true scope and spirit of the invention are indicated by the following claims.
[0124] It should be understood that the present invention is not limited to the precise structure described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of the present invention is limited only by the appended claims. The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A target tracking method, characterized in that, This method is applied to a roadside sensing system, which includes at least one set of FMCW lidar and video cameras, wherein the FMCW lidar and the video cameras are located on the roadside and their sensing areas overlap. The method includes: Traffic object identification is performed on the current perception data collected by the FMCW lidar and the video camera respectively to obtain target objects with a size smaller than a size threshold; Based on the current sensing data of the FMCW lidar, the target velocity, target acceleration, target micro-Doppler features, and target position of the target object are obtained. The target micro-Doppler features include the torso Doppler frequency and the total bandwidth of the Doppler signal. The image category of the target object is obtained based on the current perception data of the video camera; The cost matrix of the Hungarian algorithm is constructed based on the target velocity, target acceleration, target micro-Doppler features, target location, and image category. Specific implementation follows: The Mahalanobis distance between the predicted and detected values of the target object is calculated based on the target velocity, the target acceleration, and the target position. Calculate the cosine distance between the predicted attribute and the detected attribute of the target object based on the target micro-Doppler features and the image category; The cost matrix is constructed by weighted summation of the Mahalanobis distance and the cosine distance based on preset hyperparameters. The target object is tracked using the Kalman filter algorithm and the Hungarian algorithm.
2. The target tracking method as described in claim 1, characterized in that, The target micro-Doppler features also include: trunk Doppler bandwidth and limb movement cycle.
3. The target tracking method as described in claim 1, characterized in that, The tracking of the target object using the Kalman filter algorithm and the Hungarian algorithm includes: Based on the previous tracking trajectory of the current sensing data, the predicted value and the predicted attribute are calculated using the Kalman filter algorithm. Based on the cost matrix, the target object and the tracking trajectory are matched using the Hungarian algorithm, and tracking is performed based on the matching results.
4. The target tracking method as described in claim 3, characterized in that, The matching of the target object and the tracking trajectory based on the cost matrix using the Hungarian algorithm includes: The Mahalanobis distance and the cosine distance are weighted and summed based on preset hyperparameters; Determine whether the sum obtained by the weighted summation is greater than the first matching threshold; If the value is greater than the target object, it is confirmed that the target object matches the tracking trajectory; if the value is not greater than the target object, it is determined whether the cosine distance is greater than the second matching threshold, and the second matching threshold is greater than the first matching threshold. If the cosine distance is greater than the second matching threshold, the target object is confirmed to match the tracking trajectory.
5. A target tracking device, characterized in that, The device is used to implement the method as described in any one of claims 1-4. The device is applied to a roadside sensing system, the roadside sensing system comprising at least one set of FMCW lidar and video cameras, wherein the FMCW lidar and the video cameras are disposed on the roadside and their sensing areas overlap. The device includes: The identification unit is used to identify traffic objects from the current perception data collected by the FMCW lidar and the video camera, respectively, and to obtain target objects with a size smaller than a size threshold. The recognition unit is also used to obtain the image category of the target object based on the current perception data of the video camera; The feature extraction unit is used to obtain the target velocity, target acceleration, target micro-Doppler features and target position of the target object based on the current perception data of the FMCW lidar. The target micro-Doppler features include the torso Doppler frequency and the total bandwidth of the Doppler signal. A construction unit is used to construct the cost matrix of the Hungarian algorithm based on the target velocity, the target acceleration, the target microDoppler features, the target position, and the image category; The tracking unit is used to track the target object using the Kalman filter algorithm and the Hungarian algorithm.
6. The target tracking device as described in claim 5, characterized in that, The target micro-Doppler features also include: trunk Doppler bandwidth and limb movement cycle.
7. The target tracking device as described in claim 5, characterized in that, The building unit is specifically used for: The Mahalanobis distance between the predicted and detected values of the target object is calculated based on the target velocity, the target acceleration, and the target position. Calculate the cosine distance between the predicted attribute and the detected attribute of the target object based on the target micro-Doppler features and the image category; The cost matrix is constructed based on the Mahalanobis distance and the cosine distance.
8. An electronic device, characterized in that, It includes a memory and one or more programs, wherein one or more programs are stored in the memory and are configured to be executed by one or more processors to implement the method as described in any one of claims 1-4.
9. A computer-readable storage medium, characterized in that, It stores a computer program that, when executed by a processor, implements the steps of the method described in any one of claims 1-4.