A method and system for roadside multi-view multi-sensor spatial synchronization based on global positioning

By automatically optimizing the parameters of multiple cameras through a global positioning method and combining radar data to achieve spatial synchronization of multiple perspectives and multiple sensors, the problem of synchronization accuracy and deployment complexity of roadside multi-sensor systems in large-scale traffic scenarios has been solved, thereby improving positioning accuracy and system scalability.

CN117132861BActive Publication Date: 2026-06-26BEIJING UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING UNIV OF TECH
Filing Date
2023-09-03
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing roadside multi-sensor systems cannot adapt to multi-view, multi-sensor spatial synchronization in large-scale traffic scenarios. Furthermore, newly added equipment requires manual calibration or has insufficient accuracy in automatic calibration, resulting in low spatial synchronization accuracy and a cumbersome deployment process.

Method used

A global positioning method is adopted, which automatically optimizes the parameters of multiple cameras through monocular positioning, radar data preprocessing and camera parameter fine-tuning. Combined with radar data, spatial synchronization of multi-view cameras and radar is achieved, reducing manual operation and errors.

Benefits of technology

It achieves unmanned spatial synchronization of multiple perspectives and sensors in roadside scenarios, improving positioning accuracy and system scalability, reducing deployment and maintenance costs, and is suitable for a wide range of traffic scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117132861B_ABST
    Figure CN117132861B_ABST
Patent Text Reader

Abstract

The application discloses a kind of method and system of road side multi-view multi-sensor space synchronization based on global positioning, first automatically estimate camera parameter, the angle and distance between vehicle and camera are calculated using geometric-based positioning method, obtain the global coordinates of vehicle. The 3D vehicle information detected by radar is preprocessed to obtain structured vehicle positioning data. Determine the fusion area by multi-sensor coverage area estimation, then initialize the matching of multiple objects in the area to obtain the initial correlation data of the same object between radar and camera. The camera parameters and positioning accuracy of multiple cameras are simultaneously optimized using radar real positioning data. The positioning conversion matrix obtained using the optimized pseudo-real camera parameters realizes the space synchronization between multi-view camera and radar. In the case of multiple uncalibrated cameras at the roadside, space synchronization with radar is realized at one time without any manual operation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of machine learning, and specifically to an optimization method for gradient descent, particularly a method and system for roadside multi-view, multi-sensor spatial synchronization based on global localization. Background Technology

[0002] Multimodal sensor fusion plays a crucial role in achieving high-quality roadside perception in the field of intelligent traffic monitoring. Although various integrated radar-visual fusion devices are now widely used, there are still a large number of scattered radar and multi-view camera devices, such as in intersection scenarios. To avoid the waste of directly replacing these devices, an additional system must be deployed in roadside scenarios to achieve spatial synchronization of these multi-view cameras and radars.

[0003] Unlike onboard sensors in autonomous vehicles, roadside sensors have different spatial coordinates and require necessary spatial alignment for data fusion. Spatial alignment is a prerequisite for data fusion from discrete sensors, which involves projecting objects sensed by multiple sensors onto the same spatial coordinate system. Spatial alignment of camera and radar (including lidar and millimeter-wave radar in this invention) data is achieved through joint calibration, which typically requires camera calibration before joint calibration.

[0004] For example, most current camera calibration methods use Zhang Zhengyou's calibration method or a four-point solution for the homography matrix, both of which require manual operation. The former requires inserting objects, which is not very practical in traffic scenarios; the latter requires manually selecting at least four sets of corresponding points in the map and image, a process that often involves human error. Besides these two manual methods, there are also many automatic calibration methods, such as traditional methods based on vehicle vanishing point detection suitable for traffic scenarios. However, this method requires roads to be straight or vehicle trajectories to be approximately straight, making it difficult to apply to large-scale traffic scenarios. On the other hand, while current deep learning-based camera calibration methods have high accuracy on datasets, the domain adaptation problem of training data means that the accuracy of these methods in real-world traffic scenarios still falls short of application requirements.

[0005] Besides camera calibration issues, current roadside multi-sensor spatial synchronization methods can only be calibrated based on pairs of cameras and radars because their spatial coordinate systems are limited to their respective local coordinate systems. With the widespread deployment of cameras in traffic scenarios, there is a growing demand for simultaneous spatial synchronization of multiple cameras and radars. However, past methods based on one-to-one group synchronization and alignment require repeated calibration of multiple cameras and radars in these scenarios. Furthermore, the addition of new equipment necessitates modifications to their combined spatial coordinate systems, resulting in a cumbersome process and significant cumulative errors from extensive manual operations, further reducing spatial synchronization accuracy.

[0006] Analyzing the above background, it is evident that current roadside spatial synchronization systems cannot meet the requirements of simultaneous spatial synchronization across multiple perspectives and sensors in a given scenario. Furthermore, manual calibration is mandatory when adding new cameras, or the accuracy obtained from automatic calibration is insufficient to support downstream safety applications within intelligent transportation systems. Therefore, a system is urgently needed to address these two issues. Summary of the Invention

[0007] The technical objective of this invention is to propose a method and system for roadside multi-view, multi-sensor spatial synchronization based on global positioning. The system includes: monocular positioning, radar data preprocessing, camera parameter fine-tuning, and multi-view spatial synchronization. Monocular positioning first automatically estimates camera parameters, then uses a geometry-based positioning method to calculate the angle and distance between the vehicle and the camera, and combines this with the camera's own latitude and longitude to obtain the vehicle's global coordinates. Radar data preprocessing preprocesses the 3D vehicle information detected by radar to obtain structured vehicle positioning data. Camera parameter fine-tuning determines the fusion region through multi-sensor coverage area estimation, then establishes initial matching for multiple objects within the region to obtain initial association data for the same object between the radar and the camera, and finally uses the radar's real positioning data to simultaneously optimize the camera parameters and positioning accuracy of multiple cameras. Multi-view spatial synchronization uses a positioning transformation matrix obtained from the optimized pseudo-real camera parameters to achieve spatial synchronization between the multi-view cameras and the radar. Applying this invention, spatial synchronization with radar can be achieved simultaneously with multiple uncalibrated roadside cameras without any manual operation.

[0008] This invention provides a roadside multi-view, multi-sensor spatial synchronization system based on global positioning, the technical solution of which is as follows:

[0009] An automatic optimization method for roadside camera parameters based on global localization cues, the method comprising:

[0010] Step A, monocular localization: camera parameters are obtained through automatic camera calibration, and then the localization results of vehicles or pedestrians in the scene are obtained by combining the camera's own latitude and longitude, camera parameters and geometric calculation methods.

[0011] Step B: Radar data preprocessing to obtain structured radar detection results;

[0012] Step C: Fine-tuning of camera parameters, using a small amount of real radar positioning results to optimize camera positioning accuracy;

[0013] In step B, radar data that is not within the camera's detection range is filtered by combining the camera's own heading angle.

[0014] Step C includes steps A and B. After obtaining the relative distance and angle between the vehicle and the camera through a positioning algorithm, and combining this with the structured radar data, vehicles not within the shared coverage area of ​​both the radar and the camera are filtered out based on their vehicle angle. Then, an initial matching process is established between the radar tracking results and the camera tracking results within the radar and camera coverage area (fusion area).

[0015] In step C, the positioning results are optimized based on the associated data obtained after initial matching.

[0016] In step C, while optimizing the positioning results, the camera parameters are indirectly optimized. These camera parameters include camera altitude, camera latitude and longitude, heading angle, camera focal length, pitch angle, and roll angle.

[0017] A roadside multi-view multi-sensor spatial synchronization system based on global positioning is disclosed. The system simultaneously optimizes the parameters of multiple cameras and uses global positioning and radar for spatial synchronization, including: monocular positioning, radar data preprocessing, camera parameter fine-tuning, and multi-view multi-sensor spatial synchronization.

[0018] The monocular positioning method combines the camera's own latitude and longitude, automatically calibrated camera parameters, and geometric calculations to obtain the positioning results of the same object in a scene under different cameras.

[0019] The radar data preprocessing involves using information about objects that have been detected and tracked by the radar as formatted data for initial matching.

[0020] The camera parameter fine-tuning utilizes a small amount of radar data constructed after initial matching to optimize monocular positioning accuracy and camera parameters.

[0021] The multi-view spatial synchronization constructs a spatial transformation matrix for the multi-view camera, converts the coordinates of the multi-view camera into the global coordinate system in one go, and uses the global positioning result to achieve spatial synchronization between multiple sensors.

[0022] The radar data preprocessing involves using the radar data obtained after detecting, tracking, and locating the vehicle as structured data to establish an initial association with the camera tracking and positioning results.

[0023] The camera parameter fine-tuning, after estimating the coverage area (fusion area), first requires initial matching of radar tracking results and camera tracking results.

[0024] Initial matching is performed after the range of vehicle data to be fused is determined. The fusion region is determined by the camera heading angle and the corresponding angle threshold range of the camera.

[0025] Excluding vehicle data outside the fusion area is achieved through angle filtering calculated from the vehicle radar positioning results and the camera's own position information.

[0026] The camera parameter fine-tuning can include optimizing parameters such as the camera's own latitude and longitude, camera heading angle, camera altitude, camera focal length, and camera pitch angle.

[0027] The spatial synchronization involves multiple cameras combining their respective spatial transformation matrices to achieve spatial alignment between the multiple cameras and the radar.

[0028] The camera calibration method described belongs to the automatic camera calibration method, and the self-calibration method does not limit the specific implementation method.

[0029] The technical solution provided by this invention can locate vehicles in a scene based on existing roadside multi-view cameras and one or more radars. On the basis of camera self-calibration, the calibrated camera parameters are combined with the positioning algorithm to locate vehicles in the scene. At the same time, a small amount of radar positioning data is used to optimize the camera positioning accuracy and parameters. Finally, the optimized positioning cues are used to achieve unified spatial synchronization between multiple views and multiple sensors.

[0030] The proposed solution not only significantly reduces the deployment process and manual operation in roadside positioning scenarios, as well as the resulting human error, but also improves the accuracy of roadside monocular positioning. Furthermore, it utilizes the positioning results to achieve spatial synchronization between multiple perspectives and sensors. This solution simplifies the roadside multi-sensor deployment process, enhances scalability, and lowers maintenance costs, making it applicable to a wide range of traffic scenarios. Attached Figure Description

[0031] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in the present invention. For those skilled in the art, other drawings can be obtained based on these drawings.

[0032] Figure 1 This is a schematic diagram of the system structure of the present invention;

[0033] Figure 2 This is a schematic diagram of the fusion region estimation method according to an embodiment of the present invention;

[0034] Figure 3 This is a flowchart illustrating the positioning accuracy optimization and camera parameter optimization in an embodiment of the present invention. Detailed Implementation

[0035] The purpose of this invention is to unify the multi-view, multi-sensor coordinate systems of the roadside into a global coordinate system, and to achieve spatial synchronization of multiple views and multiple sensors in one go using global positioning information.

[0036] To achieve the above solutions, it is necessary to improve monocular positioning accuracy while solving the camera calibration problem. To address these issues, this invention provides a method and system for roadside multi-view, multi-sensor spatial synchronization based on global positioning. Furthermore, the method employed in this invention is based on the assumption that radar detection possesses global vehicle positioning capabilities, an assumption that is generally met in current traffic scenarios.

[0037] The specific steps in this system are as follows:

[0038] Step 1: Transfer the deep learning-based camera calibration algorithm pre-trained on the public camera calibration dataset to the roadside scene to predict the camera parameters of the scene, including the field of view, pitch angle, and roll angle.

[0039] Step 2: After obtaining the camera angle, combine an initial camera height value, camera latitude and longitude values, and global heading angle with the target detection and tracking algorithm to calculate the final vehicle positioning result by obtaining the pixel position of the bottom center point of the vehicle detection box. Refer to the patent number for the specific positioning formula; the camera latitude and longitude and heading angle can be directly obtained from the map.

[0040] Step 3: Region fusion estimation, assuming the camera heading angle is ω h ω d Represented as: the clockwise angle between the distance vector between the vehicle and the camera and the map's true north direction vector. The camera fusion region angle threshold is set to Ω, and the camera's field of view within the fusion region is represented by ω. h ±Ω. Ultimately, it can be achieved through ω. d Filter out vehicles that are not within the camera's field of view, such as... Figure 2 As shown: ω d ∈ω h ±Ω.

[0041] Step 4: Process the vehicle tracking data obtained from radar detection to obtain the radar tracking ID of each vehicle and its corresponding latitude and longitude values. Combine the camera's own latitude and longitude with the vehicle's latitude and longitude to generate the distance and angle between the vehicle and the camera. The angle here is ω. d .

[0042] Step 5: Camera tracking data, including the pixel tracking ID of each vehicle and its corresponding pixel position (x, y). The radar tracking ID and pixel tracking ID are associated using the Kuhn-Munkres algorithm. This yields the final dataset showing the true radar distance, angle, and latitude / longitude changes corresponding to pixel changes within a continuous timeframe in the fused region, along with estimated distance, estimated angle, and estimated latitude / longitude changes calculated using a monocular localization algorithm.

[0043] Step 6: As Figure 3The loss between the distance and angle predicted by the positioning model and the actual distance and angle values ​​obtained by the radar is calculated by backpropagation and stochastic gradient descent. The camera parameters are continuously updated by the backpropagated gradient loss to improve positioning accuracy. These represent angle loss and distance loss, respectively.

[0044] Since there is a definite relationship between the camera focal length and the field of view, the optimization process only calculates the field of view gradient and uses it to update the angle focal length. Assume the predicted angle error is... The true value is a i The predicted distance estimation error is The true value is di. Given n samples, the geolocation loss is... Determined by the following formula:

[0045]

[0046] Step 7: Combine the parameters of the i-th camera and represent them as A. i The pixel position of an object within the fusion region of the i-th camera is determined by p. i =(x i y i Given the corresponding geographical location q, i It can be represented as q i =(l i g i If a total of n cameras are deployed in the scene, the parameter combination of the multi-view camera system is represented as (A1, A2, ..., A...). n The overall process of multi-view spatial alignment can be represented as follows:

[0047]

[0048] Using the steps described above, we can simultaneously calibrate and optimize the parameter combinations of all cameras in the scene, and simultaneously achieve unified spatial synchronization based on the global coordinate system among multiple views and multiple sensors.

[0049] The above description is merely a specific embodiment of the present invention. It should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. A method for automatic optimization of roadside camera parameters based on global localization cues, characterized in that, The method includes: Step A, monocular localization: camera parameters are obtained through automatic camera calibration, and then the localization results of vehicles or pedestrians in the scene are obtained by combining the camera's own latitude and longitude, camera parameters and geometric calculation methods. Step B: Radar data preprocessing to obtain structured radar detection results; Step C involves fine-tuning camera parameters and optimizing camera positioning accuracy using a small amount of real radar positioning results. Step C includes steps A and B: after obtaining the relative distance and angle between the vehicle and the camera through a positioning algorithm and the structured radar data, vehicles not covered by both radar and camera are filtered out by the vehicle angle. Then, an initial matching is established between radar tracking results and camera tracking results in the radar and camera coverage area, i.e., the fusion area.

2. The method for automatic optimization of roadside camera parameters based on global positioning cues according to claim 1, characterized in that, In step B, radar data that is not within the camera's detection range is filtered by combining the camera's own heading angle.

3. The method for automatic optimization of roadside camera parameters based on global positioning cues according to claim 1, characterized in that, In step C, the positioning results are optimized based on the associated data obtained after initial matching.

4. The method for automatic optimization of roadside camera parameters based on global positioning cues according to claim 1, characterized in that, In step C, while optimizing the positioning results, the camera parameters are indirectly optimized. These camera parameters include camera altitude, camera latitude and longitude, heading angle, camera focal length, pitch angle, and roll angle.