Infrastructure-assisted real-time hd map construction method for autonomous driving

By using roadside infrastructure-assisted geometry construction and topology estimation, combined with lightweight map representation, the high cost and real-time performance issues of existing HD map construction methods are resolved. This enables high-precision and widely covered HD map updates, improving the decision-making accuracy of autonomous vehicles and traffic efficiency.

CN122228534APending Publication Date: 2026-06-16THE CHINESE UNIVERSITY OF HONG KONG

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
THE CHINESE UNIVERSITY OF HONG KONG
Filing Date
2024-09-27
Publication Date
2026-06-16

Smart Images

  • Figure CN122228534A_ABST
    Figure CN122228534A_ABST
Patent Text Reader

Abstract

A method and system for infrastructure-aided real-time HD map construction are provided. The method includes constructing geometry to project data to a bird's eye view (BEV) space, making a topological estimation based on trajectories, and performing map fusion at a vehicle end. The constructing geometry includes performing BEV projection, feature extraction, BEV instance segmentation, and map vectorization. The performing BEV projection includes obtaining accumulated point clouds and vehicle trajectories, projecting the point clouds and the trajectories to a ground plane in a BEV view, and rasterizing the points to generate a two-dimensional (2D) grid. The performing feature extraction includes extracting one or more features for each cell, and connecting the extracted features to form a feature map in which trajectory direction vectors occupy two channels. The performing BEV instance segmentation includes configuring a two-dimensional convolutional neural network (2D CNN) having a similar UNet structure to perform semantic instance segmentation with the extracted feature map in the BEV.
Need to check novelty before this filing date? Find Prior Art

Description

Cross-references to related applications

[0001] This application claims the benefit of U.S. Provisional Application Serial No. 63 / 586,109, filed on September 28, 2023, the disclosure of which is incorporated herein by reference in its entirety. Background Technology

[0002] 1 Introduction

[0003] Autonomous driving systems promise to revolutionize the transportation industry. High-definition (HD) maps are an integral part of enabling these systems, facilitating perception and navigation of autonomous vehicles in their operating environment. HD maps comprise two core components: geometry and topology

[28] . In this context, geometry encompasses the location and semantics of static physical assets associated with roads, such as lanes, road boundaries, lane dividers, and crosswalks. It is important to emphasize that the HD maps under consideration employ a vectorized representation. Vectorization uses geometric primitives, including lines, curves, and polygons, to encode road geometry rather than raw point clouds. Topology, as the second core component, describes lane connectivity, illustrating how lanes or groups of lanes are interconnected, influenced by predetermined traffic rules and real-time road conditions. By providing a comprehensive and detailed representation of the environment, HD maps enable autonomous vehicles to understand scenarios, map optimal routes, and make context-aware decisions.

[0004] Existing methods for building HD maps can be divided into two main approaches: offline and online. Offline construction typically involves using specialized survey vehicles (each with more than [number missing] sensors) equipped with a combination of high-end sensors (such as cameras, LiDAR, GPS, IMU, and radar). Labor-intensive data collection can cost $200,000 [3, 42]. Using SLAM (Simultaneous Localization and Mapping) technology [43, 53] helps create globally consistent maps, which can then be annotated manually or semi-automatically.

[0005] Online HD map construction methods. HDMapNet

[27] was the first study to introduce the problem of learning HD semantic maps. It encodes features from a single-frame LiDAR point cloud and / or images from surrounding cameras and predicts semantic map elements in a bird's-eye view. STSU[9] proposed an end-to-end method that can simultaneously extract local road network maps and detect objects given only front-facing camera images. VectorMapNet

[29] uses the Transformer module to predict a set of sparse polylines in a bird's-eye view to model the geometry of HD maps. These studies all take onboard sensor observations as input and are therefore limited by physical obstacles (such as obstacle occlusion) and sensing range.

[0006] Infrastructure-assisted vehicle perception. Recent studies have shown the potential of infrastructure in enhancing autonomous driving perception. VI-Eye

[21] proposed a semantic-based point cloud registration method to merge vehicle point clouds with infrastructure point clouds. VIPS

[44] uses graph matching algorithms to fuse object detection results of infrastructure and vehicles. Michael et al.[7] used infrastructure to detect, track and predict vehicle motion to build an environment model, which was then transmitted to the vehicle for motion planning. In these studies, infrastructure was mainly used for dynamic object perception. VI-Map differs from these attempts in that it aims to use infrastructure to build accurate and up-to-date HD maps on the infrastructure side, thereby assisting in the generation of in-vehicle HD maps.

[0007] Major players in the autonomous driving industry, such as TomTom

[47] , HERE

[46] , and Baidu[3], have adopted this approach because it generates very detailed and comprehensive map information. However, this existing practice is too costly. In addition, the maps built are prone to becoming outdated between successive mapping iterations[24, 38]. This dilemma stems from the inherent impracticality of maintaining up-to-date HD maps that cover large areas, as road networks are large and lane connectivity changes frequently

[37] . As a result, most current offline HD maps focus on highways, leaving insufficient coverage of urban roads[6].

[0008] To address the inherent challenges of offline solutions, recent technological advances [9, 22, 27, 29, 30] have explored the concept of online HD map learning. This approach aims to estimate local HD maps in real time based on observations from onboard sensors, such as point clouds acquired from LiDAR or images captured by surrounding cameras. While this strategy reduces reliance on global offline HD maps and offers the potential for more cost-effective and scalable HD map construction, it is not without limitations. These methods are subject to inherent constraints stemming from onboard sensors and dynamic road conditions. These challenges encompass the limitations of sensor field of view (due to factors such as occlusion and rapid movement) and the constraints of varying sensor data quality. Furthermore, online HD maps often lack road topology due to the complexity of inferring such logical information in real time. Therefore, existing online map building schemes can be fragile and incomplete. Summary of the Invention

[0009] Embodiments of the present invention relate to a method and system for constructing HD maps for autonomous driving.

[0010] According to embodiments of the present invention, a method for constructing real-time HD maps for infrastructure assistance is provided. The method includes: constructing geometry to project data into a bird-eye-view (BEV) space; performing topology estimation based on trajectories; and performing map fusion at the vehicle end. Constructing geometry includes performing BEV projection, feature extraction, BEV instance segmentation, and map vectorization. Furthermore, performing BEV projection includes acquiring accumulated point clouds and vehicle trajectories, projecting the point clouds and trajectories onto a ground plane in the BEV view, and rasterizing the points to generate a two-dimensional (2D) mesh. Performing feature extraction includes extracting one or more features for each cell and concatenating the extracted features to form a feature map, in which trajectory direction vectors occupy two channels. Performing BEV instance segmentation includes configuring a two-dimensional convolutional neural network (2D CNN) with a UNet-like structure to perform semantic instance segmentation using the feature map extracted from the BEV. The 2D CNN is configured to take the feature map as input and generate a pixel-wise mask for each individual road element, thereby obtaining an instance mask as output. Furthermore, map vectorization includes vectorizing the map to generate a sparse and compact representation of the HD map. Trajectory-based topology estimation includes establishing and updating the map topology based on accurate vehicle trajectories and lane instance segmentation results obtained from the constructed geometry. On-vehicle map fusion includes merging the HD map received from the constructed geometry and topology estimation with the on-vehicle HD map. Additionally, on-vehicle map fusion includes performing map element mapping, map alignment, and map element refitting.

[0011] In another embodiment of the invention, a computer program product is provided, including a non-transitory computer-executable storage device having computer-readable program instructions thereon. When executed by a computer, the computer-readable program instructions cause the computer to perform a method for infrastructure-assisted real-time HD map construction. The computer-executable program instructions include: constructing geometry to project data onto a bird's-eye view (BEV) space; performing topology estimation based on trajectories; and performing map fusion at the vehicle end. Constructing geometry includes performing BEV projection, feature extraction, BEV instance segmentation, and map vectorization. Furthermore, performing BEV projection includes: acquiring accumulated point clouds and vehicle trajectories, projecting the point clouds and trajectories onto a ground plane in the BEV view, and rasterizing the points to generate a two-dimensional (2D) mesh. Performing feature extraction includes: extracting one or more features for each cell, and concatenating the extracted features to form a feature map, in which trajectory direction vectors occupy two channels. Performing BEV instance segmentation includes: configuring a two-dimensional convolutional neural network (2D CNN) with a UNet-like structure to perform semantic instance segmentation using the feature map extracted from the BEV. The 2D CNN is configured to take feature maps as input and generate pixel-wise masks for each individual road element, thus obtaining instance masks as output. Furthermore, map vectorization is performed to generate a sparse and compact representation of the HD map. Trajectory-based topology estimation involves building and updating the map topology based on accurate vehicle trajectories and lane instance segmentation results obtained from the constructed geometry. On-vehicle map fusion includes merging the HD map received from the constructed geometry and topology estimation with the on-vehicle HD map. Additionally, on-vehicle map fusion includes performing map element mapping, map alignment, and map element refitting. Attached Figure Description

[0012] This patent application or application document contains at least one color drawing. Upon request and payment of the necessary fees, the Patent Office will provide a copy of the published text of this patent or patent application with the color drawing.

[0013] Figure 1 An infrastructure-assisted HD map construction according to an embodiment of the present invention is illustrated.

[0014] Figure 2A and Figure 2B An embodiment of the present invention is shown with an outdated HD map ( Figure 2A ) and new HD maps ( Figure 2B Vehicle behavior extracted from the topology, where Figure 2A The outdated HD map topology fails to reflect vehicle malfunctions; the right-hand image shows a vehicle failing to make a right-turn decision, causing traffic congestion. Figure 2BThe new HD map topology depicts blocked and passable lanes, with the right-hand image showing vehicles changing lanes in advance to passable lanes.

[0015] Figure 3 This is a schematic diagram illustrating how infrastructure point cloud accumulation brings about a clearer perception, according to an embodiment of the present invention.

[0016] Figure 4 This is an illustration of objects extracted from a vehicle trajectory according to an embodiment of the present invention (from left to right: vehicle trajectory, trajectory attributes, and corresponding map elements).

[0017] Figure 5 The system architecture of VI-Map according to an embodiment of the present invention is shown.

[0018] Figure 6 The design of HD map geometry construction on roadside infrastructure according to an embodiment of the present invention is shown.

[0019] Figure 7 The design of HD map fusion on a vehicle according to an embodiment of the present invention is shown.

[0020] Figures 8A-8C A real-world test platform deployed for VI-Map data collection and system evaluation according to an embodiment of the present invention is shown, wherein... Figure 8A The modified vehicle was shown. Figure 8B Mobile roadside infrastructure was shown. Figure 8C The image shows 18 test locations in two cities.

[0021] Figure 9A and Figure 9B The diagram shows a Town 5 map used in the CARLA dataset according to an embodiment of the present invention, wherein... Figure 9A The location of roadside infrastructure is shown. Figure 9B This shows an HD map parsed from an OpenDRIVE file.

[0022] Figure 10 An HD map showing a vehicle passing through a road section with infrastructure, according to an embodiment of the present invention. IoU Fraction.

[0023] Figure 11 The VI-Map, as shown in an embodiment of the present invention, extends the range of online HD maps.

[0024] Figure 12 The VI-Map, as illustrated in an embodiment of the present invention, improves ride comfort.

[0025] Figure 13The VI-Map, as shown in an embodiment of the present invention, improves traffic efficiency.

[0026] Figure 14 The results of HD map construction using different methods in a real-world scene are shown according to embodiments of the present invention.

[0027] Figure 15A and Figure 15B The topology estimation performance of different update strategies according to embodiments of the present invention under different road types is illustrated, wherein... Figure 15A The response and success times are shown. Figure 15B The success rate is shown.

[0028] Figure 16 HD maps showing merged maps generated by different map merging methods according to embodiments of the present invention are illustrated. IoU The score varies with the vehicle positioning error.

[0029] Figure 17 The runtime latency of each step of VIMap on a vehicle under different road types is shown according to an embodiment of the present invention. Detailed Implementation

[0030] Embodiments of the present invention relate to a method and system for constructing real-time HD maps for infrastructure assistance.

[0031] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used herein, the term "and / or" includes any and all combinations of one or more of the associated listed items. As used herein, unless the context clearly indicates otherwise, the singular forms "a," "an," and "the" are intended to also include the plural forms. It will be further understood that, when used in this specification, the terms "comprising" and / or "including" specify the presence of the stated features, steps, operations, elements, and / or components, but do not exclude the presence or addition of one or more other features, steps, operations, elements, components, and / or groups thereof.

[0032] Unless otherwise defined, all terms used herein (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. It will be further understood that terms such as those defined in common dictionaries should be interpreted as having the same meaning as they have in the relevant technical context and the context of this disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly defined herein.

[0033] When the term “about” is used in conjunction with numerical values ​​in this document, it should be understood that the value may be in the range of 90% to 110% of that value, i.e., the value may be + / - 10% of the stated value. For example, “about 1 kg” means from 0.90 kg to 1.1 kg.

[0034] Intelligent roadside infrastructure, or roadside units (RSUs), equipped with sensors and computing units, offer a superior solution for HD map construction. Notably, a key attribute unique to roadside infrastructure is its ability to continuously observe road segments while stationary. This capability overcomes the limitations of both offline and online methods. First, compared to offline methods, uninterrupted continuous observation allows infrastructure to update dynamically evolving HD maps in a timely manner. Second, compared to online technologies, the wide field of view and statically accumulated observations provide infrastructure with comprehensive, unobstructed, and high-quality sensor data.

[0035] According to embodiments of the present invention, a system and method for infrastructure-assisted real-time HD map construction (named VI-Map) are provided, aiming to create and maintain HD maps for autonomous driving using roadside infrastructure. Specifically, the infrastructure utilizes its own sensors (such as LiDAR or 2D / 3D cameras) to construct and refresh the HD map. The vehicle then integrates this map with its own HD map in real time to enhance / update the vehicle's scene understanding (e.g., ...). Figure 1 (As shown). The core idea of ​​VI-Map is to leverage the unique, static, and cumulative observations of roadside infrastructure to achieve accurate and up-to-date HD map construction. Specifically, VI-Map first extracts multiple (e.g., 5) carefully crafted, concise bird's-eye view (BEV) features from the dense point cloud and precise vehicle trajectories captured by the infrastructure, and then uses them for efficient map geometry construction. Next, VI-Map uses the latest vehicle trajectories to estimate and update the current map topology. Finally, VI-Map employs a novel three-stage map fusion algorithm to merge the infrastructure's HD map with the vehicle's HD map. It is worth noting that VI-Map is not intended to replace existing HD map construction methods. Instead, it provides a crucial complementary HD map construction paradigm for autonomous driving by leveraging the increasingly prevalent intelligent roadside infrastructure.

[0036] VI-Map offers several key advantages: (i) It transforms large amounts of unstructured, cumulative 3D data into structured, compact, and concise 2D bird's-eye view (BEV) features. These features can be processed using efficient 2D CNNs, greatly reducing computational overhead and enabling practical deployment on edge devices of infrastructure. (ii) VI-Map generates vectorized HD maps on infrastructure. This vectorized representation is very lightweight, minimizing communication overhead between infrastructure and vehicles. Furthermore, the representation is compatible with the widely adopted industry-standard HD map data format OpenDRIVE

[36] . (iii) VI-Map does not require precise location of roadside infrastructure or time synchronization between vehicles and infrastructure. This significantly lowers the barriers to widespread deployment of roadside infrastructure and adoption of the solution. (iv) VI-Map allows for simple, fast, and flexible deployment because it allows roadside infrastructure nodes to independently build their own HD maps and then merge them into a continuous global vehicle-mounted HD map. Therefore, VI-Map can work with mobile RSUs deployed on complex and rapidly changing road segments where there is a strong need for new HD map updates. Therefore, VI-Map can provide patches for global HD map updates of important road segments, supplementing existing offline and online map building solutions. This brings a highly scalable architecture to infrastructure-assisted HD map building for autonomous driving.

[0037] VI-Map was implemented on a real-world test platform comprising modified passenger vehicles and mobile RSUs deployed across 18 different road segments in two cities, covering up to five road types. Two new datasets were collected using this test platform and a leading autonomous driving simulation platform. Results show that compared to online map-building methods, VI-Map can extend the local HD map extent of vehicles by 3 times and improve map accuracy by 2.8 times, while incurring only a map fusion latency of 42 milliseconds on the vehicle. At the application level, VI-Map improves traffic efficiency on problem road segments by more than 5 times and increases ride comfort by 3.9 times. A video demonstration of VI-Map on the real-world test platform is available at https: / / youtu.be / p2RO65R5Ezg.

[0038] Embodiments of this invention utilize roadside infrastructure to enhance the real-time HD map construction of autonomous vehicles. It leverages the unique cumulative observations of roadside infrastructure to build and maintain accurate and up-to-date HD maps. These HD maps are then fused in real-time with onboard HD maps to form a more comprehensive and up-to-date HD map. This invention has wide applications in autonomous driving and intelligent transportation systems, covering multiple fields such as smart cities, infrastructure-assisted autonomous driving, cooperative perception, and online HD map construction.

[0039] According to embodiments of the present invention, a system and method for infrastructure-assisted real-time HD map construction for autonomous driving comprises three key components: (1) Geometry construction. This module first extracts five carefully crafted concise bird's-eye view (BEV) features from dense point clouds and accurate vehicle trajectories captured by the infrastructure, and then utilizes them for efficient map geometry construction. (2) Topology estimation. This module uses the latest vehicle trajectories to estimate and update the current map topology. The resulting accurate vectorized map geometry, along with the new topology, forms a concise infrastructure HD map. (3) Map fusion. This module is executed inside the vehicle. It receives the infrastructure HD map and merges it with the in-vehicle HD map. This module cleverly utilizes the semantic, proximity, and directional attributes of the vectorized HD map to facilitate rapid integration of the infrastructure and vehicle HD maps, thereby providing real-time HD map support for autonomous driving.

[0040] A system and method for real-time HD map building with infrastructure assistance were tested in an autonomous driving simulator and on five real-world road sections. Experiments show that the system and method can build HD maps with decimeter-level accuracy (up to 0.3 meters) and achieve real-time map fusion (with a latency of up to 42 milliseconds) between vehicles and roadside infrastructure. Compared to state-of-the-art online HD map building methods, this represents a significant improvement in map accuracy and coverage, by 2.8 times and 3 times, respectively. The prototype was validated experimentally and tested in a simulator and on five real-world road sections, achieving high map building accuracy in both scenarios.

[0041] This paper presents a novel HD map building paradigm that leverages roadside infrastructure to enhance real-time HD map construction for autonomous driving. The system and method for infrastructure-assisted real-time HD map building utilize unique observations from roadside infrastructure to construct accurate HD maps at the infrastructure level, which are then fused in real-time with onboard HD maps to form a more comprehensive and up-to-date HD map.

[0042] Materials and Methods

[0043] 2. Case Studies on Motivation

[0044] This section begins with a case study to understand the limitations of existing HD maps on autonomous vehicles (Section 2.1). It then elucidates key insights into the advantages that infrastructure offers in generating timely and high-quality HD maps, highlighting the potential of infrastructure-assisted HD map construction (Section 2.2).

[0045] 2.1 Limitations of existing in-vehicle HD maps

[0046] Offline HD maps. Autonomous vehicles rely on up-to-date HD maps for both global path planning and local behavior decisions and motion planning. However, various road conditions (such as road construction, congestion, and accidents) cause road geometry and topology (i.e., lane connectivity) to change constantly, thus requiring frequent updates to HD maps to ensure the safety and efficiency of autonomous driving [24, 38].

[0047] Vehicles using outdated HD maps may make incorrect behaviors and motion planning decisions that do not conform to the current road conditions, resulting in suboptimal or even dangerous driving performance. The CARLA

[16] example illustrates this; CARLA is a popular driving simulator that has been used to develop industrial autonomous driving systems such as Apollo[2] and Autoware.

[0048] In this case study, the disabled vehicle blocked the rightmost lane, causing a change in the road topology, and the right lane was no longer connected to other lanes. Figure 2A and Figure 2B The behavior of vehicles using outdated HD maps and new HD maps is illustrated separately. Using outdated HD maps, vehicles experience sudden braking and sharp turns, leading to speed reduction and even congestion. In contrast, new HD maps allow vehicles to anticipate lane conditions and make better decisions, such as choosing the unobstructed left lane, resulting in smoother and faster passage through affected road sections (average speed 1.6 m / s vs. 8.1 m / s). It should be noted that the decisions made by CARLA's autonomous driving agent in the case study may not be optimal. Nevertheless, even when using off-the-shelf autonomous driving agents, significant benefits of maintaining real-time HD maps are observed.

[0049] Online HD Maps. Online map building methods use onboard sensors to acquire up-to-date HD maps. However, such maps are susceptible to the limited sensor range of vehicles and uncertain sensor data quality. For visual illustration, online map building methods were run on real vehicle LiDAR data collected from urban streets

[27] . Detailed results / visualizations can be found in the video in the abstract. In particular, occlusion of the LiDAR by surrounding vehicles makes the constructed maps incomplete and fragmented. Evaluations show that when half of the road is occluded, the integrity of the online generated map may be less than 25%, which fails to meet the stringent safety requirements of autonomous driving.

[0050] 2.2 Advantages of Roadside Infrastructure

[0051] As described in Section 2.1, offline HD maps provide complete but outdated perception, while up-to-date online HD maps built solely by vehicles may be inaccurate. This study addresses these issues by leveraging infrastructure-assisted HD map building and updating. Roadside infrastructure enables continuous observation of road segments while stationary, which is ideal for local HD map building because it offers two key advantages: higher perception quality and the ability to capture real-time topology changes. Roadside infrastructure, including readily available RSUs [12, 13, 20], is becoming increasingly prevalent. This trend presents an opportunity to improve HD map building for autonomous vehicles by utilizing roadside infrastructure.

[0052] Complete and clear perception. Sensors installed on infrastructure offer a wider field of view and a longer sensing range, and are less susceptible to obstruction compared to sensors mounted on vehicles. Furthermore, by accumulating sensor data over a period of time, more detailed and accurate observations of the road can be obtained. Figure 3 The left and right images show a single-frame point cloud and a 10-second overlay point cloud of the infrastructure, respectively. The accumulated point cloud from the infrastructure can provide more detailed and accurate information (such as lane lines and road boundaries), which is crucial for autonomous driving.

[0053] Precise trajectory observations. By using LiDAR for vehicle detection and tracking, infrastructure can obtain continuous and precise trajectories (with decimeter-level accuracy)

[33] , which is unique to infrastructure and cannot be obtained by other methods (such as GPS for vehicles). The key observation is that these new and precise trajectories provide valuable information for inferring the geometry and topology of roads. Using LiDAR, etc. Figure 4 The results are illustrated using a typical real-world intersection. Trajectory density distinguishes lanes and helps locate lane dividers and road boundaries. Trajectory direction aligns with map topology, indicating directed connectivity between lanes. The variance of trajectory direction reflects the difference in travel directions at the same location, which is higher near intersections. Therefore, it can be used to infer potential pedestrian crossings. These observations can be integrated to enable real-time and highly lightweight map building and updating.

[0054] 3. Design of VI-MAP

[0055] 3.1 System Overview

[0056] Systems and methods for infrastructure-assisted real-time HD map building (named VI-Map) that utilize roadside infrastructure to provide real-time HD map building for autonomous vehicles. Figure 5An overview of VI-Map is shown. Specifically, VI-Map utilizes unique data collected from roadside infrastructure, including cumulative point clouds and precise vehicle trajectories, to build and maintain HD maps. It is worth noting that these two data types are unique to roadside infrastructure.

[0057] VI-Map comprises three key components. First, to efficiently handle the large volume of unstructured and heterogeneous 3D point cloud and trajectory data, the geometry construction module (Section 3.2) is configured to project these data types into the BEV space, producing a streamlined, structured, and unified 2D BEV representation. Then, the geometry construction module extracts specific features from both data types, refining valuable information tailored for generating map geometry. The geometry construction module utilizes relatively new but abundant point cloud and trajectory data for high-precision geometric predictions. In contrast, topology estimation (Section 3.3) uses newly arrived trajectories for topology inference. An update strategy is designed to identify trajectory changes and trigger topology updates. The resulting accurate vectorized map geometry, along with the new topology, constitutes a concise infrastructure HD map. Finally, map fusion (Section 3.4), performed inside the vehicle, receives the infrastructure HD map and merges it with the in-vehicle HD map. The map fusion module cleverly leverages the semantic, proximity, and directional attributes of the vectorized HD map to facilitate rapid integration of infrastructure and vehicle HD maps, thereby providing real-time HD map support for autonomous driving. Furthermore, vectorized HD maps are very lightweight, resulting in minimal communication overhead and strong adaptability to changing communication conditions between infrastructure and vehicles.

[0058] 3.2 Geometric Construction

[0059] This module is designed to generate the geometric portion of an HD map on infrastructure using two key data sources (cumulative point cloud and vehicle trajectories). In this design, map geometry is defined as a vectorized representation of four road element types: road boundaries, lane dividers, lanes, and pedestrian crossings. These elements cover the most common elements in HD maps and are consistent with existing online map building methods. Specifically, line-based elements (such as road boundaries and lane dividers) are represented as splines, while region-based elements (such as lanes and pedestrian crossings) are represented as polygons.

[0060] Geometry construction presents the following challenges: (i) point clouds and trajectories are two heterogeneous data types that may be difficult to process simultaneously. (ii) Processing multi-frame accumulated 3D point clouds on edge devices with limited computing resources is a challenge. These challenges are addressed by projecting both data types onto a bird's-eye view (BEV) and rasterizing them. Thus, (i) both unstructured LiDAR points and trajectories are converted into regular grids, enabling the simultaneous learning of features from both inputs to infer geometry; and (ii) this image-like BEV representation can be efficiently processed by highly lightweight 2D CNNs, eliminating the need for resource-intensive 3D point cloud DNNs. This approach enables efficient map construction on mobile GPUs (Graphics Processing Units).

[0061] Figure 6 The pipeline for the geometry building block is illustrated, comprising four steps: BEV projection, feature extraction, instance segmentation, and vectorization. In the initial step, two types of accumulated data are projected onto the BEV space. From this projection, five carefully crafted features, tailored for inferring map geometry, are extracted. These features are then fed into a 2D CNN for instance segmentation. This process produces map element instances represented as a grid map. Finally, the grid-based representations of the map elements are vectorized to create a vectorized HD map.

[0062] BEV Projection. First, the process of acquiring the accumulated point cloud and accurate vehicle trajectories is described. For each infrastructure point cloud frame, a 3D multi-object tracking method such as AB3DMOT (A Baseline for 3D Multi-Object Tracking) is configured to detect and track vehicles. Frames without vehicle detection or tracking are accumulated as a static point cloud, while trajectories are saved for the tracked vehicles. Then, the point cloud and trajectories are projected onto the ground plane in the BEV view. Points in the point cloud greater than 0.5m above the ground are filtered out because they may contain points of trees or buildings irrelevant to map building. LiDAR points and trajectories are projected onto the ground plane, resulting in a set of 2D points ( x ,),in( x , y This represents the coordinates of each point. Finally, the points in the generated 2D mesh are rasterized. Specifically, the points are distributed to pixel locations (...). u , v ), where the grid size is ( H , W Rasterization is represented as u = [( x - xmin ) / α], v = [( y - y min ) / α]. The grid height and width are respectively expressed as H = [( x max - x min ) / α], W = [( y max - y min [·], where [·] is the rounding operation and α represents the rasterization resolution. For example, when α = 10, the cell size is 0.1m × 0.1m. After the previous step, BEV mesh representations of the point cloud and trajectory are obtained, where each cell may contain a different number of points. Each point contains some features, which will be designed and extracted in the next step.

[0063] Feature extraction. Multiple (e.g., five) features are selected from LiDAR points and trajectory points for HD map construction. These features include, but are not limited to, those from LiDAR points and trajectory points. high , strength , density , Directional mean and Directional variance For LiDAR points, the maximum distance to the ground plane in each cell is calculated as... high Features (∈ R) 1 This helps inferring road boundaries, as curbs are typically 0.2 to 0.3 meters higher than the ground. The average value of points in each cell is also calculated. strength (∈ R 1 High intensity indicates the presence of road markings (such as lane dividers and pedestrian crossings), as the paint used for road marking printing is typically reflective. For trajectory points, as described in Section 2.2, multiple (e.g., three) features are calculated, including but not limited to... density (∈ R 1 ), Directional mean (∈R 2 )and Directional variance (∈ R 1 Density is calculated by counting the number of trajectory points in each cell. 2D direction vectors are used instead of angles to represent direction to improve the smoothness of the representation space. The direction variance is defined as... ,in yes directional vectors { The average value of} It is the L2 norm in the average direction. In summary, a total of five features are extracted for each cell and concatenated to form a shape of ( H , W The feature map of , 6), where the trajectory direction vector occupies two channels.

[0064] BEV instance segmentation. A 2D CNN with a UNet-like structure is used to perform semantic instance segmentation using feature maps extracted from BEVs. Prediction is made for four types of road elements: lanes, road boundaries, lane dividers, and pedestrian crossings. The CNN is trained by combining a weighted cross-entropy loss

[39] and an instance clustering loss

[15] , which is represented as The CNN takes a feature map of shape (H, 6) as input and generates a pixel-wise mask for each individual road element, thus outputting an instance mask. The instance segmentation results are also used as input to the topology estimation module.

[0065] Map vectorization. Instance-segmented maps of each category of road element cannot be directly used by vehicles because they are incompatible with most autonomous driving frameworks that typically use vectorized representations of HD maps. Therefore, the map is further vectorized to generate a sparse and compact representation of the HD map, minimizing the map's data size and thus reducing communication overhead during the transmission of infrastructure HD maps to vehicles.

[0066] The standard HD map specification in ASAM OpenDRIVE is adopted to model road elements as spline curves (for boundaries and dividers) or polygons (for lanes and pedestrian crossings). Each instance is first extracted as a set of pixel locations on the segmentation map. To fit each boundary and separator line, find a cubic function that minimizes the mean squared error. u ): The solution can be easily found using least squares regression. For lanes and pedestrian crossings, the fitted... The minimum bounding rectangle is used as the geometric representation.

[0067] 3.3 Topology Estimation

[0068] This module builds and updates the map topology by utilizing the precise vehicle trajectories and lane instance segmentation results from Section 3.2. A graph is used to represent the topology, following the definitions in OpenDRIVE. In this context, the nodes and edges of the graph correspond to lanes and the connections between lanes, respectively. The module's design is based on a key observation: precise vehicle trajectories can be used to infer map topology and identify topology changes because there is a strong correlation between lane connectivity and trajectories, allowing inference from one to another. Furthermore, since the trajectories are continuous, new trajectory data can be used to estimate and update the map topology in a timely manner.

[0069] The principle of topology construction is as follows: if two lanes are traversed by the same trajectory, they are considered connected. Connectivity is directional, described by the direction of the trajectory. As described in Section 2.1, the road topology is dynamic. Therefore, an update strategy is designed to determine when to trigger an update when the topology changes. This update strategy detects topology changes in both time and space dimensions and performs cross-validation to reduce erroneous updates. Specifically, in the time dimension, it is assumed that for each lane... i , k Arrival time of vehicle trajectory t Following a Poisson distribution. The continuous arrival times are discretized and defined as the number of time units. Each time unit lasts 1 second. Each lane... i The probability mass function is defined as: Each lane i rate parameters Based on a predefined time interval k The latest historical observation set of arrival times for each trajectory is dynamically updated. In the experiment, the time interval is set to one hour, which is a common update interval in many traffic flow monitoring and prediction studies. Then, a significance level of 0.05 is used... Test to determine each lane i Does the historical observation follow the current Poisson distribution? In the spatial dimension, calculate the most recent... k The trajectory has three features: trajectory density, directional mean, and directional variance. Then, the cross-entropy loss between the current trajectory features and features within a one-hour time interval is calculated. l Only when p >0.05 and l The map topology is only updated when the specified threshold is exceeded.

[0070] 3.4 Map Fusion

[0071] This module runs on the vehicle and is configured to merge HD maps received from the infrastructure with an onboard HD map. This step involves solving the map fusion problem, i.e., finding the coordinate transformation between the two maps and using this transformation to integrate them. Existing map fusion technologies are primarily developed for scenarios such as multi-robot collaborative SLAM, where the merged map is an occupies a raster map. However, due to the difference in map types—merging two vectorized HD maps—existing methods cannot directly adapt to the context.

[0072] To address this, a three-stage map fusion algorithm is provided to merge infrastructure HD maps and vehicle HD maps. Figure 7The pipeline of this module is illustrated. First, the semantic attributes and spatial proximity of map elements are used to establish their correspondences. Then, directional characteristics are used to determine the transformation between the two maps based on the established correspondences. Next, this transformation is used to precisely align the two HD maps. Finally, each pair of corresponding map elements is refitted into a unified representation, ultimately creating the final merged HD map.

[0073] Phase 1: Map Element Correspondence. To merge two HD maps, each with multiple road elements, it's necessary to first find the correspondence between infrastructure and vehicles. First, an initial transformation is used to convert the infrastructure HD map from its infrastructure coordinates to vehicle coordinates. This initial transformation can be easily derived based on the infrastructure and vehicle poses obtained from GPS and IMU data. Due to GPS errors, this initial transformation may be inaccurate and will be optimized in the next phase. Then, within the vehicle's HD map, a direct approach is adopted: matching each map element with its corresponding element in the infrastructure HD map. This matching is based on shared semantic labels and proximity; corresponding elements are the closest elements in the infrastructure HD map that have the same semantic label. The distance between two elements is defined as the distance between their nearest endpoints. It's important to note that elements in the infrastructure HD map may correspond to multiple elements in the vehicle HD map, especially when occlusion from the vehicle's viewpoint causes map element fragmentation. Notably, leveraging the semantic labels of map elements improves the accuracy and efficiency of the matching. Specifically, elements belonging to the same semantic category are specifically selected as potential corresponding elements. This simple yet effective method demonstrates commendable results in establishing correspondences, even when the initial transformation is inaccurate.

[0074] Phase Two: Map Alignment. After finding all corresponding element pairs, errors in the initial transformation are eliminated by optimizing the alignment between all element pairs. Note that alignment is performed using only linear elements (splines) because it reveals more location information. The optimal rigid transformation is found by minimizing the novel direction-aware chamfer distance between two splines. Specifically, the splines are first transformed back to points by sampling at fixed intervals. Then, the tangent direction at each point is calculated (…). , ). Combined position ( u , v Assign tuples to each point. And obtain two sets of tuples from infrastructure and vehicles { }、{ Next, in each iteration, first by... search{ The nearest tuple in the array is used to determine the tuple correspondence. The distance is defined as the Euclidean distance between two tuples. Then, the rigid transformation is estimated using a weighted root mean square. ,in The hyperparameter controls the degree to which the estimate should be biased towards points with similar orientations. All estimated transformations are averaged over each corresponding element, and this average is used to transform the splines on the infrastructure. This process is then repeated iteratively. In one embodiment, convergence was found to occur in five iterations. Finally, a fine-grained transformation aligns map elements from infrastructure to vehicles is obtained.

[0075] Phase 3: Map Element Refitting. Each pair of corresponding elements is merged into a union map element. For linear elements, sampling points are densely taken on both elements, and the fitting process described in Section 3.2 is repeated. For regional features, the union of a pair of corresponding elements is simply taken. An integrated HD map aligned with the vehicle's viewpoint is obtained for potential downstream tasks.

[0076] 4. Test Platform and Dataset

[0077] This section provides the VI-Map test platform and the dataset collected for evaluation.

[0078] Testing Platform. VI-Map was implemented in both real-world settings and the CARLA simulator for extensive data collection and evaluation. Figures 8A-8C A real-world setup is shown, which includes a modified vehicle and custom-built mobile poles serving as roadside infrastructure units. Each pole is equipped with two Livox AVIA LiDARs at a height of 5 meters

[31] . The Livox HAP LiDARs

[32] are mounted approximately 1.7 meters above the vehicle. The mobile poles are equipped with NVIDIA Jetson Orin and 802.11ac WiFi routers for wireless communication with the vehicle. Figures 8A-8C The test vehicle shown carries another Orin to run online HD map baseline building

[27] and VI-Map map fusion code. Both the vehicle and the moving pole are equipped with NEO-M8T GPS

[49] and HWT9052-485 IMU

[52] to estimate pose.

[0079] This paper presents a cost analysis of mobile RSUs, providing valuable insights into their practical application in real-world scenarios. Table 1 compares the hardware and costs of RSUs used in existing research [25, 48] and VI-Map. Each RSU mainly comprises several components such as sensors, computing units, and communication units. Notably, the cost of RSUs has the potential to decrease further, especially as LiDAR prices continue to decline over time. To ensure comprehensive road coverage, a deployment distance of approximately 50 meters between two adjacent RSUs is recommended. Table 1. Hardware and cost comparison of RSUs used in existing studies [25, 48] and VI-Map

[0080] Existing public datasets for autonomous driving [8, 18] only provide vehicle-centric point cloud data and cannot be used to evaluate VI-Maps. Therefore, two new datasets were collected, one from a real-world testing platform and the other generated by the CARLA simulator

[16] . Both datasets contain point clouds from infrastructure and vehicles over constant time periods. The real-world dataset was collected from 18 road segments in two cities, covering various road types (T-junctions, intersections, curves, straightaways, etc.) and different numbers of lanes from single lane to six lanes. Table 2 summarizes the two datasets. Table 2. Summary of the two new datasets, where "infr." represents infrastructure and "veh." represents vehicles.

[0081] Real-world dataset. During dataset collection, the average collection time for infrastructure point clouds was 3 minutes (maximum approximately 10 minutes). The average vehicle speed was 25 km / h (maximum approximately 30 km / h). To obtain ground-based HD maps, point clouds for each pair of infrastructure and vehicles were first registered. The fused point clouds were then projected onto the BEV and rasterized into images with a resolution of 0.15 m / pixel. Next, map geometry (polylines and polygons) was manually labeled in the images using the CVAT tool. HD maps were labeled for each pair of infrastructure-vehicle point clouds, as well as individual vehicle point clouds with minimal overlap to the infrastructure point clouds. Independent ground-based HD maps for infrastructure and vehicles were obtained by cropping the fused HD maps. The map topology for each road segment was manually labeled. n×n A binary array, in which nThis indicates the number of lanes in a road segment, with values ​​of 1 and 0 representing connected and disconnected lanes, respectively. Therefore, an infrastructure HD map and multiple vehicle HD maps were obtained for each road segment, used as training data for the geometry building module and the online map construction baseline, respectively. Vehicles in the infrastructure point cloud were also labeled for use in the training data of the AB3DMOT tracking algorithm.

[0082] The CARLA dataset was also rendered in CARLA

[16] . Roadside infrastructure was configured at different locations in Town 5 of the CARLA ecosystem. Each infrastructure and ego vehicle was equipped with a LiDAR with 32 channels and 360° field of view, consistent with mainstream autonomous driving datasets[8, 18]. Traffic flow was simulated using 200 vehicles roaming throughout the town. Map files provided by CARLA (in the .xodr file format defined by OpenDRIVE) were parsed to generate ground-based HD maps of the entire town. Then, vehicle point clouds and local HD maps of each infrastructure were obtained by cropping the town map according to the pose of each subject. Figures 9A-9B The dataset displays the locations of all infrastructure and a global HD map. It covers 50 road segments within a 439m × 509m town. Furthermore, CARLA is capable of simulating events that cause changes in road topology, such as vehicle breakdowns or accidents, which would be dangerous in real-world simulations. Vehicle breakdowns were simulated on road segments of three different road types: T-junctions, intersections, and two-lane straight roads. Different degrees of pavement marking blurring were also simulated, including but not limited to mild (<20%), moderate (50%–60%), and severe (>90%).

[0083] 5 assessments

[0084] 5.1 Evaluation Setup and Indicators

[0085] 5.1.1 Evaluation Settings. Based on the road coverage of the infrastructure point cloud, determine the height of the BEV mesh. H and width W (See Section 3.2) The value was set to 300. The geometry model, vehicle tracking model

[51] , and baseline model

[27] of VI-Map were trained or fine-tuned using the two datasets described in Section 4. Training was performed on a server equipped with an Intel Xeon Silver 4210 CPU and an NVIDIA RTX 2080Ti GPU. A 10-fold validation method was used for all three models.

[0086] 5.1.2 IoU , CD P , CDL , CD , P , R Following the evaluation methods in [19, 27, 29], and employing these widely accepted evaluation metrics, the accuracy of map geometry is assessed. Intersection-over-union (IoU) and Chamfer distance (CD) are used as metrics for evaluating linear elements (such as boundaries and dividing lines). For the area element pedestrian crossings, [the following is used]. IoU Accuracy ( P ) and recall rate ( R (This is used as an indicator.) Specifically, IoU It is the Euler metric, used to measure the pixel semantic difference between a predicted map and the actual ground reality, and is represented as... IoU ,in D P , D G R H×W×D It is a dense representation of map elements (rasterized curves and polygons on the BEV grid). D P It is a prediction. D G Represents the actual ground situation. H and W These are the height and width of the BEV mesh. D It represents the number of map element categories, and |·| represents the size of the set. CD It is a Lagrangian metric used to measure the spatial distance of vector geometries (curves or splines). CD Dir It is the direction chamfer distance. CD It's a two-way chamfer distance. Specifically, CD Dir Defined as CD Dir ,in and These are two sets of points sampled from the predicted curve and the actual ground curve. CD P Represents the process from prediction to label. CD (equivalent to accuracy), and CD L Represents the transition from label to prediction CD (Equivalent to recall rate). CD Defined as CD ( , ) = CDDir ( , ) + CD Dir ( , Precision and recall metrics for regional elements are defined as follows: and . CD P and CD L These respectively reflect the accuracy and completeness of the predicted map.

[0087] 5.1.3 Ride comfort, average transit time, and traffic throughput. These three user experience-related metrics are used to evaluate the effectiveness of the VI-Map. Ride comfort is measured using the vehicle's longitudinal acceleration. and lateral acceleration These are widely considered key indicators for assessing ride comfort [23, 35]. Lower acceleration values ​​correspond to better ride comfort; average transit time is the average time a vehicle takes to traverse a road segment; and traffic throughput is the number of vehicles that traverse a road segment within a fixed time.

[0088] 5.1.4 Response time, success time, and success rate. These three metrics are used to evaluate the performance of map topology update strategies. Response time is defined as... - ,in and These refer to the moments of road topology changes and infrastructure map topology updates, respectively. Success time is defined as... - ,in This is the moment when the map topology is correctly updated. Since the update might be incorrect,... Possibly with They are different. Success rate is the number of correct topology updates divided by the total number of updates.

[0089] 5.2 End-to-end assessment

[0090] The end-to-end system performance of VI-Map was evaluated on real-world roads containing four road types: two-way single-lane intersections, T-junctions, four-lane straight roads, and curves. Figure 10 This diagram shows the trajectories of vehicles entering and exiting intersections with moving poles on the roadside. Gray dots represent the vehicle point cloud along the trajectory. Colored dots represent the vehicle's GPS location, with different colors indicating the construction of the HD map. IoU At the beginning and end of the trajectory, only the in-vehicle HD map is available. IoU The percentage is 40%. When vehicles enter the infrastructure coverage area, the fused HD map... IoU The efficiency was increased to 80%. The results show that VI-Map can benefit vehicles by utilizing accurate infrastructure maps, thus helping them safely navigate complex road sections. The end-to-end latency of VI-Map was also measured. VI-Map achieved an average end-to-end latency of 37ms and a maximum of 42ms, indicating that VI-Map can operate in real-time, as most autonomous vehicles have a sensor frame rate of 10 Hz. Furthermore, delays were observed at the start and end of the trajectory. IoU There is a discrepancy, even though both only have HD maps of vehicles. This is because the road markings at the starting point are severely worn, affecting the online map. IoU Very low. The impact of incomplete pavement markings will be assessed in Section 5.6.

[0091] 5.3 Benefits of VI-Map

[0092] This section evaluates the benefits of VI-Map for autonomous vehicles. The main observations in this section are: (i) VI-Map's accurate map geometry complements and extends the vehicle's online HD map, going beyond the limitations of single-vehicle perception. (ii) VI-Map's novel map topology enables vehicles to make better behavioral decisions under current road conditions, benefiting passengers, vehicles, and the traffic system.

[0093] 5.3.1 Extending the online HD map range. The extent to which VI-Map extends the range was evaluated when the distance between vehicles and infrastructure changes. Figure 11 The demonstration showed that the VI-Map-fused HD map can double the coverage area compared to using only online HD maps, even when vehicles are very close to infrastructure (e.g., <10 m). This is because infrastructure provides a wider field of view due to its height and is less likely to be obstructed than vehicles. This expanded HD map coverage is fundamental to how VI-Map can benefit many downstream tasks and improve the safety of autonomous driving.

[0094] 5.3.2 Improving Ride Experience and Traffic Efficiency. VI-Map provides vehicles with the latest map topology to support downstream tasks such as decision-making and motion planning. The impact of map freshness on the behavior of autonomous vehicles, and the resulting effects and consequences, were assessed. An intersection was constructed in CARLA, where vehicles with hard-coded artificial intelligence (AI) used HD maps and surrounding information for autonomous driving. The following steps were performed: First, all autonomous vehicles traveling in the town were equipped with offline HD maps provided by CARLA. Second, a sudden stop was set up to simulate a lane congestion scenario, causing a change in road topology and rendering the offline HD map outdated. Data trajectories for the road segment before and after the topology change were recorded, including infrastructure point clouds and timestamps, accelerations, and positions of all vehicles passing through the segment. Third, the infrastructure point clouds were input into VI-Map to generate an updated HD map. The topology of the corresponding road segment in the CARLA map was manually updated, and the new map was loaded onto all autonomous vehicles. Finally, the vehicles were driven using the new map, and the new data trajectories were recorded.

[0095] Ride comfort, average transit time, and traffic throughput were calculated. Figure 12 The average ride comfort under three scenarios is shown. The results indicate that driving with outdated maps leads to a poor ride experience due to frequent starts, stops, and sharp turns. Compared to outdated maps, VI-Map significantly improves ride comfort by 3.9 times, even reaching levels comparable to non-accident scenarios. This is because VI-Map updates the map in a timely manner, enabling the vehicle to make better decisions, resulting in a smoother and more comfortable ride. Figure 13 It displays the average vehicle transit time and traffic throughput for this road segment. Compared to outdated maps, VI-Map reduces vehicle transit time to one-fifth and increases traffic throughput by two times.

[0096] 5.4 Performance of VI-Map

[0097] 5.4.1 Geometry Construction. The geometry construction of VI-Map was evaluated on a real-world testbed. VI-Map was compared with two baselines: (i) a state-of-the-art online HD map construction method called HDMapNet, which uses only vehicle data

[27] ; and (ii) a modified HDMapNet-based method in which the raw point cloud of infrastructure is fused to the vehicles. Figure 14An example of the map construction results at an intersection using two baselines and VI-Map is shown. The blue and red splines and the green rectangles are directly from the VI-Map output. The orange arrows are manually labeled to show the topology of the system-generated map, its original form being the figure (Section 3.3). It is determined that the HD map generated by VI-Map provides a wider range compared to methods using only vehicles. Furthermore, VI-Map provides both topology and a more accurate HD map than two baselines.

[0098] VI-Map was then evaluated on a real-world dataset. Table 3 shows the results using the metrics defined in Section 5.1.2. VI-Map outperformed the baseline on all metrics. In particular, VI-Map significantly outperformed HDMapNet. IoU This represents a 16%-35% improvement. VI-Map achieves decimeter-level map accuracy (i.e.,... CD The average error is 0.3m. The results also show that HDMapNet can benefit from the point cloud of infrastructure, thanks to the unobstructed view of the infrastructure. IoU (20% improvement). However, VI-Map's performance is far superior to HDMapNet, which uses infrastructure data. CD This resulted in approximately a 60% improvement. This is because the overlay of two single-frame point clouds for infrastructure and vehicles was still too sparse to accurately predict the location of map elements. VI-Map utilizes accumulated point clouds and precise trajectories to reveal more details and clues for accurately locating map elements. Table 3. Geometric accuracy of HD maps on real-world datasets. Higher accuracy... IoU (%) better. Lower CD (m) is better.

[0099] It was observed that infrastructure point clouds contributed more significantly to HDMapNet's improvement on lane dividers compared to road boundaries and pedestrian crossings. This is because lane dividers occupy a smaller area and are primarily inferred from point intensity, while overlaying infrastructure point clouds makes the intensity more prominent within a specific small area. Furthermore, determining the VI-Map... CD P and CD L The difference between them (approximately 13%) is much smaller than that between the baselines. This is because the HD maps generated from the two baselines may be incomplete due to sparse point clouds and potential occlusion, while the complete HD map of VI-Map ensures consistency between the two metrics. Furthermore, determining the Euclidean metric... IoU In comparison, VI-Map in Lagrange metric CDThe above shows a greater improvement over the two baselines (35%-26% improvement vs. 65%-60% improvement). This is because VI-Map generates continuous map elements and fits them to a vectorized shape. Furthermore, ablation studies were conducted to evaluate the effectiveness of trajectories in map geometry construction. Table 3 also shows VI-Map without trajectory input, where... IoU It decreased by 7%. CD This represents a 50% increase. The result supports the design that precise trajectories contain features that infer map geometry, and combining these with point cloud features can yield better performance.

[0100] 5.4.2 Topology Estimation. The performance of the HD map topology update strategy is evaluated by examining the response time, success time, and success rate defined in Section 5.1.4. The update strategy in Section 3.3 is compared with two baselines: (i) Fixed time period: with a fixed time period T (ii) Update topology; (ii) Fixed number of trajectories: after observing a fixed number of additional trajectories n Then update the topology. For the first method, T Set the traffic light cycle duration for the intersection; 60 seconds for straight roads. For the fixed number of trajectories method, n The values ​​were set to 3 × the number of lanes. It is worth noting that these values ​​were selected after extensive searching to determine the optimal performance parameters for each baseline method. Since the performance of map updates can vary with different road types and traffic conditions, three different road conditions were evaluated: a city center intersection, a suburban T-junction, and a four-lane straight road. Specifically, five simulations were performed for each of these three different road conditions, under the same conditions described in Section 5.3.2. Figures 15A-15B The performance of the VI-Map and two baselines is shown. In particular, Figure 15A The response and success times of the three update methods are shown. VI-Map provides the shortest success time across all road conditions, although its response is not the fastest. The two baselines respond quickly but take longer to obtain the correct topology update. VI-Map considers topology updates at the lane-by-lane level, while the two baselines do not. Therefore, the tracks collected under the two baseline strategies may be unevenly distributed across lanes, with some lanes potentially having no tracks at all, which could lead to update failures. Figure 15B The success rate is shown. VI-Map achieves an average success rate of 84%, significantly exceeding the 30%-44% of the two baselines. The update strategy integrates temporal and spatial variations to detect changes in the topology and suppresses erroneous updates through cross-validation.

[0101] 5.4.3 Map Fusion. The map fusion method was then experimented with using a real-world dataset. This module runs on a vehicle and is responsible for merging the infrastructure map with the vehicle map. By adapting the ideas of typical map fusion methods in collaborative SLAM to this scenario, two baseline methods were implemented: a probabilistic method [5, 26, 54] and an optimization method [4, 10, 34]. The vectorized map was rasterized into a grid map, and these methods were followed in the implementation. Figure 16 The performance of the fused map obtained after applying these fusion methods is shown. Figure 16 The curves in the image show the maps under different vehicle positioning errors for the three fusion methods. IoU . Figure 16 The green bars in the graph show the distribution of GPS positioning errors. The results indicate that when the positioning error is greater than 8m, VI-Map achieves over 40% better positioning accuracy than the baseline. IoU Improvements. As positioning errors increased, the performance of both baselines deteriorated sharply, while VI-Map maintained over 78% map accuracy across all tested positioning errors. IoU This is because VI-Map treats the maps as individual map elements and utilizes instance-level correspondences of these elements to precisely align and merge the two maps. In contrast, the two baseline methods treat the maps as a whole and rely solely on an initial transformation to align the two maps.

[0102] 5.5 System Overhead

[0103] 5.5.1 Vehicle-based runtime latency. The runtime latency of VI-Map at each step on the vehicle and the overall map fusion process were measured. Since the map fusion runtime can be affected by road complexity (i.e., the number of map elements), three different real-world road types were tested. Figure 17 The runtime for each step is shown, with error bars representing the lowest 5% and 95% of the runtime across all frames under the same settings. The results show that the overall onboard computation time for VI-Map is less than 50 ms for all road types, which meets the stringent real-time requirements of autonomous driving applications (prediction

[17] , decision making

[11] , and motion planning

[50] ). Thus, VI-Map achieves real-time HD map construction on vehicles with the assistance of roadside infrastructure. The matching step accounts for half of the total computation time because it involves calculating the distance between pairs of map elements. Notably, this runtime can be further accelerated by computation on a GPU. The runtimes for the transformation and alignment steps do not increase with road complexity because they are independent of the number of map elements.

[0104] 5.5.2 Communication Overhead. The communication overhead of VI-Map was compared to methods for transmitting raw point cloud data (e.g., HDMapNet with infrastructure) or rasterized map data [4, 10, 26, 54]. Data transmission time trajectories were measured using the real-world testbed described in Section 4, where vehicles and infrastructure communicated via an 802.11ac Wi-Fi network with a bandwidth of 80 MHz. Average data volume and transmission time are shown in Table 4. VI-Map transmitted 71.2 KB of vectorized map representation (spline control points, polygon vertices, and topology) with an average cycle time of 13.6 ms. Compared to transmitting raw point cloud data, VI-Map reduced data volume and time by approximately 42x and 40x, respectively, and by 22x compared to transmitting rasterized map data. Therefore, VI-Map can be deployed on a wide range of V2X (Vehicle to Everything) networks, even those with low communication bandwidth. Table 4. Average size and transmission time of data shared over an 802.11ac network.

[0105] 5.6 System Robustness

[0106] As observed in Section 5.2, online HD maps can be significantly affected by worn road markings. In this section, VI-Map was evaluated under varying degrees of road marking incompleteness. Specifically, CARLA was used to mask different proportions of road marking points in the point cloud. Table 5 shows that all VI-Map metrics decrease slightly and steadily with increasing road marking incompleteness, maintaining acceptable performance even under the most severe occlusion, and still outperforming the baselines in Table 3 without severe occlusion. Unlike online map-building methods that rely solely on sensor data (point cloud), VI-Map additionally utilizes precise trajectory observations of infrastructure and extracts unique features valuable for map construction. This design allows VI-Map to adapt to various road marking conditions as they do not affect trajectory features. Table 5. Geometric accuracy of VI-Map's HD map under different degrees of incomplete road markings.

[0107] 6. Discussion

[0108] VI-Map's Scalability. VI-Map's adaptability extends to various road types and traffic scenarios. First, the BEV features used to infer HD maps are low-level features directly derived from the raw data, without making any assumptions about road structure or lane numbers. Second, busy or sparse traffic conditions primarily affect the accumulation time of static point clouds and vehicle trajectories. However, this does not compromise the accuracy of the constructed HD map. However, VI-Map's scalability is limited in inferring 3D map elements with complex shapes, such as traffic lights, fire hydrants, or bus stops. This limitation is mainly attributed to projecting 3D data into a 2D BEV space and performing subsequent processing within that domain, leading to information loss during dimensionality reduction. Therefore, this approach may not yield optimal performance when dealing with complex 3D map elements because details of their 3D geometry are lost during projection.

[0109] Limitations and failure cases. In adverse weather conditions (such as rain, snow, and fog), VI-Map may not achieve satisfactory performance due to significant noise in the LiDAR data. Furthermore, VI-Map may encounter challenges in high-speed driving scenarios, as increased speed leads to a significant deterioration in point cloud quality, thus impacting VI-Map performance. Nevertheless, geometry construction, topology estimation, and map fusion can be modified and applied to other sensing modalities, such as 3D cameras.

[0110] VI-Map, a system for enhancing in-vehicle HD maps by providing accurate and timely infrastructure HD maps, was developed. VI-Map was implemented end-to-end, and experimental results demonstrate that it improves upon existing HD map construction methods in terms of map geometric accuracy, map topological freshness, system robustness, and efficiency. The system and method for infrastructure-assisted real-time HD map construction can be applied to industries such as autonomous driving, intelligent transportation systems, and smart cities.

[0111] Example 1. A method for constructing real-time HD maps for infrastructure assistance, comprising: Construct geometry to project data into the bird's-eye view (BEV) space; Topology estimation based on trajectory; and Integrating in-vehicle maps.

[0112] Example 2. According to the method described in Example 1, the geometry construction includes: performing BEV projection, feature extraction, BEV instance segmentation, and map vectorization.

[0113] Example 3. The method according to any of the preceding embodiments, wherein performing BEV projection includes: acquiring an accumulated point cloud and a vehicle trajectory, projecting the point cloud and trajectory onto a ground plane in a BEV view, and rasterizing the points to generate a two-dimensional (2D) mesh.

[0114] Example 4. The method according to any of the foregoing embodiments, wherein performing feature extraction includes: extracting one or more features for each cell, and concatenating the extracted features to form a feature map, wherein the trajectory direction vector occupies two channels in the feature map.

[0115] Example 5. The method according to any of the foregoing embodiments, wherein performing BEV instance segmentation includes: configuring a two-dimensional convolutional neural network (2D CNN) with a UNet-like structure to perform semantic instance segmentation using feature maps extracted from BEV.

[0116] Example 6. The method according to any of the foregoing embodiments, wherein the 2D CNN is configured to take a feature map as input and generate a pixel-wise mask for each individual road element, thereby obtaining an instance mask as output.

[0117] Example 7. The method according to any of the foregoing embodiments, wherein performing map vectorization includes: vectorizing the map to generate a sparse and compact representation of the HD map.

[0118] Example 8. The method according to any of the foregoing embodiments, wherein topology estimation based on trajectory includes: establishing and updating the map topology based on the accurate vehicle trajectory and lane instance segmentation results obtained by constructing geometry.

[0119] Example 9. The method according to any of the foregoing embodiments, wherein performing map fusion on the vehicle side includes: merging the HD map received from constructing geometry and topology estimates with the vehicle-mounted HD map.

[0120] Example 10. The method according to any of the foregoing embodiments, wherein map fusion at the vehicle end includes: performing map element correspondence, map alignment, and map element refitting.

[0121] Example 11. A computer program product, comprising: A non-transitory computer-executable storage device having thereon computer-readable program instructions, which, when executed by a computer, cause the computer to perform a method for constructing real-time HD maps for infrastructure assistance, the computer-executable program instructions including: Constructing geometry projects data onto the bird's-eye view (BEV) space; Topology estimation based on trajectory; and Map fusion is performed on the vehicle side.

[0122] Example 12. The computer program product according to Example 11, wherein constructing geometry includes: performing BEV projection, feature extraction, BEV instance segmentation, and map vectorization.

[0123] Example 13. A computer program product according to any of the foregoing embodiments, wherein performing BEV projection includes: acquiring an accumulated point cloud and a vehicle trajectory, projecting the point cloud and trajectory onto a ground plane in a BEV view, and rasterizing the points to generate a two-dimensional (2D) mesh.

[0124] Example 14. A computer program product according to any of the foregoing embodiments, wherein performing feature extraction includes: extracting one or more features for each cell, and connecting the extracted features to form a feature map, wherein a trajectory direction vector occupies two channels in the feature map.

[0125] Example 15. A computer program product according to any of the foregoing embodiments, wherein performing BEV instance segmentation includes: configuring a two-dimensional convolutional neural network (2D CNN) with a UNet-like structure to perform semantic instance segmentation using feature maps extracted from BEV.

[0126] Example 16. A computer program product according to any of the foregoing embodiments, wherein the 2D CNN is configured to take a feature map as input and generate a pixel-wise mask for each individual road element, thereby obtaining an instance mask as output.

[0127] Example 17. A computer program product according to any of the foregoing embodiments, wherein performing map vectorization includes: vectorizing the map to generate a sparse and compact representation of the HD map.

[0128] Example 18. A computer program product according to any of the foregoing embodiments, wherein topology estimation based on trajectory includes: establishing and updating map topology based on the precise vehicle trajectory and lane instance segmentation results obtained from the constructed geometry.

[0129] Example 19. A computer program product according to any of the foregoing embodiments, wherein map fusion at the vehicle end includes merging an HD map received from constructing geometry and topology estimates with an on-vehicle HD map.

[0130] Example 20. A computer program product according to any of the foregoing embodiments, wherein map fusion at the vehicle end includes: performing map element mapping, map alignment, and map element refitting.

[0131] All patents, patent applications, provisional applications, and publications mentioned or cited herein, including all figures and tables, are incorporated herein in their entirety, provided that they do not contradict the express teachings of this specification.

[0132] It should be understood that the examples and embodiments described herein are for illustrative purposes only, and various modifications or alterations that will occur to those skilled in the art should be included within the spirit and scope of this application. Furthermore, any element or limitation of any invention or embodiment thereof disclosed herein may be combined with any and / or all other elements or limitations (alone or in any combination) or any other invention or embodiment thereof disclosed herein, and all such combinations are covered within the scope of this invention without limiting it.

[0133] References

[0134] [1] Autoware. 2023. The Autoware Foundation - open source for autonomous driving. https: / / www.autoware.org / .(2023).

[0135] [2] Baidu. 2023. Apollo. https: / / apollo.auto / . (2023).

[0136] [3] Zhibin Bao, Sabir Hossain, Haoxiang Lang and Xianke Lin. 2022. High-definition map generation technologies for autonomous driving: a review. arXiv preprint arXiv:2206.05400 (2022).

[0137] [4] Andreas Birk and Stefano Carpin. 2006. Merging occupancy grid maps from multiple robots. Proc. IEEE 94, 7(2006), 1384–1397.

[0138] [5] Jose-Luis Blanco, Javier González-Jiménez and Juan-Antonio Fernández-Madrigal. 2013. A robust, multi-hypothesis approach to matching occupancy grid maps. Robotica 31, 5 (2013), 687–701.

[0139] [6] Lidar News Blog. 2022. HD Map Database Coverage Doubled. https: / / blog.lidarnews.com / hd-map-database-coverage-doubled / . (2022).

[0140] [7] Michael Buchholz, Johannes Müller, Martin Herrmann, JanStrohbeck, Benjamin Völz, Matthias Maier, Jonas Paczia, Oliver Stein, Hubert Rehborn and Rüdiger-Walter Henn. 2021. Handling occlusions in automated driving using a multiaccess edge computing server-based environment model from infrastructure sensors. IEEE Intelligent Transportation Systems Journal 14, 3 (2021), 106–120.

[0141] [8] Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, VeniceErin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan and OscarBeijbom. 2020. nuscenes: A multimodal dataset for autonomous driving. See IEEE / CVF Conference Proceedings on Computer Vision and Pattern Recognition. 11621–11631.

[0142] [9] Yigit Baran Can, Alexander Liniger, Danda Pani Paudel and Luc VanGool. 2021. Structured bird's-eye-view traffic scene understanding from onboard images. In the proceedings of the IEEE / CVF International Conference on Computer Vision, 15661–15670.

[0143]

[10] Stefano Carpin. 2008. Fast and accurate map merging for multi-robot systems. Autonomous Robots 25 (2008), 305–316.

[0144]

[11] Xue-Mei Chen, Min Jin, Yi-song Miao and Qiang Zhang. 2017. Driving decision-making analysis of car-following for autonomous vehicle under complex urban environment. Journal of Central South University 24 (2017), 1476–1482.

[0145]

[12] Cohda. 2023. Cohda Wireless MK6C EVK RSU. https: / / www.cohdawireless.com / solutions / hardware / mk6c-evk / . (2023).

[0146]

[13] Commsignia. 2023. Commsignia Roadside Unit. https: / / www.commsignia.com / products / . (2023).

[0147]

[14] CVAT. 2023. CVAT Annotation tool. https: / / www.cvat.ai / . (2023).

[0148]

[15] Bert De Brabandere, Davy Neven and Luc Van Gool. 2017. Semanticinstance segmentation for autonomous driving. Proceedings of the IEEE Workshop on Computer Vision and Pattern Recognition. 7–9.

[0149]

[16] Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez and Vladlen Koltun. 2017. CARLA: An open urban driving simulator. See Conference on robot learning. PMLR, 1–16.

[0150]

[17] Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles R Qi, Yin Zhou et al. 2021. Large scale interactive motion forecasting for autonomous driving: The Waymo open motion dataset. See IEEE / CVF International Conference on Computer Vision Proceedings. 9710–9719.

[0151]

[18] Andreas Geiger, Philip Lenz and Raquel Urtasun. 2012. Are we ready for autonomous driving? the kitti vision benchmark suite. See IEEE Conference on Computer Vision and Pattern Recognition 2012. IEEE, 3354–3361.

[0152]

[19] Nikhil Gosala, Kürsat Petek, Paulo LJ Drews-Jr, Wolfram Burgard and Abhinav Valada. 2023. SkyEye: Self-Supervised Bird's-Eye-View Semantic Mapping Using Monocular Frontal View Images. See IEEE / CVF Conference Proceedings on Computer Vision and Pattern Recognition, 14901–14910.

[0153]

[20] HARMAN. 2023. HARMAN Savari StreetWAVE RSU. https: / / car.harman.com / solutions / connectivity / harman-savari-streetwave. (2023).

[0154]

[21] Yuze He, Li Ma, Zhehao Jiang, Yi Tang and Guoliang Xing. 2021. VI-eye: semantic-based 3D point cloud registration for infrastructure-assisted autonomous driving. See Proceedings of the 27th International Conference on Mobile Computing and Networking. 573–586.

[0155]

[22] Namdar Homayounfar, Wei-Chiu Ma, Justin Liang, Xinyu Wu, JackFan and Raquel Urtasun. 2019. Dagmapper: Learning to map by discovering lanetopology. See IEEE / CVF International Conference on Computer Vision Proceedings. 2911–2920.

[0156]

[23] Zhiyu Huang, Jingda Wu and Chen Lv. 2021. Driving behavior modeling using naturalistic human driving data with inverse reinforcement learning. IEEE Transactions on Intelligent Transportation Systems 23, 8(2021), 10239–10251.

[0157]

[24] Kitae Kim, Soohyun Cho and Woojin Chung. 2021. HD map update for autonomous driving with crowdsourced data. IEEE Robotics and Automation Letters 6, 2 (2021), 1895–1901.

[0158]

[25] Annkathrin Krämmer, Christoph Schöller, Dhiraj Gulati, Venkatnarayanan Lakshminarasimhan, Franz Kurz, Dominik Rosenbaum, Claus Lenz and Alois Knoll. 2019. Providentia–A Large-Scale Sensor System for the Assistance of Autonomous Vehicles and Its Evaluation. arXiv preprint arXiv:1906.06789 (2019).

[0159]

[26] Heon-Cheol Lee, Seung-Hwan Lee, Myoung Hwan Choi and Beom-Hee Lee. 2012. Probabilistic map merging for multi-robot RBPF-SLAM with unknown initial poses. Robotica 30, 2(2012), 205–220.

[0160]

[27] Qi Li, Yue Wang, Yilun Wang and Hang Zhao. 2022. Hdmapnet: Anonline HD map construction and evaluation framework. See International Conference on Robotics and Automation 2022. IEEE, 4628–4634.

[0161]

[28] Rong Liu, Jinling Wang and Bingqi Zhang. 2020. High definition map for automated driving: Overview and analysis. Navigation Journal 73, 2 (2020), 324–341.

[0162]

[29] Yicheng Liu, Yue Wang, Yilun Wang and Hang Zhao. 2022. Vectormapnet: End-to-end vectorized hd map learning. arXiv preprint arXiv:2206.08920 (2022).

[0163]

[30] Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, HuiziMao, Daniela Rus and Song Han. 2022. BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation. arXiv preprint arXiv:2205.13542 (2022).

[0164]

[31] Livox. 2023. Livox AVIA. https: / / www.livoxtech.com / avia. (2023).

[0165]

[32] Livox. 2023. Livox HAP. https: / / www.livoxtech.com / hap. (2023).

[0166]

[33] Jonathon Luiten, Aljosa Osep, Patrick Dendorfer, Philip Torr, Andreas Geiger, Laura Leal-Taixé and Bastian Leibe. 2021. Hota: A higher ordermetric for evaluating multi-object tracking. International Journal of Computer Vision 129 (2021), 548–578.

[0167]

[34] Xin Ma, Rui Guo, Yibin Li and Weidong Chen. 2008. Adaptive genetic algorithm for occupancy grid maps merging. See The 7th World Conference on Intelligent Control and Automation, 2008. IEEE, 5716–5720.

[0168]

[35] Maximilian Naumann, Liting Sun, Wei Zhan and Masayoshi Tomizuka. 2020. Analyzing the Suitability of Cost Functions for Explaining and Imitating Human Driving Behavior based on Inverse Reinforcement Learning. See IEEE International Conference on Robotics and Automation 2020. 5481–5487. https: / / doi.org / 10.1109 / ICRA40945.2020.9196795.

[0169]

[36] ASAM OpenDRIVE. 2023. OpenDRIVE Format Specification. https: / / www.asam.net / standards / detail / opendrive / . (2023).

[0170]

[37] Teddy Ort, Liam Paull and Daniela Rus. 2018. Autonomous vehiclenavigation in rural environments without detailed prior maps. See IEEE International Conference on Robotics and Automation, 2018. IEEE, 2040–2047.

[0171]

[38] David Pannen, Martin Liebner, Wolfgang Hempel and Wolfram Burgard. 2020. How to keep HD maps for automated driving up to date. See IEEE International Conference on Robotics and Automation 2020. IEEE, 2288–2294.

[0172]

[39] Vasyl Pihur, Susmita Datta and Somnath Datta. 2007. Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach. Bioinformatics 23, 13(2007), 1607–1615.

[0173]

[40] Nicholas G Polson and Vadim O Sokolov. 2017. Deep learning for short-term traffic flow prediction. Traffic Research Part C: Emerging Technologies 79 (2017), 1–17.

[0174]

[41] Olaf Ronneberger, Philipp Fischer and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. See Medical Image Computing and Computer-Aided Intervention—MICCAI 2015: The 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 234–241.

[0175]

[42] Heiko G Seif and Xiaolong Hu. 2016. Autonomous driving in theiCity—HD maps as a key challenge of the automotive industry. Engineering 2, 2 (2016), 159–162.

[0176]

[43] Tixiao Shan and Brendan Englot. 2018. Lego-loam: Lightweight and ground-optimized lidar odometry and mapping on variable terrain. See IEEE / RSJ International Conference on Intelligent Robots and Systems, 2018. IEEE, 4758–4765.

[0177]

[44] Shuyao Shi, Jiahe Cui, Zhehao Jiang, Zhenyu Yan, Guoliang Xing, Jianwei Niu and Zhenchao Ouyang. 2022. VIPS: real-time perception fusion for infrastructure-assisted autonomous driving. See Proceedings of the 28th International Conference on Mobile Computing and Networking. 133–146.

[0178]

[45] Brian L Smith and Michael J Demetsky. 1994. Short-term traffic flow prediction models—a comparison of neural network and nonparametric regression approaches. See Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Vol. 2. IEEE, 1706–1709.

[0179]

[46] HERE Technologies. 2018. HERE HD LiveMap, The Most Intelligent Sensor for Autonomous Driving. https: / / bit.ly / 2Woss4K. (2018).

[0180]

[47] TomTom. 2018. HD Maps - Highly Accurate Border-to-border Modelof the Road. https: / / bit.ly / 2WrI1sd. (2018).

[0181]

[48] ​​Manabu Tsukada, Takaharu Oi, Masahiro Kitazawa and Hiroshi Esaki. 2020. Networked roadside perception units for autonomous driving. Sensors 20, 18 (2020), 5320.

[0182]

[49] U-BLOX. 2023. NEO-M8T GPS. https: / / www.u-blox.com / en / product / neolea-m8t-series. (2023).

[0183]

[50] Jingke Wang, Yue Wang, Dongkun Zhang, Yezhou Yang and Rong Xiong. 2020. Learning hierarchical behavior and motion planning for autonomous driving. See IEEE / RSJ International Conference on Intelligent Robots and Systems 2020. IEEE, 2235–2242.

[0184]

[51] Xinshuo Weng, Jianren Wang, David Held and Kris Kitani. 2020.Ab3dmot: A baseline for 3d multi-object tracking and new evaluation metrics. arXiv preprint arXiv:2008.08063(2020).

[0185]

[52] WIT-MOTION. 2023. HWT9052-485 IMU. https: / / www.wit-motion.cn / # / witmotion / product / detail?id=0e24e2a59ac94b14b3034a92e4338b2c. (2023).

[0186]

[53] Ji Zhang and Sanjiv Singh. 2014. LOAM: Lidar odometry and mapping in real-time. See Robotics: Science and Systems, Vol.2. Berkeley, CA, 1–9.

[0187]

[54] Xun S Zhou and Stergios I Roumeliotis. 2006. Multi-robot SLAM with unknown initial correspondence: The robot rendezvous case. See IEEE / RSJ International Conference on Intelligent Robots and Systems, 2006. IEEE, 1785–1792.

Claims

1. A method for constructing real-time HD maps for infrastructure assistance, comprising: Construct geometry to project data onto the bird's-eye view BEV space; Topology estimation based on trajectory; as well as Map fusion is performed on the vehicle side.

2. The method according to claim 1, wherein, Geometry construction includes performing BEV projection, feature extraction, BEV instance segmentation, and map vectorization.

3. The method according to claim 2, wherein, Performing BEV projection includes: acquiring an accumulated point cloud and vehicle trajectory, projecting the point cloud and trajectory onto a ground plane in the BEV view, and rasterizing the points to generate a two-dimensional 2D mesh.

4. The method according to claim 2, wherein, Performing feature extraction includes: extracting one or more features for each cell, and concatenating the extracted features to form a feature map, in which the trajectory direction vector occupies two channels.

5. The method according to claim 2, wherein, Performing BEV instance segmentation involves configuring a 2D convolutional neural network (CNN) with a UNet-like structure to perform semantic instance segmentation using feature maps extracted from BEVs.

6. The method according to claim 5, wherein, The 2D CNN is configured to take the feature map as input and generate a pixel-wise mask for each individual road element, thereby obtaining an instance mask as output.

7. The method according to claim 2, wherein, Performing map vectorization includes: vectorizing the map to generate a sparse and compact representation of the HD map.

8. The method according to claim 1, wherein, Trajectory-based topology estimation includes: establishing and updating the map topology based on the accurate vehicle trajectories and lane instance segmentation results obtained from the constructed geometry.

9. The method according to claim 1, wherein, Map fusion on the vehicle side includes merging the HD map received from the constructed geometry and the topology estimation with the vehicle-mounted HD map.

10. The method according to claim 9, wherein, Map fusion on the vehicle side includes: performing map element mapping, map alignment, and map element refitting.

11. A computer program product, comprising: A non-transitory computer-executable storage device having thereon computer-readable program instructions, which, when executed by a computer, cause the computer to perform a method for infrastructure-assisted real-time HD map construction, the computer-executable program instructions including: Construct geometry to project data onto the bird's-eye view BEV space; Topology estimation based on trajectory; and Map fusion is performed on the vehicle side.

12. The computer program product according to claim 11, wherein, Geometry construction includes performing BEV projection, feature extraction, BEV instance segmentation, and map vectorization.

13. The computer program product according to claim 12, wherein, Performing BEV projection includes: acquiring an accumulated point cloud and vehicle trajectory, projecting the point cloud and trajectory onto a ground plane in the BEV view, and rasterizing the points to generate a two-dimensional 2D mesh.

14. The computer program product according to claim 12, wherein, Performing feature extraction includes: extracting one or more features for each cell, and concatenating the extracted features to form a feature map, in which the trajectory direction vector occupies two channels.

15. The computer program product according to claim 12, wherein, Performing BEV instance segmentation involves configuring a 2D convolutional neural network (CNN) with a UNet-like structure to perform semantic instance segmentation using feature maps extracted from BEVs.

16. The computer program product according to claim 15, wherein, The 2D CNN is configured to take the feature map as input and generate a pixel-wise mask for each individual road element, thereby obtaining an instance mask as output.

17. The computer program product according to claim 12, wherein, Performing map vectorization includes: vectorizing the map to generate a sparse and compact representation of the HD map.

18. The computer program product according to claim 11, wherein, Trajectory-based topology estimation includes: establishing and updating the map topology based on the accurate vehicle trajectories and lane instance segmentation results obtained from the constructed geometry.

19. The computer program product according to claim 11, wherein, The on-vehicle map fusion includes merging the HD map received from the constructed geometry and the topology estimation with the on-vehicle HD map.

20. The computer program product according to claim 19, wherein, Map fusion on the vehicle side includes: performing map element mapping, map alignment, and map element refitting.