SYSTEM AND METHOD FOR SITE-SPECIFIC FINE-TUNING OF A ROAD ENGINEERING NETWORK
The LoRA technique fine-tunes transformer-based road generation networks by using prestored parameters and neighboring map tiles to enhance accuracy in complex road scenarios, addressing the limitations of current models and improving vehicle performance.
Patent Information
- Authority / Receiving Office
- DE · DE
- Patent Type
- Applications
- Current Assignee / Owner
- MERCEDES BENZ GROUP AG
- Filing Date
- 2025-10-30
- Publication Date
- 2026-07-02
AI Technical Summary
Current road generation network models, particularly those with transformer-based architectures, fail to provide accurate representations of roads in complex scenarios with dense intersections and junctions, impacting vehicle performance.
A method and system utilizing Low-Rank Adaptation (LoRA) to fine-tune transformer-based road generation networks by retrieving prestored parameter sets from servers, reconfiguring LoRA weighting parameters based on neighboring map tiles, and updating the road generation model to generate accurate road representations.
Enhances the accuracy of road representations in complex environments, improving vehicle decision-making at dense intersections and reducing computational overhead.
Smart Images

Figure 00000000_0000_ABST
Abstract
Description
The present invention relates generally to techniques for road estimation and generation and in particular, but not exclusively, to a method and a system for site-specific fine-tuning of a road generation network. Typically, in autonomous vehicles, the road representations for lane / road estimation are generated by a semantic feature recognition model. Specifically, the semantic feature recognition model generates a semantic map that includes features such as, but not limited to, lane markings, curbs, pedestrian crossings, and so on. The semantic map generated by the semantic feature recognition model is localized using high-definition (HD) maps, which are usually obtained from various servers. A similar technique is described in publication US10733484B2 (hereafter referred to as the "484 publication"), which discloses techniques for improving an in-vehicle feature detector used in vehicle localization techniques.The 484 publication involves embedding a feature recognition model and pre-calculated weights into a data layer of map data representing a geographic area to generate an improved feature recognition model. However, a major limitation of using the above-mentioned technique is that the HD map information obtained from various servers is usually not available for all geographical locations and therefore has its limits in certain geographical areas or regions. To replace conventional systems that use semantic feature recognition models, road generation network models with transformer-based architectures are being intensively researched to generate HD maps while driving using inputs from camera, LiDAR sensor, RADAR sensor, etc. However, current road generation network models are unable to provide an accurate representation of roads in complex road scenarios with dense junctions and intersections, which affects various critical functions of the vehicle. Therefore, there is a need for a system for the efficient and accurate generation of a street representation. The present disclosure describes a method for generating a street representation. The method comprises retrieving a prestored parameter set of a map tile corresponding to the current location of a vehicle from one or more servers associated with the vehicle. The prestored parameter set includes one or more optimized hyperparameters and a low-rank adaptation (LoRA) weighting parameter. The method further comprises reconfiguring the LoRA weighting parameter for the map tile based on the one or more optimized hyperparameters. The reconfiguration includes identifying one or more neighboring map tiles for the map tile. The reconfiguration further comprises determining one or more LoRA weighting values corresponding to each of the one or more neighboring map tiles.The reconfiguration process further includes calculating the reconfigured LoRA weighting parameter based on the determined one or more LoRA weighting values. The procedure also includes updating a road generation model associated with the vehicle based on the reconfigured LoRA weighting parameter to generate the road representation. In one embodiment of the present disclosure, the prestored parameter set for the map tile is generated by training a basic road generation network model. The basic road generation network model is associated with a basic weighting parameter, and for training the basic road generation network model, the LoRA weighting parameter is determined by decomposing the basic weighting parameter into a low-dimensional representation. In one embodiment of the present disclosure, identifying the one or more adjacent map tiles for the map tile comprises obtaining a subordinate map tile and a parent map tile corresponding to the current location of the vehicle. The subordinate map tile is a map tile of a predefined area corresponding to a location. The parent map tile comprises one or more subordinate map tiles. The identification further comprises obtaining one or more adjacent map tiles corresponding to the subordinate map tile. An adjacent map tile of the one or more adjacent map tiles shares a tile boundary with the subordinate map tile. The identification further comprises obtaining the one or more neighboring map tiles.The one or more adjacent map tiles are parent map tiles, which correspond to the one or more adjacent map tiles and at least one child map tile. In one embodiment of the present disclosure, calculating the reconfigured LoRA weighting parameter based on the determined one or more LoRA weighting values comprises determining the reconfigured LoRA weighting parameter by calculating a sum of the one or more LoRA weighting values. Each of the one or more LoRA weighting values is a product of at least one optimized hyperparameter and a prestored parameter set obtained for the corresponding adjacent map tile of the one or more adjacent map tiles. In one embodiment of the present disclosure, the road generation network model is associated with a plurality of LoRA layers. Each LoRA layer of the plurality of LoRA layers is associated with a corresponding LoRA weighting parameter and one or more optimized parameters. The present disclosure describes a device for generating a street representation. The device comprises a memory and at least one processor coupled to the memory. The at least one processor is configured to retrieve a prestored parameter set of a map tile corresponding to the current location of a vehicle from one or more servers associated with the vehicle. The prestored parameter set includes one or more optimized hyperparameters and a low-rank adaptation (LoRA) weighting parameter. The at least one processor is further configured to reconfigure the LoRA weighting parameter for the map tile based on the one or more optimized hyperparameters.To reconfigure the LoRA weighting parameter, the at least one processor is configured to identify one or more neighboring map tiles for the map tile and to determine one or more LoRA weighting values corresponding to each of these neighboring map tiles. The at least one processor is further configured to calculate the reconfigured LoRA weighting parameter based on the determined one or more LoRA weighting values. The at least one processor is also configured to update a vehicle-associated road generation network model based on the reconfigured LoRA weighting parameter to generate the road representation. In one embodiment of the present disclosure, the prestored parameter set for the map tile is generated by training a basic road generation network model. The basic road generation network model is associated with a basic weighting parameter, and for training the basic road generation network model, the at least one processor is further configured to determine the LoRA weighting parameter by decomposing the basic weighting parameter into a low-dimensional representation. In one embodiment of the present disclosure, the at least one processor is configured to obtain a child map tile and a parent map tile corresponding to the current location of the vehicle in order to identify the one or more adjacent map tiles for the map tile. The child map tile is a map tile of a predefined area corresponding to a location. The parent map tile comprises one or more child map tiles. The at least one processor is configured to obtain one or more adjacent map tiles corresponding to the child map tile. An adjacent map tile of the one or more adjacent map tiles shares a tile boundary with the child map tile. The at least one processor is configured to obtain the one or more neighboring map tiles.The one or more adjacent map tiles are parent map tiles, which correspond to the one or more adjacent map tiles and at least one child map tile. In one embodiment of the present disclosure, the at least one processor is further configured to determine the reconfigured LoRA weighting parameter based on the determined one or more LoRA weighting values by calculating a sum of the one or more LoRA weighting values. Each of the one or more LoRA weighting values is a product of at least one optimized hyperparameter and a prestored parameter set obtained for the corresponding adjacent map tile of the one or more adjacent map tiles. In one embodiment of the present disclosure, the road generation network model is associated with a plurality of LoRA layers. Each LoRA layer of the plurality of LoRA layers is associated with a corresponding LoRA weighting parameter and one or more optimized parameters. The foregoing summary is merely exemplary and is not intended as a limitation in any way. In addition to the exemplary aspects, embodiments, and features presented, further aspects, embodiments, and features will become apparent from the drawings and the following detailed description. The accompanying drawings, which form part of this disclosure, show exemplary embodiments and, together with the description, explain the principles set forth. In the figures, the leftmost digit of a reference number identifies the figure in which the reference number first appears. The same numbers are used in all figures to reference features and components. Some embodiments of the system and / or methods according to embodiments of the present invention are now described by way of example with reference to the accompanying figures: Fig. 1 shows an exemplary environment 100 of a vehicle 104 approaching a close intersection, according to some embodiments of the present disclosure.Figure 2 shows a block diagram of a location-specific fine-tuning system 200, comprising a device 202 for generating a road representation, according to some embodiments of the present disclosure. Figure 3 shows an exemplary logic flow diagram 300 for generating a road representation, according to some embodiments of the present disclosure. Figure 4 shows an exemplary representation 400 of a map tile system that maps map tiles according to the current location of a vehicle 104, according to some embodiments of the present disclosure. Figure 5 shows an exemplary representation 500 of a low-rank adaptation (LoRA) technique, according to some embodiments of the present disclosure. Figure 6 shows a block diagram of a transformer-based architecture 600 of the road generation network model 314, according to some embodiments of the present disclosure.Figure 7 shows a flowchart to illustrate a method 700 for generating a street representation, according to some embodiments of the present disclosure. Those skilled in the art in this field should be aware that all block diagrams presented here are conceptual representations illustrating systems that embody the principles of the present invention. Likewise, it should be acknowledged that all flowcharts, process diagrams, state transition diagrams, pseudocodes, and the like represent various processes that can be substantially represented in a computer-readable medium and executed by a computer or processor, regardless of whether such a computer or processor is explicitly shown. In this document, the word "exemplary" is used to mean "serving as an example, instance, or illustration." Each embodiment or implementation of the present invention described herein as "exemplary" is not necessarily to be construed as superior or advantageous to other embodiments. While the disclosure is open to various modifications and alternative forms, one specific embodiment has been shown as an example in the drawings and is described in detail below. It should be understood, however, that the disclosure is not intended to be limited to the specific disclosed forms, but rather, on the contrary, to encompass all modifications, equivalents, and alternatives that fall within the scope of the disclosure. The terms “include,” “comprehensive,” “includes,” or other variations thereof are intended to signify non-exclusive inclusion, such that a setup, device, or process that includes a list of components or steps does not only include those components or steps but also other components or steps that are not expressly listed or inherent to that setup, device, or process. In other words, one or more elements in a system or device referred to by “includes…” does not, without further qualification, preclude the existence of other elements or additional elements in the system or process. The following detailed description of embodiments of the disclosure refers to the accompanying drawings, which form part thereof and illustrate exemplary embodiments in which the disclosure can be put into practice. These embodiments are described in sufficient detail to enable those skilled in the field to put the disclosure into practice, and it is understood that other embodiments may be used and that modifications may be made without departing from the scope of the present disclosure. The following description is therefore not to be understood in a limiting sense. As described in the Background section, road generation network models with transformer-based architectures are being intensively researched to generate HD maps while driving, using input from cameras, light detection and ranging (LiDAR) sensors, radio detection and ranging (RADAR) sensors, etc. However, current road generation network models are unable to provide an accurate road representation in complex road scenarios with dense intersections and junctions, which affects various critical vehicle functions. To overcome the limitations of current road generation network models, the present disclosure provides a system and a method for accurately generating a road representation, particularly in complex road scenarios such as dense junctions, intersections, etc. Specifically, the present disclosure aims to fine-tune a transformer-based road generation network model using the Low-Rank Adaptation (LoRA) technique. A detailed description of the proposed solution is provided in the following paragraphs in conjunction with Figures 1-7. Fig. 1 shows an exemplary environment 100 of an autonomous vehicle 104 approaching a dense intersection, according to some embodiments of the present disclosure. A transformer-based road generation network model (not shown) is part of an autonomous vehicle 104 that enables the generation of a road representation in the form of vectorized HD maps. As shown in road representation 102-1, conventional transformer-based road generation network models rely on vehicle sensors to predict road elements and fail to accurately represent them. For example, as shown in Fig. 1, road representation 102-1 exhibits incomplete road elements such as broken lane markings and incorrect road curvatures. On the other hand, road representation 102-2 accurately describes the dense intersection, which includes a multitude of lanes and junctions.Therefore, given the road representation 102-1, the autonomous vehicle 104 would not be able to make timely decisions regarding lane changes and other similar actions when approaching such critical and dense intersections. Consequently, conventional transformer-based road generation network models suffer from reduced performance in scenarios with complex road geometries. The techniques of the present disclosure aim to provide an accurate road representation in complex environments by fine-tuning a transformer-based road generation network model using location-specific information. Fig. 2 shows a block diagram of a site-specific fine-tuning system 200, comprising a device 202 for generating a street representation, according to some embodiments of the present disclosure. The system 200 according to Fig. 2 comprises a device 202 for generating a street representation. The device 202 further comprises a memory 204 and at least one processor 206 (also referred to as a "CPU" or "Central Processing Unit"). The processor 206 may include at least one data processor. The processor 206 may include specialized processing units such as integrated system (bus) controllers, memory management control units, floating-point units, graphics processing units, digital signal processing units, etc. The memory 204 may, without limitation, include storage drives, removable disk drives, etc. The memory 204 may store a collection of program or database components, including, but not limited to, an operating system, a web browser, etc. System 200 further comprises one or more servers 208 that store a map tile database from which the device 202 retrieves pre-stored information corresponding to the current location of the vehicle 104. The one or more servers 208 can be cloud-based servers and can be connected to the device 202 via a communication network. The communication network includes, but is not limited to, a direct connection, an e-commerce network, a peer-to-peer (P2P) network, a local area network (LAN), a wide area network (WAN), a wireless network (e.g., using the Wireless Application Protocol), the internet, Wi-Fi, and the like. In one embodiment, the pre-stored information comprises parameter sets corresponding to a plurality of map tiles. These aspects are explained in more detail in the following sections. The system 200 further comprises one or more sensors 210 of a vehicle 104 in which the system 200 is implemented. The one or more sensors 210 comprise at least one camera sensor for capturing images of a road environment while the vehicle 104 is in motion. The one or more sensors 210 further comprise at least one light detection and ranging (LiDAR) sensor that uses laser light for distance measurement and for generating high-resolution 3D maps of the environment. The system 200 comprises a road generation network model 212, which is part of the memory 204.The road generation network model 212 of the present disclosure is used to generate an accurate representation of the road and its surroundings by processing data from cameras, LiDAR sensors, and other vehicle data stored in memory 204. As explained in the previous sections, a current road generation network model is known to typically determine road boundaries (i.e., lane boundaries, curbs, road edges), detect lane markings, lane limits, and lane types, and provide a vectorized, high-resolution map representation of a road. However, conventional road generation network models perform poorly in dense geometries such as large intersections.Therefore, the road generation network model 212 of the present disclosure generates an accurate road representation for the vehicle 104 based on the receipt of information from the one or more sensors 210 and the one or more servers 208. Fig. 3 shows an exemplary logical flowchart 300 for generating a street representation, according to some embodiments of the present disclosure. To generate an accurate street representation of a road environment, a current vehicle location 302-1 is acquired by the one or more sensors 210 of the vehicle 104. Furthermore, in block 306, a parameter set corresponding to the vehicle location 302-1 is downloaded from the one or more servers 208. A parameter set is a set or package of parameters, such as weight parameters and hyperparameters, that were previously uploaded to the one or more servers 208. The one or more servers 208 store a database of parameter sets, each corresponding to a specific location on a map. The weight parameters and hyperparameters corresponding to the location were obtained during the training of the road generation network model 212. For example, a baseline road generation network model is trained for a specific location to generate a street representation corresponding to that specific location.The basic road generation network model is trained using a basic weighting parameter. After training, one or more optimized hyperparameters are obtained. Thus, upon completion of the training, the one or more optimized hyperparameters, along with the weighting parameters corresponding to a specific location, are uploaded to one or more servers 208. Consequently, one or more servers 208 store the parameter sets corresponding to all previously trained locations. In Fig. 3, the parameter set 305 for the current vehicle location 302-1 is stored in a cloud 304, which hosts one or more servers 208. The parameter set 305 is then downloaded (in block 306) and deserialized (in block 308). During deserialization, the parameter set is converted from a serialized / compact format to a readable format. In block 310, the weighting parameters are derived from the parameter set 305 and adjusted using a series of steps, as detailed in blocks 310-1, 310-2, 310-3, and 310-4. The operations of the blocks listed below are explained in the following paragraphs in conjunction with Fig. 4. In block 312, the adjusted weighting parameters obtained from block 310 are passed to a map decoder block 314-3 of a road estimation model 314. The road generation network model 314 is the same as the road generation network model 212 shown in Fig. 2.The road generation network model 314 is a transformer architecture model comprising blocks 314-1 and 314-2, which denote a map encoder and a bird's-eye view (BEV) feature generator, respectively. The map encoder 314-1 receives input from a camera 302-2 and a LiDAR sensor 302-3. The BEV feature generator 314-2 takes 2D images as input and outputs a 3D frame using multimodal fusion techniques. The BEV feature generator 314-2 processes sensor data received from various vehicle sensors 210, such as location sensors and multiview camera sensors, and seamlessly fuses them on a single plane. The fused data from the BEV feature generator 314-2, along with the adjusted weighting parameters, are then passed to the map decoder block 314-3 and processed to obtain an accurate and improved road representation for the current vehicle location 302-1 (in block 316). Fig. 4 shows an exemplary representation 400 of a map tile system that maps map tiles according to a current location of a vehicle 104, according to some embodiments of the present disclosure. It is known in the prior art that the Hexagonal Hierarchical Geospatial Indexing System (H3 system) is a global grid system that divides geographic locations, such as cities, into a grid of hexagonal areas called cells or map tiles. The H3 grid system is an indexing system that identifies each grid cell by a 64-bit H3 index. Furthermore, each cell is associated with a resolution, and the H3 system supports a total of sixteen resolutions ranging from 0 to 15 (i.e., from coarser to finer). Each finer resolution has cells with one-seventh the area of the coarser resolution. For example, a parent cell with a resolution of 4 would contain a total of seven child cells, each with a resolution of 5.It should be noted that hexagons cannot be perfectly subdivided into seven hexagons, so the subordinate cells are only approximately contained within a parent cell. In the present revelation, the grid cell mentioned in the previous section is referred to as a map tile. In the following sections, a current map tile can be considered to be the map tile that an Ego-vehicle is traversing. Specifically, the current map tile represents the current location of the Ego-vehicle. Once a latitude / longitude point is known, the index of the corresponding map tile can be retrieved from the H3 indexing system via integrated APIs. Figure 4 shows a map representation of a particular city. A grid comprising a multitude of map tiles is superimposed on the map. Figure 4 shows three parent map tiles 400-1, 400-2, and 400-3. Vehicle 104 is located at a current map tile 402. The current map tile 402 is associated with the parent map tile 400-2. Furthermore, the edges of the current map tile 402 border the edges of seven adjacent map tiles 404, each of which has either a common or a different parent map tile. In the present disclosure, the at least one processor 206 is configured to determine the current location of the vehicle 104. Furthermore, the at least one processor 206 is configured to obtain the grid information corresponding to a specific location on the map. After obtaining the grid information, the current map tile 402 is determined. To refine or adjust the parameters (shown in block 310 of Fig. 3), the at least one processor 206 is configured to obtain a subordinate map tile 402 and a parent map tile 400-2 corresponding to the current location of the vehicle 104. Furthermore, the at least one processor 206 is configured to obtain the adjacent map tiles 404, i.e., all map tiles that share a tile boundary with the map tile 402.Based on the adjacent map tiles, the at least one processor 206 is configured to receive one or more adjacent map tiles. The one or more adjacent map tiles are parent map tiles corresponding to the one or more adjacent map tiles 404 and map tile 402. As shown in Fig. 4, the parent map tiles corresponding to the one or more adjacent map tiles 404 and map tile 402 are tiles 400-1, 400-2, and 400-3. Thus, the one or more adjacent map tiles include the parent map tiles 400-1, 400-2, and 400-3. If there is more than one parent map tile, the parameter packages of these parent map tiles are combined. After receiving one or more adjacent map tiles, the corresponding parameter sets for these tiles are retrieved from the one or more servers 208. If there is only one adjacent map tile, the corresponding parameter set for that tile is retrieved from the one or more servers 208. The process of adjusting the parameters for the current location of vehicle 104 using these parameter sets is described in the following paragraphs. Fig. 5 shows an exemplary representation 500 of a Low-Rank Adaptation (LoRA) technique, according to some embodiments of the present disclosure. It is understood that the LoRA technique is a well-known technique that utilizes the concept of rank decomposition. Rank decomposition allows a high-dimensional matrix to be represented as the product of two low-dimensional matrices. Considering the example where a matrix (w) has dimension -d * d, it can be represented by the product of two matrices A and B, where A has dimension d * r and B has dimension -r * d. Thus, forming the product of A and B yields -2 * r * d for the dimension of the trainable parameter. In this disclosure, the LoRA technique is used to demonstrate that significant changes to the neural network can be captured through a low-dimensional representation. Specifically, not all trainable parameters of a network model are equally important, and a smaller subset of these trainable parameters can effectively summarize the necessary adjustments within the network model. In Fig. 5, block 502 provides an input X to an example layer of a network model. This network model can also be referred to as a basic road generation network model. The example layer of the basic road generation network model is pre-trained with certain trainable parameters having dimensions d*d. The example layer has a weight W. The trainable parameters can also be referred to as basic weight parameters. As shown in Fig. 5, block 504, the dimensions of the trainable parameters are d*d. In block 506, the pre-trained weight W is multiplied by the input X to obtain the output W·X. In blocks 508 and 510, the LoRA technique is used to decompose the weight W into a weight ΔW. The weight ΔW is obtained by decomposing a matrix of dimensions d*d into two low-dimensional matrices A and B.Here, both A and B have a lower rank (r). In other words, since the pre-trained weight W has dimensions d * d, the decomposed weight ΔW must also have dimensions d * d. Consequently, using the LoRA technique, W is decomposed into matrices A and B. Here, A is a matrix of dimension d * r and B is a matrix of dimension r * d, as shown in Fig. 5. Thus, the dimensions of the trainable parameters are reduced from d² to 2rd, which is a significantly smaller size if r < <d berücksichtigt. Diese Zerlegung erleichtert die Verringerung des gesamten rechnerischen Overheads, der typischerweise mit der Feinabstimmung großer Netzwerkmodelle verbunden ist. Nach der Zerlegung wird die aktualisierte Gewichtung wie folgt dargestellt: W' = W + AB. In einer Ausführungsform umfasst der aktualisierte Gewichtungsparameter W' den Basis-Gewichtungsparameter W und den LoRA-Gewichtungsparameter A·B.The LoRA weighting parameter A·B is associated with at least two hyperparameters – the rank (r) and a coefficient value Alpha (α). Experts in this field will recognize that the updated weight W' can be determined for each example layer of the network model. In the present disclosure, taking into account Fig. 4, the parameter sets for the one or more adjacent map tiles (400-1, 400-2, and 400-3) are retrieved for the decomposition; that is, the LoRA weighting parameter is retrieved for each map tile. Furthermore, the model weighting for the current location of vehicle 104 is reconfigured using the previous weighting parameter and the retrieved LoRA weighting parameter. To reconfigure or update the model weighting parameters according to the current location, the formula below, according to equation (1), is used, where n is the total number of one or more adjacent map tiles, α is one of the hyperparameters of the parameter set (i.e., Alpha), and L is the LoRA weighting parameter obtained for a corresponding adjacent map tile. For example, considering the neighboring map tiles 400-1, 400-2, and 400-3 with their corresponding LoRA weighting parameters L1, L2, and L3, as well as their corresponding hyperparameters α1, α2, and α3, the updated model weighting parameter for a current map tile 402 would be the sum of the base model weight and the weight values a1*L1, a2*L2, and a3*L3 obtained for each of the three parent map tiles—400-1, 400-2, and 400-3. The weight value of a neighboring map tile is the product of the hyperparameter α and the LoRA weighting parameter L of the neighboring map tile. In one embodiment, the hyperparameter α and the LoRA weight L for a given tile comprise a set of all hyperparameters and the cumulative combination of all LoRA weight parameters for all example layers of the network model. Fig. 6 shows a block diagram of a transformer-based architecture 600 of the road generation model 314, according to some embodiments of the present disclosure. In the present disclosure, the LoRA technique decomposes the pretrained parameters, i.e., the basic weighting parameters of the basic road generation network model, into a reduced set of trainable parameters (LoRA weighting parameters, including hyperparameters such as rank r and alpha α). As explained in the description of Fig. 5, rank r is a value associated with the decomposition matrices A and B. Alpha α is a coefficient associated with the rank value r. These trainable parameters are injected into the decoder layers of the transformer-based road generation model 600, significantly reducing the number of trainable parameters for downstream tasks. Typically, a transformer-based network model adopts an encoder-decoder paradigm, where the map encoder transforms sensor inputs into a uniform bird's-eye view (BEV) representation. The 608 map decoder uses a hierarchical query embedding scheme to explicitly encode map elements. In the present disclosure, the transformer-based network model 600 comprises an encoder, which includes a backbone module 602 and a perspective-view (PV)-to-BEV module 604. The backbone module 602 is a conventional backbone that generates multiview feature maps from the input images 601. The surround-view input images 601 are acquired by vehicle-mounted camera sensors 210. The PV-to-BEV module 604 then transforms PV image features into BEV features. A BEV grid 606 is generated by performing BEV semantic segmentation. The BEV grid 606 is a rasterized map constructed from the BEV features. The map decoder 608 comprises several decoder layers (multi-head attention layers 608-1, custom deformable attention heads 608-2 and prediction heads of feed forward network 608-3). In one embodiment of the present disclosure, a total of 72 (4+2+3+3=12 layers x 6) linear decoder layers are added with LoRA weight parameters in - Custom MSDeformable Attention layers (4), Feed Forward Network (2) and prediction heads (3+3) of the transformer architecture model 600. The prediction heads consists of a classification branch (3) and a regression branch (3). Thus, LoRA weight parameters are added to all 72 layers, thereby greatly improving the performance of the model. Fig. 7 zeigt ein Flussdiagramm zur Veranschaulichung eines Verfahrens 700 zur Generierung einer Straßendarstellung, gemäß einigen Ausführungsformen der vorliegenden Offenbarung. As shown in Fig. 7, Method 700 can comprise one or more steps. Method 700 can be described in the general context of computer-executable instructions. In general, computer-executable instructions can include routines, programs, objects, components, data structures, procedures, modules, and functions that perform specific functions or implement specific abstract data types. The order in which Method 700 is described is not to be construed as a restriction, and any number of the described method blocks can be combined in any order to implement the method. In addition, individual blocks can be deleted from the methods without departing from the scope of the invention described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or a combination thereof. In block 702, method 700 comprises determining the current location of the vehicle 104. The current location of the vehicle 104 is retrieved from one or more sensors 210. The operations of block 702 can be performed by the device 202 (in particular by the at least one processor 206) of Fig. 2. In Block 704, the method 700 comprises retrieving a pre-stored parameter set of a map tile corresponding to the current location of the vehicle 104. The pre-stored parameter set includes one or more optimized hyperparameters and a low-rank adaptation (LoRA) weighting parameter. The operations of Block 704 can be performed by the device 202 (in particular by the at least one processor 206) of Fig. 2 and also by the one or more servers 208. In this case, the pre-stored parameter set is retrieved from a cloud server of the one or more servers 208. The cloud server stores a map tile database. The map tile database contains parameter sets corresponding to a number of map tiles that were previously uploaded to the cloud server. In block 706, method 700 comprises reconfiguring the LoRA weighting parameter for the map tile based on one or more optimized hyperparameters. The operations of block 706 can be performed by the device 202 (in particular by the at least one processor 206) of Fig. 2. Blocks 708, 710, and 712 are subblocks of block 706. In block 708, method 700 comprises identifying one or more adjacent map tiles for the map tile. The operations of block 708 can be performed by the device 202 (in particular by the at least one processor 206) of Fig. 2. In one embodiment of the present disclosure, identifying the one or more adjacent map tiles for the map tile comprises obtaining a subordinate map tile and a parent map tile corresponding to the current location of the vehicle 104. The subordinate map tile is a map tile of a predefined area corresponding to a location. The parent map tile comprises one or more subordinate map tiles. The identification further comprises obtaining one or more adjacent map tiles corresponding to the subordinate map tile. An adjacent map tile of the one or more adjacent map tiles shares a tile boundary with the subordinate map tile. The identification further comprises obtaining the one or more neighboring map tiles.The one or more adjacent map tiles are parent map tiles, which correspond to the one or more adjacent map tiles and at least one child map tile. In block 710, method 700 comprises determining one or more LoRA weighting values corresponding to each of the one or more adjacent map tiles. The operations of block 710 can be performed by the device 202 (in particular by the at least one processor 206) of Fig. 2. In block 712, method 700 comprises calculating the reconfigured LoRA weighting parameter based on the determined one or more LoRA weighting values. The operations of block 712 can be performed by the device 202 (in particular by the at least one processor 206) of Fig. 2. In one embodiment of the present disclosure, calculating the reconfigured LoRA weighting parameter based on the determined one or more LoRA weighting values comprises determining the reconfigured LoRA weighting parameter by calculating a sum of the one or more LoRA weighting values. Each of the one or more LoRA weighting values is a product of at least one optimized hyperparameter and a prestored parameter set obtained for the corresponding adjacent map tile of the one or more adjacent map tiles. In block 714, method 700 comprises updating a road generation network model based on the reconfigured LoRA weighting parameter to generate the road representation. The operations of block 714 can be performed by the device 202 (in particular by the at least one processor 206) of Fig. 2. In one embodiment of the present disclosure, the method further comprises generating the prestored parameter set for the map tile by training a basic road generation network model 212. The basic road generation network model 212 is associated with a basic weighting parameter, and for training the basic road generation network model 212, the LoRA weighting parameter is determined by decomposing the basic weighting parameter into a low-dimensional representation. In one embodiment of the present disclosure, the road generation network model 212 comprises a map decoder that includes a plurality of LoRA layers. Each LoRA layer of the plurality of LoRA layers is associated with a corresponding LoRA weighting parameter and one or more optimized parameters. In one embodiment, the present disclosure provides a finely tuned road generation network model that generates accurate road representations even in dense and complex environments. In another embodiment, the present disclosure utilizes LoRA technology to significantly reduce the overall computational overhead associated with fine-tuning large network models. The terms “an embodiment”, “elaboration”, “elaborations”, “the embodiment”, “the embodiments”, “one or more embodiments”, “some embodiments”, and “an embodiment” mean “one or more (but not all) embodiments of the invention(s)”, unless expressly stated otherwise. The terms “including”, “comprising”, “comprising”, and variations thereof mean “including, but not limited to”, unless expressly stated otherwise. The enumerated list of items does not imply that any or all items are mutually exclusive, unless expressly stated otherwise. The terms “a”, “an”, and “the” mean “one or more”, unless expressly stated otherwise. The description of an embodiment with several communicating components does not imply that all of these components are required. On the contrary, various optional components are described to illustrate the wide variety of possible embodiments of the invention. Where a single device or item is described herein, it is obvious that more than one device / item (regardless of whether they interact) may be used instead of a single device / item. Likewise, it is obvious that where more than one device / item is described herein (regardless of whether they interact), a single device / item may be used instead of the more than one device / item, or a different number of devices / items may be used instead of the number of devices or programs shown. The functionality and / or features of a device may alternatively be embodied by one or more other devices that are not expressly described as having such functionality / features.Therefore, other embodiments of the invention need not include the device itself. Finally, the language used in the specification text was chosen primarily for readability and didactic purposes and possibly not to define or limit the inventive subject matter. Therefore, it is intended that the scope of the invention is not limited by this detailed description, but rather by any claims granted in an application based thereon. Accordingly, the embodiments of the present invention are intended to illustrate, but not limit, the scope of the invention as set out in the following claims. Reference numbers: 100, 400 Example Environments 300, 500, 700 Methods 102-1, 102-2, 316 Road Representations 104 Vehicle 200 System 202 Device 204 Memory 206 Processor 208 One or More Servers 210 One or More Sensors 212, 314, 600 Road Generation Network Model 302-1, Current Vehicle Location 302-2, 302-3 Camera and LiDAR Sensors 304 Cloud Server 305 Parameter Set 306, 308, 310, 312 310-1, 310-2, 310-3, Process Blocks 502-514, 702-714 314-1, 602, 604 Map Encoder and Components 314-2, 606 BEV Feature Generator / Detector 314-3, 608 Map Decoder 400-1, 400-2, 400-3 Parent Map Tiles 402 Current Map Tile 404 Adjacent Map Tiles 601 Input Images QUOTES INCLUDED IN THE DESCRIPTION This list of documents cited by the applicant was automatically generated and is included solely for the reader's convenience. The list is not part of the German patent or utility model application. The DPMA accepts no liability for any errors or omissions. Cited patent literature US 10733484B2
[0002]
Claims
A method (700) for generating a street representation, the method comprising: obtaining (704) a prestored parameter set of a map tile corresponding to a current location of a vehicle (104) from one or more servers (208) associated with the vehicle (104), wherein the prestored parameter set comprises one or more optimized hyperparameters and a low-rank adaptation (LoRA) weighting parameter; reconfiguring (706) the LoRA weighting parameter for the map tile based on the one or more optimized hyperparameters, wherein the reconfiguration comprises: identifying (708) one or more neighboring map tiles for the map tile; determining (710) one or more LoRA weighting values corresponding to each of the one or more neighboring map tiles; and calculate (712) the reconfigured LoRA weighting parameter based on the determined one or more LoRA weighting values;and update (714) a road generation network model (212) associated with the vehicle (104) based on the reconfigured LoRA weighting parameter to generate the road representation.; The method of claim 1, wherein the prestored parameter set for the map tile is generated by training a basic road generation network model, wherein the basic road generation network model is associated with a basic weighting parameter, and wherein, for the training of the basic road generation network model, the method comprises determining the LoRA weighting parameter by decomposing the basic weighting parameter into a low-dimensional representation. The method of claim 1, wherein identifying the one or more adjacent map tiles for the map tile comprises: obtaining a child map tile and a parent map tile corresponding to the current location of the vehicle (104), wherein the child map tile is a map tile of a predefined area corresponding to a location, and wherein the parent map tile comprises one or more child map tiles; obtaining one or more adjacent map tiles corresponding to the child map tile, wherein an adjacent map tile of the one or more adjacent map tiles shares a tile boundary with the child map tile;and obtaining one or more adjacent map tiles, wherein the one or more adjacent map tiles are parent map tiles corresponding to the one or more adjacent map tiles and the at least one child map tile. The method of claim 1, wherein the calculation of the reconfigured LoRA weighting parameter based on the determined one or more LoRA weighting values comprises: determining the reconfigured LoRA weighting parameter by calculating a sum of the one or more LoRA weighting values, wherein each of the one or more LoRA weighting values is a product of at least one optimized hyperparameter and a prestored parameter set obtained for the corresponding adjacent map tile of the one or more adjacent map tiles. Method according to claim 1, wherein the road generation network model (212) is associated with a plurality of LoRA layers, wherein each LoRA layer of the plurality of LoRA layers is associated with a corresponding LoRA weighting parameter and one or more optimized parameters. Device (202) for generating a map representation, the device comprising: a memory (204); and at least one processor (206) coupled to the memory (204), the at least one processor (206) being configured to: obtain a prestored parameter set of a map tile corresponding to a current location of a vehicle (104) from one or more servers (208) associated with the vehicle (104), the prestored parameter set comprising one or more optimized hyperparameters and a low-rank adaptation (LoRA) weighting parameter; reconfigure the LoRA weighting parameter for the map tile based on the one or more optimized hyperparameters, the at least one processor (206) being configured to: identify one or more neighboring map tiles for the map tile;to determine one or more LoRA weight values corresponding to each of the one or more adjacent map tiles; and to calculate the reconfigured LoRA weight parameter based on the determined one or more LoRA weight values; and to update a road generation network model (212) associated with the vehicle (104) based on the reconfigured LoRA weight parameter to generate the road representation. Device according to claim 6, wherein the prestored parameter set for the map tile is generated by training a basic road generation network model, wherein the basic road generation network model is associated with a basic weighting parameter, and wherein for training the basic road generation network model the at least one processor (206) is further configured to determine the LoRA weighting parameter by decomposing the basic weighting parameter into a low-dimensional representation.Device according to claim 7, wherein the at least one processor (206) is configured to identify the one or more adjacent map tiles for the map tile in order to: obtain a subordinate map tile and a superior map tile corresponding to the current location of the vehicle (104), wherein the subordinate map tile is a map tile of a predefined area corresponding to a location, and wherein the superior map tile comprises one or more subordinate map tiles; obtain one or more adjacent map tiles corresponding to the subordinate map tile, wherein an adjacent map tile of the one or more adjacent map tiles shares a tile boundary with the subordinate map tile;and to obtain the one or more adjacent map tiles, wherein the one or more adjacent map tiles are parent map tiles corresponding to the one or more adjacent map tiles and the at least one child map tile. Device according to claim 6, wherein the at least one processor (206) for calculating the reconfigured LoRA weighting parameter based on the determined one or more LoRA weighting values is further configured to determine the reconfigured LoRA weighting parameter by calculating a sum of the one or more LoRA weighting values, wherein each of the one or more LoRA weighting values is a product of at least one optimized hyperparameter and a prestored parameter set obtained for the corresponding adjacent map tile of the one or more adjacent map tiles. Device according to claim 6, wherein the road generation network model (212) is associated with a plurality of LoRA layers, each LoRA layer of the plurality of LoRA layers being associated with a corresponding LoRA weighting parameter and one or more optimized parameters.