Lane attribute creation system and creation method, and computer program product

By combining map element encoders and large language models with multimodal data to generate lane attribute data, the problem of high-definition maps being unable to identify detailed driving rules at the traffic rule layer is solved, enabling the autonomous driving system to drive safely in complex traffic environments.

WO2026137952A1PCT designated stage Publication Date: 2026-07-02BEIJING AUTONAVI YUNMAP TECH CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
BEIJING AUTONAVI YUNMAP TECH CO LTD
Filing Date
2025-09-02
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Existing high-definition maps only describe lane direction markings at the traffic rule layer, and cannot identify and match detailed driving rules for traffic signs other than direction markings, which limits the safety of autonomous driving systems in complex traffic environments.

Method used

The vectorized map is processed by a map element encoder, and lane attribute data is generated by combining a large language model, including driving rules and the correspondence between driving rules and lanes. Multimodal data such as image and text encoding results are used to achieve accurate matching of lane information.

Benefits of technology

It provides detailed and accurate traffic rule-based data to support safe decision-making by autonomous driving systems in complex traffic environments, ensuring the correct matching of driving rules and lanes.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025118360_02072026_PF_FP_ABST
    Figure CN2025118360_02072026_PF_FP_ABST
Patent Text Reader

Abstract

Disclosed in the present disclosure are a lane attribute creation system and creation method, and a computer program product. The lane attribute creation system comprises: a map element encoder and a large language model, wherein an output layer of the map element encoder is connected to an input layer of the large language model, the map element encoder is used for processing a vectorized map to output a vector encoding result, the vectorized map uses vector features to represent map elements, and the map elements include lanes; and the input layer of the large language model further receives at least one of an image encoding result and a text encoding result, and the large language model is configured to generate lane attribute data on the basis of the vector encoding result and at least one of the image encoding result and the text encoding result. The solution in the embodiments of the present disclosure can process data of multiple modalities including a vector modality, so as to match travel rules to corresponding lanes, thereby providing accurate and detailed lane attribute data for constructing a traffic rule layer.
Need to check novelty before this filing date? Find Prior Art

Description

Lane attribute creation system, creation method and computer program product

[0001] This disclosure claims priority to Chinese Patent Application No. 202411960752.2, filed on December 27, 2024, entitled "Lane Attribute Generation System, Generation Method and Computer Program Product", the entire contents of which are incorporated herein by reference. Technical Field

[0002] This disclosure generally relates to the field of map technology. More specifically, this disclosure relates to a lane attribute creation system, creation method, and computer program product. Background Technology

[0003] With the rapid development of technologies such as artificial intelligence and 5G (5th Generation Mobile Communication Technology), autonomous driving and intelligent transportation systems have progressed to the application stage. The rapid development of these systems places higher demands on the reliability and accuracy of navigation data, requiring precise navigation data to enable vehicle perception, localization, path planning, and decision-making control. High-definition (HD) maps, with their detailed representation of road elements, have become a crucial component supporting these systems. The geometric layer of HD maps provides information such as lane dividers and lane centerlines, the connectivity layer provides lane relationships to facilitate path planning, and the traffic rule layer provides lane-related rule information for decision-making and control.

[0004] However, while HD maps currently used in autonomous driving systems perform well at the geometry and connectivity layers, they suffer from the following shortcomings at the traffic rule layer: First, the traffic rule layer they construct only describes lane direction markings, such as straight lanes, left-turn lanes, and / or right-turn lanes. In reality, there are many more types of traffic signs and their corresponding driving rules in vehicle driving scenarios, such as bus lanes and / or speed-limited areas. Second, even if some related technologies can identify traffic signs other than direction markings, they can only identify the type of traffic sign, but cannot form the detailed rules required for autonomous driving based on them and match them to the corresponding lanes as the basis for autonomous driving decisions.

[0005] In view of this, there is an urgent need to provide a lane attribute generation scheme to meet the map data requirements of autonomous driving scenarios. Summary of the Invention

[0006] In order to at least address one or more of the technical issues mentioned above, this disclosure proposes a lane attribute creation scheme in several aspects.

[0007] In a first aspect, this disclosure provides a lane attribute generation system comprising: a map element encoder and a large language model; the output layer of the map element encoder is connected to the input layer of the large language model, the map element encoder is used to process a vectorized map to output a vector encoding result, the vectorized map uses vector features to represent map elements, the map elements including lanes; the input layer of the large language model also receives at least one of image encoding results and text encoding results, the large language model is configured to: generate lane attribute data based on at least one of image encoding results and text encoding results and vector encoding results, the lane attribute data including: vehicle driving rules and the correspondence between driving rules and lanes.

[0008] In a second aspect, this disclosure provides a lane attribute creation method applied to a lane attribute creation system, the lane attribute creation system comprising: a map element encoder and a large language model, the output layer of the map element encoder being connected to the input layer of the large language model; the lane attribute creation method comprising: receiving multimodal data, the multimodal data including at least one of image encoding results and text encoding results, and a vectorized map; processing the vectorized map using the map element encoder to obtain vector encoding results, wherein the vectorized map uses vector features to represent map elements, the map elements including lanes; and processing at least one of the image encoding results and text encoding results, and the vector encoding results using the large language model to generate lane attribute data, wherein the lane attribute data includes vehicle driving rules and the correspondence between driving rules and lanes.

[0009] In a third aspect, this disclosure provides a computer program product including a computer program that, when executed by a processor, implements the method steps of the second aspect.

[0010] In a fourth aspect, this disclosure provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the method steps of the second aspect.

[0011] Using the lane attribute creation system provided above, this embodiment encodes the vector modal map data provided by the vectorized map through a map element encoder, such as encoding the vector features of lanes, thereby obtaining vector encoding results. The vector encoding results and the encoding results of other modal data are processed by a large language model with excellent performance in acquiring implicit knowledge. The large language model can process at least one of the image modal and text modal data as well as the vector modal data, combining multiple modal data to complete the creation of lane attribute data. The lane attribute data not only provides the driving rules of vehicles, but also provides the correspondence between the driving rules and specific lanes. It can be used to construct an accurate and reliable traffic rule layer in HD maps. High-definition maps with accurate traffic rule layers can be provided to autonomous driving and intelligent transportation systems, enabling autonomous driving and intelligent transportation systems to make driving decisions that conform to the driving rules of the current lane based on the lane attribute data. Attached Figure Description

[0012] The above and other objects, features, and advantages of exemplary embodiments of this disclosure will become readily apparent upon reading the following detailed description with reference to the accompanying drawings. In the drawings, several embodiments of this disclosure are illustrated by way of example and not limitation, and like or corresponding reference numerals denote like or corresponding parts, wherein:

[0013] Figure 1 shows a schematic diagram of the composition structure of an existing high-definition map;

[0014] Figure 2 shows an exemplary structural diagram of a lane attribute creation system according to some embodiments of this disclosure;

[0015] Figure 3 shows an exemplary structural diagram of a lane attribute creation system according to other embodiments of this disclosure;

[0016] Figure 4 illustrates an exemplary workflow diagram of a lane attribute creation system according to other embodiments of this disclosure;

[0017] Figure 5 shows an exemplary structural diagram of a lane attribute creation system according to some embodiments of this disclosure;

[0018] Figure 6 shows an exemplary structural diagram of a map element encoder according to some embodiments of this disclosure;

[0019] Figure 7 illustrates an exemplary workflow diagram of a map element encoder according to other embodiments of this disclosure;

[0020] Figure 8 shows an exemplary flowchart of a lane attribute creation method according to some embodiments of this disclosure;

[0021] Figure 9 shows an exemplary flowchart of a method for obtaining vector encoding results according to some embodiments of this disclosure;

[0022] Figure 10 shows an exemplary flowchart of a method for generating lane attribute data according to some embodiments of this disclosure;

[0023] Figure 11 shows an exemplary flowchart of a training method for RuleVLM according to some embodiments of this disclosure;

[0024] Figure 12 shows an exemplary structural block diagram of an electronic device according to an embodiment of this disclosure. Detailed Implementation

[0025] The technical solutions in the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this disclosure, not all of them. Based on the embodiments in this disclosure, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this disclosure.

[0026] It should be understood that the terms “comprising” and “including” used in this disclosure and claims indicate the presence of the described features, integrals, steps, operations, elements and / or components, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or collections thereof.

[0027] It should also be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of this disclosure. As used in this disclosure and claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used in this disclosure and claims refers to any combination and all possible combinations of one or more of the associated listed items, and includes such combinations.

[0028] As used in this specification and claims, the term "if" may be interpreted, depending on the context, as "when," "once," "in response to determination," or "in response to detection." Similarly, the phrase "if determined" or "if [described condition or event] is detected" may be interpreted, depending on the context, as "once determined," "in response to determination," "once [described condition or event] is detected," or "in response to detection of [described condition or event]."

[0029] The specific embodiments disclosed herein will now be described in detail with reference to the accompanying drawings.

[0030] Exemplary application scenarios

[0031] With the rapid development of autonomous vehicles and intelligent transportation systems, the demand for accurate and reliable navigation data is becoming increasingly urgent. High-definition (HD) maps, with their detailed representation of road elements, have become an indispensable supporting component for these systems. HD maps can be broken down into three core layers: the geometry layer, the connectivity layer, and the traffic rules layer. Figure 1 shows a schematic diagram of the composition structure of an existing HD map. As shown in Figure 1, the geometry layer provides accurate vector data, such as lane dividers and lane centerlines; the connectivity layer clarifies the relationships between lanes to assist in route planning; and the traffic rules layer contains lane-related rule information, such as high-occupancy vehicle lanes, bus lanes, and speed-limited zones, providing information support for compliance of driving behavior.

[0032] Currently, HD maps used in autonomous driving systems have the following shortcomings in their traffic rule layer: First, the traffic rule layer they construct only describes lane direction markings, such as straight lanes, left-turn lanes, and / or right-turn lanes. However, in reality, there are many more types of traffic signs and their corresponding driving rules in actual driving scenarios, such as high-occupancy lanes, bus lanes, and / or speed-limited zones. Second, even if some related technologies can identify traffic signs other than direction markings, they can only identify the type of traffic sign, but cannot form the detailed rules required for autonomous driving and match them to the corresponding lanes as a basis for autonomous driving decisions. This means that although maps can provide static structural information about roads, they lack the ability to dynamically update and interpret traffic rules, cannot provide a structured description consistent with HD map standards, and cannot support comprehensive autonomous driving applications, thus restricting the driving safety of autonomous vehicles in complex and changing traffic environments.

[0033] Exemplary application scheme

[0034] In view of this, the present disclosure provides a lane attribute generation scheme, which uses a map element encoder to process the vector modal data in the vectorized map and provides the processed vector encoding results to a large language model, so that the large language model can generate lane attribute data based on multiple modal data, including vector modal data, to complete the establishment of the traffic rule layer.

[0035] Figure 2 illustrates an exemplary structural diagram of a lane attribute generation system according to some embodiments of this disclosure. As shown in Figure 2, the lane attribute generation system includes a map element encoder and a large language model. The output layer of the map element encoder is connected to the input layer of the large language model. The map element encoder is used to process a vectorized map to output vector encoding results. In addition to the vector encoding results, the input layer of the large language model also receives at least one of image encoding results and text encoding results. The large language model is configured to generate lane attribute data based on at least one of the image encoding results and text encoding results, as well as the vector encoding results. In this embodiment of the disclosure, the lane attribute data includes: vehicle driving rules and the correspondence between driving rules and lanes.

[0036] Based on the above description of the input to the large language model, it can be understood that the large language model shown in this disclosure embodiment can process vector modal data and image modal data, as well as vector modal data and text modal data, and can also process three modal data: vector modal data, image modal data, and text modal data. This allows it to output the detailed rules required for autonomous driving, namely the lane attribute data in the previous embodiments. The lane attribute data not only provides driving rules but also the lane corresponding to a specific driving rule. For example, if the lane attribute data describes a driving rule as "Bus lanes are not allowed to be used by other motor vehicles between 7:30 AM and 8:30 AM," and also describes that the lane corresponding to this driving rule is the rightmost lane, then the autonomous driving system can obtain the following information from the lane attribute data: Between 7:30 AM and 8:30 AM, the rightmost lane on the map is not allowed to be used by other motor vehicles.

[0037] Large Language Models (LLMs) are complex machine learning models trained to process and generate natural language. They are typically based on deep learning techniques and are capable of understanding and generating text data. Due to their powerful language understanding capabilities, LLMs can comprehend the context of input data, capturing not only the relationships within the input data but also understanding its meaning within a specific context. Furthermore, this disclosure embodiment may employ Multimodal Large Language Models (MLLMs) as the large language model. MLLMs possess powerful performance in processing multimodal data and acquiring implicit knowledge of modality alignment. They combine the natural language processing capabilities of large language models with their ability to understand and generate multimodal data, enabling the establishment of connections between different modalities. This allows them to perform tasks requiring the understanding and generation of content across multiple data types, such as analyzing text descriptions and / or images of traffic signs to identify corresponding driving rules. As an example, the large language model in this disclosure embodiment can be Qwen-VL (Qwen Large Vision Language Model), which is an open-source large-scale visual language model that can take images and text as input and output text, supporting analysis tasks in various scenarios such as knowledge question answering, image title generation, image question answering, document question answering, and fine-grained visual localization.

[0038] Vectorized maps are a type of map data that utilizes vector features for map creation and element display. They construct maps by representing multiple map elements as vectors. Map elements can include, but are not limited to, lanes and sidewalks. For a lane, this can be further subdivided into sub-elements such as lane centerlines and lane dividers. Furthermore, a wide range of map elements can be abstracted into a unified point sequence representation. Based on geometric features, map elements can be divided into three main categories: linear elements, discrete elements, and region elements. Linear elements mainly include lane dividers and lane centerlines. By setting sampling points at fixed intervals on these linear elements, a corresponding point sequence representation can be obtained. Discrete elements, for example, can include all regularly shaped elements, such as speed bumps and arrows indicating lane directions. These discrete elements can be represented by the four corner points of their bounding boxes; the order of these corner points, or the order of the corner point sequence, reflects the element's direction. Region elements, for example, include closed-shaped regions, such as sidewalks and detour areas. By setting sampling points at fixed intervals on the boundaries of these regions, the points on the boundaries can be converted into an ordered point sequence. By using a unified point sequence representation, accurate geometric representations of various map elements can be provided on vectorized maps.

[0039] In some embodiments disclosed herein, the vectorized map provides vector modal data for the model. The map element encoder can use the vectorized map as its input to extract lane-related information, forming a vector encoding result containing lane information, which is then provided to the large language model. The large language model can extract lane information features based on the vector encoding result. These lane information features can provide the large language model with information such as lane location, number of lanes, and lane distribution. Without acquiring lane information features and only acquiring driving rule features, the large language model can only know that there is a driving rule that vehicles must follow within the area represented by the map, but it cannot specify which lane the vehicle must follow. In this case, the constructed traffic rule layer will lead to anomalies. For ease of understanding, let's continue with the driving rule "Bus lanes are not allowed to be used by other motor vehicles between 7:30 AM and 8:30 AM": If this driving rule is matched to the entire area represented by the map, the autonomous driving system will consider that all vehicles are prohibited from passing between 7:30 AM and 8:30 AM. In this case, the autonomous driving system cannot generate a feasible driving route. If this driving rule is randomly matched to a lane, it may result in the rule not being matched to the correct lane. In this case, the autonomous driving system might generate a route that allows travel in the bus lane between 7:30 AM and 8:30 AM, leading to a traffic violation. In other words, the map element encoder is a network structure in the lane attribute creation system that processes vector modal data to extract and integrate lane-related information from the vectorized map.

[0040] As described above, the map element encoder provides lane information features for the large language model to generate lane attribute data. To generate lane attribute data, the large language model also needs to acquire driving rule features. In this disclosed embodiment, driving rule features can be obtained based on image modal data and / or text modal data. In some embodiments, image modal data can be encoded into image encoding results by a visual encoder, and text modal data can be encoded into text encoding results by a text encoder. The large language model can then extract driving rule features based on the image encoding results and / or text encoding results. It should be noted that there are various existing schemes for extracting driving rule features from image encoding results and / or text encoding results, such as the visual language model Qwen-VL, which will not be elaborated upon here.

[0041] As can be further seen from the above, when the large language model performs the action of generating lane attribute data based on at least one of image encoding results and text encoding results and vector encoding results, the large language model is also configured to: extract driving rule features based on image encoding results and / or text encoding results; extract lane information features based on vector encoding results; and match driving rule features with lane information features to generate lane attribute data.

[0042] Compared to related technologies, the lane attribute generation system disclosed in this embodiment introduces a map element encoder into a large language model to process vectorized maps. This allows the large language model to not only generate individual driving rule features based on image modal data and / or text modal data, but also to combine vector modal data to match driving rule features to lane information features, thereby mapping driving rules to specific lanes. This results in complete, accurate, and detailed lane attribute data, facilitating the construction of the traffic rule layer for HD maps supporting fully autonomous driving applications. As an example, the large language model can match driving rule features to lane information features using association head prediction technology.

[0043] Association head prediction is a technique in computer vision and machine learning, typically used in object detection or tracking models. It involves using a network layer with an association head to predict features related to the target. The association head usually consists of a fully connected or convolutional layer that receives feature input from other parts of the model and outputs target-related features. In other words, a large language model can use the association head to find vector features representing lane information that are related to driving rule features. Or, a large language model can use the association head to find driving rule features related to vector features, thus associating driving rule features with vector features representing lane information. This results in lane attribute data, including driving rules and their correspondence with lanes on a vectorized map.

[0044] For ease of description, we will refer to the model combining map element encoder and large language model provided in the previous embodiments as RuleVLM. RuleVLM aims to process input data containing at least two modalities, including vector modalities, and generate driving rules corresponding to specific lanes. The lane attribute creation system can be understood as a system that uses RuleVLM to construct the traffic rule layer in an HD map.

[0045] RuleVLM can process vector modal data and image modal data, as well as vector modal data and text modal data. It can also handle three modalities: vector modal data, image modal data, and text modal data. For ease of description, the following example of RuleVLM handling three modalities (image, text, and vector) will be used to further illustrate the structure of the lane attribute creation system.

[0046] Figure 3 illustrates an exemplary structural diagram of a lane attribute creation system according to other embodiments of this disclosure, and Figure 4 illustrates an exemplary workflow diagram of a lane attribute creation system according to other embodiments of this disclosure. As shown in Figure 3, the lane attribute creation system further includes a visual encoder and a text encoder. The output layer of the visual encoder is connected to the input layer of a large language model, and the output layer of the text encoder is connected to the input layer of the large language model. The visual encoder is used to process image modal data to output image encoding results, and the text encoder is used to process text modal data to output text encoding results.

[0047] As shown in Figure 4, the map element encoder, visual encoder, and text encoder process the vector modal data, image modal data, and text modal data respectively for the large language model to understand. This encodes the three modalities into a format that the large language model can understand, enabling it to extract corresponding features from the encoding results, such as driving rule features and lane information features. Specifically, the large language model extracts driving rule features from the encoding results output by the visual encoder and text encoder, and extracts lane information features from the encoding results output by the map element encoder.

[0048] To enable the large language model to understand multiple modalities of data, as shown in Figures 3 and 4, one or more of the map element encoder, visual encoder, and text encoder can be connected to the large language model via adapters to align one or more of the vector encoding results, image encoding results, and text encoding results with the embedding space of the large language model. Further, in some embodiments, the adapter can be a linear layer, also known as a fully connected layer, which is a basic layer in neural networks. Its main function is to perform a linear transformation on the input data, converting the encoder output into a form suitable for processing by the large language model. Furthermore, at the start of training, the adapter weights are typically initialized identically to ensure that the initial performance of the model is similar to the original model. When fine-tuning on the target task, the parameters of the map element encoder and its output-side adapter remain trainable. This ensures that only the weights of the adapter and map element encoder are adjusted, while the parameters of the visual encoder, text encoder, and their output-side adapter remain unchanged, which is beneficial for efficient model training.

[0049] In large language models, adapters enable efficient parameter fine-tuning. By adding adapters to specific locations within the large language model, it allows for rapid adaptation to new tasks. Compared to traditional full-parameter fine-tuning, adapters only require training a small number of new parameters, preserving most of the pre-trained model's parameters. Because it eliminates the need to adjust the entire model's parameters, large language models can quickly adapt to new tasks without retraining from scratch, significantly improving analysis efficiency.

[0050] As model size increases, training helps handle model complexity, ensuring the model can learn and generalize effectively, thereby improving model performance and making it more accurate and effective on specific tasks. In some embodiments, large language models can be obtained through LoRA-based training. LoRA stands for Low-Rank Adaptation of LLMs, which fine-tunes by optimizing the low-rank factorization matrix of the pre-trained model's weight matrix while keeping the pre-trained weights unchanged, significantly reducing the number of trainable parameters in downstream tasks. The core idea of ​​LoRA-based training is to inject trainable low-rank factorization matrices into each layer of the Transformer architecture while freezing the pre-trained model weights, thereby reducing the number of training parameters and memory requirements.

[0051] It should be noted that the lane attribute creation system shown in Figures 3 and 4 is equipped with a RuleVLM that processes three modalities (image, text, and vector), and its structure is only an example. In practical applications, the RuleVLM equipped in the lane attribute creation system can simplify one of the visual encoder and text encoder, thereby forming a RuleVLM that processes bimodal data, including vector modal data. In other words, in the lane attribute creation system disclosed herein, the visual encoder and text encoder can be selectively set in the RuleVLM, or they can be used in combination.

[0052] The large language model in RuleVLM outputs driving rules in text format; in other words, the lane attribute data output by the large language model in RuleVLM is in text format. To support comprehensive autonomous driving applications, lane attribute data needs to be converted into a more structured description consistent with HD map standards. Based on this, some embodiments of this disclosure also provide a lane attribute creation system as shown in Figure 5. Figure 5 illustrates an exemplary structural diagram of the lane attribute creation system of some embodiments of this disclosure. As shown in Figures 4 and 5, the RuleVLM it incorporates also includes a JSON (JavaScript Object Notation) decoder. The input layer of the JSON decoder is connected to the output layer of the large language model to restore the lane attribute data from text format to JSON format. JSON format data is also known as formatted data. JSON is a lightweight data exchange format that is easy for users to read and write, and also easy for machines to parse and generate. JSON format data consists of a series of key-value pairs {key: value}, where the key is a string, and the value can be a string, number, boolean value, array, object, etc. JSON format data can be integrated into HD maps for further applications.

[0053] The above embodiments illustrate the overall network structure of the lane attribute creation system. As described in the embodiments, the core of the lane attribute creation system for vector modal data processing lies in the map element encoder. The network structure and working principle of the map element encoder in the lane attribute creation system will be further explained below.

[0054] Figure 6 illustrates an exemplary structural diagram of a map element encoder according to some embodiments of this disclosure. The map element encoder (MEE) is a functional module in the lane attribute creation system that processes the vector modal data provided by the vectorized map; its structure is similar to a language model. In this embodiment, the vectorized map includes at least vector features for describing lanes, and further may include vector features for describing lane centerlines and vector features for describing lane dividers. In practical applications, as described in the preceding embodiments, map elements may also include map elements such as speed bumps, arrows indicating lane directions, sidewalks, and / or detour areas, which can also be represented in vector form in the vectorized map.

[0055] As shown in Figure 6, the map element encoder for receiving and processing vectorized maps includes an embedding layer, a first vector encoding block, and a second vector encoding block, which are connected in sequence.

[0056] The embedding layer is used to receive vectorized maps and extract and transform the vector features therein. These vector features include at least lane vector features, but may also include vector features of other map elements. For simplicity, the vector features mentioned below refer to vector features used to describe map elements, including lane vector features.

[0057] In this embodiment, the vectorized map is the input to the embedding layer and also the input to the map element encoding block. As described in the previous embodiment, in the vectorized map, linear elements such as lane dividers and lane centerlines can be represented as a point sequence by setting sampling points at fixed intervals. Similarly, area elements such as sidewalks can be converted into an ordered point sequence by setting sampling points at fixed intervals on the boundaries of the area. Thus, it can be seen that the vector features in the vectorized map can be represented as a sequence of points.

[0058] Building upon this, the embedding layer can perform embedding transformations on points on each vector feature in the vectorized map to form point embeddings. Since each point on the vector feature contains various types of information, such as the point's coordinates, the vector feature to which the point belongs, and the relative positions between points, the process of performing embedding transformations on the vector feature to obtain point embeddings can be divided into multiple processes of extracting different information and forming embedded representations.

[0059] Furthermore, the embedding layer performs embedding transformations on points on each vector feature in the vectorized map, generating vector embeddings, type embeddings, instance embeddings, and location embeddings describing each vector feature. These vector embeddings, type embeddings, instance embeddings, and location embeddings can be understood as subsets of point embeddings; different embeddings represent different types of information. Specifically, vector embeddings represent the position of points on vector features, type embeddings represent the map element represented by the vector feature containing the point, instance embeddings indicate the vector feature containing the point, and location embeddings represent the relative positions between points. To facilitate understanding of the difference between type embeddings and instance embeddings, they are further explained below: In this embodiment, type embeddings are used to distinguish different vector types. For example, the type embeddings of points on vector features representing lane centerlines and points on vector features representing lane dividers are different. Instance embeddings, on the other hand, are used to distinguish different vector instances. In other words, instance embeddings distinguish whether different points belong to the same vector feature. Points on the same vector feature have the same instance embedding, while points on different vector features have different instance embeddings.

[0060] Since the lane attribute creation system provided in this disclosure embodiment aims to match driving rules to corresponding lanes, the lane attribute creation system needs to have the ability to distinguish between different lanes. In order to facilitate the large language model in the lane attribute creation system to recognize different map elements such as lane dividers and lane centerlines, so as to complete the distinction between different lanes, this embodiment introduces a labeled embedding, which is used as an index for vector features in the vectorized map to help the large language model understand the map elements and information represented by each vector feature.

[0061] Before assigning values ​​to the indices of the vector features, the embedding layer needs to set blank label embeddings at the beginning of each vector feature in the vectorized map. Label embeddings, vector embeddings, type embeddings, instance embeddings, and position embeddings are aggregated to form the embedded representation of the vector features, which serves as the input to the first vector encoding block. Here, the blank label embeddings can be understood as placeholders. Firstly, they reserve space for the subsequent assignment of values ​​to the first and second vector encoding blocks. Secondly, for vector features with varying point sequence representation lengths, placeholders can be used to unify the input length of the first vector encoding block.

[0062] The first and second vector encoding blocks are functional blocks in MEE used to encode vector features. Further, the first vector encoding block includes M concatenated first Transformer layers, where M is a positive integer. Figure 7 illustrates an exemplary workflow diagram of a map element encoder according to other embodiments of this disclosure. As shown in Figure 7, the first Transformer layer includes an Intra-Instance Attention network and a Feed-Forward Network (FFN) connected in sequence. Like the traditional Transformer architecture, the first Transformer layer includes multiple stacked encoder layers. Each encoder layer includes two main parts: a self-attention mechanism and a feed-forward network, and the output of each encoder layer undergoes residual connections and layer normalization. Unlike the traditional Transformer architecture, in this embodiment, the self-attention mechanism of the first Transformer layer is replaced by an Intra-Instance Attention mechanism to capture the interactions between points within the vector features. Since the Transformer architecture has a multi-layered stacked structure, for simplicity, Figure 7 only shows a simplified structure of one first Transformer layer.

[0063] In this embodiment, the first Transformer layer can learn and extract information from the embedded representation of the input vector features, including vector embedding, type embedding, instance embedding and location embedding. It captures the interaction between points within the vector features through an intra-instance attention mechanism, understands the semantic information of each point on the vector features, such as the map element represented by the vector feature where the point is located and the coordinate information of the map element, and thus assigns values ​​to the blank label embedding.

[0064] Similar to the first vector encoding block, the second vector encoding block also adopts the Transformer architecture. The difference is that the second vector encoding block uses an inter-instance attention mechanism instead of an intra-instance attention mechanism. Specifically, the second vector encoding block includes N concatenated second Transformer layers, where N is a positive integer. The second Transformer layers use an inter-instance attention mechanism to capture the interaction between different vector features. The second Transformer layers can also learn and extract information from the embedded representations of the input vector features, including vector embeddings, type embeddings, instance embeddings, and location embeddings, and understand semantic information that is different from that of the first Transformer layer, such as vector features representing different map elements and their relative positions. This allows for a secondary assignment of the labeled embeddings, forming labeled embeddings that correspond one-to-one with the vector features and using them as indices for each vector feature to obtain a vector encoding result containing lane information. Similar to the first Transformer layer, since the Transformer architecture has a multi-layer stacked structure, for the sake of simplicity, Figure 7 only shows a simplified structure of one of the second Transformer layers, which includes an Inter-Instance Attention network and a Feed-Forward Network (FFN) connected in sequence.

[0065] Based on the above description, through the processing of the first and second vector encoding blocks, a corresponding label embedding can be generated for the vector features of each lane in the vectorized map. In other words, the label embedding can be used as an index for the vector features of the lane. The process of assigning values ​​to the label embedding or generating the vector feature index is also the process of mapping the acquired lane information features to the label embedding. The lane information obtained here is based on vector embedding, type embedding, instance embedding, and location embedding. Vector embedding, type embedding, instance embedding, and location embedding provide different lane-related information. For example, the vector embedding of each point on the vector features of the lane centerline can provide the coordinate information of the lane centerline. Therefore, MEE can output vector encoding results containing lane information. After obtaining the vector encoding results containing lane information, the large language model can identify unique vector features by recognizing the index. Furthermore, since the index is assigned by the first and second vector encoding blocks based on vector embedding, type embedding, instance embedding, and position embedding, it can convey rich information about the vector features matching the index to the large language model. This facilitates the large language model in extracting lane information features and distinguishing different lanes, as well as performing feature association analysis through feature association techniques such as association head prediction, in order to match driving rules to the corresponding lanes.

[0066] In this embodiment, the lane attribute creation system introduces MEE to encode vector features in the vectorized map. The encoded token embedding is used as an index for the vector features, providing reference information for the large language model to distinguish different lanes and identify map elements such as lane center lines. This helps the model to accurately match driving rules with lanes by combining data from multiple modalities such as images, text, and vectors. The resulting lane attribute data is beneficial for the construction of the traffic rule layer in the HD map.

[0067] Based on the lane attribute creation system provided in any of the preceding embodiments, some embodiments of this disclosure also implement a lane attribute creation method. Figure 8 shows an exemplary flowchart of a lane attribute creation method 800 according to some embodiments of this disclosure. As shown in Figure 8, the lane attribute creation method includes:

[0068] In step S801, multimodal data is received;

[0069] In step S802, the vectorized map is processed using a map element encoder to obtain the vector encoding result;

[0070] In step S803, at least one of the image encoding results and text encoding results, as well as the vector encoding results, are processed using a large language model to generate lane attribute data.

[0071] In step S801 of this embodiment, the multimodal data includes a vectorized map, which provides vector modal data. Specifically, the vectorized map provides vector features for describing lanes, such as vector features for describing lane center lines and lane dividers. In addition, the vectorized map may also provide vector features of other map elements. Furthermore, the multimodal data also includes image encoding results and at least one of the following: image encoding results. In other words, the multimodality in the multimodal data refers to vector modality and image modality, or it can refer to vector modality and text modality, or it can refer to all three: vector modality, image modality, and text modality.

[0072] In step S802 of this embodiment, the internal structure and working principle of the map element encoder can be found in the description of any of the previous embodiments, and will not be elaborated here.

[0073] In step S803 of this embodiment, the large language model used is a pre-trained model capable of understanding multiple modalities of data and analyzing them together to output lane attribute data. This lane attribute data includes vehicle driving rules and the correspondence between driving rules and lanes. As an example, the trained large language model can be a pre-trained multimodal large language model. Taking a multimodal large language model that processes images, text, and vectors as an example, after training on a dataset containing text, images, and vectors, the model can extract the required information from different modalities of data and integrate and output the extracted information.

[0074] In some embodiments, step S803 may be performed as follows: processing image encoding results and / or text encoding results using a large language model to extract driving rule features; processing vector encoding results using a large language model to extract lane information features; and matching driving rule features with lane information features using a large language model to generate lane attribute data.

[0075] In the above process, the trained large language model extracts driving rule features from image encoding results and / or text encoding results, and extracts lane information features from vector encoding results, and matches the two types of extracted features. The trained large language model can also establish relationships between the extracted different features, that is, establish relationships between driving rule features and lane information features in this embodiment, thereby generating lane attribute data.

[0076] As an example, large language models can establish relationships between features through association head prediction techniques. That is, they can associate driving rule features with lane information features to form lane attribute data.

[0077] Association head prediction is a technique in computer vision and machine learning, typically used in object detection or tracking models. It involves using a network layer with an association head to predict features related to the target. The association head usually consists of a fully connected or convolutional layer that receives feature input from other parts of the model and outputs target-related features. In other words, a large language model can use an association head to find vector features related to driving rules, or vice versa. This association head links driving rule features with vector features representing lane information, resulting in lane attribute data that includes driving rules and their correspondence with lanes on a vectorized map.

[0078] Based on the lane attribute creation method shown in the preceding embodiments, Figure 9 illustrates an exemplary flowchart of a vector encoding result acquisition method 900 of some embodiments of this disclosure. It can be understood that the lane attribute data acquisition method is a specific implementation of step S802 described above; therefore, the features described above in conjunction with Figure 8 can be similarly applied here. In this embodiment, the structure of the map element encoder is shown in Figure 6, including: an embedding layer, a first vector encoding block, and a second vector encoding block connected in sequence. As shown in Figure 9, the method includes:

[0079] In step S901, the vector features of lanes in the vectorized map are converted into an embedded representation using an embedding layer;

[0080] In step S902, the embedded representation of the vector features is encoded using the first vector coding block and the second vector coding block to form a vector coding result containing lane information.

[0081] In this embodiment, after receiving the vectorized map, the embedding layer extracts vector features from the vectorized map. Then, it performs embedding transformation on the points of the vector features of each lane in the vectorized map to obtain vector embedding, type embedding, instance embedding, and location embedding. Then, it adds a blank marker embedding at the beginning of the vector features of each lane. After that, it aggregates the marker embedding, vector embedding, type embedding, instance embedding, and location embedding to form the embedded representation of the vector features of the lane and outputs it. The embedded representation of the vector features of the lane output by the embedding layer will enter the first vector coding block and the second vector coding block. The first vector coding block and the second vector coding block encode the embedded representation of the vector features of the lane to form a vector coding result containing lane information.

[0082] In this embodiment, a vectorized map is a type of map data that utilizes vector features for map drawing and map element display. It constructs the map by representing multiple map elements, including lane center lines and lane dividers, as vectors. Based on the previous description of the lane attribute creation system, map elements can be abstracted into a unified point sequence representation. According to geometric features, map elements can be divided into three main categories: linear elements, discrete elements, and region elements. Taking linear elements as an example, they mainly include lane dividers and lane center lines. By setting sampling points at fixed intervals on these linear elements, corresponding point sequence representations can be obtained. Through this unified point sequence representation, accurate geometric representations can be provided for various map elements on the vectorized map. Furthermore, since each point on a vector feature contains various types of information, such as the point's coordinates, the vector feature to which the point belongs, and the relative position between points, the process of embedding and transforming points on vector features can be divided into multiple processes that form different embedding representations based on different information. Thus, by using the embedding layer in the map element encoder to perform embedding and transforming on points on each vector feature in the vectorized map, we can obtain vector embedding, type embedding, instance embedding, and location embedding. The specific meaning and differences of each type of embedding representation have been described in detail in the previous embodiments and will not be repeated here.

[0083] The introduction of marker embeddings facilitates the differentiation of different lanes by the large language model in the lane attribute generation system. In the embedding layer, blank marker embeddings are added as placeholders at the beginning of each vector feature. This serves two purposes: firstly, it reserves space for the subsequent assignment of values ​​to the first and second vector encoding blocks; secondly, for vector features with varying point sequence representation lengths, placeholders can be used to unify the input length of the first vector encoding block. Marker embeddings, vector embeddings, type embeddings, instance embeddings, and location embeddings are aggregated and used as input to the first vector encoding block in the map element encoder.

[0084] In this embodiment, the first vector encoding block and the second vector encoding block adopt a Transformer architecture. The first vector encoding block includes M concatenated first Transformer layers, where M is a positive integer, and the first vector encoding block adopts an intra-instance attention mechanism. The second vector encoding block includes N concatenated second Transformer layers, where N is a positive integer, and the second vector encoding block adopts an inter-instance attention mechanism. When executing step S902, the lane information features are obtained based on vector embedding, type embedding, instance embedding, and location embedding by using the M concatenated first Transformer layers and the N concatenated second Transformer layers in sequence according to the intra-instance attention mechanism and the inter-instance attention mechanism. The obtained lane information features are then mapped to the label embedding to form a vector encoding result containing lane information.

[0085] In the above process, the first and second vector encoding blocks learn and extract information from vector embeddings, type embeddings, instance embeddings, and location embeddings, and assign values ​​to blank label embeddings to map the acquired lane information features to the label embeddings, thereby forming an index that corresponds one-to-one with the vector features, thus forming a vector encoding result containing lane information. Specifically, the first vector encoding block uses M cascaded first Transformer layers and, through an intra-instance attention mechanism, assigns values ​​to blank label embeddings based on vector embeddings, type embeddings, instance embeddings, and location embeddings. Then, the second vector encoding block uses N cascaded second Transformer layers and, through an inter-instance attention mechanism, performs a secondary assignment of values ​​to the label embeddings based on vector embeddings, type embeddings, instance embeddings, and location embeddings. The label embeddings after the secondary assignment are used as the index of each vector feature to obtain the vector encoding result.

[0086] After receiving the vector encoding results output by MEE, the large language model can identify uniquely matching vector features based on the index, i.e. the assigned label embedding. For example, it can identify a vector feature as the center line of the leftmost lane, or as the lane divider between the leftmost and middle lanes.

[0087] To complete the creation of lane attribute data, the large language model also needs to acquire driving rule features. As one example, driving rule features can be obtained externally to the lane attribute creation system. In this case, RuleVLM can receive extracted driving rule features, or encoded image encoding results and / or text encoding results. As another example, driving rule features can also be obtained from other functional modules within the lane attribute creation system, such as a visual encoder and / or a text encoder. In some embodiments disclosed herein, the lane attribute creation system may further include a visual encoder and / or a text encoder, wherein the visual encoder can process image modal data to output image encoding results, and the text encoder can process text modal data to output text encoding results. Based on the image encoding results and text encoding results, individual driving rules can be generated. It should be noted that at this stage, the driving rules have not yet been mapped to specific lanes.

[0088] The process of creating lane attribute data will be described below with reference to Figure 10. As described in the previous embodiments, in addition to vector modal data, the lane attribute creation system can also process image modal data and / or text modal data. In this case, the structure of the lane attribute creation system can be seen in the embodiments described with reference to Figure 3 or Figure 4. Besides the map element encoder, it also includes at least one of a visual encoder and a text encoder. The multimodal data received by the lane attribute creation system can also include at least one of image modal data and text modal data, as well as vector modal data.

[0089] Figure 10 shows an exemplary flowchart of a lane attribute data creation method 1000 according to some embodiments of this disclosure. As shown in Figure 10, the method includes:

[0090] In step S1001, the image modal data is encoded using a visual encoder to output the image encoding result to the large language model;

[0091] In step S1002, the text modal data is encoded using a text encoder to output the text encoding result to the large language model;

[0092] In step S1003, the map element encoder is used to encode the vector modal data in order to output the vector encoding result to the large language model;

[0093] In step S1004, at least one of the image encoding results and text encoding results, as well as the vector encoding results, are processed using a large language model to generate lane attribute data.

[0094] It should be noted that in this embodiment, steps S1001 and S1002 can be performed selectively, or both steps can be performed. In other words, the lane attribute generation system can process image modal data and text modal data simultaneously, or it can process only one type of modal data. Correspondingly, the large language model can process image encoding results and text encoding results simultaneously, or it can process only one type of data.

[0095] Additionally, it should be noted that there are no strict restrictions on the execution order of steps S1001 to S1003. The three steps can be executed in any order, for example, in the order of steps S1003, S1002, and S1001. Alternatively, the three steps S1001 to S1003 can be executed in parallel, without much restriction here.

[0096] In this embodiment, step S1003 describes the process of processing vector modal data using a map element encoder. The specific execution steps of this process can be found in the embodiment described above in conjunction with Figure 9, and will not be described in detail here.

[0097] In this embodiment, the execution process of step S1004 can be referred to step S803 and related descriptions in the embodiment described above in conjunction with Figure 8, and will not be repeated here. Further, after step S1004 is completed, the lane attribute data output by the large language model is a driving rule in text format. In order to integrate the driving rule into the HD map, it needs to be converted into a standardized structured description. Therefore, in some embodiments, the lane attribute creation system also includes a JSON decoder, whose input layer is connected to the output layer of the large language model. After processing the vector encoding result using the large language model, the JSON decoder is also needed to restore the lane attribute data from text format to JSON format. Then, the traffic rule layer of the high-definition map is constructed based on the lane attribute data in JSON format.

[0098] The preceding embodiments described the process of creating lane attributes using a lane attribute creation system. Prior to this, the lane attribute creation system needs to be constructed and trained to enable it to perform lane attribute creation. This process specifically includes: constructing a RuleVLM, and then training the RuleVLM to obtain the lane attribute creation system. The process of constructing the RuleVLM is as follows: the embedding layer, the first vector encoding block, and the second vector encoding block are sequentially connected to build a map element encoder. Then, the output layer of the map element encoder is connected to the input layer of the large language model. The specific structures of the first and second vector encoding blocks have been described in detail in the preceding embodiments in conjunction with Figure 7, and will not be repeated here.

[0099] In this embodiment, the structure of the constructed RuleVLM can be referred to the embodiment described above in conjunction with Figures 2 to 7, and will not be repeated here.

[0100] The training process of RuleVLM will be described below with reference to Figure 11. Figure 11 shows an exemplary flowchart of a RuleVLM training method 1100 according to some embodiments of this disclosure. As shown in Figure 11, the training method includes:

[0101] In step S1101, the sample dataset is obtained;

[0102] In step S1102, the lane attribute data in JSON format in the sample dataset is serialized into text format to form a sample corpus in QA (Question and Answer) format;

[0103] In step S1103, the QA format sample corpus is split into training corpus and evaluation corpus;

[0104] In step S1104, RuleVLM is trained using the training corpus;

[0105] In step S1105, the performance of the trained RuleVLM is evaluated using the evaluation corpus until its performance meets the requirements, thus obtaining the lane attribute creation system.

[0106] As an example, the sample dataset used in this embodiment is MapDR, a dataset specifically designed for extracting driving rules from traffic signs and associating them with vectorized high-definition maps. MapDR focuses on complex traffic scenarios and collects tens of thousands of video clips from multiple traffic scenarios, each clip containing at least one traffic sign. In addition to vectorized representations of separators, boundaries, and center lines, MapDR also provides structural annotations of traffic regulations and their association with lanes.

[0107] It should be noted that the above description is an exemplary dataset that can be used in the embodiments disclosed herein. In practical applications, there may be other datasets of the same type that are suitable for the training method shown in this disclosure, and no further restrictions are imposed here.

[0108] Because integrating driving rules into HD maps requires a standardized, structured description to support comprehensive autonomous driving applications, lane attribute data in sample datasets is typically represented in JSON format. However, compared to QA corpora, it performs slightly worse for training models handling language tasks. This is because QA corpora generally contain richer linguistic phenomena and more complex contexts, making them more suitable for training models handling language tasks. In contrast, while JSON data is structured, it lacks the linguistic diversity and complexity of QA corpora. Furthermore, QA corpus training is more task-oriented; models can learn how to provide answers to specific questions using QA corpora, improving their performance on specific tasks. Moreover, QA corpora allow models to learn how to extract information from given text, positively impacting subsequent fine-tuning and task-specific execution.

[0109] For the reasons mentioned above, after acquiring the sample dataset, this embodiment will serialize the lane attribute data in JSON format into text format to form a QA-formatted sample corpus. In addition to serving as training data for the model, the sample dataset can also be used as evaluation data to assess the performance of the trained model, reflecting whether the model has been successfully trained or needs further adjustment. As an example, the trained RuleVLM is tested using the evaluation corpus, and its recall rate is calculated. If the recall rate meets the target, RuleVLM is considered successfully trained; otherwise, the model is fine-tuned using the training corpus until the recall rate meets the target. As an example, the recall rate threshold can be set to 80% or other values; specific values ​​can be set according to actual needs, and no excessive restrictions are imposed here.

[0110] It should be noted that in some embodiments, the steps described in the previous embodiments can be followed: first, the lane attribute data in JSON format is serialized into sample corpus in QA format, and then the training corpus and evaluation corpus are split into them. In other embodiments, the sample dataset can be split first to obtain a training dataset and an evaluation dataset, and then the training dataset and evaluation dataset can be converted into their respective formats. Both of the above dataset splitting methods are applicable to this disclosure, and no further restrictions are imposed here. In addition, the sample dataset can be split according to a certain ratio, for example, splitting the training data and evaluation data according to a 9:1, 8:2, or other ratios. There are no strict requirements on the splitting ratio, and it can be adjusted according to actual needs.

[0111] Furthermore, when training RuleVLM using the training corpus, a LoRA-based training method can be used to train the large language model within it. Even further, in RuleVLM, the map element encoder can be connected to the large language model via an adapter, which can be a linear layer. Therefore, during step S1104, the parameters of the map element encoder and its adapter can be kept in a trainable state, while the parameters of the visual encoder and text encoder and their adapters are fixed. Then, the trainable parameters in RuleVLM are trained based on the training corpus, thereby achieving efficient parameter fine-tuning.

[0112] The above embodiments describe how to optimize the training of RuleVLM through format conversion and / or by selecting efficient training methods. In other embodiments, in order to optimize the training of RuleVLM to obtain a RuleVLM with enhanced performance, it is also necessary to prevent overfitting during the training process.

[0113] To address the overfitting problem, some embodiments disclosed herein process the sample dataset as follows: First, the original sample dataset is obtained. Then, the lane centerline data of the vector modes in the original sample dataset are ordered to form an overfit-resistant sample dataset. This approach increases the diversity and volume of the training data, achieving a certain degree of resistance to overfitting.

[0114] In summary, this disclosure provides a lane attribute creation system that can be equipped with RuleVLM, including a map element encoder and a large language model. It defines a task to integrate traffic rules into a vectorized high-definition map and assigns it to RuleVLM for execution. RuleVLM processes the vectorized map data through the introduced MEE and analyzes multiple modal data, including vector modality and other modalities, through the large language model. It then outputs lane attribute data that matches driving rules to the corresponding lanes, which is beneficial for the construction of the traffic rule layer in the HD map.

[0115] This disclosure also provides a lane attribute creation method, which uses RuleVLM to integrate traffic rules into a vectorized high-definition map. Specifically, RuleVLM can combine data from multiple modalities to associate driving rule features with vector features in the vectorized map, thereby completing the matching of driving rules and lanes. With the help of RuleVLM, this method can generate detailed rule descriptions required for autonomous driving. These rule descriptions can also be matched with specific lanes, facilitating the construction of an accurate traffic rule layer and thus forming a reliable HD map to support fully autonomous driving applications.

[0116] To implement the method steps described above in conjunction with the accompanying drawings at the software and hardware level, embodiments of this disclosure also provide an electronic device as shown in FIG12. Specifically, FIG12 shows an exemplary structural block diagram of the electronic device 1200 of an embodiment of this disclosure.

[0117] As shown in Figure 12, the electronic device 1200 disclosed herein may include a processor 1210 and a memory 1220. Specifically, the memory 1220 stores executable program instructions. When the program instructions are executed by the processor 1210, the electronic device performs the steps described above in conjunction with Figures 8-11.

[0118] It is understood that, in order to clearly illustrate the solution disclosed herein and avoid confusion with the prior art, the electronic device 1200 in FIG12 only shows the components relevant to the embodiments disclosed herein, while omitting those components that may be necessary for implementing the embodiments disclosed herein but fall within the scope of the prior art. Therefore, based on the content disclosed herein, those skilled in the art can clearly understand that the electronic device 1200 disclosed herein may also include common components different from those shown in FIG12.

[0119] In an exemplary implementation scenario, the processor 1210 described above can control the overall operation of the electronic device 1200. For example, the processor 1210 can control the operation of the electronic device 1200 by executing a program stored in the memory 1220. In terms of implementation, the processor 1210 disclosed herein can be implemented as a central processing unit (CPU), application processor (AP), intelligent processing unit (IPU), etc., provided in the electronic device 1200. Furthermore, the processor 1210 disclosed herein can also be implemented in any suitable manner. For example, the processor 1210 can take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro)processor, logic gates, switches, application-specific integrated circuits (ASICs), programmable logic controllers, and embedded microcontrollers, etc.

[0120] In terms of stored content, memory 1220 can be used to store various data and instructions processed in electronic device 1200. For example, memory 1220 can store processed data and data to be processed in electronic device 1200. Memory 1220 can store datasets that have been processed or are to be processed by processor 1210. In addition, memory 1220 can store applications, drivers, etc., to be driven by electronic device 1200. For example, memory 1220 can store various programs to be executed by processor 1210. Memory 1220 can be DRAM (Dynamic Random Access Memory), but this disclosure is not limited to this. In terms of type, memory 1220 can include at least one of volatile memory or non-volatile memory. Non-volatile memory can include read-only memory (ROM), programmable read-only memory (PROM), electrically programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), flash memory, phase-change random access memory (PRAM), magnetic random access memory (MRAM), resistive random access memory (RRAM), and ferroelectric random access memory (FRAM). Volatile memory can include dynamic random access memory (DRAM), static random access memory (SRAM), synchronous dynamic random access memory (SDRAM), PRAM, MRAM, RRAM, and ferroelectric random access memory (FRAM).In an embodiment, the memory 1220 may include at least one of a hard disk drive (HDD), a solid state drive (SSD), a high-density flash memory (CF), a secure digital (SD) card, a micro-secure digital (Micro-SD) card, a mini-secure digital (Mini-SD) card, an extreme digital (xD) card, caches, or a memory stick.

[0121] In summary, the specific functions implemented by the memory 1220 and processor 1210 of the electronic device 1200 provided in this specification can be explained in comparison with the aforementioned embodiments in this specification, and can achieve the technical effects of the aforementioned embodiments. They will not be repeated here.

[0122] Additionally, this disclosure can also be implemented as a computer program product. This computer program product includes a computer program that, when executed by a processor, implements some or all of the steps of the methods described above.

[0123] Additionally or optionally, this disclosure may also be implemented as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) storing computer-executable instructions (or computer programs, or computer instruction codes) that, when executed by a processor of an electronic device (or electronic device, server, etc.), cause the processor to perform some or all of the steps of the methods described above according to this disclosure.

[0124] While numerous embodiments of this disclosure have been shown and described herein, it will be apparent to those skilled in the art that such embodiments are provided by way of example only. Many modifications, alterations, and alternatives will occur to those skilled in the art without departing from the spirit and intent of this disclosure. It should be understood that various alternatives to the embodiments of this disclosure described herein may be employed in the practice of this disclosure. The appended claims are intended to define the scope of this disclosure and therefore cover equivalents or alternatives within the scope of these claims.

[0125] The collection and acquisition of various data disclosed herein comply with relevant laws and regulations and are authorized by the data providers. Any organization or individual that needs to obtain external data shall obtain authorization in accordance with the law and ensure data security. It is prohibited to illegally collect, use, process, or transmit unauthorized or unprotected data, or to illegally buy, sell, provide, or disclose unauthorized or unprotected data.

Claims

1. A lane attribute creation system, wherein, include: Map element encoder and large language model; The output layer of the map element encoder is connected to the input layer of the large language model. The map element encoder is used to process the vectorized map to output vector encoding results. The vectorized map uses vector features to represent map elements, including lanes. The input layer of the large language model also receives at least one of image encoding results and text encoding results, and the large language model is configured to: Lane attribute data is generated based on at least one of the image encoding results and text encoding results, as well as the vector encoding results. The lane attribute data includes: vehicle driving rules and the correspondence between driving rules and lanes.

2. The lane attribute generation system according to claim 1, wherein, The large language model is also configured to: Extract driving rule features based on image encoding results and / or text encoding results; Lane information features are extracted based on the vector encoding results; as well as The driving rule features are matched with the lane information features to generate lane attribute data.

3. The lane attribute creation system according to claim 1 or 2, wherein, The map element encoder includes: an embedding layer, a first vector encoding block, and a second vector encoding block connected in sequence. The embedding layer is used to convert the vector features of lanes in the vectorized map from vector representation to embedded representation. The first vector encoding block and the second vector encoding block are used to encode the vector features of the embedded representation into a vector encoding result containing lane information.

4. The lane attribute generation system according to claim 3, wherein, The embedding layer is used to perform embedding transformation on the points on the vector features of each lane in the vectorized map to generate vector embedding, type embedding, instance embedding and location embedding describing the vector features of each lane. It is also used to set a blank marker embedding at the beginning of the vector features of each lane in the vectorized map. The marker embedding, the vector embedding, the type embedding, the instance embedding and the location embedding are aggregated to form an embedded representation of the vector features of the lane. Wherein, the vector embedding is used to represent the position of a point on a vector feature, the type embedding is used to represent the map element represented by the vector feature where the point is located, the instance embedding is used to indicate the vector feature where the point is located, the position embedding is used to represent the relative position between points, and the label embedding is used as an index for each vector feature.

5. The lane attribute generation system according to claim 4, wherein, The first vector encoding block includes: M concatenated first Transformer layers, where M is a positive integer. The first Transformer layers employ an in-instance attention mechanism. The M concatenated first Transformer layers are used to receive the embedded representation of the vector features of the lane, and to assign values ​​to the blank label embeddings according to the vector embedding, the type embedding, the instance embedding, and the position embedding. The second vector encoding block includes: N concatenated second Transformer layers, where N is a positive integer. The second Transformer layers employ an inter-instance attention mechanism. The N concatenated second Transformer layers are used to perform secondary assignment on the label embedding based on the vector embedding, the type embedding, the instance embedding, and the position embedding, forming a label embedding that corresponds one-to-one with the vector features and using it as an index for each vector feature to obtain a vector encoding result containing lane information.

6. The lane attribute generation system according to any one of claims 1-5, wherein, The lane attribute data output by the large language model is driving rules in text format; The lane attribute generation system also includes a JSON decoder, with the input layer connected to the output layer of the large language model, to restore the lane attribute data from text format to JSON format.

7. A method for creating lane attributes, wherein, It is applied to a lane attribute creation system, which includes a map element encoder and a large language model, wherein the output layer of the map element encoder is connected to the input layer of the large language model; the lane attribute creation method includes: Receive multimodal data, the multimodal data including at least one of image encoding results and text encoding results, and a vectorized map; The vectorized map is processed using the map element encoder to obtain a vector encoding result, wherein the vectorized map uses vector features to represent map elements, including lanes; and The image encoding result and text encoding result, as well as the vector encoding result, are processed using the large language model to generate lane attribute data, wherein the lane attribute data includes vehicle driving rules and the correspondence between driving rules and lanes.

8. The method for creating lane attributes according to claim 7, wherein, The process of using the large language model to process at least one of the image encoding results and text encoding results, as well as the vector encoding results, to generate lane attribute data includes: The image encoding results and / or text encoding results are processed using the large language model to extract driving rule features; The vector encoding results are processed using the large language model to extract lane information features; and The driving rule features are matched with the lane information features using the large language model to generate the lane attribute data.

9. The method for creating lane attributes according to claim 7 or 8, wherein, The map element encoder includes: an embedding layer, a first vector encoding block, and a second vector encoding block connected in sequence; wherein processing the vectorized map using the map element encoder to obtain vector encoding results includes: The embedding layer is used to convert the vector features of lanes in the vectorized map into an embedded representation; and The embedding representation of vector features is encoded using the first vector coding block and the second vector coding block to form a vector coding result containing lane information.

10. A computer program product comprising a computer program, wherein, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 7 to 9.

11. A computer-readable storage medium, wherein, The computer-readable storage medium stores computer-executable instructions, which, when executed by a processor, implement the steps of the method as described in any one of claims 7 to 9.