Vehicle control methods, devices, equipment and media

By combining vehicle perception data and brief natural language commands with a large visual-language model and topological data, a path that meets vehicle dynamics constraints is generated. This solves the problems of weak navigation capabilities and high deployment costs in the absence of high-precision maps for assisted driving, and achieves safe and reliable autonomous navigation.

CN122306044APending Publication Date: 2026-06-30CHONGQING QIANLI ZHIJIA TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHONGQING QIANLI ZHIJIA TECHNOLOGY CO LTD
Filing Date
2026-03-23
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing driver assistance methods have weak navigation capabilities and high deployment costs in scenarios lacking high-precision maps, especially in environments such as underground parking garages where maps are missing or updating is costly.

Method used

By combining vehicle-sensing data and target location text with a large visual-language model and topological data, candidate paths that meet vehicle dynamics constraints are generated, enabling autonomous navigation from any starting point to the target location and avoiding reliance on high-precision maps.

Benefits of technology

It achieves safe and reliable destination-level automatic navigation in mapless conditions, significantly reducing deployment costs and improving system stability and security.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122306044A_ABST
    Figure CN122306044A_ABST
Patent Text Reader

Abstract

This application relates to the field of assisted driving technology and discloses vehicle control methods, devices, equipment, and media, including: obtaining a target path based on the vehicle's perception data of the current scene and the target location indicated by the user, converting the target path into vehicle control commands, and controlling the vehicle to drive. It can realize the determination of whether the target location has been reached from any starting point under map-less conditions based on vehicle perception data and a target location that conforms to the user's description habits. If the target location has not been reached, a safe path that is directed toward the target location is determined, realizing complete destination-level automatic navigation, which can significantly reduce deployment costs.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of driver assistance technology, and in particular to a vehicle control method, device, equipment and medium. Background Technology

[0002] Significant progress has been made in assisted driving (ADAS) in open road environments, such as urban areas or highways. Relying on high-precision maps and end-to-end perception-prediction-planning models, ADAS can achieve safe and reliable driving on most road sections with map coverage. However, in the last mile scenario where maps are largely missing or map updates are costly, especially in underground parking garages, ADAS technologies suffer from weak navigation capabilities and high deployment costs. Summary of the Invention

[0003] This application provides a vehicle control method, apparatus, device, and medium to address the problems of weak navigation capabilities and high deployment costs in related assisted driving methods.

[0004] In a first aspect, this application provides a vehicle control method, including: acquiring the target position of the vehicle; determining whether the vehicle has reached the target position; if the vehicle has not reached the target position, determining a target path based on the perception data of the vehicle's environment and the target position, and controlling the vehicle to drive based on the target path.

[0005] In one optional implementation, determining the target path based on the perception data of the vehicle's environment and the target location includes: obtaining at least one candidate path based on the vehicle's preset trajectory anchor points, vehicle status, and perception data; and determining the target path based on the candidate path and the target location.

[0006] In one optional implementation, at least one candidate path is obtained based on the vehicle's preset trajectory anchor points, vehicle state, and perception data, including: fusing trajectory anchor points and perception data to obtain a first feature; performing nonlinear semantic enhancement on the second feature fused with the first feature and vehicle state to obtain an enhanced second feature; performing time step modulation on the enhanced second feature to obtain a temporal feature; mapping the temporal feature to a path coordinate sequence, and determining the candidate path based on the path coordinate sequence.

[0007] In one optional implementation, determining the target path based on perceived data of the vehicle's environment and the target location further includes: extracting environmental features based on the perceived data; acquiring topological data of a preset area, identifying intersections in the topological data, and determining the locations at preset distances from the intersections as topological nodes; determining directed edges between topological nodes based on the lanes represented by the topological data; constructing a decision graph based on the intersections, topological nodes, and directed edges; initializing path selection parameters based on the decision graph; and determining the target path based on the path selection parameters, by integrating environmental features and the target location.

[0008] In one optional implementation, initializing path selection parameters based on the decision graph includes: during interactive training, determining the topological nodes that the vehicle passes through from a preset starting point to a preset ending point; determining a first reward corresponding to the driving trajectory selected at the passed topological nodes based on a second preset condition, the second preset condition including at least one of the average Euclidean distance between the driving trajectory and the expert trajectory, whether a collision occurs, and whether a boundary crossing occurs; if the first reward is higher than a first reward threshold, increasing the weight of selecting the driving trajectory corresponding to the first reward, and if the first reward is lower than or equal to the first reward threshold, decreasing the weight of selecting the driving trajectory corresponding to the first reward; performing multiple rounds of interactive training until the average first reward of the path selected at the passed topological nodes satisfies a third preset condition, and determining the path selection parameters based on the parameters corresponding to the average first reward that satisfies the third preset condition.

[0009] In one optional implementation, initializing path selection parameters based on the decision graph further includes: during interactive training, determining the directed edge sequence traversed by the vehicle from a preset starting point to a preset ending point and the shortest path from the starting point to the ending point; determining a second reward corresponding to the directed edge sequence based on a fourth preset condition, the fourth preset condition including at least one of the prefix overlap between the directed edge sequence and the shortest path, the overlap between the directed edges in the directed edge sequence and the edges in the shortest path, and whether the ending point has been reached; if the second reward is higher than a second reward threshold, increasing the weight of the directed edge sequence corresponding to the second reward; if the second reward is lower than or equal to the second reward threshold, decreasing the weight of the directed edge sequence corresponding to the second reward; performing multiple rounds of interactive training until the second reward corresponding to the traversed directed edge sequence satisfies a fifth preset condition; determining path selection parameters based on the parameters corresponding to the second reward that satisfies the fifth preset condition; or, combining the first reward and the second reward based on preset weights to obtain a third reward, and determining path selection parameters based on the parameters corresponding to the third reward that satisfies a sixth preset condition.

[0010] In one optional implementation, at least one candidate path is obtained based on the vehicle's preset trajectory anchor points, vehicle status, and perception data, including: initializing candidate path generation parameters based on a decision graph; and obtaining at least one candidate path based on the candidate path generation parameters, the vehicle's preset trajectory anchor points, vehicle status, and perception data.

[0011] In one optional implementation, initializing candidate path generation parameters based on a decision graph includes: generating expert trajectories from a starting point to multiple endpoints in the decision graph; clustering the expert trajectories to obtain initial trajectory anchor points; adding noise to the target expert trajectory to generate a noisy trajectory, where the target expert trajectory is any one of the expert trajectories; acquiring initial perception data of vehicles in the decision graph; gradually reconstructing the noisy trajectory based on the initial perception data and the initial trajectory anchor points; determining a first error between the first trajectory generated during the reconstruction process and the target expert trajectory; and if the first error satisfies a first preset condition, determining candidate path generation parameters based on the parameters for generating the first trajectory corresponding to the first error.

[0012] In one optional implementation, determining the target path based on the candidate path and the target location includes: determining whether there is a decision history; if there is no decision history, generating a decision history based on the target location, perception data of the vehicle's environment, the candidate path, and the target path; if there is a decision history, determining the target path based on the decision history, the candidate path, and the target location.

[0013] Secondly, this application provides a vehicle control device, which includes: a target position acquisition module for acquiring the target position of the vehicle; a perception module for acquiring perception data of the environment in which the vehicle is located; a control module for determining a target path based on the perception data and the target position, and controlling the vehicle to drive along the target path; and a display module for displaying the target path and displaying a screen showing the vehicle driving along the target path in response to a control command to drive the vehicle to the target position.

[0014] In some optional implementations, the control module is further configured to obtain at least one candidate path based on the sensing data, and determine the target path based on the candidate path and the target location; the display module is further configured to display the candidate path and the target path determined from the candidate path.

[0015] In some alternative implementations, the display module is also used to identify the target path, wherein the identification includes at least one of highlighting, adding text, and adding an icon.

[0016] In some alternative implementations, the aforementioned vehicle control device further includes a voice module for explaining the reason for selecting the target path.

[0017] Thirdly, this application provides an electronic device, including: a memory and a processor, which are communicatively connected to each other. The memory stores computer instructions, and the processor executes the computer instructions to perform the vehicle control method of the first aspect or any corresponding embodiment described above.

[0018] Fourthly, this application provides a computer-readable storage medium storing computer instructions for causing a computer to execute the vehicle control method of the first aspect or any corresponding embodiment described above.

[0019] Fifthly, this application provides a computer program product, including computer instructions for causing a computer to execute the vehicle control method of the first aspect or any corresponding embodiment described above.

[0020] The vehicle control method, apparatus, equipment, and medium provided in this application can obtain a target path based on the vehicle's perception data of the current scene and the target location indicated by the user, and convert the target path into vehicle control commands to control the vehicle's movement. It can determine whether the target location has been reached from any starting point without a map, based on the vehicle's perception data and the target location that conforms to the user's description habits. If the target location has not been reached, a safe path that leads to the target location is determined, realizing complete destination-level automatic navigation, which can significantly reduce deployment costs. Attached Figure Description

[0021] To more clearly illustrate the technical solutions in the specific embodiments of this application or the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0022] Figure 1 A schematic flowchart of a vehicle control method according to an embodiment of this application is shown; Figure 2 This paper illustrates a schematic diagram of the environmental feature extraction process provided in this application. Figure 3 A schematic diagram of the target path determination process provided in this application is shown; Figure 4 A schematic diagram of the topology data provided in this application is shown; Figure 5 A schematic diagram of a decision graph constructed based on the aforementioned topological data is shown; Figure 6 The image shows a schematic diagram illustrating the historical basis for the decision-making process; Figure 7A schematic diagram of the constituent modules of the vehicle control device provided in this application is shown; Figure 8 This is a schematic diagram of the hardware structure of an electronic device according to an embodiment of this application. Detailed Implementation

[0023] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0024] It is understood that before using the technical solutions disclosed in the various embodiments of this application, users should be informed of the types, scope of use, and usage scenarios of the personal information involved in this application in an appropriate manner in accordance with relevant laws and regulations, and user authorization should be obtained.

[0025] The terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Therefore, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this application, "multiple" means two or more, unless otherwise explicitly specified.

[0026] Among related technologies, the high-precision map-based assisted driving system adopts an architecture that combines high-precision maps, road network navigation, and local planning and control, including: an on-board sensor module, a perception and positioning module, a high-precision map database module, a global path planning module, a behavior decision module, a motion planning module, and a control execution module.

[0027] The vehicle-mounted sensor module includes multiple cameras, lidar, inertial measurement units (IMUs), and global navigation satellite system (GNSS) receivers, which are used to collect environmental perception data and vehicle position and attitude information.

[0028] The perception and localization module is used to perform tasks such as target detection and semantic segmentation on the data collected by the vehicle sensor module. At the same time, it combines high-precision map features for localization, such as lane line matching and intersection topology matching, and outputs the vehicle's precise pose in the map coordinate system.

[0029] The high-precision map database module is used to store map elements such as road centerlines, lane boundaries, traffic signs, traffic lights, speed limit information, and intersection topology. These map elements are typically collected by professional surveying vehicles and then manually annotated.

[0030] The global path planning module is used to search for a global path from the origin to the destination in the high-precision map road network based on the destination given by the user.

[0031] The behavior decision module is used to generate high-level behavior instructions, such as going straight, turning left or right, changing lanes, overtaking, and stopping and waiting, by combining the global path, the status of surrounding traffic participants (vehicles, pedestrians, etc.) and traffic rules.

[0032] The motion planning module is used to generate local trajectories that satisfy dynamic constraints based on global paths, obstacle positions, etc., under the constraints of behavioral instructions.

[0033] The control execution module is used to convert the planned local trajectory into chassis control commands such as throttle, brake, and steering angle to achieve vehicle control.

[0034] While high-precision map-based driver assistance systems have demonstrated good performance on open roads, significant problems exist in the map-deficient scenarios addressed in this application, such as indoor parking lots, underground garages, and enclosed parks. Firstly, the construction and maintenance costs of high-precision maps are extremely high, and the number of such map-deficient scenarios is enormous with significant structural differences. Taking indoor parking lots as an example, acquiring a high-precision map covering all garages requires extensive surveying vehicles and manual annotation. Furthermore, changes frequently occur within parking garages, such as parking space adjustments, partition modifications, the addition of charging stations, or entrance / exit modifications, leading to frequent map updates and extremely high maintenance costs.

[0035] Secondly, in areas lacking maps or where maps are unavailable, the current system cannot provide effective navigation. In such cases, the system typically requires manual intervention and cannot make decisions based on high-level objectives such as "going to a certain exit / charging area / parking area." In complex, mapless scenarios, vehicles cannot autonomously infer their direction of travel based on visual cues such as road signs and text markings in the environment, severely impacting the "arrival at the destination" experience.

[0036] To enhance the autonomous navigation capabilities of intelligent agents in mapless environments, there are indoor autonomous navigation systems based on vision-language navigation models, which rely on step-by-step instructions and the setting of discrete node graphs.

[0037] Specifically, the indoor autonomous navigation system based on the vision-language navigation model discretizes the real indoor scene, such as the various rooms and corridors of a building, into a series of visible nodes through an environment map and a discrete node graph. A directed graph is constructed based on the visible nodes, with each visible node corresponding to a panoramic image. Through the image acquisition and rendering module, a panoramic image is rendered for each node in the 3D reconstructed scene. Through the language encoding module, detailed natural language instructions labeled by humans, such as "go through the living room, turn right, and walk to the bed in the second bedroom," are encoded to extract semantic features. Through the history state and attention module, historical observations and executed actions are encoded into hidden states using a neural network to model long-term path information. Through the policy network module, the next action is selected from a limited set of discrete actions (such as forward, left turn, right turn, and stop) by combining the current node image features, language instruction features, and history state, ultimately completing the navigation from the starting point to the destination.

[0038] Directly transferring indoor autonomous navigation systems based on vision-language navigation models from related technologies to assisted driving vehicles for autonomous navigation in real-world environments presents the following problems: First, there is a mismatch in language formats. Indoor autonomous navigation systems based on vision-language navigation models rely on extremely detailed and step-by-step route descriptions. These instructions are typically written by annotators who have mastered the complete path in the scene. However, in actual driving scenarios, vehicle drivers generally only provide brief descriptions of the destination, such as "go to the exit," "go to the nearest charging station," or "go to the parking space near the elevator," without providing step-by-step verbal prompts.

[0039] Second, there is a mismatch between the action space and control precision. Indoor autonomous navigation systems based on vision-language navigation models typically select a limited set of virtual actions on a discrete node graph, without considering vehicle dynamics constraints, and cannot output continuous trajectories. Vehicle-assisted driving requires the generation of continuously executable signals such as trajectory, speed, and acceleration; a limited set of discrete actions is insufficient to directly control the vehicle.

[0040] This application provides a vehicle control method that can achieve visual and text feature fusion and path planning by using the vehicle's own perception data and the text quality of the target location. This enables the vehicle to autonomously drive from any starting point to the target location in an environment without high-precision maps, relying on vehicle perception information and brief natural language commands.

[0041] According to an embodiment of this application, a vehicle control method embodiment is provided. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Furthermore, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in a different order than that shown here.

[0042] This embodiment provides a vehicle control method that can be used in scenarios such as underground garages, indoor parking lots, internal roads of large commercial complexes, and enclosed outdoor parks where reliable high-precision maps or road network navigation data are unavailable. Figure 1 A schematic flowchart of a vehicle control method according to an embodiment of this application is shown, as follows: Figure 1 As shown, the process includes the following steps: Step S101: Obtain the target location of the vehicle.

[0043] The target location is used to represent the destination of the vehicle's assisted driving, such as an exit or the nearest charging station. The target location can be obtained based on the user's voice or text input.

[0044] Step S102: Determine whether the vehicle has reached the target location. If the vehicle has not reached the target location, determine the target path based on the perception data of the vehicle's environment and control the vehicle to drive along the target path.

[0045] Step S102 can be implemented based on a navigation selector module. This module can be constructed based on a Vision-Language Model (VLM), or it can be constructed based on a convolutional neural network combined with a bidirectional long short-term memory network. Environmental features and target location are fused through simple concatenation or a multilayer perceptron. The navigation selector module includes a Vision Encoder. The Vision Encoder includes a Vision Transformer (ViT).

[0046] The perception data of the vehicle's environment is used to characterize the environmental data collected by the vehicle through onboard sensors. Onboard sensors can be a forward-facing camera and a few auxiliary cameras, multiple cameras in a surround-view setup, or fisheye surround-view panoramic cameras. Correspondingly, the acquired perception data is image data. Onboard sensors can also be LiDAR or millimeter-wave radar, correspondingly acquiring point cloud data. Perception data can be image data, or it can include both image data and point cloud data. Image data includes real-world information perceived by the vehicle, such as garage driveways, pillars, walls, parked vehicles, empty parking spaces, and lane lines.

[0047] Images perceived by the vehicle can be encoded using a vision transformer to extract multi-scale environmental features. These environmental features characterize the structured features of the road scene extracted from the perceived data of the vehicle's environment. Environmental features include spatial information such as drivable areas, obstacle distribution, parking space outlines, and lane orientation.

[0048] The navigation selector module also includes a text encoder. The text encoder incorporates a natural language processing model. Based on this model, it encodes the Chinese description of the target location to obtain text features. These text features are used to characterize the semantic constraints extracted from the text including the target location. The text of the target location can be determined based on a textual prompt.

[0049] The navigation selector module also includes a language model decoder (LM Decoder). Based on the LM Decoder, environmental features and textual features can be aligned and fused through a cross-modal attention mechanism to obtain a first feature. This first feature represents the multimodal structured features resulting from the semantic-spatial alignment and fusion of environmental and textual features. The target path represents the executable driving path determined by the navigation selector module.

[0050] During the process of controlling the vehicle to travel along the target path, it is possible to repeatedly determine whether the vehicle has reached the target location. If the vehicle has not reached the target location, the process of step S102 is repeated until the vehicle reaches the target location.

[0051] In this way, even in mapless scenarios without a predefined road network or global path, the vehicle can combine perception data of its environment, such as signs, arrows on the ground, and text on walls from the forward-facing camera, as well as brief descriptions of the target location provided by the passenger (such as "go to the exit" or "go to the nearest charging station"), to determine whether it has reached the target area and select a safe and feasible trajectory towards the destination from multiple feasible driving paths.

[0052] The vehicle control method provided in this embodiment does not rely on external high-precision maps. It obtains the target path based on the vehicle's perception data of the current scene and the target location indicated by the user, and converts the target path into vehicle control commands to control the vehicle's movement. It can determine whether the target location has been reached from any starting point without a map, based on the vehicle's perception data and the target location that conforms to the user's description habits. If the target location has not been reached, a safe path is determined that leads to the target location, thus realizing complete destination-level automatic navigation and significantly reducing deployment costs.

[0053] In related technologies, advanced driver assistance systems (ADAS) learn how to drive and where to go by combining them into a single network. This makes it difficult to guarantee system stability in map-free scenarios with limited data and complex tasks. Furthermore, autonomous driving systems based on high-precision maps are prone to unstable and unexplainable behavior without the constraints of those maps. Directly combining voice command output with control may result in correct language understanding but unsafe trajectories.

[0054] In some optional implementations, determining the target path based on the perception data of the vehicle's environment and the target location includes: obtaining at least one candidate path based on the vehicle's preset trajectory anchor points, vehicle status, and perception data; and determining the target path based on the candidate path and the target location.

[0055] The trajectory anchor points represent a set of spatially representative key reference points for the trajectory (such as lane inflection points or target direction points). They can constrain the spatial range of path generation and can be fixed anchor point trajectories in multiple directions obtained through pre-clustering, such as twenty fixed anchor point trajectories in each direction. The vehicle state represents the current driving state of the vehicle, such as turning left, going straight, turning right, making a U-turn, accelerating, decelerating, or stopping.

[0056] In this embodiment, candidate paths conforming to physical laws are generated by using trajectory anchor points, local environmental features, and vehicle state constraints. These candidate paths are then superimposed on the target location, and the target path that is suitable for the environment, meets the vehicle's capabilities, and points directly to the target location is selected from the candidate paths. The steps in this embodiment can be executed by a planning module, which can be either a diffusion model or a non-diffusion model. For example, a variational autoencoder (VAE) can be used to replace the diffusion model to generate multiple candidate trajectories. By sampling multiple candidate trajectories through different prior distributions, the steps in this embodiment can also be implemented. Alternatively, candidate trajectories can be generated based on sampling planning, such as using traditional sampling planning methods like fast expanding random trees or dynamic programming to generate multiple candidate trajectories in the local space. After safety filtering of these candidate trajectories, they are input into the navigation selector module.

[0057] The planning module inputs perceived information about the vehicle's environment and its historical state, using a pre-clustered set of trajectory anchor points as prior information to generate multiple candidate trajectories that satisfy vehicle dynamics constraints. This module does not use destination information during training; instead, it learns a destination-independent distribution of drivable trajectories, responsible for safe, smooth, and regular local planning.

[0058] Specifically, clustering can be performed based on existing trajectory data to obtain several typical trajectory models, which are the trajectory anchor points. Each trajectory anchor point can be parameterized with multiple key points (e.g., eight) as prior inputs to the diffusion model. After obtaining the trajectory anchor points, environmental features extracted from the perception data of the vehicle's environment are obtained.

[0059] Figure 2 A schematic diagram of the environmental feature extraction process provided in this application is shown, such as... Figure 2As shown, eight cameras can be configured in the vehicle, namely, the first camera F0, the second camera B0, the third camera L1, the fourth camera R1, the fifth camera L0, the sixth camera R0, the seventh camera L2, and the eighth camera R2, to acquire image data 21 of the vehicle's environment. Alternatively, a LiDAR can be configured in the vehicle to acquire point cloud data 22 of the vehicle's environment; that is, the perception data includes both image data and point cloud data. Local environmental features can be extracted based on the perception data by fusing and encoding the image data and point cloud data to generate bird's-eye view features 23. If cost is limited, LiDAR can be omitted, and bird's-eye view features can be generated solely based on the more accurate image data. Alternatively, point cloud data acquired by millimeter-wave radar can be added to the perception data to enhance the detection and prediction of dynamic obstacles, thereby improving the safety of the generated candidate trajectories.

[0060] Figure 3 This application provides a schematic diagram of the target path determination process. Figure 3 As shown, the target path determination process provided in this application includes: fusing trajectory anchor points 31 and environmental features (bird's-eye view features 23) using a spatial cross-attention network. The trajectory anchor points and visual features are unified into the vehicle coordinate system to generate spatial position codes; through the spatial cross-attention network, the trajectory anchor points focus on the environmental features corresponding to their spatial positions, while simultaneously adapting the environmental features to the spatial constraints of the anchor points; the environmental features and spatial constraints are fused to output a first feature that combines trajectory spatial constraints and environmental semantics.

[0061] The vehicle state 32 and the first feature are fused using an agent cross-attention network. Specifically, the low-dimensional numerical representation of the vehicle state is encoded into a high-dimensional feature to align with the dimension of the first feature. Through the agent cross-attention network, the first feature focuses on vehicle dynamic constraints, such as minimum turning radius and maximum acceleration, filtering out unreachable anchor points or paths, and then weightedly fusing the vehicle state and the first feature.

[0062] The result of fusing vehicle state and the first feature is input into a feedforward neural network (FFN) to enhance the semantics related to path generation. Through decoding using an MLP or diffusion model, dimensionality is progressively reduced to generate multiple path coordinate sequences that satisfy vehicle dynamics and environmental constraints, i.e., candidate paths. Each candidate path can consist of four candidate trajectories, each including a sequence of eight two-dimensional coordinate points at eight future time points, with each point spaced 0.5 seconds apart. These candidate paths can be sorted from highest to lowest confidence, and four distinct but drivable candidate paths can be selected.

[0063] Candidate paths are overlaid onto the perception data acquired by the vehicle's forward-facing camera, resulting in an overlay result (33). This overlay result integrates the spatial relationship between the candidate paths and the vehicle's environment. Furthermore, it includes visual prompts for the candidate trajectories. Environmental features can be re-extracted from the overlay result using a visual encoder and input into the navigation selector module. Candidate trajectories can be drawn onto the images acquired by the forward-facing camera using different colors, and the images can be adjusted to a preset size, such as 1792×896 pixels. The explicit visualization of candidate paths is input into the visual-language big data model. Based on this model, one path is selected as the target path from multiple candidate paths. This allows the visual understanding and reasoning capabilities of the visual-language big data model to be directly applied to navigation decisions, facilitating the full utilization of the model's existing capabilities.

[0064] The target location indicated by the user in the assisted driving system might be a brief description of the location, such as "go to the nearest exit" or "go to the charging station area on the right." Textual features can be extracted based on the user-indicated target location. Using a cross-modal attention network employed in the language model decoder within the navigation selector module, the re-extracted visual features are made to focus on the target semantics within the textual features. Simultaneously, the textual features are adapted to the spatial relationships between the path and the environment, resulting in an aligned and fused result. The smallest semantic units representing the textual features are visual tokens, and the smallest semantic units representing the re-extracted visual features are textual tokens. The aligned and fused result is then input into a classification or scoring network to score each candidate path; the candidate path with the highest score is selected as the target path.

[0065] This approach employs a decoupled dual-module architecture between the planning module and the navigation selector module. The planning module constructs drivable trajectories that satisfy dynamic and road constraints, focusing on local feasibility and safety. The navigation selector module focuses on high-level decisions related to the target location, requiring only language-driven selection from a limited number of candidate trajectories. This reduces the burden on a single network to simultaneously learn how to drive safely and how to correctly reach the target location, significantly lowering the learning difficulty (only selection, not continuous control). Structurally, it avoids selecting obviously infeasible or immediately collision-prone trajectories, achieving an organic unity between visual language understanding and driving safety constraints. Furthermore, even if the navigation selector module occasionally selects the wrong path, the candidate path itself still satisfies lane geometry and dynamic constraints, balancing interpretability, safety, and stability in a graph-free environment. Compared to schemes that directly output control quantities from a visual-language model, this implementation has significant advantages in safety and training stability.

[0066] In some optional implementations, at least one candidate path is obtained based on the vehicle's preset trajectory anchor points, vehicle state, and perception data, including: fusing trajectory anchor points and perception data to obtain a first feature; performing nonlinear semantic enhancement on the second feature fused with the first feature and vehicle state to obtain an enhanced second feature; performing time step modulation on the enhanced second feature to obtain a temporal feature; mapping the temporal feature to a path coordinate sequence, and determining the candidate path based on the path coordinate sequence.

[0067] Nonlinear semantic enhancement is used to represent the nonlinear transformation of features through a feedforward network, strengthening semantic information related to path generation. Enhanced second features represent high-dimensional features that have undergone nonlinear transformation, filtered noise, and highlight path planning information. Temporal step modulation represents the injection of temporal dimension information into the semantically enhanced second features, enabling static features to express temporal trajectories. Temporally sequenced features represent features that simultaneously contain semantic and temporal information, and can be used to generate trajectory points that progress over time.

[0068] In this embodiment, such as Figure 3 As shown, the second feature can be input into a feedforward neural network, and feature extraction can be performed through nonlinear transformation to obtain the semantically enhanced second feature, which is then modulated by time steps. Multiple time steps are preset, such as 0.5 seconds, 1 second, 1.5 seconds, or 2 seconds in the future. A time code is generated for each time step, and the time code is added to or concatenated with the semantically enhanced second feature to obtain a temporally sequenced feature with temporal information. The temporally sequenced feature is then dimensionality-reduced using an MLP, mapping it to a two-dimensional coordinate point sequence. The coordinate points are arranged in chronological order to form a smooth, drivable trajectory, i.e., a candidate path.

[0069] In this way, by enhancing the semantic filtering of trajectory-related irrelevant features through FFN, the accuracy of path generation can be improved. At the same time, by modulating the time step, the output candidate path can be made to have temporal continuity. By directly mapping coordinates through temporal features, end-to-end efficient generation can be achieved. The candidate path generation speed is fast, the structure is simple, and it is easy to deploy. Through nonlinear semantic enhancement, time step modulation, and coordinate mapping, the second feature that integrates multi-source information is transformed into a candidate path that conforms to temporal laws and dynamic constraints. This makes the generated trajectory have environmental adaptability, temporal smoothness, and physical executability, which can improve the accuracy and reliability of assisted driving path planning.

[0070] High-precision map-based solutions in related technologies cannot plan global paths in map-free scenarios, and cannot complete navigation from any starting point to target areas such as exits, parking spaces, or charging areas. Indoor visual-language navigation systems, while capable of operating without maps, rely on extremely detailed step-by-step instructions, and their action space consists of discrete nodes and abstract actions, which are disconnected from the short-term trajectory and dynamic control of real vehicles.

[0071] In some optional implementations, determining the target path based on perceived data of the vehicle's environment and the target location further includes: extracting environmental features based on the perceived data; acquiring topological data of a preset area, identifying intersections in the topological data, and determining the locations at preset distances from the intersections as topological nodes; determining directed edges between topological nodes based on the lanes represented by the topological data; constructing a decision graph based on the intersections, topological nodes, and directed edges; initializing path selection parameters based on the decision graph; and determining the target path based on the path selection parameters, by integrating environmental features and the target location.

[0072] Among them, the path selection parameters are used to characterize the model parameters for selecting the optimal target path from multiple candidate paths, including: path score weights, trajectory selection probability coefficients, attention allocation parameters, decision thresholds, etc.

[0073] In this embodiment, the decision graph is used to train the planning module and the navigation selector module. In the application of the vehicle control method provided in this application, it is not necessary to construct the decision graph. Figure 4 A schematic diagram of the topology data provided in this application is shown. For example... Figure 4 As shown, Figure 4 The predefined region includes twelve decision nodes (0 to 11), one point of interest (12), such as an exit, four intersections, and a lane center line. The first intersection includes a first decision node (1), a second decision node (0), and a third decision node (2).

[0074] A topological node refers to a specific position where a vehicle is located before entering an intersection during its journey in a graphless environment. At this position, the vehicle will face multiple driving options (such as going straight, turning left, or turning right) and needs to make a path selection.

[0075] The topology nodes are set up to meet the following requirements: their positions are located a short distance before the convergence or fork of the center lines of different lanes, ensuring that the vehicle has enough space to make decisions and steer; each topology node is associated with several outgoing edges, and each outgoing edge corresponds to an optional driving direction; the topology node number is unique and is used to build the decision graph later.

[0076] The decision graph can be formalized as a directed graph G=(V, E). Here, V is the set of topological nodes, where each node v corresponds to a position before entering an intersection; E is the set of directed edges, where each edge e represents a segment of the lane centerline from node vi to its adjacent node vj, with a length consistent with the actual road geometry. Route cost can be measured based on directed edges. Each edge e(i, j) is associated with the following attributes: distance: corresponding to the actual lane centerline length; trajectory point sequence: a two-dimensional coordinate sequence obtained by sampling at fixed intervals; associated interest points or destination information: if there are interest points or destinations such as exits, parking spaces, or charging stations near the endpoint of the edge, they are labeled, where i represents one node of edge e, and j represents another node of edge e.

[0077] In this implementation, topological data such as lane centerlines, intersection coordinates, points of interest (POIs), and destination coordinates can be obtained from an external system. The relationships between all lane centerlines, intersections, and lane centerlines are analyzed. Locations with intersecting relationships are marked as intersections, and topological nodes are marked at a certain distance in front of the intersections, for example, five meters. Based on the geometric relationships of the lane centerlines and lane traffic rules, directed edges are created from upstream topological nodes to downstream topological nodes, and the coordinates of the lane centerlines and the length of the road segments within these directed edges are recorded. The vicinity of each directed edge is detected; if entrances / exits, parking spaces, or charging stations exist, they are recorded in the attributes of the corresponding topological node to associate POIs and destination information. Figure 5 A schematic diagram of a decision graph constructed based on the aforementioned topology data is shown. Green circles represent topology nodes, and red circles represent points of interest. The constructed decision graph can be saved as a graph data structure for training the planning module and the navigation selector module.

[0078] In some optional implementations, at least one candidate path is obtained based on the vehicle's preset trajectory anchor points, vehicle status, and perception data, including: initializing candidate path generation parameters based on the decision graph; and obtaining at least one candidate path based on the candidate path generation parameters, the vehicle's preset trajectory anchor points, vehicle status, and perception data.

[0079] The candidate path generation parameters include: FFN parameters, time step modulation parameters, MLP coordinate mapping, weights, biases, and hyperparameters of the diffusion denoising model.

[0080] The planning module is trained using a decision graph, enabling it to generate diverse but drivable short-term trajectories in various local environments. Simultaneously, during the training process, keyframes are extracted, and the navigation selector module is fine-tuned with single-frame instructions based on these keyframes. This allows the navigation selector module to learn to choose the correct path and output it in the correct format, given only a single frame of image and text.

[0081] For each topological node, the planning module generates several short-term candidate trajectories within the current local area, rather than having the planning module directly output control variables in an unconstrained continuous action space. The navigation selector module makes discrete choices among these trajectories, with each selection corresponding to an outgoing edge of the topological node in the decision graph. Simultaneously, a subtask is introduced to determine whether the target position has been reached, used to judge whether the vehicle is close to the target position from the current perspective.

[0082] Combining predefined destination types (parking spaces, charging stations, exits, etc.), this application uses brief target descriptions as language input, such as "go to the nearest exit" or "go to the charging area on the right," without requiring users to provide step-by-step route instructions. Through modeling methods involving decision graphs, candidate trajectories, and discrete selection, this application constrains and organizes the decision space without relying on high-precision maps, utilizing a topology-based training process. The candidate path is guaranteed to be safe to drive and satisfy vehicle dynamics constraints by the planning module; the navigation selector module, based on the forward-facing wide-angle camera and language prompts, determines and selects the trajectory closest to the target, again ensuring safety and reliability. The language prompts only require destination-level information, conforming to the expression habits of real passengers and solving the problem of difficulty in providing step-by-step instructions in actual situations.

[0083] In this way, without constructing a complete high-precision map, prior knowledge of the scene's topology can be introduced to train the planning module and navigation selector module, reducing the learning difficulty and improving navigation efficiency and safety. With only coarse-grained topological information from real driving data (such as lane centerlines, intersection connectivity, and destination location), a lightweight topological representation can be constructed. This allows the model to understand the reachability relationships between different intersections and the impact of the current decision on subsequent paths without relying on a high-precision map. As a result, it can form spatial intuition similar to that of a human driver in a mapless environment. Parameter initialization can be completed based on the road network rules and path constraints of the decision graph, so that the initial state of the planning module and navigation selector parameters already has prior road knowledge, which fits the actual needs of path planning.

[0084] In some optional implementations, initializing candidate path generation parameters based on the decision graph includes: generating expert trajectories from a starting point to multiple endpoints in the decision graph; clustering the expert trajectories to obtain initial trajectory anchor points; adding noise to the target expert trajectory to generate a noisy trajectory, where the target expert trajectory is one of the expert trajectories; acquiring initial perception data of vehicles in the decision graph; gradually reconstructing the noisy trajectory based on the initial perception data and the initial trajectory anchor points; determining a first error between the first trajectory generated during the reconstruction process and the target expert trajectory; and if the first error satisfies a first preset condition, determining candidate path generation parameters based on the parameters for generating the first trajectory corresponding to the first error.

[0085] Among them, the expert trajectory is used to represent the standard optimal driving trajectory from the starting point to the corresponding end point, generated based on the topological rules of the decision graph, traffic regulations, and vehicle dynamics constraints. It has the characteristics of compliance, smoothness, collision-free operation, and conformity to the road network, and serves as the true benchmark for the planning module's learning. The initial trajectory anchor point can be a representative trajectory key point sequence obtained after clustering massive expert trajectories. It is used to constrain the spatial direction and morphological structure of trajectory generation and is the core spatial prior for path generation. The target expert trajectory can be a single standard trajectory selected from the clustered expert trajectories as the true value for a single round of training. It is used for noise addition and error comparison and serves as a direct reference for parameter optimization. The noisy trajectory can be a distorted trajectory obtained by adding random Gaussian noise to the coordinates, heading, speed, and other dimensions of the target expert trajectory. It simulates a non-ideal trajectory shape that deviates from the true value and is used for the planning module's reconstruction training. The initial perception data can be the basic environmental perception data obtained by the vehicle at the corresponding initial position on the decision graph through sensors, providing environmental constraints for trajectory reconstruction. The first trajectory can be the fitted trajectory generated by the planning module based on the initial perception data and the initial trajectory anchor point, after gradually denoising and reconstructing the noisy trajectory. It is the candidate trajectory initially output by the candidate path generation parameters. The first error can be the quantified difference between the first trajectory and the target expert trajectory. It can be calculated using indicators such as Euclidean distance, smoothness error, and heading deviation, and is used to measure the trajectory reconstruction effect and parameter quality. The first preset condition can be a pre-set error threshold, representing the pass / fail standard for trajectory reconstruction. When the first error is less than this threshold, it is determined that the accuracy of the trajectory generated by the planning module meets the standard, and the corresponding parameters can be used as initial candidate path generation parameters.

[0086] In this implementation, the planning module can use a trajectory generator based on a diffusion model to generate multiple candidate trajectories that satisfy vehicle dynamics and road constraints by progressively reverse denoising, given the current observations and historical states.

[0087] Specifically, trajectory anchor points are initialized by clustering existing trajectory data to obtain several typical trajectory patterns, i.e., initial trajectory points. During training of the planning module, Gaussian noise is added to the anchor point trajectories near the real trajectory, and noisy trajectory samples are generated through a multi-step forward diffusion process. Based on the diffusion model, a denoising process is learned to gradually recover the real trajectory from the noisy trajectory. The difference between the first reconstructed trajectory and the target expert trajectory is quantified and compared to obtain the first error. The trajectory reconstruction accuracy is verified, and it is determined whether the current parameters meet the preset qualification standard. When the first error meets the preset condition, the model parameters corresponding to the first trajectory generated in this round are used as the initial candidate path generation parameters, completing the parameter initialization based on the decision graph. Thus, during inference, high-quality trajectories can be sampled from random noise. During the inference phase, the current local environmental features, vehicle status, and trajectory anchor points are used as conditions to input the diffusion model for sampling. The top-k method can be used, which determines the k largest or smallest elements from the candidate path set to obtain true candidate trajectories sorted from high to low confidence. Multiple different but drivable candidate trajectories are selected and superimposed on the forward-looking wide-angle camera image as input to the navigation selector module.

[0088] In this way, the road network topology based on the decision graph generates expert trajectories, and the candidate path generation parameters are initialized based on real road constraints rather than random assignment. This aligns with the driving scenarios and planning requirements of vehicle-assisted driving from the source. Relying on topological prior initialization, the initial state of the candidate path generation parameters is more in line with the actual scenario. At the same time, using expert trajectories as a supervision benchmark can improve the generation accuracy of candidate trajectories.

[0089] Currently, related technologies rely solely on supervised learning, which struggles to guarantee global topological consistency throughout the navigation process, only ensuring the correctness of single-step decisions. In map-less scenarios, local observations sometimes lack clear visual cues (e.g., no signs at intersections), requiring models to rely on historical decisions and global objectives for reasoning. Supervised training on single-frame data in existing technologies struggles to acquire this capability.

[0090] In some optional implementations, initializing path selection parameters based on the decision graph includes: during interactive training, determining the topological nodes traversed by the vehicle from the starting point to the destination; determining a first reward corresponding to the driving trajectory selected at the traversed topological nodes based on a second preset condition, the second preset condition including at least one of the average Euclidean distance between the driving trajectory and the expert trajectory, whether a collision occurs, and whether a boundary crossing occurs; if the first reward is higher than a first reward threshold, increasing the weight of selecting the driving trajectory corresponding to the first reward, and if the first reward is lower than or equal to the first reward threshold, decreasing the weight of selecting the driving trajectory corresponding to the first reward; performing multiple rounds of interactive training until the average first reward of the path selected at the traversed topological nodes satisfies a third preset condition, and determining the path selection parameters based on the parameters corresponding to the average first reward that satisfies the third preset condition.

[0091] Interactive training is used to represent the reinforcement learning interactive environment built on the decision graph, simulating the entire driving process of a vehicle from the starting point to the destination. This allows the navigation selector module to autonomously choose its driving trajectory at each topology node and receive immediate reward feedback in an iterative training process. The first reward is an immediate quantitative reward given to the model for a single trajectory selection, used to measure the quality of the selected trajectory. The mean Euclidean distance represents the average point-to-point spatial distance between the selected trajectory and the corresponding expert trajectory, quantifying the spatial deviation of the trajectory; a smaller value indicates the trajectory is closer to the optimal standard. Collision and boundary violations represent the violation status of the driving trajectory. Collision refers to the trajectory coming into contact with obstacles or other traffic participants; boundary violations refer to the trajectory deviating from the lane or entering a prohibited area, both of which are considered unqualified driving behaviors. The first reward threshold represents a pre-set reward qualification threshold, used to distinguish between high-quality and low-quality trajectories and determine the quality of the model's trajectory selection behavior. The driving trajectory weight represents the probability that the corresponding trajectory is selected by the model; a higher weight increases the likelihood that the model will select that type of trajectory in subsequent training. The average first reward is used to characterize the average first reward obtained by the model when selecting trajectories at each topological node during multiple rounds of interactive training, and is used to measure the overall quality of the model's path selection strategy.

[0092] The third precondition is a quantitative criterion used in interactive training based on the decision graph to determine whether the path selection strategy at the topology node has reached a qualified state. Its core function is to terminate multiple rounds of interactive training and determine the final path selection parameters. Specifically, a fixed average first reward threshold can be preset. When the average first reward of the topology node is greater than or equal to this threshold after multiple rounds of training, the third precondition is deemed to be satisfied. Alternatively, after multiple rounds (e.g., 10 or 20 rounds) of interactive training, the fluctuation range of the average first reward of the topology node is ≤ a preset minimum value, and the current average reward is at a relatively high level. Furthermore, after multiple rounds of interactive training, the average first reward of the topology node reaches its maximum value or peak value, and subsequent training cannot exceed this maximum value. Similarly, the fifth and sixth preconditions mentioned later can also be determined in the aforementioned manner.

[0093] In this implementation, firstly, a decision graph is used as the state space of the environment. Expert trajectories generated based on the road network topology and traffic rules of the decision graph are used as a benchmark to simulate the vehicle's journey from origin to destination. Based on the images of corresponding topological nodes and the trajectory sequence selected by the navigation selector module, the vehicle travels to the next topological node or target location, capturing the trajectory selection behavior of each topological node. This ensures the entire training process is grounded in real road constraints and closely matches actual driving scenarios for assisted driving. Secondly, a quantitative reward mechanism is established. The first reward for each trajectory selection is calculated from multiple dimensions, including the degree to which the trajectory matches the optimal path, driving safety, and driving compliance. The reward value provides an objective assessment of trajectory quality. Then, using a reward threshold as a boundary, the weight of trajectory selection is adjusted to positively reinforce high-quality selections and negatively suppress low-quality selections, guiding the model to gradually optimize its selection strategy. Subsequently, multiple rounds of iterative interactive training were carried out. In each round, the trajectory selection was optimized based on the weight adjustment results of the previous round, continuously improving the overall average first reward until the average reward reached the maximum value and the model selection strategy converged and stabilized. At this point, the corresponding model parameters had the ability to conform to the road network rules and select high-quality trajectories. Finally, this set of parameters was determined as the path selection parameters, and the initialization was completed.

[0094] In this way, the safety and comfort of single-step decision-making can be measured based on the first reward, i.e., the local reward. The single-round reward constrains the quality of local trajectory selection, and the average reward maximizes the overall effect of the path selection throughout the entire process. It takes into account the advantages and disadvantages of single-node decisions and the overall navigation effect, which can prevent the planning model from having a high-quality local trajectory but deviating from the target throughout the entire process. The selected target path not only fits the optimal trajectory, but also efficiently reaches the destination, which can improve navigation efficiency and driving smoothness.

[0095] In some optional implementations, initializing path selection parameters based on the decision graph further includes: during interactive training, determining the directed edge sequence traversed by the vehicle from the starting point to the destination and the shortest path from the starting point to the destination; determining a second reward corresponding to the directed edge sequence based on a fourth preset condition, the fourth preset condition including at least one of the prefix overlap between the directed edge sequence and the shortest path, the overlap between the directed edges in the directed edge sequence and the edges in the shortest path, and whether the destination has been reached; if the second reward is higher than a second reward threshold, increasing the weight of the directed edge sequence corresponding to the second reward; if the second reward is lower than or equal to the second reward threshold, decreasing the weight of the directed edge sequence corresponding to the second reward; performing multiple rounds of interactive training until the second reward corresponding to the traversed directed edge sequence satisfies a fifth preset condition; determining path selection parameters based on the parameters corresponding to the second reward that satisfies the fifth preset condition; or, combining the first reward and the second reward based on preset weights to obtain a third reward, and determining path selection parameters based on the parameters corresponding to the third reward that satisfies a sixth preset condition.

[0096] The directed edge sequence represents the sequence of directed edges traversed by a vehicle from the starting point to the destination in the decision graph, arranged in the order of travel, such as [e1, e2, ..., en], representing the road network links for the entire journey. The shortest path represents the optimal road network link (directed edge sequence) from the starting point to the destination calculated based on the topological rules and traffic constraints of the decision graph, possessing the characteristic of having the shortest path length. The second reward represents the global quantitative reward for the entire journey, used to measure the optimality (overlap with the shortest path) and effectiveness (whether the destination is reached) of the overall route, serving as the evaluation criterion for the global path selection strategy. Prefix overlap represents the proportion of consecutively overlapping directed edges in the directed edge sequence with the shortest path starting from the starting point. Edge overlap represents the proportion of the total number of directed edges in the directed edge sequence that overlap with the shortest path. The second reward threshold represents the preset global reward qualification threshold, used to distinguish between high-quality and low-quality global paths, and is the criterion for adjusting the selection weight of the directed edge sequence. The directed edge sequence weights represent the probability that the corresponding directed edge sequence will be selected by the model. The higher the weight, the greater the likelihood that the model will choose this type of global path during subsequent training. The preset weights represent pre-defined weight coefficients used to fuse the local first reward and the global second reward, or they can be fused based on a discount factor. The third reward represents the comprehensive reward obtained by weighted fusion of the first and second rewards, which can balance local trajectory quality and global path optimality.

[0097] In this embodiment, multi-step interactive navigation can be performed in the decision graph. Reinforcement learning methods can be used to optimize the navigation selector module. Alternatively, offline reinforcement learning algorithms can be used to optimize the strategies of the planning module and the navigation selector module based on historical data without interacting with the environment. Proximal policy optimization (PPO) can also be used to determine the candidate path generation parameters and path selection parameters. In a simulation environment built on a decision graph, the vehicle's journey from the starting point to the destination is simulated. The sequence of directed edges actually traversed by the vehicle is recorded, and the shortest path between the starting point and the destination is calculated based on the topology of the decision graph, providing a benchmark for global reward calculation. Combining at least one of the following indicators—prefix overlap (early optimality), edge overlap (overall optimality), and whether the destination is reached (effectiveness)—the second reward of the directed edge sequence is quantitatively calculated to evaluate the quality of the global path. The second reward is compared with a preset threshold. For high-quality global paths (rewards higher than the threshold), the selection weight of their directed edge sequences is increased, strengthening the navigation selector module's preference for the globally optimal path. For low-quality global paths, the weight is reduced, weakening the preference for such paths. The process of driving simulation, sequence recording, reward calculation, and weight adjustment is repeated. Through multiple rounds of interactive training, the global path selection strategy is continuously optimized until the second reward of the directed edge sequence reaches its maximum value. For determining the path selection parameters, one of the following methods can be used: the parameter corresponding to the maximum second reward can be directly selected as the path selection parameter; or the first reward (local) and the second reward (global) can be fused according to preset weights to obtain the third reward. Training continues until the third reward meets the sixth preset condition, at which point the corresponding parameters are selected as path selection parameters. The parameters of the navigation selector module can be updated using the Group Relative Policy Optimization (GRPO) algorithm, maximizing the expected reward across multiple rounds of interaction. This allows the planning and navigation selector modules to optimize the entire navigation strategy in the long-term reward context, rather than focusing solely on the current step.

[0098] In this way, in the two-stage training and reinforcement learning reward design, safety and comfort are constrained by the first (local) reward, and the path approaches the optimal route by the second (global) topological reward. Even in the absence of clear visual cues, the correct choice can still be made by relying on historical decisions and topological priors, thereby achieving mapless navigation behavior. This significantly reduces the probability of detours and unsafe driving behaviors. This integrated local and global training effect is difficult to achieve by traditional single-stage supervision or simple reinforcement learning methods.

[0099] In some optional implementations, determining the target path based on candidate paths and target location includes: determining whether a decision history exists; if no decision history exists, generating a decision history based on the target location, perception data of the vehicle's environment, candidate paths, and target path; if a decision history exists, determining the target path based on the decision history, candidate paths, and target location.

[0100] During multi-round vehicle control, in the absence of a decision history, a target path can be determined based on candidate paths and target locations. After the target path is generated, a decision history can be generated based on the target location, perceived data of the vehicle's environment, candidate paths, and the target path. Subsequent vehicle control can then determine the target path for the current round based on the decision history, combined with the candidate paths and target locations generated in the current round, and control the vehicle to travel along that target path.

[0101] Specifically, at least one candidate path generated by the planning module and the target path determined by the navigation selector module can be stored in the decision history. The decision history is then input into the navigation selector module. Figure 6 The diagram illustrates the historical data used to inform decision-making. When the vehicle's environment is as shown in the first image 61, the system asks: "Based on the first image 61, issue a command to proceed to the target location and determine the target path." The navigation selector module answers with a yellow target path. When the vehicle's environment is as shown in the second image 62, the system asks: "Based on historical decisions and the target location, determine the target path." The navigation selector module answers with a blue target path. This process is repeated until the target location is reached. This allows the navigation selector module to consider not only the current screen and voice commands when selecting a path, but also previous historical decision results. This ensures that the entire driving trajectory is topologically as close as possible to the shortest feasible path, while avoiding collisions and uncomfortable actions. Furthermore, it balances the rationality of individual decision-making with the consistency of the overall route in assisted driving.

[0102] This embodiment also provides a vehicle control device for implementing the above embodiments and preferred embodiments; details already described will not be repeated. As used below, the term "module" can refer to a combination of software and / or hardware that implements a predetermined function. Although the device described in the following embodiments is preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.

[0103] This embodiment provides a vehicle control device. Figure 7 A schematic diagram of the constituent modules of the vehicle control device provided in this application is shown, such as... Figure 7 As shown, it includes: The target location acquisition module 701 is used to acquire the target location of the vehicle. The perception module 702 is used to acquire perception data of the vehicle's environment; Control module 703 is used to determine the target path based on perception data and target location, and control the vehicle to travel according to the target path; Display module 704 is used to respond to control commands that drive the vehicle to the target location, display the target path, and display a screen showing the vehicle traveling based on the target path.

[0104] The display module is used to visualize the driving process, providing intuitive feedback to the user. Upon receiving control commands to drive the vehicle to the target location, it displays the target path. The vehicle's trajectory can be clearly marked with highlighted lines on the in-vehicle screen, showing the vehicle following the target path.

[0105] This can enhance the driver's control over the vehicle's driving status and reduce driving fatigue. It is especially suitable for the human-machine collaboration needs in assisted driving scenarios. For example, if the user triggers the left or right turn signal, the vehicle can be controlled based on the user's vehicle control command and displayed on the display module. The user's vehicle control command can be manually input or voice input.

[0106] In some optional implementations, the control module is further configured to obtain at least one candidate path based on the sensing data, and determine the target path based on the candidate path and the target location; the display module is further configured to display the candidate path and the target path determined from the candidate path.

[0107] In this embodiment, the control module generates multiple feasible candidate paths based on perception data and vehicle driving constraints. Each candidate path is then matched and scored against the target location, and the optimal path is selected as the target path. The display module synchronously displays all generated candidate paths on the screen, using different colors and line types for differentiation. Multiple candidate paths can be overlaid on the vehicle's front view or on a virtual image within the vehicle's display interface. The final selected target path is highlighted and bolded. As the vehicle travels along the determined target path, the display module continuously refreshes the path and driving view, achieving synchronization between planning and display.

[0108] In this way, users can intuitively see the multiple optional routes planned by the system, understand the path planning logic, and increase their trust in the intelligent driving system; at the same time, the full visualization of path planning can improve the intuitiveness and convenience of driving.

[0109] In some alternative implementations, the display module is also used to identify the target path, wherein the identification includes at least one of highlighting, adding text, and adding an icon.

[0110] In this embodiment, the display module can differentiate the target path in the display screen by highlighting it, such as using bolding, color changing, flashing, or glowing lines; adding text, such as recommended path or target path; and adding icons, such as one or more of arrows, guide signs, and route markers. Other candidate paths maintain a normal display style, forming a clear visual distinction from the target path.

[0111] In this way, the visual differentiation is high, and the driver can quickly identify the driving trajectory; at the same time, the intuitive and clear route guidance can reduce the driver's understanding cost, without the need to judge the system's driving intention, making operation easier.

[0112] In some alternative implementations, the aforementioned vehicle control device further includes a voice module for explaining the reason for selecting the target path.

[0113] In this embodiment, after determining the target route, the control module can transmit relevant data about the reasons for selecting the target route to the voice module, such as the shortest distance. After receiving the relevant data, the voice module converts the textual reasons for the selection into voice content and plays it to the user. The control module can also control the display module to highlight the exit signs on the vehicle driving screen or navigation screen to ensure that the exit signs are clearly visible.

[0114] In this way, through multiple auditory and visual guidance, users can clearly understand the route planning logic and quickly capture key information such as exits, thereby improving the safety and accuracy of vehicle driving.

[0115] In some optional implementations, the control module includes: a path planning unit, used to obtain at least one candidate path based on the vehicle's preset trajectory anchor points, vehicle status, and perception data; and to determine the target path based on the candidate path and the target location.

[0116] In some optional implementations, the path planning unit includes: a first path planning subunit, used to fuse trajectory anchor points and perception data to obtain a first feature; to perform nonlinear semantic enhancement on the fused first feature and a second feature of vehicle state to obtain an enhanced second feature; to perform time step modulation based on the enhanced second feature to obtain a temporal feature; to map the temporal feature to a path coordinate sequence, and to determine a candidate path based on the path coordinate sequence.

[0117] In some optional embodiments, the aforementioned vehicle control device further includes: a decision graph construction module, used to extract environmental features based on perception data; acquire topological data of a preset area, identify intersections in the topological data, and determine the positions at preset distances from the intersections as topological nodes; determine directed edges between topological nodes based on lanes represented by the topological data; construct a decision graph based on intersections, topological nodes, and directed edges; initialize path selection parameters based on the decision graph; and determine the target path based on the path selection parameters, integrating environmental features and target location.

[0118] In some optional implementations, the decision graph construction module includes: a first parameter determination unit, used in interactive training to determine the topological nodes traversed by the vehicle from a preset starting point to a preset ending point; to determine a first reward corresponding to the driving trajectory selected at the traversed topological nodes based on a second preset condition, the second preset condition including at least one of the average Euclidean distance between the driving trajectory and the expert trajectory, whether a collision occurs, and whether a boundary crossing occurs; if the first reward is higher than a first reward threshold, to increase the weight of selecting the driving trajectory corresponding to the first reward, and if the first reward is lower than or equal to the first reward threshold, to decrease the weight of selecting the driving trajectory corresponding to the first reward; to perform multiple rounds of interactive training until the average first reward of the path selected at the traversed topological nodes satisfies a third preset condition, and to determine path selection parameters based on the parameters corresponding to the average first reward that satisfies the third preset condition.

[0119] In some optional implementations, the decision graph construction module further includes: a second parameter determination unit, used in interactive training to determine, for example, the directed edge sequence traversed by the vehicle from a preset starting point to a preset ending point and the shortest path from the starting point to the ending point; to determine a second reward corresponding to the directed edge sequence based on a fourth preset condition, the fourth preset condition including at least one of the prefix overlap between the directed edge sequence and the shortest path, the overlap between the directed edges in the directed edge sequence and the edges in the shortest path, and whether the ending point has been reached; if the second reward is higher than a second reward threshold, to increase the weight of the directed edge sequence corresponding to the second reward being selected, and if the second reward is lower than or equal to the second reward threshold, to decrease the weight of the directed edge sequence corresponding to the second reward being selected; to perform multiple rounds of interactive training until the second reward corresponding to the traversed directed edge sequence satisfies a fifth preset condition; to determine path selection parameters based on the parameters corresponding to the second reward that satisfy the fifth preset condition; or, to obtain a third reward by combining the first reward and the second reward based on preset weights, and to determine path selection parameters based on the parameters corresponding to the third reward that satisfy a sixth preset condition.

[0120] In some optional implementations, the path planning unit further includes: a second path planning subunit, used to initialize candidate path generation parameters based on the decision graph; and to obtain at least one candidate path based on the candidate path generation parameters, the vehicle's preset trajectory anchor points, the vehicle's state, and perception data.

[0121] In some optional implementations, the decision map construction module further includes: a third parameter determination unit, used to generate expert trajectories from a starting point to multiple endpoints in the decision map; cluster the expert trajectories to obtain initial trajectory anchor points; add noise to the target expert trajectory to generate a noisy trajectory, wherein the target expert trajectory is any one of the expert trajectories; acquire initial perception data of vehicles in the decision map, and based on the initial perception data and the initial trajectory anchor points, gradually reconstruct the noisy trajectory, and determine a first error between the first trajectory generated during the reconstruction process and the target expert trajectory; if the first error meets a first preset condition, determine candidate path generation parameters based on the parameters for generating the first trajectory corresponding to the first error.

[0122] In some optional embodiments, the aforementioned vehicle control device further includes: a historical decision module, used to determine a target path based on candidate paths and target location, including: determining whether there is a decision history; if there is no decision history, generating a decision history based on the target location, perception data of the vehicle's environment, candidate paths, and target path; if there is a decision history, determining the target path based on the decision history, candidate paths, and target location.

[0123] The vehicle control device provided in this application can execute the vehicle control method provided in any embodiment of this application, and has the corresponding functional modules and beneficial effects for executing the method. Further functional descriptions of the various modules and units described above are the same as those in the corresponding embodiments described above, and will not be repeated here.

[0124] Figure 8 This is a schematic diagram of the hardware structure of an electronic device according to an embodiment of this application.

[0125] The following is a detailed reference. Figure 8 The diagram illustrates a structural schematic suitable for implementing the electronic device described in the embodiments of this application. The electronic device may include a processor (e.g., a central processing unit, graphics processor, etc.) 801, which can perform various appropriate actions and processes according to a program stored in read-only memory (ROM) 802 or a program loaded from memory 808 into random access memory (RAM) 803. The RAM 803 also stores various programs and data required for the operation of the electronic device. The processor 801, ROM 802, and RAM 803 are interconnected via a bus 804. An input / output (I / O) interface 805 is also connected to the bus 804.

[0126] Typically, the following devices can be connected to I / O interface 805: input devices 806 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 807 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; memory devices 808 including, for example, magnetic tapes, hard disks, etc.; and communication devices 809. Communication device 809 allows electronic devices to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 8 Electronic devices with various devices are shown, but it should be understood that it is not required to implement or have all of the devices shown, and more or fewer devices may be implemented or have instead.

[0127] Specifically, according to embodiments of this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this application include a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device 809, or installed from a memory 808, or installed from a ROM 802. When the computer program is executed by the processor 801, it performs the functions defined in the vehicle control method of embodiments of this application.

[0128] Figure 8 The electronic device shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of this application.

[0129] This application also provides a computer-readable storage medium. The methods described in this application can be implemented in hardware or firmware, or implemented as recordable on a storage medium, or implemented as computer code downloaded over a network and originally stored on a remote storage medium or a non-transitory machine-readable storage medium and subsequently stored on a local storage medium. Thus, the methods described herein can be processed by software stored on a storage medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware. The storage medium can be a magnetic disk, optical disk, read-only memory, random access memory, flash memory, hard disk, or solid-state drive, etc.; further, the storage medium can also include combinations of the above types of memory. It is understood that computers, processors, microprocessor controllers, or programmable hardware include storage components capable of storing or receiving software or computer code. When the software or computer code is accessed and executed by the computer, processor, or hardware, the vehicle control method shown in the above embodiments is implemented.

[0130] A portion of this application can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide the methods and / or technical solutions according to this application through the operation of the computer. Those skilled in the art will understand that the forms in which computer program instructions exist in a computer-readable medium include, but are not limited to, source files, executable files, installation package files, etc. Correspondingly, the ways in which computer program instructions are executed by a computer include, but are not limited to: the computer directly executing the instructions, or the computer compiling the instructions and then executing the corresponding compiled program, or the computer reading and executing the instructions, or the computer reading and installing the instructions and then executing the corresponding installed program. Here, the computer-readable medium can be any available computer-readable storage medium or communication medium accessible to a computer.

[0131] Although embodiments of this application have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of this application, and all such modifications and variations fall within the scope defined by the appended claims.

Claims

1. A vehicle control method, characterized in that, The method includes: Obtain the target location of the vehicle; Determine whether the vehicle has reached the target location. If the vehicle has not reached the target location, determine the target path based on the perception data of the vehicle's environment and the target location, and control the vehicle to travel along the target path.

2. The vehicle control method according to claim 1, characterized in that, The process of determining the target path based on the perception data of the vehicle's environment and the target location includes: Based on the vehicle's preset trajectory anchor points, vehicle status, and the perception data, at least one candidate path is obtained; The target path is determined based on the candidate path and the target location.

3. The vehicle control method according to claim 2, characterized in that, Based on the vehicle's preset trajectory anchor points, vehicle status, and the perception data, at least one candidate path is obtained, including: By fusing the trajectory anchor points and the perceived data, a first feature is obtained; A non-linear semantic enhancement is performed on the second feature that integrates the first feature and the vehicle state to obtain the enhanced second feature; Based on the enhanced second feature, time-step modulation is performed to obtain a temporal feature; The temporal features are mapped to a path coordinate sequence, and the candidate path is determined based on the path coordinate sequence.

4. The vehicle control method according to claim 2, characterized in that, The method of determining the target path based on the perception data of the vehicle's environment and the target location further includes: Environmental features are extracted based on the perceived data; Obtain topology data of a preset area, identify intersections in the topology data, and determine the positions at preset distances from the intersections as topology nodes; Based on the lanes represented by the topological data, the directed edges between the topological nodes are determined; A decision graph is constructed based on the intersection points, the topological nodes, and the directed edges; The path selection parameters are initialized based on the decision graph, and the target path is determined by integrating the environmental features and the target location based on the path selection parameters.

5. The vehicle control method according to claim 4, characterized in that, The initialization of path selection parameters based on the decision graph includes: In interactive training, the topological nodes passed by the vehicle as it travels from a preset starting point to a preset ending point are determined; The first reward corresponding to the driving trajectory selected by the passed topology nodes is determined based on the second preset condition. The second preset condition includes at least one of the following: the average Euclidean distance between the driving trajectory and the expert trajectory, whether a collision occurs, and whether a boundary crossing occurs. If the first reward is higher than the first reward threshold, the weight of selecting the driving trajectory corresponding to the first reward is increased; if the first reward is lower than or equal to the first reward threshold, the weight of selecting the driving trajectory corresponding to the first reward is decreased. Multiple rounds of interactive training are performed until the average first reward for selecting a path through the traversed topology nodes satisfies a third preset condition. Based on the parameters corresponding to the average first reward that satisfies the third preset condition, the path selection parameters are determined.

6. The method according to claim 5, characterized in that, The initialization of path selection parameters based on the decision graph also includes: In interactive training, the directed edge sequence traversed by the vehicle as it travels from a preset starting point to a preset ending point, as well as the shortest path from the starting point to the ending point, are determined. The second reward corresponding to the directed edge sequence is determined based on a fourth preset condition, which includes at least one of the following: the prefix overlap between the directed edge sequence and the shortest path, the overlap between the directed edges in the directed edge sequence and the edges in the shortest path, and whether the destination has been reached. If the second reward is higher than the second reward threshold, the weight of the directed edge sequence corresponding to the second reward is increased; if the second reward is lower than or equal to the second reward threshold, the weight of the directed edge sequence corresponding to the second reward is decreased. Perform multiple rounds of the aforementioned interactive training until the second reward corresponding to the sequence of directed edges that has been passed satisfies the fifth preset condition; The path selection parameters are determined based on the parameters corresponding to the second reward that meets the fifth preset condition; or, the first reward and the second reward are combined based on preset weights to obtain the third reward, and the path selection parameters are determined based on the parameters corresponding to the third reward that meets the sixth preset condition.

7. The vehicle control method according to claim 4, characterized in that, Based on the vehicle's preset trajectory anchor points, vehicle status, and the perception data, at least one candidate path is obtained, including: Initialize the candidate path generation parameters based on the decision graph; Based on the candidate path generation parameters, the vehicle's preset trajectory anchor points, the vehicle status, and the perception data, at least one candidate path is obtained.

8. The vehicle control method according to claim 7, characterized in that, The initialization of candidate path generation parameters based on the decision graph includes: The decision graph generates expert trajectories from the starting point to multiple endpoints; Cluster the expert trajectories to obtain initial trajectory anchor points; Noise is added to the target expert trajectory to generate a noisy trajectory, wherein the target expert trajectory is any one of the expert trajectories; Acquire initial perception data of the vehicle in the decision map, and based on the initial perception data and the initial trajectory anchor point, gradually reconstruct the noisy trajectory, and determine the first error between the first trajectory generated during the reconstruction process and the target expert trajectory; If the first error satisfies the first preset condition, the candidate path generation parameters are determined based on the parameters for generating the first trajectory corresponding to the first error.

9. The vehicle control method according to claim 2, characterized in that, Determining the target path based on the candidate path and the target location includes: Determine whether a decision history exists; if the decision history does not exist, generate the decision history based on the target location, the perception data of the vehicle's environment, the candidate path, and the target path. If the decision history exists, the target path is determined based on the decision history, the candidate paths, and the target location.

10. A vehicle control device, characterized in that, The device includes: The target location acquisition module is used to acquire the target location of the vehicle; The perception module is used to acquire perception data of the environment in which the vehicle is located; The control module is used to determine the target path based on the perceived data and the target location, and control the vehicle to travel along the target path; The display module is used to respond to a control command that drives the vehicle to a target location, display the target path, and display a screen showing the vehicle traveling along the target path.

11. The vehicle control device according to claim 10, characterized in that, The control module is further configured to obtain at least one candidate path based on the sensing data, and determine the target path based on the candidate path and the target location; The display module is also used to display the candidate paths and the target path determined among the candidate paths.

12. The vehicle control device according to claim 10, characterized in that, The display module is also used to identify the target path, wherein the identification includes at least one of highlighting, adding text, and adding an icon.

13. The vehicle control device according to claim 10, characterized in that, The device further includes: The voice module is used to explain why the target path was selected.

14. An electronic device, characterized in that, include: A memory and a processor are communicatively connected, the memory storing computer instructions, and the processor executing the computer instructions to perform the steps of the vehicle control method according to any one of claims 1 to 9.

15. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions for causing the computer to perform the steps of the vehicle control method according to any one of claims 1 to 9.