Complex environment multi-scale car cloud collaborative map construction method based on end-to-end residual pruning

By employing an end-to-end residual trimming method, efficient multi-scale map construction for complex environments in IVCPS was achieved, solving the problems of dynamic environment representation and data synchronization under resource-constrained conditions, and improving the adaptability and real-time performance of map construction.

CN122244352APending Publication Date: 2026-06-19CHONGQING UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHONGQING UNIV
Filing Date
2026-03-11
Publication Date
2026-06-19

Smart Images

  • Figure CN122244352A_ABST
    Figure CN122244352A_ABST
Patent Text Reader

Abstract

This invention belongs to the field of intelligent connected vehicles and vehicle-road-cloud integration technology. It discloses a method for constructing multi-scale vehicle-cloud collaborative maps in complex environments based on end-to-end residual pruning. The method improves upon existing technologies through the following improvements: 1) Designing a dynamic weight mapping mechanism driven by multi-scale context, which enhances the adaptive representation accuracy of the end-to-end model for dynamic environment topological features; 2) Designing a pruning mechanism triggered by geometric consistency residuals, which achieves adaptive pruning of point cloud synchronization tasks by measuring the deviation between observed values ​​and prior maps, thus reducing communication overhead; 3) Constructing a cloud architecture based on incremental end-to-end learning, solving the key technical challenge of lossless reconstruction of local perception information into a global map. These improvements demonstrate that this invention can effectively reduce resource consumption for environment construction while meeting the real-time and bandwidth constraints of IVCPS, and addresses the key technical deficiencies of existing dynamic mapping methods in unstructured collaborative scenarios.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of intelligent connected vehicles and vehicle-road-cloud integration technology, specifically involving a method for constructing multi-scale vehicle-cloud collaborative maps for complex environments based on end-to-end residual trimming. Background Technology

[0002] With the development of autonomous driving, vehicle-road-cloud integration, and large-scale intelligent transportation systems, Intelligent Vehicle Cyber-Physical Systems (IVCPS) have become the core architecture for realizing high-level autonomous driving and intelligent transportation. In the multi-dimensional architecture of IVCPS, complex environment map construction is not only a digital mirror of the physical world in cyberspace, but also a core foundational functional module supporting cross-domain integration and collaborative decision-making. Especially in unstructured scenarios, high-precision, real-time environment construction technology plays a decisive role in ensuring the safety and task continuity of IVCPS. Faced with changing actual operating conditions, the dynamic requirements of this functional module typically exhibit the following common patterns:

[0003] 1. High spatiotemporal correlation and demand for cross-domain collaboration. The accuracy of environment construction depends not only on the current sensor input, but also on the real-time alignment of historical map priors and global geometric constraints, requiring functional modules to have a high degree of spatiotemporal consistency when integrating vehicles and the cloud across domains;

[0004] 2. Extremely stringent real-time requirements. Under high-speed maneuvering conditions, the vehicle's response time to the perception of local dynamic obstacles typically needs to be controlled within milliseconds. This requires the environmental mapping module to have extremely high computational efficiency during execution.

[0005] 3. The potential for hierarchical representation quality and functional decoupling is great. The environment construction can adopt different representation modes according to the complexity of the scenario. Different levels have significant differences in communication bandwidth consumption and computation time, which provides a natural basis for the hierarchical decoupling and modular tailoring of complex functions.

[0006] 4. The sensing load is characterized by suddenness and heterogeneity. When the vehicle enters an unstructured or highly dynamic scenario, the amount of sensing data and environmental uncertainty will increase sharply in a short period of time, requiring the system functional modules to be able to flexibly allocate resources and respond according to dynamic demand characteristics.

[0007] During IVCPS operation, when environmental features are relatively simple and communication bandwidth is sufficient, the cloud can request the maintenance of the highest-precision explicit geometric map for all vehicles to achieve optimal environmental representation performance. However, when the vehicle cluster load surges, wireless communication bandwidth is limited, or the complexity of environmental features exceeds the critical value, if the mapping module lacks an efficient pruning mechanism and still forces the full point cloud upload and explicit mapping, it will cause serious cumulative delays in the system processing link, compromising the immediacy of navigation decisions. If the construction accuracy is reduced indiscriminately or the perception dimensions are simplified, it will result in over-pruning. Although it can reduce response latency, it will cause the loss of environmental features and the failure of geometric constraints, threatening driving safety.

[0008] Therefore, based on the IVCPS multidimensional architecture and dynamic requirements, how to achieve a map construction method that integrates IVCPS basic functions across domains and modularly and efficiently, with controllable performance, on-demand data synchronization, and geometric adaptive capabilities, under the constraints of limited communication bandwidth, limited onboard computing power, and strict decision-making time windows, has become a highly challenging key technical problem in the field of vehicle-cloud collaboration. Summary of the Invention

[0009] In view of this, the purpose of this invention is to provide a method for constructing multi-scale vehicle-cloud collaborative maps in complex environments based on end-to-end residual trimming. This aims to address the significant performance bottlenecks commonly found in existing technologies for unstructured vehicle-cloud collaborative scenarios with limited resources and complex, variable spatial features, such as: 1) the static model's representational capabilities are difficult to adapt to dynamic environmental topology changes; 2) existing data synchronization strategies lead to ineffective use of communication bandwidth in vehicle-cloud collaboration; and 3) the unified-dimensional feature processing mechanism results in an imbalance between terminal computing power allocation and mapping accuracy.

[0010] To achieve the above objectives, the present invention adopts the following technical solution:

[0011] A method for constructing multi-scale vehicle-cloud collaborative maps in complex environments based on end-to-end residual clipping includes the following steps:

[0012] S1. Context-driven dynamic weight mapping perception: Multimodal alignment and multi-scale feature extraction are performed on vehicle sensor data. Global topological features are extracted through the large kernel perception branch, and then the convolution weights of the small kernel adaptation branch are dynamically generated using the global topological features to reconstruct local features and generate enhanced feature maps.

[0013] S2. Pruning mechanism based on geometric consistency residual triggering: The enhanced feature map Y is compared with the reference feature map generated by projecting the high-precision map in the cloud. Geometric difference calculation is performed, dynamic interference is filtered out through semantic masking to obtain the effective geometric residual R, and the pruned residual feature stream is triggered based on spatiotemporal integration and adaptive threshold decision. Reporting as needed;

[0014] S3. Cloud-based reconstruction architecture based on incremental end-to-end learning: receiving and decoding compressed residual feature streams By using cross-domain feature alignment and an incremental fusion network, sparse residuals are mapped to a global coordinate system to generate incremental update values. This is overlaid onto the global base map to achieve dynamic map evolution.

[0015] Furthermore, step S1 includes the following sub-steps:

[0016] S1.1 performs multimodal spatial alignment on the original point cloud P and the visual image I, and generates a fused spatial feature map using the shallow layer feature extraction module. ;

[0017]

[0018] In the formula, This represents the feature transformation operator for lidar; express; Indicates feature splicing; The shallow weight matrix represents the initial aggregation of dimensionality reduction and semantics;

[0019] In the core multi-scale feature processing unit S1.2, long-distance spatial dependencies are extracted through the large kernel perception branch to generate global topological features. ;

[0020]

[0021] In the formula, PW represents pointwise convolution; DW represents depthwise convolution; K L The kernel size is large. Represents local input features;

[0022] S1.3 Utilizing global topological features Dynamically generated weights for small kernel convolutions And feature aggregation is achieved through a small kernel adaptation branch to generate the final enhanced feature map Y;

[0023]

[0024]

[0025] In the formula, σ and δ represent the Sigmoid and ReLU activation functions, respectively; Indicates global average pooling; Represents a learnable linear transformation matrix; For mapping functions; K represents small-scale convolution; SThis is the kernel size for small convolutions.

[0026] Furthermore, step S2 includes the following sub-steps:

[0027] S2.1 utilizes the vehicle positioning attitude matrix T and the sensor intrinsic parameter matrix K to incorporate prior information from the high-precision map in the cloud. Project onto the current feature space to generate a reference feature map. ;

[0028]

[0029] In the formula, Represents the perspective projection transformation operator;

[0030] S2.2 Calculate the enhanced feature map Y and the reference feature map The pixel-by-pixel Euclidean distance between them yields the original geometric residual matrix R. raw ;

[0031]

[0032] S2.3 introduces a semantic weight mask S to perform spatial weighted filtering on the residual matrix, obtaining the effective geometric residual R;

[0033]

[0034] In the formula, For indicator functions; This represents a predefined set of static semantic categories;

[0035] S2.4 performs spatiotemporal integration on the effective geometric residual R to calculate the change confidence level. ,when Time-triggered reporting generates a pruned compressed residual feature stream. ;

[0036]

[0037]

[0038] In the formula, An adaptive threshold; This is the time decay factor; For time intervals; For gated functions; It is a learnable dimension-reduced projection matrix.

[0039] Furthermore, step S3 includes the following sub-steps:

[0040] S3.1 uses the decoding operator to process the received compressed residual feature stream. Reduced to residual feature tensor ;

[0041] S3.2 introduces a spatiotemporal recalibration operator through a cross-domain feature alignment module. Combined with cloud pose correction parameters The sparse residuals are mapped to the global coordinate system to generate a set of synchronized features. ;

[0042]

[0043] S3.3 Construct an incremental fusion network with synchronization features With local base map slices As input, a multi-head cross-attention mechanism is used to generate an updated feature stream. ;

[0044]

[0045] In the formula, These represent the query matrix, key matrix, and value matrix obtained through linear transformation, respectively. This is the scaling factor;

[0046] S3.4 Using generative completion operators This will update the feature stream. Convert to map incremental update value And overlay it onto the global base map;

[0047]

[0048]

[0049] In the formula, Learnable parameters for reconstructing networks in the cloud; Represents a global static basemap; This indicates the updated global map.

[0050] Beneficial effects:

[0051] This invention provides a method for constructing multi-scale vehicle-cloud collaborative maps in complex environments based on end-to-end residual pruning. The improvements are achieved through the following methods: 1) Designing a dynamic weight mapping mechanism driven by multi-scale context, which improves the adaptive representation accuracy of the end-to-end model for the topological features of complex dynamic environments; 2) Designing a task pruning mechanism based on geometric consistency residuals, which achieves adaptive pruning of point cloud synchronization tasks by measuring the deviation between observed values ​​and prior maps, thus reducing communication overhead; 3) Constructing a cloud architecture based on incremental end-to-end learning, solving the key technical challenge of lossless reconstruction of local perception information into a global map. These improvements demonstrate that this invention can effectively reduce the resource consumption of environment construction while meeting the real-time and bandwidth constraints of IVCPS, and solves the key technical defects of existing dynamic mapping methods in unstructured collaborative scenarios.

[0052] Other advantages, objectives, and features of the invention will be set forth in part in the description which follows, and in part will be apparent to those skilled in the art from the following examination, or may be learned from practice of the invention. The objectives and other advantages of the invention can be realized and obtained through the following description. Attached Figure Description

[0053] Figure 1 This is a flowchart of a method for constructing multi-scale vehicle-cloud collaborative maps in complex environments based on end-to-end residual trimming, according to the present invention. Detailed Implementation

[0054] To make the technical solutions, advantages, and objectives of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. All other embodiments obtained by those skilled in the art based on the described embodiments of the present invention without creative effort are within the protection scope of this application.

[0055] like Figure 1 As shown, this invention provides a method for constructing multi-scale vehicle-cloud collaborative maps in complex environments based on end-to-end residual trimming, specifically including the following steps:

[0056] S1. Dynamic weight mapping perception based on multi-scale context-driven approach;

[0057] This mechanism constructs a dual-branch parallel architecture. A large-scale receptive field branch captures the global topological features of the environment in real time and transforms them into dynamic parameters of the perceptual operator, thereby achieving a high degree of adaptation between the model logic and the input content. In the feature extraction stage, the system first performs multimodal spatial alignment between the point cloud representation from the vehicle-mounted LiDAR and the visual image features, and then uses a shallow feature extraction module to generate a fused spatial feature map. .

[0058]

[0059] in The original point cloud, For visual images, The represents the feature transformation operator for each modality, and [·] represents feature concatenation. This represents the shallow weight matrix for dimensionality reduction and initial semantic aggregation.

[0060] In the core feature processing unit, the system introduces a Large Kernel Perception (LKP) branch to capture long-range spatial dependencies. This branch employs a large-size depthwise separable convolutional structure to expand the receptive field and establish a global semantic field, thereby recognizing the macroscopic topological structure of roads.

[0061]

[0062] In this formula, The input features representing the current functional block. This indicates pointwise convolution to enable cross-channel information interaction. Indicates the kernel size as The system employs deep convolution. Through this hierarchical aggregation of spatial features, it is able to extract continuous road skeletons and environmental salient features from discrete sensor observations.

[0063] To achieve refined feature extraction for content adaptation, the system further constructs a SmallKernel Adaptation (SKA) branch. This branch does not directly use static parameters, but instead utilizes global topological features extracted by the LKP module. The weights of the small kernel convolution are dynamically generated. This mechanism essentially uses macroscopic priors to conditionally constrain the perception of microscopic features. Let the generated dynamic weights be... Its mapping function It typically consists of two linear transformations and an activation function. The generation process of dynamic weights is expressed as:

[0064]

[0065] in and These represent the Sigmoid and ReLU activation functions, respectively. The system reconstructs local features using dynamically generated weights in real-time, enabling small-scale convolutional kernels. Highly accurate feature aggregation is achieved through computation. The final feature map... The calculation process is as follows:

[0066]

[0067] This design enables the model to adjust its focusing strategy on detailed features such as lane lines and curbs in real time based on the semantic information of the global road network structure when dealing with highly heterogeneous dynamic scenarios such as complex intersections and forking lanes. Compared with traditional static convolutional models, this method significantly enhances the robustness of map representation in changing environments without increasing the redundancy of static parameters during the inference stage.

[0068] S2. Clipping mechanism triggered by geometric consistency residuals;

[0069] After obtaining feature maps that can accurately adapt to complex environments Subsequently, a differentiated triggering mechanism based on geometric consistency residuals was designed to achieve efficient on-demand data synchronization by extracting the spatiotemporal deviation between observed features and pre-set maps in the cloud. This mechanism will then use the enhanced feature map generated in the previous stage... As the core input, by calculating and cropping the geometric differences between the input and the cloud-projected features, the actual changes in the environmental topology are accurately identified.

[0070] The system's primary task is to establish a spatial mapping relationship between the current vehicle-mounted sensor coordinate system and the global coordinate system of the cloud-based high-precision map. This is achieved using the vehicle's current positioning and attitude matrix. The prior information of local high-precision maps distributed from the cloud Projected into the current feature space, a reference feature map is generated. The projection process can be represented as:

[0071]

[0072] in Represents the perspective projection transformation operator. This is the sensor intrinsic parameter matrix. Through this step, the pre-set prior information in the cloud is aligned with the currently observed feature map. From the same perspective, a geometric benchmark is provided for subsequent residual calculations. The system calculates enhanced feature maps. Compared with reference feature map The pixel-wise Euclidean distance between them is used to extract the original geometric residual matrix. .

[0073]

[0074] To eliminate interference from dynamic traffic participants in the detection of environmental changes, a semantic weight mask is introduced. Identify permanent infrastructure areas such as roads and buildings in the image, and then perform spatial weighted filtering on the residual matrix. The weighted effective clipping geometric residual is then calculated. The calculation formula is:

[0075]

[0076] in, For indicator functions, This represents a predefined set of static semantic categories. Through this dual constraint of semantics and geometry, the system can effectively eliminate spurious residuals caused by vehicle occlusion and light and shadow drift, ensuring that the captured information represents substantial changes at the road topology level.

[0077] In the final transmission decision stage, the system evaluates the effective residual matrix. Perform spatiotemporal integration to calculate the confidence level of change in the current observation sequence. To balance real-time updates with bandwidth load, the triggering mechanism employs an adaptive threshold function, only triggering updates when the confidence level exceeds a dynamic threshold. Only then will the system activate the feature reporting link. This condition is expressed as:

[0078]

[0079] in The time decay factor, This represents the time interval since the last synchronization. When the above inequality holds, the system does not upload the original full-scale sensing data, but instead generates a compressed residual feature stream to be uploaded through feature difference operations. :

[0080]

[0081] In the formula, For gated functions, This is a learnable, reduced-dimensional projection matrix. The incremental feature flow after pruning. This method incorporates only the core features of environmental changes, achieving a high degree of compression of raw sensing data. Through this on-demand reporting mode, it achieves accurate capture of environmental topological features under low bandwidth consumption, providing high-quality data input for incremental updates of cloud maps.

[0082] S3. Cloud-based refactored architecture for incremental end-to-end learning;

[0083] The compressed feature stream, triggered and uploaded by the vehicle, is received in the cloud. Subsequently, the system enters the cloud-based reconstruction and feature completion stage based on incremental end-to-end learning. The core task of this method is to solve the alignment problem of asymmetric feature fragments across the spatiotemporal span, and to use a generative mechanism to complete the lossless mapping from local residuals to a globally consistent map.

[0084] The cloud first uses a decoding operator to restore the received compressed data into a residual feature tensor. Due to occlusion and discrete triggering timing in vehicle-side perception, this feature exhibits high sparsity and asymmetry in spatial distribution. To integrate it into the global base map, the system establishes a cross-domain feature alignment module. This module introduces a spatiotemporal recalibration operator. Combined with high-precision pose correction parameters in the cloud This maps the sparse residuals to a standard mesh in the global coordinate system. The alignment process is described as follows:

[0085]

[0086] in This represents the set of synchronized features after time synchronization and spatial translation. This set is related to the global map base in the cloud within the feature domain. Preliminary semantic alignment has been achieved.

[0087] To address the topological fragmentation problem caused by incomplete feature segment observations, the system constructs an incremental fusion network. This network uses synchronous feature segment observations... With local base map slices As a joint input, a multi-head cross-attention mechanism is used to extract the correlation between the two inputs, generating an enhanced updated feature stream. Its fusion logic can be expressed as:

[0088]

[0089] In this formula, These represent the query matrix, key matrix, and value matrix obtained through linear transformation, respectively. The scaling factor is used. In this way, the mature topological information in the base map can provide geometric constraints for sparse residual features, helping the system infer occluded or unobserved environmental details.

[0090] In the final reconstruction output stage, the system designed a generative completion operator. This operator, based on an end-to-end learning mechanism, transforms the fused feature stream into incremental update values ​​for map elements. To ensure the logical and geometric consistency of the updated map, the reconstruction process is constrained by a topological continuity loss term. The incremental reconstruction process is represented as follows:

[0091]

[0092] in, Learnable parameters for reconstructing the network in the cloud. Ultimately, by overlaying this increment onto the original static base map, the map's dynamic evolution is achieved:

[0093]

[0094] This technology effectively avoids the accuracy loss introduced by traditional methods during discretization post-processing through end-to-end feature flow reconstruction. Utilizing a global base map as prior guidance, it successfully solves the topological completion problem of asymmetric feature fragments, ensuring global consistency and high-frequency accuracy of the vehicle-cloud collaborative map under complex and fragmented perception conditions.

[0095] It is hereby declared that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.

Claims

1. A method for constructing multi-scale vehicle-cloud collaborative maps in complex environments based on end-to-end residual trimming, characterized in that, Includes the following steps: S1. Context-driven dynamic weight mapping perception: Multimodal alignment and multi-scale feature extraction are performed on vehicle sensor data. Global topological features are extracted through the large kernel perception branch, and then the convolution weights of the small kernel adaptation branch are dynamically generated using the global topological features to reconstruct local features and generate enhanced feature maps. S2. Pruning mechanism based on geometric consistency residual triggering: The enhanced feature map Y is compared with the reference feature map generated by projecting the high-precision map in the cloud. Geometric difference calculation is performed, dynamic interference is filtered out through semantic masking to obtain the effective geometric residual R, and the pruned residual feature stream is triggered based on spatiotemporal integration and adaptive threshold decision. Reporting as needed; S3. Cloud-based reconstruction architecture based on incremental end-to-end learning: receiving and decoding compressed residual feature streams By using cross-domain feature alignment and an incremental fusion network, sparse residuals are mapped to a global coordinate system to generate incremental update values. This is overlaid onto the global base map to achieve dynamic map evolution.

2. The method for constructing a multi-scale vehicle-cloud collaborative map in complex environments based on end-to-end residual trimming as described in claim 1, characterized in that, Step S1 includes the following sub-steps: S1.1 performs multimodal spatial alignment on the original point cloud P and the visual image I, and generates a fused spatial feature map using the shallow layer feature extraction module. ; In the formula, This represents the feature transformation operator for lidar; express; Indicates feature splicing; The shallow weight matrix represents the initial aggregation of dimensionality reduction and semantics; In the core multi-scale feature processing unit S1.2, long-distance spatial dependencies are extracted through the large kernel perception branch to generate global topological features. ; In the formula, PW represents pointwise convolution; DW represents depthwise convolution; K L The kernel size is large. Represents local input features; S1.3 Utilizing global topological features Dynamically generated weights for small kernel convolutions And feature aggregation is achieved through a small kernel adaptation branch to generate the final enhanced feature map Y; In the formula, σ and δ represent the Sigmoid and ReLU activation functions, respectively; Indicates global average pooling; Represents a learnable linear transformation matrix; For mapping functions; K represents small-scale convolution; S This is the kernel size for small convolutions.

3. The method for constructing complex environment maps based on end-to-end residual triggering according to claim 2, characterized in that, Step S2 includes the following sub-steps: S2.1 utilizes the vehicle positioning attitude matrix T and the sensor intrinsic parameter matrix K to incorporate prior information from the high-precision map in the cloud. Project onto the current feature space to generate a reference feature map. ; In the formula, Represents the perspective projection transformation operator; S2.2 Calculate the enhanced feature map Y and the reference feature map The pixel-by-pixel Euclidean distance between them yields the original geometric residual matrix R. raw ; S2.3 introduces a semantic weight mask S to perform spatial weighted filtering on the residual matrix, obtaining the effective geometric residual R; In the formula, For indicator functions; This represents a predefined set of static semantic categories; S2.4 performs spatiotemporal integration on the effective geometric residual R to calculate the change confidence level. ,when Time-triggered reporting, trimming to generate compressed residual feature stream ; In the formula, An adaptive threshold; This is the time decay factor; For time intervals; For gated functions; It is a learnable dimension-reduced projection matrix.

4. The method for constructing a multi-scale vehicle-cloud collaborative map in complex environments based on end-to-end residual trimming according to claim 3, characterized in that, Step S3 includes the following sub-steps: S3.1 uses the decoding operator to process the received compressed residual feature stream. Reduced to residual feature tensor ; S3.2 introduces a spatiotemporal recalibration operator through a cross-domain feature alignment module. Combined with cloud pose correction parameters The sparse residuals are mapped to the global coordinate system to generate a set of synchronized features. ; S3.3 Construct an incremental fusion network with synchronization features With local base map slices As input, a multi-head cross-attention mechanism is used to generate an updated feature stream. ; In the formula, These represent the query matrix, key matrix, and value matrix obtained through linear transformation, respectively. This is the scaling factor; S3.4 Using generative completion operators This will update the feature stream. Convert to map incremental update value And overlay it onto the global base map; In the formula, Learnable parameters for reconstructing networks in the cloud; Represents a global static basemap; This indicates the updated global map.