Parking visualization method and device based on gaussian occupancy field, equipment and medium

By using a Gaussian occupancy field-based method, the problems of image distortion and lack of depth information in the visualization of parking for autonomous vehicles were solved, achieving real-time, high-fidelity, complete and interactive 3D display, which improved the user's environmental perception and operational capabilities in parking scenarios.

CN122244330APending Publication Date: 2026-06-19MOTOVIS TECH SHANGHAI CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
MOTOVIS TECH SHANGHAI CO LTD
Filing Date
2026-05-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing autonomous vehicle parking visualization technologies suffer from problems such as image distortion, loss of detail, lack of depth information, lack of realistic texture on model surfaces, high rendering burden, and unsuitability for dynamic scenes, making it difficult to achieve real-time, high-fidelity, complete, and interactive 3D display.

Method used

A Gaussian occupancy field-based approach is adopted to generate a bird's-eye view feature map through multi-view image feature extraction and cross-view fusion, predict the Gaussian existence probability map and initial height map, construct an attributed 3D Gaussian ellipsoid, and generate an interactive 3D visualization image by combining Gaussian sputtering rendering, and overlay parking assistance information.

Benefits of technology

It achieves real-time, high-fidelity, complete and interactive 3D display of the vehicle's surrounding environment, enhancing users' environmental perception and operational confidence in complex parking scenarios, and meeting the low latency and smooth display requirements of the in-vehicle system.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244330A_ABST
    Figure CN122244330A_ABST
Patent Text Reader

Abstract

This invention discloses a parking visualization method, device, equipment, and medium based on Gaussian occupancy fields. It uses Gaussian occupancy fields to uniformly represent the scene surrounding the vehicle and generates an interactive 3D visualization image on the vehicle's display screen through Gaussian sputtering rendering. The interactive 3D visualization image is then semantically fused and displayed, with semantic fusion and parking assistance information overlaid within the 3D visualization image. This enhances the realism and spatial perception of the parking scene display, improves the realism, completeness, and information richness of the display results, and increases the user's freedom of observation and accuracy of judgment in narrow and complex parking scenarios. It facilitates quick parking decisions for users and meets the requirements for low latency and smooth display in vehicle parking scenarios.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of parking visualization technology, and particularly relates to a parking visualization method, device, equipment and medium based on Gaussian occupancy field. Background Technology

[0002] With the development of parking assistance technology for autonomous vehicles, the visualization interface of the vehicle's surrounding environment has become an important component affecting the user's parking experience and spatial perception capabilities. Existing parking visualization solutions mainly include traditional 2D bird's-eye view interfaces, rendering interfaces based on 3D models, and 3D reconstruction visualization solutions based on NeRF or traditional 3DGS, but all of them have varying degrees of shortcomings.

[0003] Traditional 2D bird's-eye view interfaces typically rely on inverse perspective transformation (IPM) to stitch together multi-view images, which is based on the assumption of a flat ground and fixed camera extrinsic parameters. When a vehicle is on a slope, on a bumpy road, or undergoing pitch or roll changes, these assumptions become invalid, leading to distortions such as shearing and stretching in the projected image. Furthermore, because image pixels are stretched at distances, details of distant obstacles (such as curbs and manhole covers) become blurred, resulting in information loss. In addition, 2D bird's-eye view interfaces lack vertical information, making it difficult to intuitively reflect height restrictions, low-hanging obstacles, or terrain undulations (such as beams in a 2-meter-high parking garage, low-hanging tree branches, steep slopes, etc.). Users rely primarily on experience for judgment, which can easily lead to misjudgments. Moreover, the interaction methods of existing 2D bird's-eye view interfaces are usually limited to panning and zooming, failing to support 3D interaction and making it difficult to meet the spatial relationship judgment needs in complex parking scenarios. The fundamental reason is that such solutions are essentially image stitching based on a planar assumption, utilizing only two-dimensional image pixel information and lacking an effective representation of the scene's three-dimensional geometry.

[0004] While 3D model-based rendering interfaces can provide a certain degree of stereoscopic display effect, they typically use parametric geometric primitives such as cuboids and cylinders to approximate the target. This results in a lack of realistic texture, material, and lighting details on the model surface, leading to significant differences from the real environment and limited immersion and credibility. Furthermore, this type of solution generally relies heavily on sparse 3D bounding boxes output by the upstream perception module, which is insufficient for representing background areas (such as green belts, irregular walls, and ground markings) or small objects (such as cones and stones), easily causing incomplete scenes and information gaps. In addition, when there are many dynamic targets, the position, pose, and visibility of a large number of models need to be updated frequently, which can place a significant rendering burden on low-computing-power vehicle platforms. The fundamental reason is that the visualization interface is decoupled from the perception results, lacking the ability to construct a high-quality, complete 3D scene from images in real time.

[0005] While existing visualization solutions based on NeRF or traditional 3DGS offer advantages in scene reconstruction quality, they are generally more suitable for high-quality offline reconstruction and not for online real-time interactive displays in parking scenarios. NeRF typically requires extensive multi-view images for long-term optimization training, making it difficult to adapt to dynamically changing scenes, and it also suffers from high rendering latency. Traditional 3DGS, although offering faster rendering speeds, often relies on motion-reconstructed structures to generate sparse point clouds for initialization, a process prone to failure in low-texture areas or dynamic scenes. Furthermore, the overall optimization process remains highly iterative, making it difficult to meet the requirements of instantaneous startup and real-time updates for in-vehicle systems. The fundamental reason is that these solutions are primarily designed for high-fidelity reconstruction tasks, not for feedforward, low-latency online perception and interactive displays.

[0006] Therefore, there is an urgent need for a new panoramic visualization method for parking autonomous vehicles, which can realize real-time, high-fidelity, complete and interactive 3D display of the vehicle's surrounding environment based on a unified scene representation, and support the intuitive overlay of parking assistance information, thereby improving the user's environmental perception and operational confidence in complex parking scenarios. Summary of the Invention

[0007] Based on this, and to address the aforementioned technical problems, a parking visualization method, apparatus, device, and medium based on Gaussian occupancy fields are provided.

[0008] The technical solution adopted in this invention is as follows: As a first aspect of the present invention, a parking visualization method based on Gaussian occupancy fields is provided, characterized in that it includes: Acquire multi-view images of the scene surrounding the vehicle; Feature extraction is performed on the multi-view images to obtain multi-scale image features corresponding to each view, and cross-view fusion is performed based on the multi-scale image features corresponding to each view to generate a bird's-eye view feature map. Based on the bird's-eye view feature map, a Gaussian existence probability map and an initial height map are predicted through a feedforward decoding network. Multiple candidate Gaussian center anchor points are determined based on the Gaussian existence probability map and the initial height map. Then, for each candidate Gaussian center anchor point, the center position offset and attribute parameters of the corresponding three-dimensional Gaussian ellipsoid are predicted. The attribute parameters include shape parameters, occupancy intensity parameters, appearance attribute parameters, and semantic attribute parameters. Based on the candidate Gaussian center anchor point, center position offset, and attribute parameters, multiple attributed three-dimensional Gaussian ellipsoids are generated. Each three-dimensional Gaussian ellipsoid has a position attribute determined by the candidate Gaussian center anchor point and center position offset, as well as spatial morphology attributes, occupancy intensity attributes, appearance attributes, and semantic attributes characterized by the attribute parameters. Construct a Gaussian occupancy field based on multiple attributed three-dimensional Gaussian ellipsoids; Obtain the current virtual camera parameters, and perform Gaussian sputtering rendering on the Gaussian occupancy field based on the virtual camera parameters to generate an interactive 3D visualization image of the scene around the vehicle on the vehicle's display screen. The interactive 3D visualization image is semantically fused and displayed based on the semantic attribute parameters, and parking assistance information is superimposed on the interactive 3D visualization image.

[0009] As a second aspect of the present invention, a parking visualization device based on a Gaussian occupancy field is provided, characterized in that it comprises: The first module is used to acquire multi-view images of the scene surrounding the vehicle; The second module is used to extract features from the multi-view images, obtain multi-scale image features corresponding to each view, and perform cross-view fusion based on the multi-scale image features corresponding to each view to generate a bird's-eye view feature map. The third module is used to predict the Gaussian existence probability map and the initial height map based on the bird's-eye view feature map through a feedforward decoding network, and to determine multiple candidate Gaussian center anchor points based on the Gaussian existence probability map and the initial height map. Then, for each candidate Gaussian center anchor point, the module predicts the center position offset and attribute parameters of the corresponding three-dimensional Gaussian ellipsoid. The attribute parameters include shape parameters, occupancy intensity parameters, appearance attribute parameters and semantic attribute parameters. The fourth module is used to generate multiple attributed three-dimensional Gaussian ellipsoids based on the candidate Gaussian center anchor point, center position offset, and attribute parameters. Each three-dimensional Gaussian ellipsoid has a position attribute determined by the candidate Gaussian center anchor point and center position offset, as well as a spatial morphology attribute, occupancy intensity attribute, appearance attribute, and semantic attribute characterized by the attribute parameters. The fifth module is used to construct a Gaussian occupancy field based on multiple attributed three-dimensional Gaussian ellipsoids; The sixth module is used to obtain the current virtual camera parameters and perform Gaussian sputtering rendering on the Gaussian occupancy field based on the virtual camera parameters, so as to generate an interactive three-dimensional visualization image of the scene around the vehicle on the vehicle's display screen. The seventh module is used to perform semantic fusion display on the interactive 3D visualization image based on the semantic attribute parameters, and to overlay parking assistance information on the interactive 3D visualization image.

[0010] As a third aspect of the present invention, an electronic device is provided, characterized in that it includes a storage module, the storage module including instructions loaded and executed by a processor, the instructions, when executed, causing the processor to perform the parking visualization method based on Gaussian occupancy field described in the first aspect above.

[0011] As a fourth aspect of the present invention, a computer-readable storage medium is provided, the computer-readable storage medium storing one or more programs, characterized in that, when the one or more programs are executed by a processor, they implement the parking visualization method based on Gaussian occupancy fields described in the first aspect above.

[0012] The beneficial effects of this invention are as follows: 1. This invention uses Gaussian occupancy fields to uniformly represent the scene around the vehicle and generates interactive three-dimensional visualization images on the vehicle's display screen. This effectively overcomes the problems of image distortion, loss of detail, and lack of depth information caused by the failure of the planar assumption in traditional 2D bird's-eye view interfaces, thereby improving the realism and spatial perception of parking scene display.

[0013] 2. This invention utilizes attributed 3D Gaussian ellipsoids to construct Gaussian occupancy fields and combines them with Gaussian sputtering rendering to achieve high-fidelity display of the complete scene around the vehicle. It can not only express various scene contents such as obstacles, background areas and road surfaces, but also integrate appearance information and semantic information to improve the realism, completeness and information richness of the display results.

[0014] 3. This invention supports interactive three-dimensional visualization display, which, compared with fixed-viewpoint display, can enhance the user's freedom of observation and accuracy of judgment in narrow and complex parking scenarios.

[0015] 4. This invention can overlay parking assistance information onto a three-dimensional visualization image, thereby improving the intuitiveness and understandability of the assistance information presentation and making it easier for users to make parking decisions quickly.

[0016] 5. This invention adopts a feedforward Gaussian occupancy field generation method combined with real-time Gaussian sputtering rendering, which can better meet the requirements of low latency and smooth display in vehicle parking scenarios, thereby improving the system's real-time response capability and user experience. Attached Figure Description

[0017] The present invention will now be described in detail with reference to the accompanying drawings and specific embodiments: Figure 1 A flowchart of a Gaussian occupancy field perception method for autonomous driving scenarios provided by an embodiment of the present invention; Figure 2 This is a schematic diagram of a Gaussian occupancy field perception device for an autonomous driving scenario, provided as an embodiment of the present invention.

[0018] Figure 3 This is a schematic diagram of an electronic device provided in an embodiment of the present invention. Detailed Implementation

[0019] The embodiments of the present invention will be described below with reference to the accompanying drawings. It should be noted that the embodiments described in this specification are not exhaustive and do not represent the only embodiments of the present invention. The corresponding embodiments below are only for clearly illustrating the inventive content of this patent and are not intended to limit its implementation. For those skilled in the art, different variations and modifications can be made based on the embodiments described. Any variations or modifications that fall within the technical concept and inventive content of this invention and are obvious are also within the protection scope of this invention.

[0020] like Figure 1 As shown in the figure, this application provides a parking visualization method based on a Gaussian occupancy field, the specific process of which is as follows: S101. Acquire multi-view images of the scene surrounding the vehicle.

[0021] The multi-view images can be acquired by multiple cameras positioned around the vehicle, such as surround-view fisheye cameras, wide-angle cameras, or other vehicle-mounted imaging devices. Preferably, the multiple cameras cover areas in front of, behind, to the left, to the right, and diagonally in front of and behind the vehicle to achieve a more complete perception of the surrounding scene.

[0022] In this embodiment, The loop-viewed fisheye image is denoted as .

[0023] After acquiring multi-view images, preprocessing operations can be performed on them. Preprocessing operations may include, but are not limited to, image distortion correction, image cropping, image scaling, image normalization, color space transformation, and time synchronization alignment.

[0024] S102. Extract features from the multi-view images to obtain multi-scale image features corresponding to each viewpoint, and perform cross-view fusion based on the multi-scale image features corresponding to each viewpoint to generate a bird's-eye view feature map. The specific process is as follows: S1021. An image feature extraction network with shared weights is used to encode images from different viewpoints to obtain corresponding multi-scale image features.

[0025] In this embodiment, feature extraction can be achieved through a weighted image feature extraction network (backbone network). The image feature extraction network can be a visual Transformer network (such as the Swing Transformer) or a ResNet network. The extracted multi-scale image features are denoted as... , This indicates the feature level, which typically has five levels: 1 / 64, 1 / 32, 1 / 16, 1 / 8, and 1 / 4.

[0026] S1022. Set the query vector on the preset bird's-eye view plane.

[0027] S1023. By using a cross-view feature fusion method based on an attention mechanism, each query vector adaptively aggregates image features from images at different viewpoints and scales to generate a bird's-eye view feature map.

[0028] Specifically, a Transformer-based BEV encoder (such as BEVFormer) can be used, in a preset BEV plane (vehicle coordinate system). Initialize a set of learnable query vectors on the ) Through a deformable cross-attention mechanism, each BEV query vector adaptively aggregates image features from different viewpoints and scales. Specifically, for BEV planar position coordinates... Query adjacent Its attention is calculated as follows: ,in, This represents the number of attention heads, where m is the attention head index. The output projection matrix of the m-th attention head linearly transforms the aggregated result within the m-th attention head (i.e., the part within square brackets) to the same dimension as the final output. Then, the results of all heads are summed (or concatenated and linearly transformed) to obtain the final output feature. s is the number of sampling points, where s is the sampling point number. It is attention weight. This is the reference point, which is the spatial coordinates associated with the query vector q. In BEV generation, p typically represents the normalized coordinates (x, y) of the query vector on the BEV plane (sometimes also including the height z). This reference point is used to determine the initial sampling position on the image feature map. It is the predicted sampling offset. It is a bilinear sampling function. The final output is a dense BEV feature map. .

[0029] Through the above processing, the two-dimensional visual information in multi-view images can be mapped and fused into a unified bird's-eye view spatial representation, providing basic features for subsequent candidate Gaussian center anchor point determination and Gaussian parameter prediction.

[0030] S103. Based on the bird's-eye view feature map, a Gaussian existence probability map and an initial height map are predicted through a feedforward decoding network. Multiple candidate Gaussian center anchor points are determined based on the Gaussian existence probability map and the initial height map. Then, the center position offset and attribute parameters of the corresponding three-dimensional Gaussian ellipsoid are predicted for each candidate Gaussian center anchor point.

[0031] In this embodiment, the feedforward decoding network employs a lightweight fully convolutional decoder network, which uses BEV feature maps. For input. Specifically, the above fully convolutional decoder network is based on the convolutional U-Net+ResNet architecture.

[0032] The specific process for determining multiple candidate Gaussian center anchor points based on the Gaussian existence probability map and the initial height map is as follows: S1031. The positions in the Gaussian existence probability map that are greater than a preset threshold are determined as candidate plane anchor points.

[0033] S1032. Obtain the height value of each candidate plane anchor point position in the initial height map, and use it as the initial height of the corresponding candidate plane anchor point position.

[0034] S1033. Generate candidate Gaussian center anchor points based on the candidate plane anchor point positions and their corresponding initial heights.

[0035] In this process, a dense Gaussian existence probability map is first predicted by the feedforward decoding network. and an initial height map The Gaussian existence probability map is used to characterize the probability of generating a 3D Gaussian ellipsoid center anchor point at each location on the bird's-eye view plane, and the initial height map is used to characterize the initial height of the candidate Gaussian center anchor point at the corresponding location. Then, for any point on the bird's-eye view plane... If its Gaussian existence probability is greater than the threshold Then this location point is taken as the candidate Gaussian center anchor point. .

[0036] threshold The value range is 0-1, and the empirical value is 0.5.

[0037] For each anchor point, the different branches of the decoder predict the following parameters in parallel: Center position offset .

[0038] Attribute parameters: Shape parameters, including scaling and rotation parameters, scaling parameters The rotation parameter represents the half-length of the three-dimensional Gaussian ellipsoid along its three axes; the rotation parameter refers to the rotation quaternion. (Needs normalization) to represent the orientation information of a three-dimensional Gaussian ellipsoid in three-dimensional space; Occupancy intensity parameter (opacity) This is used to characterize the strength of the contribution of the corresponding three-dimensional Gaussian ellipsoid to the spatial occupancy, and is activated by the Sigmoid function; Semantic attribute parameters, used to characterize the semantic category probability of the corresponding three-dimensional Gaussian ellipsoid, are denoted as the semantic category probability vector. (Activated via Softmax); Dynamic attribute parameters, used to characterize the motion state of the corresponding three-dimensional Gaussian ellipsoid, are denoted as instantaneous velocity vectors. Of course, the prediction acceleration vector can also be extended. and future Displacement residuals at each time step ; Appearance attribute parameters: Predict a set of low-order spherical harmonic function (sh) coefficients for view-related color rendering. This is mainly used for high-quality visualization and can be omitted in pure perception tasks to save computation.

[0039] S104. Based on the candidate Gaussian center anchor points, center position offsets, and attribute parameters, generate multiple attributed 3D Gaussian ellipsoids. The specific process is as follows: S1041. Determine the center position of each three-dimensional Gaussian ellipsoid based on the candidate Gaussian center anchor points and center position offsets. .

[0040] S1042. Construct the covariance matrix corresponding to each three-dimensional Gaussian ellipsoid based on the scaling and rotation parameters. ,in, It is composed of rotation quaternions The rotation matrix of the transformation, The covariance matrix of each three-dimensional Gaussian ellipsoid is used to characterize the spatial morphological properties (scale, orientation, and shape) of the corresponding three-dimensional Gaussian ellipsoid in three-dimensional space. That is, it is used to control the shape (slender or flat), orientation, and spatial extension range of the Gaussian ellipsoid. This attribute directly affects the shape and size of the elliptical "footprint" formed by the Gaussian projection onto the image.

[0041] The generated attributed 3D Gaussian ellipsoids are used as the basic representation units for subsequent construction of Gaussian occupancy fields.

[0042] Therefore, this application designs an end-to-end feedforward network, which, using multi-view panoramic images as input, achieves for the first time the geometric (positional) processing of a Gaussian ellipsoid. Spatial form Occupancy intensity (opacity) semantics ),dynamic( By combining and predicting appearance (sh) attributes in a feedforward manner, holistic modeling of autonomous driving scenarios is achieved. These Gaussian ellipsoids can not only be used for high-quality rendering, but also directly serve as the basic unit of "placeholders" to form a unified model of the geometry, semantics, and motion of the scene.

[0043] S105. Construct a Gaussian occupancy field based on multiple attributed 3D Gaussian ellipsoids: The scene is represented as... A set of attributed three-dimensional Gaussian ellipsoids For any point in three-dimensional space Its occupancy density This is the sum of the probability density of each attributed 3D Gaussian ellipsoid at that point and the corresponding occupancy intensity parameter:

[0044]

[0045]

[0046] in, Represents the i-th attributed 3D Gaussian ellipsoid at point i. The probability density at that location, The larger the value, the higher the probability that the point is occupied. This defines a continuous, differentiable 3D occupancy field. Compared with the discrete binary occupancy representation in traditional voxel occupancy networks, the Gaussian occupancy field constructed in this application can define the degree of occupancy at any position in continuous space, thereby providing a more refined spatial expression capability.

[0047] This application defines a mathematically rigorous continuous space occupancy density function. It replaces the traditional binary voxel occupancy, providing sub-voxel level geometric accuracy and soft uncertainty measurement, making perception output richer and decision-making more robust.

[0048] This application is the first to deeply integrate feedforward 3D Gaussian sputtering technology with the occupancy network paradigm, creating a novel continuous, explicit, and differentiable 3D scene occupancy representation method, which serves as a bridge connecting efficient neural rendering and dense perception for autonomous driving.

[0049] This application innovatively assigns velocity attributes to each Gaussian ellipsoid, enabling the Gaussian occupancy field to intrinsically represent and predict the motion of objects in the scene, forming a continuous "velocity field." This provides more direct and richer environmental dynamics information for trajectory prediction and spatiotemporal planning. This application uses the real-time generated Gaussian occupancy field—a unified scene representation—as the sole real data source for the autonomous parking visualization interface, achieving a fundamental paradigm shift from "2D image stitching display" to "real-time 3D scene reconstruction and rendering."

[0050] Based on semantic attribute parameters and dynamic attribute parameters, it is possible to further realize semantic information query and dynamic information query of any point in three-dimensional space.

[0051] For any point p in three-dimensional space, its semantic label can be determined by the semantic category of the three-dimensional Gaussian ellipsoid that contributes the most to the occupancy density of that point:

[0052] Suppose there are three types of Gaussian ellipsoids near point p: Gaussian contribution of vehicle type to the occupancy density of point p: 0.82 The Gaussian contribution of road surface class to the occupancy density at point p: 0.37 Human Gaussian contribution to the occupancy density at point p: 0.15 Then label(p) = car.

[0053] The dynamic velocity of point p can be obtained by weighted averaging of the dynamic attribute parameters of the attributed 3D Gaussian ellipsoids in its neighborhood:

[0054] Among them, weight .

[0055] An attributed 3D Gaussian ellipsoid within the neighborhood of point p refers to an attributed 3D Gaussian ellipsoid that satisfies a preset occupancy density condition. Specifically, the coordinates of point p can be substituted into the occupancy density calculation formula in S105 above, and attributed 3D Gaussian ellipsoids whose calculation results are greater than a preset threshold are determined as attributed 3D Gaussian ellipsoids within the neighborhood of point p. For example, the preset threshold can be set to 0.5.

[0056] In this embodiment, in order to be compatible with the voxel occupancy interface commonly used in existing autonomous driving planning and control modules, a real-time occupancy grid can also be generated based on a Gaussian occupancy field. The specific process is as follows: 1. Define the target raster: Set the resolution of the output voxel raster. and range ; 2. Utilizing the explicit characteristics of the Gaussian occupancy field, based on the center position and covariance matrix of each attributed 3D Gaussian ellipsoid, the spatial influence range (a tight bounding box) corresponding to each attributed 3D Gaussian ellipsoid is quickly determined; specifically, for the i-th attributed 3D Gaussian ellipsoid, its center position is... The covariance matrix is N( , The center point and orientation of the bounding box corresponding to the ellipsoid are determined, and the 3σ criterion is used, that is, three times the standard deviation is taken as the length of the corresponding ellipsoid semi-axis along each principal axis direction, thereby constructing an oriented bounding box that encloses the influence range of the three-dimensional Gaussian ellipsoid, and further calculating the coordinates of the eight vertices of the oriented bounding box. In some implementations, the oriented bounding box can also be converted into an axis-aligned bounding box (AABB) aligned with the global coordinate axes. The following are typical situations where this conversion is applicable: a) Downstream planning module requirements: If the path planner only accepts obstacles represented by polygons or convex polyhedra (such as algorithms based on fast walking trees), then voxel occupancy needs to be converted to AABB.

[0057] AABB can significantly accelerate collision detection if algorithms requiring fast collision detection, such as Hybrid A* or State Grille, are used.

[0058] b) Resource-constrained environments: When computing resources are limited (such as in low-end embedded platforms), directly querying Gaussian placeholder fields or voxel meshes may be too expensive. Converting to a small number of AABBs can reduce the real-time computing load.

[0059] When inter-module communication bandwidth is limited, replacing dense voxel flow with AABB can reduce bus pressure.

[0060] c) Remote or coarse-grained planning: In long-distance perception (such as obstacles 100 meters away), details are not important, and the AABB approximation is sufficient to determine the passable area.

[0061] In global path planning (such as map-level navigation), AABB is sufficient to represent static obstacle distribution.

[0062] d) Simulation and playback: When building a simulation environment, the perception results (Gaussian occupancy field) can be exported as an AABB set, which can quickly build a simplified virtual world for algorithm testing.

[0063] When recording and replaying data, AABB is easier to store and parse than the original Gaussian parameters or voxel mesh.

[0064] e) Security redundancy check: In the safety monitoring module, the most conservative AABB representation is used as a backup and runs in parallel with the main perception (Gaussian placeholder field) to ensure that basic obstacle avoidance capability can be maintained even if the perception loses details.

[0065] Based on the bounding box described above, target voxels affected by the corresponding three-dimensional Gaussian ellipsoid can be quickly screened out, and the next step of processing is only performed on target voxels that fall within the spatial influence range, while voxels located outside the bounding box are not processed. This avoids traversing the entire three-dimensional voxel space, significantly reducing computational complexity and improving computational efficiency.

[0066] 3. Calculate the average occupancy density of multiple sampling points within each target voxel. If the average occupancy density exceeds a preset threshold, the corresponding target voxel is determined to be occupied. Specifically, using the target voxel... For example, uniform sampling within it Points Calculate the average occupancy density: ,like Exceeding the preset threshold If so, the voxel is determined to be occupied.

[0067] By employing the on-demand rasterization method described above, the computational load in the discrete raster generation process can be reduced, and the Gaussian occupancy field constructed in this invention can be smoothly compatible with existing voxel planning and autonomous driving control algorithm stacks. This process can be mapped to GPU parallel computing, where each Gaussian contributes to the voxels within its influence range through "sputtering," similar to the rasterization process in 3DGS rendering, achieving extremely high efficiency.

[0068] In this embodiment, a multi-task joint learning approach can be used to train the aforementioned feedforward decoding network and related feature extraction and fusion networks. The total loss function can be expressed as: The total loss function is: .

[0069] Geometric loss : Center point supervision: Utilizing lidar point clouds as supervision. Variants of chamfer distance or earth mover's distance are employed to encourage the predicted Gaussian center point set to match the true point cloud distribution.

[0070] Depth Supervision: Rendering a depth map of the Gaussian occupancy field from any virtual perspective (the perspective of a virtual camera positioned above the ground within the vehicle's range, similar to the perspective of an in-vehicle camera). Compared with sparse / semi-dense true depth maps generated from LiDAR point clouds or obtained from SFM Calculate L1 or BerHu loss: ,in, The depth loss value measures the difference between the predicted depth map and the true depth map, and is used as part of the overall loss function. Let Ω be the set of valid pixels. The true depth map may not have valid values ​​at some pixels (e.g., holes appear after LiDAR point cloud projection). Therefore, only pixels with true depth values ​​are considered. |Ω| is the number of valid pixels, i.e., the cardinality of the set Ω, used to calculate the average loss. u is the index of a single pixel. The set Ω is traversed through all pixels. D(u) is the predicted depth value at pixel u, obtained by model rendering. D(u) is the true depth value at pixel u, which is usually derived from LiDAR point cloud projection or generated by methods such as Structure for Motion Restoration (SfM). p(·) is the loss function, which can be L1 loss or BerHu loss. The purpose of this loss function is to optimize the Gaussian parameters generated by the model by minimizing the difference between the predicted depth and the true depth, thereby improving the accuracy of geometric reconstruction.

[0071] Semantic loss : Projective supervision: Each Gaussian ellipsoid is assigned a semantic probability. and geometric properties (center position) Covariance matrix Occupancy intensity parameter (opacity) The 2D semantic segmentation map is rendered using differentiable sputtering rendering, and the cross-entropy loss is calculated with manually labeled 2D semantic tags.

[0072] Voxel supervision (optional): If there are 3D voxel semantic labels, the loss can be calculated after rasterizing the Gaussian occupancy field into a semantic voxel map.

[0073] Dynamic loss : Scene flow consistency: for two consecutive frames and Predict their Gaussian occupied fields respectively. Gaussian ellipsoid in the frame According to its predicted speed Estimate its position Frame position Calculate the position and... The distance to the nearest Gaussian center in the frame is used as a motion consistency constraint.

[0074] Trajectory supervision: If there is a labeled 3D object trajectory, the smoothed L1 loss can be calculated by comparing the Gaussian sequence velocity belonging to the same instance with the labeled trajectory.

[0075] Regularization loss : Including scaling parameters Sparsity constraints (encouraging moderate sizes), opacity Entropy regularization (encourages concentrated distribution) and repulsive force loss to prevent excessive Gaussian overlap.

[0076] S106. Obtain the current virtual camera parameters, and perform Gaussian sputtering rendering on the Gaussian occupancy field based on the virtual camera parameters to generate an interactive 3D visualization image of the scene around the vehicle on the vehicle's display screen.

[0077] In this embodiment, Gaussian sputtering rendering is performed on the Gaussian occupancy field using a rendering engine. The input to the rendering engine is a set of attributed 3D Gaussian ellipsoids of the Gaussian occupancy field. and current virtual camera parameters (view matrix in world coordinate system) and projection matrix The specific process is as follows: S1061. Transform each attributed 3D Gaussian ellipsoid of the Gaussian occupancy field from the world coordinate system to the camera clipping space: And for each attributed 3D Gaussian ellipsoid, the covariance matrix Perform a similarity transformation in the camera's view space.

[0078] in, Let be the center position of the i-th attributed 3D Gaussian ellipsoid in the camera clipping space. Let i be the center position of the i-th attributed 3D Gaussian ellipsoid in the world coordinate system, i.e. In .

[0079] In the 3D Gaussian Splatting rendering pipeline, the transformation rule of the covariance matrix differs from that of point transformation when transforming a Gaussian coordinate system from the world coordinate system to the camera view space (view space). The core principle is as follows: For a three-dimensional Gaussian distribution, its center point (mean) Under view transformation, it follows ordinary rigid body transformation (rotation + translation):

[0080] in, It is a view matrix The rotation and translation components ( , It is a 3×3 rotation matrix. (a 3×1 translation vector) To determine the location of the attributed 3D Gaussian ellipsoid at the center of the world coordinate system, The attributed 3D Gaussian ellipsoid is positioned at the center of the camera view space.

[0081] However, the covariance matrix describes the shape and orientation of the Gaussian distribution, belonging to the second-order central moment. When applying a linear transformation A, the transformation law of the covariance matrix is ​​as follows:

[0082] This property comes from the mathematical definition of covariance: if a random variable X satisfies After linear transformation After that, there was .

[0083] Applying this to the scenario, when transforming the Gaussian coordinate system from the world coordinate system to the camera view space, we only need to consider the rotation component, because translation does not affect the covariance: the linear part of the view transformation is simply the rotation matrix. Therefore, the covariance matrix in the camera view space is:

[0084] in, This is the covariance matrix of the attributed 3D Gaussian ellipsoid in the world coordinate system.

[0085] Since covariance describes the distribution relative to the mean, translation does not change the shape and orientation of the distribution, and it will rotate the matrix. Acting on This achieves the reorientation of the Gaussian ellipsoid's orientation, aligning it with the view space, thus utilizing only the rotated portion.

[0086] get Afterward, it needs to be projected onto a 2D image plane (screen space) for rasterization rendering. This projection process typically uses an affine approximation (Jacobian matrix):

[0087] Where J is the Jacobian matrix of the projection matrix, describing the local linear transformation from camera view space to screen space (NDC). For the pinhole camera model, given the focal length f and depth z, we have:

[0088] final, It is a 2×2 covariance matrix used to determine the shape and size of the ellipse corresponding to the Gaussian on the screen, for subsequent tile-based rasterization and alpha blending.

[0089] Based on the above introduction, the similarity transformation in this embodiment refers to the congruence transformation of the covariance matrix using a rotation matrix:

[0090] because, It is an orthogonal matrix ( This transformation, mathematically known as the orthogonal similarity transformation, preserves the eigenvalues ​​of the covariance matrix (i.e., the semi-axis length of the ellipsoid) and only changes its orientation, ensuring the correct orientation of the geometry in space.

[0091] S1062. Divide the screen of the display into multiple rendering blocks (Tile-based rasterization, such as each Tile being 16x16 pixels). For each rendering block, cull attributed 3D Gaussian ellipsoids that do not contribute to it, and generate a corresponding list of Gaussian ellipsoids to be rendered for each Tile (sorted in ascending order of the depth of the Gaussian ellipsoid center coordinates in the camera coordinate system, i.e., sorted from near to far).

[0092] The core principle for determining whether a three-dimensional Gaussian ellipsoid contributes to a certain tile is: each Gaussian ellipsoid (composed of...) , (Definition) is continuous in 3D space. Given a camera view matrix... and projection matrix Subsequently, its contribution region in screen space is a 2D ellipse (consisting of the projected 2D covariance matrix). (Definition). Determining whether it contributes to the Tile is equivalent to determining whether the 2D ellipse intersects with the rectangular region of the Tile. The specific process is as follows: Step 1: Camera Spatial Coordinate Transformation and Projection 1. Center Gauss Transform to camera space:

[0093] 2. The Gaussian covariance matrix Transform to camera space:

[0094] 3. Calculate the 2D projection center in screen space. and 2D covariance matrix :

[0095] Step 2: Screen Space Boundary Calculation Based on the projected 2D Gaussian distribution (ellipse), its axis-aligned bounding box (AABB) in screen space can be calculated.

[0096] For a 2D Gaussian ellipse, (center (x0, y0)), the covariance matrix is... The directions and lengths of the major and minor axes of the ellipse are determined by eigenvalues. However, its coverage area can be approximated by eigenvalue decomposition. The half-length (a multiple of the standard deviation) of the ellipse in the x-direction is approximately (The specific value depends on the direction of the feature vector). A more practical approach is to define the boundary distance of the ellipse in any direction θ as: ,in, For eigenvalues, The main axis direction.

[0097] Ultimately, the axis-aligned bounding box of the Gaussian ellipse on the screen can be taken as:

[0098] in, It is the maximum radius of the ellipse in any direction (usually taken as...). (Covering 99.7% of the energy).

[0099] Step 3: Tile Intersection Determination Let the screen space of the current tile be:

[0100] A Gaussian is considered to have contributed to the Tile if all of the following conditions are met:

[0101] Conversely, if any of the above conditions are not met, the Gaussian is completely outside the Tile and can be directly eliminated.

[0102] In practical engineering implementation, in pursuit of ultimate efficiency, the following optimizations are usually introduced: 1. Depth pre-removal: Before making a judgment, the camera space depth value z is first used as a Gaussian. c Approximate the depth buffer of the scene and remove Gaussian objects that are completely occluded by closer objects. This can be done quickly based on soft depth testing (such as comparing with the depth buffer of the previous frame).

[0103] 2. Block pre-sorting to optimize computational efficiency: Each Gaussian ellipse may spatially affect multiple tiles. The usual approach is as follows: First, calculate the range of tile indices affected by each Gaussian ellipse (determined by its bounding box). Then, write these Gaussian ellipse IDs and their corresponding tile indices into an intermediate buffer. Finally, sort the Gaussians in the buffer within each tile (from near to far in depth) so that subsequent alpha blending can be performed correctly.

[0104] 3. Screen space LOD, eliminating extremely small Gaussians that do not contribute to the visual effect: For Gaussians that are far away (projected ellipse area less than 1 pixel), rendering can be skipped directly, or their color can be merged into the background to reduce the amount of computation.

[0105] 4. Hardware Acceleration: Tile partitioning and culling have extremely high parallelism, which can fully utilize the GPU's compute shader or the gl_GlobalInvocationID in the vertex / fragment shader for parallel processing. Each thread processes one tile and performs the above intersection test on all candidate Gaussians within that tile.

[0106] Through the multi-level culling mechanism described above (see Table 1), Tile-based rasterization can reduce the number of Gaussians that need to be processed for each tile from tens of thousands to dozens, thereby optimizing the rendering speed of 3DGS from hundreds of milliseconds in traditional methods to sub-millisecond levels, meeting the needs of real-time in-vehicle interaction.

[0107] Table 1

[0108]

[0109] S1063. For each rendering block, based on its list of Gaussian ellipsoids to be rendered, calculate the color and depth of each pixel in the rendering block through alpha blending.

[0110] For screen pixels Its color and depth The following is obtained by calculating the depth (the depth of the Gaussian ellipsoid center coordinates in the camera coordinate system) from near to far using alpha blending:

[0111]

[0112] in, This is a list of Gaussian ellipsoids to be rendered for each pixel on the screen. It is composed of spherical harmonic coefficients The view-related colors are calculated based on the view direction. The view direction is a unit vector in 3D space pointing from the viewpoint (virtual camera position) to the observed point (Gaussian center). In rendering, it reflects the angle from which the observer is looking at the object. P cam The position of the virtual camera in the world coordinate system. It is the 2D opacity after projection (the 2D opacity of the i-th Gaussian ellipsoid in the list of Gaussian ellipsoids to be rendered after projection onto the 2D screen space). It is the depth of the center of the Gaussian ellipsoid in camera space (obtained by coordinate system transformation in S1061). It is the 2D opacity of the j-th Gaussian ellipsoid in the list of Gaussian ellipsoids to be rendered, after being projected into 2D screen space.

[0113] The specific process for calculating the relevant colors of the view is as follows: The direction of the line of sight is encoded using spherical harmonic functions. Each Gaussian stores a set of spherical harmonic coefficients. (Typically, order 1-3, with 3-16 coefficients in total). View-related colors. By projecting the line-of-sight direction d onto the spherical harmonic basis functions and linearly combining them with the coefficients, we obtain:

[0114] in, It is a spherical harmonic basis function. Simply put, color = dot product / weighted sum of spherical harmonic coefficients and the viewing direction vector. This allows the surface of an object to present different visual effects such as highlights and reflections from different angles, greatly enhancing the realism of the rendering.

[0115] In this embodiment, interactive viewpoint control is implemented for the 3D visualization image, thereby achieving interactive functionality of the 3D visualization image: 1. Establish a virtual camera state model that includes camera position, camera orientation, and field of view. And a perspective library with multiple preset perspectives.

[0116] in, Represents the camera's world coordinate position. The quaternion representing the camera's orientation. This represents the camera's vertical field of view.

[0117] Each preset viewpoint corresponds to a target camera state. and gaze point (Refers to the virtual camera's gaze point, i.e., the target point directly in front of the camera lens). For example: Classic overhead view: , Orient the camera toward the -Z axis (vertically downwards).

[0118] Third-person follower perspective: , (Looking in front of the car).

[0119] Close-up view of key obstacles: Calculates the position of the nearest obstacle to the vehicle. ,set up Camera position exist Set back appropriately along the extension line connecting to the vehicle. This "appropriate setback" distance is not a fixed value; it is usually dynamically calculated based on the size of the obstacle, the camera's field of view (FOV), and the desired viewing effect. The goal is to ensure the target is fully and clearly displayed in the center of the screen. Generally, the following principles are followed: To ensure complete display: The back distance must ensure that the bounding box of the target object is completely within the camera's frustum.

[0120] Standard calculation: If the maximum dimension of the obstacle in the lateral direction is The camera's horizontal field of view is The theoretical minimum distance is .

[0121] Practical experience suggests that in parking scenarios, the back-off distance is typically set to 2.0 to 5.0 meters for common obstacles such as ordinary cars or traffic cones, and is dynamically adjusted in conjunction with collision detection to avoid camera clipping.

[0122] 2. Map the user's free-view input (such as touchscreen swipe gestures and knob rotations) to incremental changes in the virtual camera state model. For example, using the vehicle center... Spherical coordinates control the center of the sphere: Users can directly control the azimuth angle with gestures. Pitch angle and radius (This is the distance from the virtual camera to the viewpoint (usually the center of the vehicle), which directly controls the zooming in and out of the view, i.e., the scaling effect).

[0123] 3. To achieve a smooth transition of viewpoints, the camera position is adjusted when switching preset viewpoints or when the user stops freely manipulating the viewpoint. Interpolation transitions are performed using easing functions (such as Cubic Bezier) to adjust the camera orientation. Transition using spherical linear interpolation: ,in The camera's initial orientation quaternion Camera target orientation quaternion The angle between them, and the transition time is usually set at 300-500ms to ensure a natural and smooth transition. q(t) represents the camera orientation quaternion at time t of the interpolation parameter, where When t=0, q(0)=q start When t=1, q(1) = q end This formula is used to achieve a smooth rotational transition from the starting orientation to the target orientation.

[0124] Therefore, this application creatively defines a hybrid perspective control system that includes mathematical camera state control, automatic calculation of preset perspective based on scene understanding, and smooth transition animation, seamlessly meeting the full-scene interaction needs from the convenience of "one-click best view" to the flexibility of "free exploration".

[0125] S107. Perform semantic fusion display on the interactive 3D visualization image based on semantic attribute parameters, and overlay parking assistance information on the interactive 3D visualization image.

[0126] In this embodiment, the specific process of semantic fusion display is as follows: S1071. In the Fragment Shader, based on the semantic attribute parameters of the attributed 3D Gaussian ellipsoid... Query the preset semantic color mapping table to obtain the semantic color of the corresponding pixel. See the example of the semantic color mapping table: {"Vehicle": RGB(220,50,50), "Pedestrian": RGB(50,220,100), "Drivable Area": ​​RGB(100,150,255,0.3)}.

[0127] S1072. Determine the true appearance color of the above pixels based on the appearance attribute parameters of the attributed 3D Gaussian ellipsoid.

[0128] Appearance attribute parameters (spherical harmonics) are a compact mathematical representation of the reflectivity of an object's surface under different lighting and viewing angles. During rendering, for each Gaussian ellipsoid, we quickly calculate the color value in that direction based on the virtual camera's viewing direction using the spherical harmonics. Specifically, each Gaussian ellipsoid carries a set of pre-calculated spherical harmonics (usually low-order, such as 3rd order, with a total of 9 coefficients). These coefficients implicitly describe the color distribution of that Gaussian region in all possible directions. When rendering a pixel, the engine takes the current viewing direction as input and performs a dot product operation with the spherical harmonics to efficiently calculate the color at that viewpoint. This process is equivalent to encoding the reflectivity of the real world using a set of basis functions (spherical harmonics), and decoding the corresponding color based on the viewing direction during rendering, thus achieving view-dependent realistic appearance effects (e.g., color changes of metallic paint at different angles, reflections on car windows, etc.).

[0129] S1073. Mix the semantic color and the actual appearance color according to a preset weight to generate the final display color of the pixel. The weight is set by the user in the interface theme.

[0130] By utilizing the depth buffer and semantic ID buffer generated during rendering, 2D labels and distance values ​​can be precisely overlaid on the object surface in the post-processing stage using screen space techniques. The distance value is calculated by comparing the 3D position corresponding to the pixel (calculated inversely from the depth value) with the vehicle's position.

[0131] In this embodiment, the specific process of overlaying parking assistance information onto the interactive 3D visualization image is as follows: S1074. Receive parking path from planning module, represented as a series of vehicle poses with timestamps. ,in It includes location and orientation, and converts the parking path into a continuous, animated 3D Bezier curve.

[0132] S1075. Calculate the vehicle motion envelope (including the vehicle body outline and the swing envelope when the rear wheels are turning) based on the vehicle dynamics model and the current steering wheel angle, and generate the corresponding three-dimensional envelope graphic (a series of connected 3D polygons).

[0133] S1076. A hybrid rendering mode based on depth testing is adopted to overlay 3D Bezier curves and three-dimensional envelope graphics onto a three-dimensional visualization image. The conditions and display effects of the depth test are less than or equal to the scene depth and semi-transparent shading (e.g., the path is glowing blue and the envelope is semi-transparent yellow), so that they can be correctly occluded by nearby objects and can also cover the road surface in the distance.

[0134] Furthermore, the three-dimensional visualization images of this application can also realize dynamic collision warning and virtual "X-ray" perspective.

[0135] Dynamic collision warning: 1. Distance Query: In each frame, the set of sampled points on the 3D envelope graph... Call the space density calculation formula (Or query its rasterized occupancy map), calculate the occupancy density of each sampling point on the 3D envelope graph, and calculate the approximate value of the directed distance field (SDF) from each sampling point to the nearest occupier.

[0136] In dynamic collision warning, the relationship between the directional range field (SDF) approximation and the occupancy density is as follows: Occupancy density O(p) represents the degree to which a spatial point p is "occupied" (the larger the value, the more likely it is to belong to the interior of an object). The surface of an object is usually defined as an isosurface with an occupancy density equal to a certain threshold (such as 0.5).

[0137] The SDF approximation is used to calculate the shortest distance from any point in space to the surface of the object, and it is marked with a sign (negative inside, positive outside).

[0138] Calculation Method: Instead of directly calculating the SDF for all points, the explicit Gaussian structure of the Gaussian occupancy field is utilized. For a sampled point (such as a point on the vehicle's envelope), the signed distance from that point to the nearest object surface is estimated by quickly finding the nearest Gaussian sphere and combining its center, covariance (which determines its shape), and opacity. In short, occupancy density tells us "where there is an object," while SDF tells us "how far away from the object," the latter being the direct basis for triggering a collision warning.

[0139] To quickly estimate the SDF, this embodiment utilizes the characteristic that the Gaussian occupancy field is composed of explicit Gaussians. This allows for efficient range lookups to quickly find the opacity within the neighborhood of each envelope point. For the Gaussians, calculate the minimum Euclidean distance between the envelope points and these Gaussian centers, denoted as . This distance approximates the distance from the vehicle envelope to the surface of the nearest obstacle, i.e., the SDF approximation.

[0140] 2. Early Warning Triggering and Visualization: Within the aforementioned minimum distance Less than the preset threshold A collision warning is triggered when the distance is 0.3 meters (e.g., 0.3 meters). In the rendering pipeline, this is achieved through screen-space post-processing effects: pixels corresponding to the danger zone are identified, and full-screen edge pulsating red light and localized highlight flashing effects are added to them. Simultaneously, these effects are displayed prominently as numbers on the UI panel. .

[0141] Therefore, this application innovatively utilizes the explicit geometry of Gaussian occupancy fields for efficient collision distance querying, realizing dynamic early warning visualization based on screen space post-processing.

[0142] Virtual "X-ray" perspective: Mechanism: Users trigger "Perspective Mode" via buttons or voice commands.

[0143] Implementation: The system temporarily modifies the fragment shader logic of the rendering pipeline. For semantic categories marked as "perspectiveable" (such as "neighboring car"), a decay factor (such as 0.2) is multiplied when calculating their final opacity. Simultaneously, the contrast or brightness of the scene area behind the object is temporarily increased to make it clearer. This creates a visual effect of "seeing through" obstacles without changing the geometry or physics, solving the classic pain point of obstructed vision in actual parking.

[0144] The aforementioned "brief duration" is typically defined as 0.5 to 2 seconds, depending on the interaction mode: User-initiated trigger (e.g., clicking a button): It is recommended to hold for 1.5-2 seconds to ensure that the user has enough time to observe the scene behind the perspective area.

[0145] Instantaneous assist prompts (such as automatically flashing when the steering wheel is near its limit): 0.5-1 second is recommended to avoid visual confusion caused by continuous perspective.

[0146] The duration can be adjusted through user preference settings, and the system keeps dynamically updating the scene during the perspective effect.

[0147] In this embodiment, the interface layout and display configuration of the display screen are as follows: 1. Front View and Picture-in-Picture (PiP) Main view: Renders the complete 3D interactive screen described above.

[0148] Picture-in-Picture Window: Displays a fixed, purely top-view orthographic projection. This view is rendered directly using Gaussian occupancy field data (or a rasterized top view), completely unaffected by perspective distortion, serving as an absolute spatial reference for the user. The viewports of the two views are synchronized; as the camera moves in the main view, a small icon indicating the current main view's frustum is displayed within the picture-in-picture window.

[0149] 2. Key Information Panel Content: Displayed as a non-modal card overlaid on the screen edge. Information displayed includes: real-time distance to the nearest obstacle. The display includes information such as current gear position, steering wheel angle, estimated parking time, and progress bar showing the alignment of the vehicle's outline with the parking space line.

[0150] Rendering: Use a separate 2D UI rendering layer to ensure that information is clear and readable without interfering with the 3D scene.

[0151] 3. Multi-screen interaction Architecture: The rendering engine runs on a core of the vehicle domain controller, generating a main framebuffer and multiple auxiliary viewport buffers.

[0152] Configuration: Define the display content of different screens (central control screen, instrument panel, rearview mirror) through configuration files. For example: Dashboard: Receives rendering output from a "third-person following view" to provide an immersive driving state perception.

[0153] Central control screen: Displays a complete "main view + picture-in-picture + interactive controls" as the main control interface.

[0154] Streaming rearview mirror: It can receive the rendered output of a "rearward wide-angle view" as an enhanced alternative to traditional optical rearview mirrors.

[0155] Synchronization: All views share the same Gaussian occupancy field data, and frame synchronization is performed through timestamps and vehicle status to ensure information consistency.

[0156] Therefore, this application designs a multi-screen linkage display framework based on the same Gaussian occupancy field data source, realizing the collaboration and division of labor of visualization content between different screens, and expanding the application boundaries of the system.

[0157] As can be seen from the above, the beneficial effects of the parking visualization method based on Gaussian occupancy field provided in this application embodiment are as follows: 1. Based on the continuous geometry and realistic appearance attributes of the Gaussian occupancy field, this invention provides a unified representation of the scene around the vehicle and generates an interactive 3D visualization image on the vehicle's display screen. It provides a 3D scene with accurate geometry, realistic texture and lighting, which can effectively overcome the problems of image distortion, loss of detail and lack of depth information caused by the failure of the planar assumption in traditional 2D bird's-eye view interface. This improves the realism of parking scene display and spatial perception ability, and greatly enhances the user's trust in the environment and spatial perception ability.

[0158] 2. This invention utilizes attributed 3D Gaussian ellipsoids to construct Gaussian occupancy fields and combines them with Gaussian sputtering rendering to achieve high-fidelity display of the complete scene around the vehicle. It can not only express various scene contents such as obstacles, background areas and road surfaces, but also integrate appearance information and semantic information to improve the realism, completeness and information richness of the display results.

[0159] 3. This invention supports interactive 3D visualization display, which can be combined with virtual camera parameter adjustment to present the spatial relationship between the vehicle and obstacles, parking lines and the surrounding environment from different viewing angles. Compared with the fixed viewing angle display method, it can enhance the user's freedom of observation and judgment accuracy in narrow and complex parking scenarios, thereby greatly enhancing the user's confidence and sense of control.

[0160] 4. This invention can overlay parking assistance information onto a three-dimensional visualization image, making parking path, vehicle motion envelope, collision warning and other information correspond to the real scene in spatial position. It seamlessly and accurately integrates key information such as abstract planning path, dynamic envelope, collision warning and other key information into a high-fidelity scene in an intuitive 3D graphic manner, thereby improving the intuitiveness and understandability of the presentation of assistance information and making it easier for users to make parking decisions quickly.

[0161] 5. Unlike the "island-like" rendering based on detection boxes, this invention uses Gaussian placeholder fields to reconstruct and present a complete scene (ground, walls, sky, etc.), including the background. The visualization interface and scene perception share the same Gaussian placeholder field data source, which helps ensure consistency between the displayed and perceived results, reducing the risk of misleading results that may arise from inconsistencies between displayed content and actual perceived results in traditional solutions.

[0162] 6. This invention adopts a feedforward Gaussian placeholder field generation method (<50ms) combined with a real-time Gaussian sputtering rendering mechanism (<20ms), which can better meet the requirements of low latency and smooth display in vehicle parking scenarios, thereby improving the system's real-time response capability and user experience. The entire interface can stably achieve smooth rendering of more than 30 frames per second on mainstream automotive computing platforms (such as NVIDIA Orin), ensuring that all interactive operations respond in real time without perceptible latency.

[0163] The following describes in detail one or more embodiments of a parking visualization device based on a Gaussian occupancy field according to the present invention. Those skilled in the art will understand that these devices can be configured using commercially available hardware components through the steps taught in this solution. Figure 2 This invention illustrates a parking visualization device based on a Gaussian occupancy field, comprising a first module 11, a second module 12, a third module 13, a fourth module 14, a fifth module 15, a sixth module 16, and a seventh module 17.

[0164] The first module 11 is used for S101 to acquire multi-view images of the scene around the vehicle.

[0165] The second module 12 is used for S102 to extract features from multi-view images, obtain multi-scale image features corresponding to each view, and perform cross-view fusion based on the multi-scale image features corresponding to each view to generate a bird's-eye view feature map.

[0166] The third module 13 is used in S103 to predict the Gaussian existence probability map and initial height map based on the bird's-eye view feature map through a feedforward decoding network, and to determine multiple candidate Gaussian center anchor points based on the Gaussian existence probability map and initial height map. Then, for each candidate Gaussian center anchor point, the center position offset and attribute parameters of the corresponding three-dimensional Gaussian ellipsoid are predicted.

[0167] The fourth module 14 is used in S104 to generate multiple attributed three-dimensional Gaussian ellipsoids based on the candidate Gaussian center anchor point, center position offset, and attribute parameters.

[0168] Module 5, 15, is used in S105 to construct a Gaussian occupancy field based on multiple attributed 3D Gaussian ellipsoids.

[0169] The sixth module 16 is used for S106 to obtain the current virtual camera parameters and perform Gaussian sputtering rendering on the Gaussian occupancy field based on the virtual camera parameters, generating an interactive 3D visualization image of the scene around the vehicle on the vehicle's display screen.

[0170] Module 17 is used for S107 to perform semantic fusion display on interactive 3D visualization images based on semantic attribute parameters, and to overlay parking assistance information on the interactive 3D visualization images.

[0171] The specific implementation methods and working principles of the above modules can be found in the relevant descriptions in the method embodiments of this application, and will not be repeated here.

[0172] In summary, the parking visualization device based on Gaussian occupancy field provided in the above embodiments can execute the parking visualization method based on Gaussian occupancy field provided in the foregoing embodiments.

[0173] Similar to the above concept, Figure 3 A schematic block diagram of the structure of an electronic device provided by an embodiment of the present invention is shown.

[0174] For example, the electronic device includes a storage module 21 and a processor 22. The storage module 21 includes instructions loaded and executed by the processor 22, which, when executed, cause the processor 22 to perform the steps described in the section on a parking visualization method based on a Gaussian occupancy field described above in this specification, according to various exemplary embodiments of the present invention.

[0175] It should be understood that processor 22 can be a Central Processing Unit (CPU), or it can be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Among these, the general-purpose processor can be a microprocessor or any conventional processor.

[0176] This invention also provides a computer-readable storage medium that stores one or more programs that, when executed by a processor, implement the steps described in the section on a parking visualization method based on a Gaussian occupancy field according to various exemplary embodiments of the invention.

[0177] Those skilled in the art will understand that all or some of the steps, systems, and apparatuses disclosed above, and their functional modules / units, can be implemented as software, firmware, hardware, or suitable combinations thereof. In hardware implementations, the division between functional modules / units mentioned above does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be performed collaboratively by several physical components. Some or all physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit (ASIC). Such software can be distributed on a computer-readable storage medium, which may include computer-readable storage media (or non-transitory media) and communication media (or transient media).

[0178] As is known to those skilled in the art, the term computer-readable storage medium includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Computer-readable storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disc (DVD) or other optical disc storage, magnetic cartridges, magnetic tape, disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and is accessible to a computer. Furthermore, it is known to those skilled in the art that communication media typically contain computer-readable instructions, data structures, program modules, or other data in modulated data signals such as carrier waves or other transmission mechanisms, and may include any information delivery medium.

[0179] For example, the computer-readable storage medium may be an internal storage unit of the electronic device described in the foregoing embodiments, such as a hard disk or memory of the electronic device. The computer-readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) card, Flash Card, etc., provided on the electronic device.

[0180] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.

Claims

1. A parking visualization method based on Gaussian occupancy fields, characterized in that, include: Acquire multi-view images of the scene surrounding the vehicle; Feature extraction is performed on the multi-view images to obtain multi-scale image features corresponding to each view, and cross-view fusion is performed based on the multi-scale image features corresponding to each view to generate a bird's-eye view feature map. Based on the bird's-eye view feature map, a Gaussian existence probability map and an initial height map are predicted through a feedforward decoding network. Multiple candidate Gaussian center anchor points are determined based on the Gaussian existence probability map and the initial height map. Then, for each candidate Gaussian center anchor point, the center position offset and attribute parameters of the corresponding three-dimensional Gaussian ellipsoid are predicted. The attribute parameters include shape parameters, occupancy intensity parameters, appearance attribute parameters, and semantic attribute parameters. Based on the candidate Gaussian center anchor point, center position offset, and attribute parameters, multiple attributed three-dimensional Gaussian ellipsoids are generated. Each three-dimensional Gaussian ellipsoid has a position attribute determined by the candidate Gaussian center anchor point and center position offset, as well as spatial morphology attributes, occupancy intensity attributes, appearance attributes, and semantic attributes characterized by the attribute parameters. Construct a Gaussian occupancy field based on multiple attributed three-dimensional Gaussian ellipsoids; Obtain the current virtual camera parameters, and perform Gaussian sputtering rendering on the Gaussian occupancy field based on the virtual camera parameters to generate an interactive 3D visualization image of the scene around the vehicle on the vehicle's display screen. The interactive 3D visualization image is semantically fused and displayed based on the semantic attribute parameters, and parking assistance information is superimposed on the interactive 3D visualization image.

2. The parking visualization method based on Gaussian occupancy field according to claim 1, characterized in that, The shape parameters include scaling parameters and rotation parameters. The step of generating multiple attributed 3D Gaussian ellipsoids based on the candidate Gaussian center anchor point, center position offset, and the attribute parameters further includes: The covariance matrix of each three-dimensional Gaussian ellipsoid is constructed based on the scaling and rotation parameters. The covariance matrix of each three-dimensional Gaussian ellipsoid is used to characterize the spatial morphological properties of the corresponding three-dimensional Gaussian ellipsoid. The step of performing Gaussian sputtering rendering on the Gaussian occupancy field based on the virtual camera parameters further includes: Each attributed 3D Gaussian ellipsoid of the Gaussian occupancy field is transformed from the world coordinate system to the camera clipping space, and the corresponding covariance matrix is ​​transformed into the camera view space. The screen of the display is divided into multiple rendering blocks. For each rendering block, attributed three-dimensional Gaussian ellipsoids that do not contribute to it are removed, and a corresponding list of Gaussian ellipsoids to be rendered is generated. For each rendering block, based on its list of Gaussian ellipsoids to be rendered, the color and depth of each pixel in the rendering block are obtained by alpha blending calculation.

3. The parking visualization method based on Gaussian occupancy field according to claim 1, characterized in that, The semantic fusion display of the interactive 3D visualization image based on the semantic attribute parameters further includes: Based on the semantic attribute parameters of the attributed 3D Gaussian ellipsoid, the semantic color of the corresponding pixel is obtained by querying the preset semantic color mapping table. The true appearance color of the pixel is determined based on the appearance attribute parameters of the attributed 3D Gaussian ellipsoid. The semantic color and the actual appearance color are mixed according to a preset weight to generate the final display color of the pixel.

4. The parking visualization method based on Gaussian occupancy field according to claim 1, characterized in that, It also includes interactive viewpoint control of the three-dimensional visualization image: Establish a virtual camera state model that includes camera position, camera orientation, and field of view, as well as multiple preset viewpoints; The user's free-viewpoint input is mapped to incremental changes in the virtual camera state model; When switching preset viewpoints or when the user stops freely manipulating the viewpoint, the camera position is interpolated and the camera orientation is spherical linear interpolation.

5. The parking visualization method based on Gaussian occupancy field according to claim 1, characterized in that, The construction of the Gaussian occupancy field based on the plurality of attributed three-dimensional Gaussian ellipsoids further includes: The scene is represented as a set of multiple attributed 3D Gaussian ellipsoids; For any point in three-dimensional space, its occupancy density is the sum of the probability density of each attributed three-dimensional Gaussian ellipsoid at that point and the corresponding occupancy intensity parameter.

6. The parking visualization method based on Gaussian occupancy field according to claim 1, characterized in that, The step of overlaying parking assistance information onto the interactive 3D visualization image further includes: Receive parking paths from the planning module and convert the parking paths into continuous 3D Bézier curves with gradient animation; The vehicle motion envelope is calculated based on the vehicle dynamics model and the current steering wheel angle, and the corresponding three-dimensional envelope graphic is generated. A hybrid rendering mode based on depth testing is used to overlay the 3D Bezier curve and the three-dimensional envelope graphic onto the three-dimensional visualization image. The conditions and display effects of the depth test are less than or equal to the scene depth and semi-transparent shading, respectively.

7. The parking visualization method based on Gaussian occupancy field according to claim 6, characterized in that, Also includes: Calculate the occupancy density of each sampling point on the three-dimensional envelope pattern, and calculate the approximate value of the directed distance field from each sampling point to the nearest occupier. A collision warning is triggered when the approximate value of the directional distance field is less than a preset threshold.

8. A parking visualization device based on a Gaussian occupancy field, characterized in that, include: The first module is used to acquire multi-view images of the scene surrounding the vehicle; The second module is used to extract features from the multi-view images, obtain multi-scale image features corresponding to each view, and perform cross-view fusion based on the multi-scale image features corresponding to each view to generate a bird's-eye view feature map. The third module is used to predict the Gaussian existence probability map and the initial height map based on the bird's-eye view feature map through a feedforward decoding network, and to determine multiple candidate Gaussian center anchor points based on the Gaussian existence probability map and the initial height map. Then, for each candidate Gaussian center anchor point, the module predicts the center position offset and attribute parameters of the corresponding three-dimensional Gaussian ellipsoid. The attribute parameters include shape parameters, occupancy intensity parameters, appearance attribute parameters and semantic attribute parameters. The fourth module is used to generate multiple attributed three-dimensional Gaussian ellipsoids based on the candidate Gaussian center anchor point, center position offset, and attribute parameters. Each three-dimensional Gaussian ellipsoid has a position attribute determined by the candidate Gaussian center anchor point and center position offset, as well as a spatial morphology attribute, occupancy intensity attribute, appearance attribute, and semantic attribute characterized by the attribute parameters. The fifth module is used to construct a Gaussian occupancy field based on multiple attributed three-dimensional Gaussian ellipsoids; The sixth module is used to obtain the current virtual camera parameters and perform Gaussian sputtering rendering on the Gaussian occupancy field based on the virtual camera parameters, so as to generate an interactive three-dimensional visualization image of the scene around the vehicle on the vehicle's display screen. The seventh module is used to perform semantic fusion display on the interactive 3D visualization image based on the semantic attribute parameters, and to overlay parking assistance information on the interactive 3D visualization image.

9. An electronic device, characterized in that, The system includes a storage module comprising instructions loaded and executed by a processor, which, when executed, cause the processor to perform a parking visualization method based on a Gaussian occupancy field according to any one of claims 1-7.

10. A computer-readable storage medium storing one or more programs, characterized in that, When the one or more programs are executed by the processor, they implement the parking visualization method based on Gaussian occupancy field as described in any one of claims 1-7.