A drivable area detection method and system, electronic device, and medium

By combining the RepVGG backbone network and the Depthwise Separable ASPPHead perception model with the KD-Tree index structure, efficient drivable area detection under monocular conditions is achieved, solving the problems of accuracy, real-time performance and cost in existing technologies, and improving detection accuracy and robustness.

CN122244814APending Publication Date: 2026-06-19DONGFENG MOTOR GRP

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
DONGFENG MOTOR GRP
Filing Date
2026-02-05
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing drivable area detection technologies lack accuracy, have poor real-time performance, and are costly, especially when fusion of multi-source data, due to insufficient accuracy in unstructured road surfaces, high-speed conditions, and high resolution.

Method used

A perception model combining a RepVGG backbone network with a Depthwise Separable ASPPHead and a U-Net structure is adopted. Driving areas are detected using images from a monocular fisheye camera. By using remapping and radial detection filtering strategies, combined with a KD-Tree index structure for boundary point tracking, efficient point screening and stable output are achieved.

Benefits of technology

It improves detection accuracy and robustness, reduces costs, meets the real-time requirements of intelligent driving systems, enhances environmental adaptability, and reduces reliance on high-cost equipment such as lidar.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244814A_ABST
    Figure CN122244814A_ABST
Patent Text Reader

Abstract

This invention provides a method, system, electronic device, and medium for detecting drivable areas, belonging to the field of intelligent driving technology. The method includes: acquiring fisheye images captured by multiple fisheye cameras on a target vehicle; inputting the fisheye images into a perception model, the perception model outputting a semantic segmentation map corresponding to each fisheye image, wherein the pixel values ​​of the semantic segmentation map at least represent drivable or non-drivable areas; performing viewpoint transformation and stitching on the semantic segmentation maps to obtain a global bird's-eye view; and obtaining drivable area results from the global bird's-eye view by ray casting, wherein the drivable area results include the farthest drivable distance detected around the target vehicle and the coordinates of drivable area boundary points. The perception model of this invention can more accurately identify drivable areas and road markings; the post-processing process improves the accuracy of boundary points through multi-step processing and filtering.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of intelligent driving technology, and in particular to a method and system for detecting drivable areas, electronic equipment, and media. Background Technology

[0002] With the rapid development of intelligent driving technology, drivable area detection has become a core component in ensuring safe autonomous driving and obstacle avoidance. This technology fuses and analyzes data collected from multiple sensors (such as cameras, LiDAR, and millimeter-wave radar) to accurately identify which areas around the vehicle are passable and which contain obstacles or potential hazards. Through this process, the system can not only build a real-time model of the vehicle's surrounding environment but also dynamically update the drivable space, providing a reliable basis for subsequent path planning, trajectory optimization, and autonomous decision-making. In other words, drivable area detection is equivalent to building a "dynamic safety map" for the vehicle, enabling the autonomous driving system to correctly judge and flexibly avoid obstacles in complex road scenarios, ensuring the safety of occupants and other road users. With the continuous improvement of related algorithms and hardware performance, drivable area detection technology will become more intelligent, stable, and efficient in the future, becoming an indispensable foundational capability in intelligent driving systems.

[0003] Existing drivable area detection technologies have limited accuracy and are prone to false detection or omission of unstructured road boundaries; their real-time performance is poor under high-speed and high-resolution conditions or with multi-source data; and some drivable area detection technologies require the introduction of equipment such as lidar, resulting in high costs. Summary of the Invention

[0004] This invention aims to solve at least one of the above-mentioned problems in the prior art, and proposes an improved method for detecting drivable areas, which can effectively integrate shallow and deep features, improve detection accuracy and robustness; efficiently screen points under monocular conditions, reduce costs and enhance environmental adaptability; and achieve stable output of results and meet real-time requirements.

[0005] In a first aspect, embodiments of the present invention provide a method for detecting drivable areas, including:

[0006] Acquire fisheye images of the target vehicle captured by multiple fisheye cameras;

[0007] The fisheye image is input into the perception model, and the perception model outputs a semantic segmentation map corresponding to each fisheye image. The pixel values ​​of the semantic segmentation map at least represent drivable or non-drivable areas.

[0008] The semantic segmentation map is transformed and stitched together to obtain a global bird's-eye view;

[0009] The drivable area is obtained by ray casting from the global bird's-eye view. The drivable area results include the farthest drivable distance detected around the target vehicle and the coordinates of the drivable area boundary points.

[0010] In a preferred embodiment, the perception model includes: a backbone network, a decoding head, and an auxiliary head;

[0011] The backbone network uses RepVGG as the feature extraction network.

[0012] The decoding head adopts a depth-separable hollow space pyramid pooling structure;

[0013] The auxiliary head is used to participate in the training process;

[0014] The pixel values ​​of the semantic segmentation map include drivable area detection category and road sign category, wherein the drivable area detection category is used to identify whether the pixel value represents a drivable area or a non-drivable area.

[0015] In a preferred embodiment, the step of performing perspective transformation and stitching on the semantic segmentation map to obtain a global bird's-eye view includes:

[0016] The semantic segmentation maps corresponding to each fisheye image are remapped using a linear interpolation method, and the semantic segmentation maps are transformed into two different bird's-eye view layers through a pre-computed mapping table;

[0017] Different regions of interest are merged to obtain a global bird's-eye view.

[0018] In a preferred embodiment, the method further includes configuring an intrinsic parameter matrix, distortion coefficients, rotation matrix, transformation matrix, and homography matrix for each camera, and generating a corresponding mapping table.

[0019] In a preferred embodiment, the step of obtaining a drivable area result from the global bird's-eye view by ray casting, wherein the drivable area result includes the farthest drivable distance detected around the target vehicle and the coordinates of boundary points, includes:

[0020] Traverse the predefined set of radial detection lines, and check the pixel values ​​in the global bird's-eye view mask point by point from the vehicle center outward for each detection line. Record the detected boundary point information, including edge identifier, type identifier, and camera identifier. Calculate the position coordinates of the boundary points in the vehicle coordinate system. Convert the bird's-eye view coordinates back to the original image coordinates through a pre-calculated mapping table. At the same time, calculate the distance from the boundary points to the vehicle center. Obtain the drivable area result based on all boundary point information.

[0021] In a preferred embodiment, the method further includes: constructing a KD-Tree index structure to perform time-series tracking and filtering of drivable area boundary points.

[0022] In a preferred embodiment, the method further includes: extracting boundary points from the semantic segmentation map corresponding to each fisheye image along the projection line of the corresponding camera.

[0023] In a second aspect, embodiments of the present invention provide a drivable area detection system configured to implement any of the methods described in the first aspect, the system comprising:

[0024] The acquisition module is used to acquire fisheye images of the target vehicle captured by multiple fisheye cameras;

[0025] The perception model prediction module is used to input the fisheye image into the perception model, and the perception model outputs a semantic segmentation map corresponding to each fisheye image. The pixel values ​​of the semantic segmentation map at least represent drivable or non-drivable areas.

[0026] The post-processing module is used to perform perspective transformation and stitching on the semantic segmentation map to obtain a global bird's-eye view;

[0027] The boundary point detection module is used to obtain the drivable area result by ray-mapping the global bird's-eye view. The drivable area result includes the farthest drivable distance detected around the target vehicle and the coordinates of the drivable area boundary points.

[0028] Thirdly, embodiments of the present invention provide an electronic device, including:

[0029] One or more processors;

[0030] Memory, used to store one or more programs;

[0031] When the one or more programs are executed by the one or more processors, the one or more processors implement any of the methods described in the first aspect.

[0032] Fourthly, embodiments of the present invention provide a computer-readable medium storing a computer program that, when executed by a processor, implements the steps of any of the methods described in the first aspect.

[0033] Beneficial effects of this invention:

[0034] 1. Improved detection accuracy: The perception model of this invention adopts a deep learning algorithm, combined with the Depthwise Separable ASPPHead and unet structure, to make full use of the contextual information of the image, which can more accurately identify drivable areas and road markings; the post-processing process further removes abnormal points through multi-step processing and filtering, thereby improving the accuracy of boundary points.

[0035] 2. Enhanced Robustness: The boundary point tracking process of this invention utilizes KD-Tree for spatial matching, and assesses point stability through multi-frame accumulation, effectively filtering out occasional detection errors and noise points, thus improving the spatiotemporal continuity and stability of boundary points. Even in cases of partial frame loss, as long as the point reappears in subsequent frames and its location is close, it can still be identified as the same physical point, enhancing its adaptability to interference such as occlusion and changes in lighting.

[0036] 3. Ensuring real-time performance: The overall technical solution of this invention is compact in design, and the post-processing and boundary point tracking algorithms are highly efficient. In particular, the use of KD-Tree accelerates spatial retrieval, making it suitable for real-time processing of large-scale point sets and able to meet the real-time requirements of intelligent driving vehicles for drivable area detection.

[0037] 4. Reduced costs: This invention mainly processes image information collected by a camera, without relying on expensive equipment such as LiDAR, which helps reduce the overall cost of intelligent driving systems and facilitates large-scale application. Attached Figure Description

[0038] Figure 1 This is a schematic diagram of the overall process of a drivable area detection method provided in an embodiment of the present invention.

[0039] Figure 2 This is a schematic flowchart of a drivable area detection method provided in an embodiment of the present invention.

[0040] Figure 3 This is a schematic diagram of the region of interest division provided in an embodiment of the present invention.

[0041] Figure 4 This is a schematic diagram of the time-series tracking and filtering process for boundary points based on the KD-Tree index structure provided in an embodiment of the present invention.

[0042] Figure 5 This is a structural block diagram of an electronic device provided in an embodiment of the present invention. Detailed Implementation

[0043] To enable those skilled in the art to better understand the technical solutions of the present invention, exemplary embodiments of the present invention are described below in conjunction with the accompanying drawings, including various details of the embodiments of the present invention to aid understanding. These should be considered merely exemplary. Therefore, those skilled in the art should recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the present invention. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description.

[0044] Where there is no conflict, the various embodiments of the present invention and the features thereof may be combined with each other.

[0045] As used herein, the term “and / or” includes any and all combinations of one or more related enumerated entries.

[0046] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used herein, the singular forms “a” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that when the terms “comprising” and / or “made of” are used in this specification, the presence of the stated feature, integral, step, operation, element, and / or component is specified, but the presence or addition of one or more other features, integrals, steps, operations, elements, components, and / or groups thereof is not excluded. Terms such as “connected” or “linked” are not limited to physical or mechanical connections but can include electrical connections, whether direct or indirect.

[0047] Unless otherwise specified, all terms used herein (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art. It will also be understood that terms such as those defined in commonly used dictionaries should be interpreted as having the meaning consistent with their meaning in the context of the relevant art and the invention, and will not be interpreted as having an idealized or overly formal meaning unless expressly so defined herein.

[0048] In the technical solution of this invention, the collection, storage, use, processing, transmission, provision, and disclosure of user personal information all comply with relevant laws and regulations and do not violate public order and good morals. The use of user data in this technical solution follows relevant national laws and regulations (e.g., the "Information Security Technology - Personal Information Security Specification"). For example: appropriate measures are taken for personal information access control; restrictions are imposed on the display of personal information; the purpose of using personal information does not exceed the scope of direct or reasonable association; and explicit identity targeting is eliminated when using personal information to avoid precisely locating a specific individual.

[0049] In this invention, abbreviations and key terms are defined as follows:

[0050] FSD: Freespace Detection.

[0051] BEV: Bird's Eye View.

[0052] KD-Tree: K-Dimensional Tree (a spatial indexed data structure).

[0053] ROI: Region of Interest.

[0054] Backbone (or backbone network).

[0055] Decode Head.

[0056] Auxiliary Head.

[0057] ASPP: Atrous Spatial Pyramid Pooling.

[0058] Depthwise Separable Convolution: Decomposing standard convolution into depthwise convolution and pointwise convolution can significantly reduce the amount of computation and parameters while maintaining good performance.

[0059] DepthwiseSeparableASPP: Depthwise separable spatial pyramid pooling replaces the standard convolutions in ASPP with depthwise separable convolutions, thus reducing computational cost while maintaining multi-scale context capture capabilities.

[0060] OpenCV is an open-source computer vision library.

[0061] KD-Tree is a binary search tree data structure used for efficiently organizing point data in a K-dimensional space.

[0062] One related technology proposes a real-time detection method and system for drivable areas of vehicles. The core steps of this technical solution include: an onboard binocular camera acquiring grayscale images of the left and right sides ahead; calculating the disparity map and converting it into a V-disparity map; binarizing the V-disparity, fitting piecewise straight lines using RANSAC, and performing multi-frame smoothing; and mapping these straight lines back to the original image to obtain the drivable area. This solution emphasizes lower requirements for disparity accuracy, attempting to improve real-time performance through geometric fitting. However, this technical solution has the following drawbacks: 1. Limited accuracy: V-disparity + straight line fitting assumes the road is approximately planar / regularly bounded, making it prone to false positives / false negatives when facing unstructured road surfaces, slopes / undulations, damaged shoulders, water accumulation / shadows; 2. Weak environmental adaptability: binocular matching is sensitive to low texture / strong reflection / nighttime / rain / fog, and disparity is easily degraded, reducing robustness; 3. Real-time limitations: although the disparity accuracy requirement is weakened, disparity calculation + RANSAC fitting + temporal smoothing still puts pressure on embedded SoCs, making it difficult to guarantee frame rates under high-speed and high-resolution conditions.

[0063] The second related technology proposes a road drivable area detection method based on the fusion of monocular vision and LiDAR. The core steps of this technical solution include: monocular vision and LiDAR point cloud fusion: image segmentation using superpixels to extract visual and point cloud features; fusion of multiple features within a Bayesian framework to determine the final road drivable area. However, this technical solution has the following drawbacks: 1. High cost: Introducing LiDAR significantly increases BOM and maintenance costs; it also introduces complexity in cross-sensor spatiotemporal calibration, installation, and maintenance; 2. Real-time pressure: Multi-source data time alignment, point cloud-image registration, feature calculation, and Bayesian fusion place a heavy load on automotive-grade embedded platforms, making it difficult to balance high frame rate and low latency; 3. Environmental adaptability is not universal: LiDAR point cloud quality degrades in distant sparse / rainy / snowy scenes; monocular vision is prone to degradation in strong light / nighttime; if the fusion strategy does not adaptively adjust weights, robustness issues will still arise. 4. Accuracy Boundaries: Superpixel segmentation is sensitive to texture and lighting; when the transition between roads and curbs / shoulders is blurred, or when there are unpaved roads, the accuracy of classification and region boundaries may be insufficient.

[0064] In summary, the drivable area detection technology involved in the relevant technologies faces a difficult technical challenge in balancing accuracy, cost, robustness, and real-time performance.

[0065] This invention proposes a drivable area detection method. By using RepVGG as the backbone network and combining Depwise Separable ASPPHead and U-Net structures, it effectively integrates shallow and deep features, improving detection accuracy and robustness. Employing remapping, single-camera field-of-view filtering, and radial detection filtering strategies, it can efficiently screen points under monocular conditions, reducing costs and enhancing environmental adaptability. Simultaneously, it utilizes KD-Tree for efficient spatial matching, achieving stable output and meeting real-time requirements, thus solving the technical challenge of balancing accuracy, cost, robustness, and real-time performance in existing technologies.

[0066] like Figure 1 As shown, the overall process of a drivable area detection method provided in this embodiment of the invention is as follows: acquiring fisheye images captured by multiple fisheye cameras; predicting a mask image using a perception model with RepVGG and DepthwiseSeparableASPPHead structures; performing viewpoint transformation and stitching on the mask image to obtain a BEV image; and calculating the boundary points around the vehicle based on the BEV image. The white dots represent the boundary points, corresponding to the contact points of the nearest obstacle during driving.

[0067] Figure 2 This is a flowchart illustrating a drivable area detection method provided in an embodiment of the present invention; as shown below. Figure 2 As shown, the method includes:

[0068] Step S1: Acquire fisheye images of the target vehicle captured by multiple fisheye cameras;

[0069] Step S2: Input the fisheye image into the perception model, and the perception model outputs a semantic segmentation map (mask map) corresponding to each fisheye image. The pixel values ​​of the semantic segmentation map at least represent drivable or non-drivable areas.

[0070] Step S3: Perform perspective transformation and stitching on the semantic segmentation map (Mask map) to obtain a global bird's-eye view (bev map).

[0071] Step S4: Obtain the drivable area result by ray casting the global bird's-eye view (BEV). The drivable area result includes the farthest drivable distance detected around the target vehicle and the coordinates of the drivable area boundary points.

[0072] The perception model of this invention employs a deep learning algorithm, combining a Depthwise Separable ASPPHead and a unet structure, to fully utilize the contextual information of the image, enabling more accurate identification of drivable areas and road markings. Post-processing further removes outliers through multi-step processing and filtering, improving the accuracy of boundary points.

[0073] In some embodiments, the perception model includes: a backbone network, a decoder head, and an auxiliary head.

[0074] The backbone network uses RepVGG as the feature extraction network, which can improve inference speed while maintaining accuracy. RepVGG has a simple and efficient structure, making it easy to deploy in an in-vehicle environment;

[0075] The decoder head employs a depthwise separable ASPP head structure. Its design combines dilated convolutional pyramids (ASPP) with global average pooling and multi-scale convolution to effectively capture multi-scale contextual information. Specifically, the stage 4 output of the backbone is first dimensionality-reduced, then multi-scale features are extracted using the ASPP module and fused with and upsampled from the stage 3 output. Subsequently, it is fused sequentially with the outputs of stage 2 and stage 1 to gradually restore spatial resolution, ultimately generating a prediction result with dimensions (N, num_classes, 384, 640). This structure allows the model to fully integrate shallow, low-resolution features with deep, high-resolution features, improving segmentation accuracy and robustness.

[0076] The auxiliary head is used to participate in the training process and assist in optimization, but does not participate in actual forward inference. Its structure includes: Detail Guidance and FPN Head; Detail Guidance: uses the features of stage 2 to correct the low-level details through detail loss, guiding the model to learn more accurate edge information; FPN Head: fuses the output features of each stage of the backbone (i.e. (N,32,96,160), (N,64,48,80), (N,96,24,40), (N,128,12,20)), providing additional supervision during the training phase and improving the overall feature representation ability;

[0077] The pixel values ​​of the semantic segmentation map (mask map) include drivable area detection category and road sign category, wherein the drivable area detection category is used to identify whether the pixel value represents a drivable area or a non-drivable area.

[0078] The perception model of this invention is optimized for the data features of the FSD (Free Space Detection) task: Auxiliary Heads are introduced to improve edge accuracy, recognizing the importance of edge information for detection; a multi-layer feature fusion mechanism is designed to retain and integrate information from stage 1 to stage 4, addressing the significant shape differences among FSD data categories; the model simultaneously covers both FSD and road sign categories, and compared to splitting them into two independent models, this solution significantly reduces resource consumption while maintaining accuracy, making it suitable for vehicle deployment.

[0079] In some embodiments, step S3, which involves performing perspective transformation and stitching on the semantic segmentation map (Mask map) to obtain a global bird's-eye view (BEV map), includes:

[0080] Step S31: Remap the semantic segmentation map (mask map) corresponding to each fisheye image using a linear interpolation method, and transform the semantic segmentation map to two different bird's-eye view layers through a pre-calculated mapping table;

[0081] Step S32: Fuse different regions of interest (ROIs) to obtain a global bird's-eye view (BEV), so that the global bird's-eye view includes a complete bird's-eye view mask.

[0082] First, the OpenCV `remap` function is used to transform the input masked image into two different bird's-eye view layers using a pre-computed mapping table. Linear interpolation is used in this process to ensure smooth remapping. To improve output quality and reduce jagged edges, the function applies Gaussian filtering to both remapped layers, using a 3x3 kernel size and a standard deviation of 0.5 to smooth the image while maintaining edge sharpness. Then, the function processes different regions of interest according to a specific fusion strategy, such as... Figure 3 The diagram shows nine regions of interest (ROIs). For ROIs numbered one, three, four, and six, the remapping result from the first layer is used directly. For the vehicle's own ROI, it is filled with zero values, indicating that the space occupied by the vehicle is impassable. The entire process achieves precise region control through pixel-by-pixel operations, ultimately generating a complete bird's-eye view mask. This mask accurately reflects the distribution of free space in the real-world coordinate system, providing crucial spatial information for subsequent path planning and obstacle avoidance.

[0083] In this step, the cv::remap function and two sets of mapping tables are used:

[0084] bev_map[0][0],bev_map[0][1],bev_map[1][0],bev_map[1][1],

[0085] bev_map[0][0],bev_map[0][1],bev_map[1][0],bev_map[1][1];

[0086] The input mask is remapped to generate intermediate results layer_0 and layer_1 from two different perspectives.

[0087] Direct merging rules for regions: For roi_1, roi_3, roi_4, and roi_6, directly copy the corresponding pixel values ​​from layer_0 to bev_mask; Region merging: For roi_0, roi_2, roi_5, and roi_7, use the pixel-by-pixel merging rules between layer_0 and layer_1.

[0088]

[0089] That is, if two mapping layers have the same label at a certain pixel, the label is retained; if they are different, the label is set to invalid label 0.

[0090] Vehicle self-region masking: For the vehicle's rectangular region ego_rect, force it to be zero:

[0091] .

[0092] In some embodiments, the method further includes configuring intrinsic parameter matrices, distortion coefficients, rotation matrices, transformation matrices, and homography matrices for each camera, and generating corresponding mapping tables.

[0093] In some embodiments, step S4, obtaining a drivable area result from the global bird's-eye view (BEV) by ray casting, wherein the drivable area result includes the farthest drivable distance detected around the target vehicle and the coordinates of boundary points, includes the following steps:

[0094] Traverse the predefined set of radial detection lines, and check the pixel values ​​in the global bird's-eye view mask point by point from the vehicle center outward for each detection line. Record the detected boundary point information, including edge identifier, type identifier, and camera identifier. Calculate the position coordinates of the boundary points in the vehicle coordinate system. Convert the bird's-eye view coordinates back to the original image coordinates through a pre-calculated mapping table. At the same time, calculate the distance from the boundary points to the vehicle center. Obtain the drivable area result based on all boundary point information.

[0095] The process involves using a radial detection filtering function (radialDet) to traverse a predefined set of radial detection lines (e.g., 360 radial detection lines around the vehicle, one for each angle). For each detection line, the function checks the pixel values ​​in the bird's-eye view mask point by point, starting from the vehicle center and moving outwards. When a valid free-space label is encountered, the judgeCam function determines which camera's field of view the point belongs to, and records the detected boundary point information, including edge markers, type markers, and camera markers, forming a set of 360 points. If no valid free-space boundary is found for the entire detection line, the boundary point is set as the last point of the detection line and marked as the road type. For each valid detection line, the function calculates the boundary point's position coordinates in the vehicle coordinate system, converts the bird's-eye view coordinates back to the original image coordinates using a pre-computed mapping table, and calculates the distance from the boundary point to the vehicle center. Finally, all detected boundary point information is organized into a VisionResultSingleFS structure, which contains complete spatial information such as vehicle coordinate system coordinates, image coordinate system coordinates, distance information, and camera identifiers. Therefore, this structure contains the farthest drivable distance detected at each angle around the vehicle in 360 degrees, and the coordinates of the boundary points, providing accurate boundary detection results for subsequent path planning and obstacle avoidance.

[0096] In some embodiments, the method further includes: step S5, constructing a KD-Tree index structure to perform time-series tracking and filtering of drivable area boundary points.

[0097] This invention combines traditional boundary point tracking methods with modern data structure algorithms. It achieves efficient nearest neighbor search by constructing a KD-Tree index structure, and introduces a multi-dimensional coordinate transformation mechanism to ensure continuous tracking of boundary points during vehicle movement.

[0098] like Figure 4 As shown, the present invention employs a dual stability assessment mechanism, which performs reliability assessment on each boundary point through stability counting and lifecycle management. The stability counting records the number of times the boundary point is continuously tracked, while the lifecycle management sets a maximum survival time for each point. When a point is successfully matched, its stability is increased, and when it fails, the relevant parameters are reset.

[0099] Furthermore, this invention implements an intelligent point set management strategy, including dynamic addition of new points, automatic deletion of timed-out points, and point filtering based on stability thresholds, and prevents duplicate matching through a conflict detection mechanism. Technically, this invention first performs coordinate system transformation between the boundary points of the current frame and points in historical frames, utilizing intelligent transformations between the global coordinate system, local coordinate system, and image coordinate system to ensure tracking continuity. Then, it performs fast nearest neighbor search using a KD-Tree, reducing the search complexity from the traditional O(n²) to O(log n), significantly improving computational efficiency. Another important feature of this invention is the optimization of the dynamic search range. By setting the search range to one-third of the historical point set, it ensures both comprehensiveness of the search and avoids unnecessary computational overhead.

[0100] An example of coordinate transformation in this invention is as follows:

[0101] Image coordinates to vehicle coordinates:

[0102]

[0103] Vehicle coordinates to image coordinates:

[0104]

[0105] In some embodiments, the method further includes step S3a, extracting boundary points from the semantic segmentation map (mask map) corresponding to each fisheye image along the projection line of the corresponding camera.

[0106] The single-camera view filtering function (singleCamDet) takes a bird's-eye view mask, a set of camera projection lines, and the output image as input parameters. It detects free space boundaries by traversing all points along each projection line. For each projection line, the function checks each point along the projection direction starting from the camera position. When a point with a pixel value greater than a road marker and less than or equal to the maximum free space marker is encountered, a circular marker with a radius of pixels is drawn on the output image centered on that point's coordinates, using the detected marker value as the fill color. The process then moves to the next projection line. This approach effectively projects the free space detection results from a single camera onto a unified bird's-eye view coordinate system. By quickly locating the free space boundaries through a search strategy along the projection lines, it provides accurate spatial boundary information for multi-camera fusion and final path planning.

[0107] The multi-step processing and filtering algorithm of this invention includes steps such as remapping, single-camera field-of-view filtering, and radial detection filtering strategy. These processes can effectively remove outliers and improve the accuracy of boundary points, which is the key to ensuring the reliability of detection results.

[0108] Based on the same inventive concept, embodiments of the present invention also provide a drivable area detection system, configured to implement any of the methods described in the above embodiments, the system comprising:

[0109] The acquisition module is used to acquire fisheye images of the target vehicle captured by multiple fisheye cameras;

[0110] The perception model prediction module is used to input the fisheye image into the perception model, and the perception model outputs a semantic segmentation map corresponding to each fisheye image. The pixel values ​​of the semantic segmentation map at least represent drivable or non-drivable areas.

[0111] The post-processing module is used to perform perspective transformation and stitching on the semantic segmentation map to obtain a global bird's-eye view;

[0112] The boundary point detection module is used to obtain the drivable area result by ray-mapping the global bird's-eye view. The drivable area result includes the farthest drivable distance detected around the target vehicle and the coordinates of the drivable area boundary points.

[0113] Based on the same inventive concept, embodiments of the present invention also provide an electronic device. Figure 5 This is a structural block diagram of an electronic device provided in an embodiment of the present invention. Figure 5As shown, an embodiment of the present invention provides an electronic device including: one or more processors 101, a memory 102, and one or more I / O interfaces 103. The memory 102 stores one or more programs, which, when executed by the one or more processors, cause the one or more processors to implement any of the methods described in the above embodiments; the one or more I / O interfaces 103 are connected between the processor and the memory, configured to enable information interaction between the processor and the memory.

[0114] The processor 101 is a device with data processing capabilities, including but not limited to a central processing unit (CPU); the memory 102 is a device with data storage capabilities, including but not limited to random access memory (RAM, more specifically SDRAM, DDR, etc.), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), and flash memory (FLASH); the I / O interface (read / write interface) 103 is connected between the processor 101 and the memory 102, and can realize information interaction between the processor 101 and the memory 102, including but not limited to a data bus (Bus).

[0115] In some embodiments, the processor 101, memory 102, and I / O interface 103 are interconnected via bus 104, and thus connected to other components of the computing device.

[0116] In some embodiments, the one or more processors 101 include a field-programmable gate array.

[0117] Based on the same inventive concept, embodiments of the present invention also provide a computer-readable medium. This computer-readable medium stores a computer program, wherein, when executed by a processor, the program implements the steps of any of the methods described in the above embodiments. The computer-readable storage medium may be a volatile or non-volatile computer-readable storage medium.

[0118] Those skilled in the art will understand that all or some of the steps, systems, and apparatuses disclosed above, and their functional modules / units, can be implemented as software, firmware, hardware, or suitable combinations thereof. In hardware implementations, the division between functional modules / units mentioned above does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be performed collaboratively by several physical components. Some or all physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit (ASIC). Such software can be distributed on a computer-readable storage medium, which may include computer storage media (or non-transitory media) and communication media (or transient media).

[0119] As is known to those skilled in the art, the term computer storage medium includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information, such as computer-readable program instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), static random access memory (SRAM), flash memory or other memory technologies, portable compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical disc storage, magnetic cartridges, magnetic tape, disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and is accessible to a computer. Furthermore, it is known to those skilled in the art that communication media typically contain computer-readable program instructions, data structures, program modules, or other data in modulated data signals such as carrier waves or other transmission mechanisms, and may include any information delivery medium.

[0120] The computer-readable program instructions described herein can be downloaded from computer-readable storage media to various computing / processing devices, or downloaded via a network, such as the Internet, local area network, wide area network, and / or wireless network, to an external computer or external storage device. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and / or edge servers. A network adapter card or network interface in each computing / processing device receives the computer-readable program instructions from the network and forwards them to the computer-readable storage media in the respective computing / processing device.

[0121] The computer program instructions used to perform the operations of this invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages ​​such as Smalltalk, C++, etc., and conventional procedural programming languages ​​such as the "C" language or similar programming languages. The computer-readable program instructions may be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or may be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, electronic circuitry, such as programmable logic circuitry, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), is personalized by utilizing state information from the computer-readable program instructions. This electronic circuitry can execute the computer-readable program instructions to implement various aspects of the invention.

[0122] The computer program product described herein can be implemented specifically through hardware, software, or a combination thereof. In one alternative embodiment, the computer program product is specifically embodied in a computer storage medium; in another alternative embodiment, the computer program product is specifically embodied in a software product, such as a software development kit (SDK), etc.

[0123] Various aspects of the present invention are described herein with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-readable program instructions.

[0124] These computer-readable program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processor of the computer or other programmable data processing apparatus, they create means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner; thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.

[0125] Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.

[0126] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction, which contains one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than those shown in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.

[0127] Example embodiments have been disclosed herein, and while specific terminology has been used, it is for illustrative purposes only and should be construed as such, and is not intended to be limiting. In some instances, it will be apparent to those skilled in the art that features, characteristics, and / or elements described in conjunction with particular embodiments may be used alone, or in combination with features, characteristics, and / or elements described in conjunction with other embodiments, unless otherwise expressly indicated. Therefore, those skilled in the art will understand that various changes in form and detail may be made without departing from the scope of the invention as set forth in the appended claims.

Claims

1. A travelable area detection method characterized by comprising: include: Acquire fisheye images of the target vehicle captured by multiple fisheye cameras; The fisheye image is input into the perception model, and the perception model outputs a semantic segmentation map corresponding to each fisheye image. The pixel values ​​of the semantic segmentation map at least represent drivable or non-drivable areas. The semantic segmentation map is transformed and stitched together to obtain a global bird's-eye view; The drivable area is obtained by ray casting from the global bird's-eye view. The drivable area results include the farthest drivable distance detected around the target vehicle and the coordinates of the drivable area boundary points.

2. The method of claim 1, wherein, The perception model includes: a backbone network, a decoder head, and an auxiliary head; The backbone network uses RepVGG as the feature extraction network. The decoding head adopts a depth-separable hollow space pyramid pooling structure; The auxiliary head is used to participate in the training process; The pixel values ​​of the semantic segmentation map include drivable area detection category and road sign category, wherein the drivable area detection category is used to identify whether the pixel value represents a drivable area or a non-drivable area.

3. The method according to claim 1 or 2, characterized in that, The step of performing perspective transformation and stitching on the semantic segmentation map to obtain a global bird's-eye view includes: The semantic segmentation maps corresponding to each fisheye image are remapped using a linear interpolation method, and the semantic segmentation maps are transformed into two different bird's-eye view layers through a pre-computed mapping table; Different regions of interest are merged to obtain a global bird's-eye view.

4. The method according to claim 3, characterized in that, Also includes: Configure the intrinsic parameter matrix, distortion coefficients, rotation matrix, transformation matrix and homography matrix for each camera, and generate the corresponding mapping table.

5. The method according to claim 3, characterized in that, The step of obtaining the drivable area result by ray casting from the global bird's-eye view, wherein the drivable area result includes the farthest drivable distance detected around the target vehicle and the coordinates of boundary points, includes: Traverse the predefined set of radial detection lines, and check the pixel values ​​in the global bird's-eye view mask point by point from the vehicle center outward for each detection line. Record the detected boundary point information, including edge identifier, type identifier, and camera identifier. Calculate the position coordinates of the boundary points in the vehicle coordinate system. Convert the bird's-eye view coordinates back to the original image coordinates through a pre-calculated mapping table. At the same time, calculate the distance from the boundary points to the vehicle center. Obtain the drivable area result based on all boundary point information.

6. The method according to claim 5, characterized in that, It also includes: constructing a KD-Tree index structure to perform time-series tracking and filtering of drivable area boundary points.

7. The method according to claim 3, characterized in that, Also includes: Boundary points are extracted from the semantic segmentation maps corresponding to each fisheye image along the projection lines of the corresponding camera.

8. A drivable area detection system, characterized in that, The system, configured to implement the method as described in any one of claims 1 to 7, comprises: The acquisition module is used to acquire fisheye images of the target vehicle captured by multiple fisheye cameras; The perception model prediction module is used to input the fisheye image into the perception model, and the perception model outputs a semantic segmentation map corresponding to each fisheye image. The pixel values ​​of the semantic segmentation map at least represent drivable or non-drivable areas. The post-processing module is used to perform perspective transformation and stitching on the semantic segmentation map to obtain a global bird's-eye view; The boundary point detection module is used to obtain the drivable area result by ray-mapping the global bird's-eye view. The drivable area result includes the farthest drivable distance detected around the target vehicle and the coordinates of the drivable area boundary points.

9. An electronic device, characterized in that, include: One or more processors; Memory, used to store one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any one of claims 1 to 7.

10. A computer-readable medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method as described in any one of claims 1 to 7.