Method, device, electronic device and storage medium for detecting a drivable area
By using cameras to acquire images from different perspectives and inputting them into the drivable area detection model, the problems of relying on lidar sensors and large computational load in existing technologies are solved, and efficient and accurate drivable area detection is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NEUSOFT REACH AUTOMOBILE TECH (SHENYANG) CO LTD
- Filing Date
- 2023-09-24
- Publication Date
- 2026-06-12
AI Technical Summary
Existing methods for detecting drivable areas rely on expensive lidar sensors and involve large computational loads, resulting in poor detection efficiency.
By acquiring multiple images from different perspectives captured by cameras on the target vehicle at the same time, and inputting them into a pre-generated drivable area detection model, target key point information is obtained to determine the drivable area.
While reducing detection costs, it improves the efficiency and accuracy of detection in drivable areas.
Smart Images

Figure CN117274934B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of autonomous driving technology, and in particular to a method, apparatus, electronic device, and storage medium for detecting drivable areas. Background Technology
[0002] Driving area detection plays a crucial role in both autonomous driving and driver assistance systems. Currently, driving area detection methods typically involve acquiring point cloud data of the area surrounding the vehicle, classifying each point in the point cloud data, and then determining the driving area within the surrounding area based on the classification results of each point.
[0003] The above method requires the installation of expensive LiDAR sensors on the vehicle to collect point cloud data. At the same time, the computational workload for classifying each point in the point cloud data is also very large. Therefore, the calculation efficiency for the drivable area around the vehicle is also poor. Summary of the Invention
[0004] This application provides a method, apparatus, electronic device, and storage medium for detecting drivable areas, which can solve the problem of poor efficiency in existing drivable area detection methods.
[0005] Firstly, a method for detecting drivable areas is provided, including:
[0006] Acquire multiple images to be processed; these images include multiple images taken simultaneously from different perspectives by the camera on the target vehicle, targeting the area surrounding the target vehicle.
[0007] Multiple images to be processed are input into a pre-generated drivable area detection model to obtain multiple target key point information of the drivable area output by the drivable area detection model; the target key point information includes the confidence information, offset information and embedding feature information corresponding to the target key points;
[0008] Based on information from multiple key target points, determine the drivable area corresponding to the target vehicle in the surrounding area.
[0009] Secondly, a device for detecting drivable areas is provided, comprising:
[0010] The image acquisition module is used to acquire multiple images to be processed; the multiple images to be processed include multiple images from different perspectives captured by the camera on the target vehicle at the same time for the area surrounding the target vehicle;
[0011] The target key point information acquisition module is used to input multiple images to be processed into a pre-generated drivable area detection model to obtain multiple target key point information of the drivable area output by the drivable area detection model; the target key point information includes the confidence information, offset information and embedding feature information corresponding to the target key points;
[0012] The drivable area determination module is used to determine the drivable area of the target vehicle in the surrounding area based on multiple target key point information.
[0013] Thirdly, an electronic device is provided, comprising: a processor and a memory for storing a computer program, the processor for calling and running the computer program stored in the memory, and performing methods as described in the first aspect, the second aspect, or various implementations thereof.
[0014] Fourthly, a computer-readable storage medium is provided for storing a computer program that causes a computer to perform methods as described in the first aspect, the second aspect, or various implementations thereof.
[0015] The technical solution provided in this application first acquires multiple images of the target vehicle's surrounding area taken simultaneously from different perspectives by a camera on the target vehicle. Then, these multiple images are input into a pre-generated drivable area detection model to obtain multiple target key point information of the drivable area output by the model. Finally, based on this target key point information, the corresponding drivable area of the target vehicle within its surrounding region is determined. This method enables rapid and accurate determination of the target vehicle's drivable area by acquiring only multiple images of the surrounding area from different perspectives, thereby reducing the detection cost of the target vehicle's drivable area while improving the detection efficiency of the target vehicle's drivable area information.
[0016] Other features and advantages of this disclosure will be described in detail in the following detailed description section. Attached Figure Description
[0017] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0018] Figure 1 An application scenario diagram provided for an embodiment of this application;
[0019] Figure 2 A flowchart illustrating a method for detecting a drivable area provided in this application embodiment;
[0020] Figure 3 A flowchart illustrating another method for detecting a drivable area provided in this application embodiment;
[0021] Figure 4 A schematic diagram of a drivable area detection device provided in an embodiment of this application;
[0022] Figure 5 This is a schematic block diagram of the electronic device provided in the embodiments of this application. Detailed Implementation
[0023] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.
[0024] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or server that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or devices.
[0025] As mentioned above, traditional methods for detecting drivable areas typically involve installing expensive LiDAR sensors on vehicles. After acquiring point cloud data within a designated area of the vehicle, each point in the point cloud data needs to be categorized to determine its category. Then, by combining the categories of each point, the drivable area information within the designated area is determined. In the above method, it is necessary to determine the category of each point in the point cloud data, which involves a large amount of computation and poor detection efficiency.
[0026] Although the development of research on BEV (Bird's Eye View) has led to the emergence of some algorithms that directly predict the detection results of drivable areas around a vehicle from a BEV perspective, these BEV-based drivable area detection methods can avoid the need to install expensive LiDAR sensors on vehicles, making them a feasible and economical autonomous driving solution. However, most existing BEV-based drivable area detection methods either have poor performance or require a large amount of resources to perform drivable area inference. Therefore, it is evident that BEV-based drivable area detection methods are inefficient in detecting drivable areas.
[0027] To address the aforementioned technical problems, the inventive concept of this application is as follows: an electronic device can acquire multiple images from different perspectives captured simultaneously by cameras on multiple target vehicles regarding the area surrounding the target vehicle; then, these multiple images from different perspectives are input into a drivable area detection model to obtain multiple target key point information of the drivable area output by the drivable area detection model; and finally, based on the multiple target key point information, the corresponding drivable area of the target vehicle within its surrounding area is determined. This application can determine the drivable area information of a target vehicle by acquiring only multiple images from different perspectives of the area surrounding the target vehicle, thereby reducing the detection cost of the drivable area while improving the detection efficiency of the target vehicle's drivable area.
[0028] It should be understood that the technical solution of this application can be applied to the following scenarios, but is not limited to:
[0029] In some possible ways, Figure 1 An application scenario diagram provided for an embodiment of this application, such as... Figure 1 As shown, this application scenario may include electronic device 110 and network device 120. Electronic device 110 can establish a connection with network device 120 through a wired network or a wireless network.
[0030] For example, electronic device 110 may be a desktop computer, laptop computer, tablet computer, etc., but is not limited thereto. Network device 120 may be a terminal device or a server, but is not limited thereto. In one embodiment of this application, electronic device 110 may send a request message to network device 120, which may be used to request the acquisition of multiple images taken by a camera on the target vehicle at the same time from different perspectives of the area surrounding the target vehicle. Further, electronic device 110 may receive a response message sent by network device 120, which includes multiple images taken by the camera at the same time from different perspectives of the area surrounding the target vehicle.
[0031] also, Figure 1 An electronic device 110 and a network device 120 are provided as examples, but other numbers of electronic devices and network devices may be included in practice, and this application does not limit this.
[0032] In other possible implementations, the technical solution of this application may also be executed by the aforementioned electronic device 110, or by the aforementioned network device 120, and this application does not impose any restrictions on this.
[0033] After introducing the application scenarios of the embodiments of this application, the technical solution of this application will be described in detail below:
[0034] Figure 2 A flowchart illustrating a method for detecting a drivable area provided in this application embodiment, the method can be performed by, for example... Figure 1 The electronic device 110 shown performs, but is not limited to, its functions. For example... Figure 2 As shown, the method may include the following steps:
[0035] S210. Obtain multiple images to be processed.
[0036] Here, multiple images to be processed include multiple images taken at the same time by cameras on the target vehicle from different perspectives of the area surrounding the target vehicle.
[0037] In this embodiment, the camera can be a fisheye camera, and correspondingly, the image can be a fisheye image. There can be four fisheye cameras, which can be respectively positioned at the front, rear, left, and right of the target vehicle body. This allows the fisheye cameras located at the front, rear, left, and right of the vehicle body to simultaneously acquire fisheye images corresponding to the front, rear, left, and right perspectives of the area surrounding the target vehicle. Here, the fisheye image acquired by the fisheye camera is a circular fisheye image with an ultra-large field of view, enabling panoramic image acquisition of the area surrounding the target vehicle.
[0038] It should be noted that the collected fisheye images can include images of the target vehicle in various weather conditions such as cloudy, sunny, snowy, rainy, and foggy weather, as well as images of the target vehicle in multiple scenarios such as underground parking garages, highways, and roads.
[0039] S220. Input multiple images to be processed into a pre-generated drivable area detection model to obtain multiple target key point information of the drivable area output by the drivable area detection model.
[0040] Here, the target key point information includes the confidence information, offset information, and embedded feature information corresponding to the target key points.
[0041] It should be noted that by analyzing the confidence information, offset information, and embedded feature information corresponding to the target key points, the location and category information of the target key points can be obtained, which facilitates the determination of drivable area information. These target key points can include points on lane lines, points on the road edges, and grounding points of obstacles. Therefore, obtaining multiple target key point information of the drivable area output by the drivable area detection model—that is, obtaining the confidence information, offset information, and embedded feature information corresponding to the target key points output by the drivable area detection model—allows for the determination of lane information, obstacle information, and road boundary information within the area surrounding the target vehicle based on the target key points.
[0042] In this step, after processing multiple images, the images can be input into a pre-generated drivable area detection model. The drivable area detection model determines the confidence information, offset information, and embedded feature information corresponding to multiple target key points in the drivable area of the target vehicle, so as to facilitate the determination of the drivable area in step S230.
[0043] S230. Based on multiple target key point information, determine the corresponding drivable area of the target vehicle in the surrounding area.
[0044] In this step, the drivable area corresponding to the target vehicle in the surrounding area can be determined by the following method: First, determine the lane information, obstacle information, and road boundary information in the surrounding area based on multiple target key point information; then, determine the shape information, location information, and boundary information of the drivable area based on multiple lane information, obstacle information, road boundary information, and the location information of the target key points; finally, use the shape information, location information, and boundary information of the drivable area as the drivable area information to determine the drivable area corresponding to the target vehicle in the surrounding area.
[0045] Furthermore, the method for detecting drivable areas provided in this embodiment also includes: performing driving control processing on the vehicle based on drivable area information in the surrounding area. For example, the driving direction and steering angle of the target vehicle can be adjusted based on preset drivable area information in the surrounding area; and then driving control processing on the target vehicle can be performed based on the driving direction and steering angle.
[0046] Using the above method, multiple images of the vehicle body from the front, rear, left, and right (or any number of views) acquired at the same time can be input into a pre-generated drivable area detection model to obtain multiple target key point information. Based on this multiple target key point information, the drivable area information of the target vehicle in the surrounding area can be accurately determined. This allows the drivable area of the target vehicle to be determined by acquiring multiple images of the surrounding area of the target vehicle from different views, thereby reducing the detection cost of the target vehicle's drivable area and improving the detection efficiency.
[0047] In some possible implementations, the drivable area detection model is trained using multiple sample sets, which include multiple sample images from different perspectives collected around the target vehicle; correspondingly, the drivable area detection model provided in this application embodiment can be pre-generated in the following ways:
[0048] S310. Use multiple sample images from multiple sample sets to iteratively train the drivable area detection model.
[0049] In this step, the type of the drivable area detection model can be determined based on the category information corresponding to the drivable area to be trained, and multiple sample images corresponding to the category information can be obtained. For example, if the type information of the drivable area is lane line information, the type of the drivable area detection model can be determined based on the lane line information, and multiple sample images corresponding to the lane line information can be obtained.
[0050] S320. Adjust the parameters of the drivable area detection model based on the loss function to obtain the pre-generated drivable area detection model.
[0051] In this step, the parameters of the drivable area detection model are adjusted based on the loss function until the loss value of the trained drivable area detection model meets the preset conditions, so that the trained drivable area detection model can be used as the pre-generated drivable area detection model in step S220.
[0052] By adopting the above implementation method, the drivable area detection model is iteratively trained using multiple sample images from multiple sample sets, and then the parameters of the drivable area detection model are adjusted based on the loss function to obtain a pre-generated drivable area detection model, so as to make the target key point information obtained by the drivable area detection model trained by this embodiment more accurate.
[0053] Furthermore, adjusting the parameters of the drivable area detection model based on the loss function to obtain the pre-generated drivable area detection model can include the following steps:
[0054] S410: Acquire multi-frame point cloud data collected by the vehicle's body radar to obtain real key point information.
[0055] In this step, by analyzing the multi-frame point cloud data collected by the vehicle's radar, we can obtain the real key point information corresponding to lane lines, obstacles, and roadside curbs within the drivable area. This real key point information includes the real location information and real category information of the real key points.
[0056] S420. Based on the target key point information and real key point information output by the drivable area detection model, calculate the loss value of the drivable area detection model based on the loss function.
[0057] Here, based on the target key point information output by the drivable area detection model, the location and type information of the target key points can be determined. Then, based on the location and type information of the target key points determined above and the real location and real category information in the real key point information, the loss value of the drivable area detection model is calculated based on the loss function. This allows the difference between the target key point information and the real key point information to be determined through the calculated loss value, so that the parameters of the drivable area detection model can be adjusted according to the loss value.
[0058] S430. Adjust the parameters of the drivable area detection model based on the loss value until the loss value of the adjusted drivable area detection model meets the preset conditions, so as to use the adjusted drivable area detection model as the pre-generated drivable area detection model.
[0059] In this step, the parameters of the drivable area detection model can be adjusted according to the loss value until the drivable area detection model converges, that is, the detection results obtained by the trained drivable area detection model are within a preset range, so as to make the detection results obtained by the trained drivable area detection model in this embodiment more accurate.
[0060] Using the above implementation method, firstly, multi-frame point cloud data collected by the vehicle's radar is used to obtain real key point information; then, based on the target key point information and real key point information output by the drivable area detection model, the loss value of the drivable area detection model is calculated based on the loss function; finally, the parameters of the drivable area detection model are adjusted based on the loss value until the loss value of the adjusted drivable area detection model meets the preset conditions, so as to obtain a more accurate drivable area detection model.
[0061] In other possible implementations, the drivable area detection model may include a bird's-eye view feature extraction model and a prediction network model; Figure 3A flowchart illustrating another method for detecting a drivable area provided in an embodiment of this application. Based on... Figure 2 ,like Figure 3 As shown, the above S220 may include the following steps:
[0062] S510. Input multiple images to be processed into the bird's-eye view feature extraction model to obtain the bird's-eye view image features output by the bird's-eye view feature extraction model.
[0063] In this implementation, the multiple images to be processed are input into the bird's-eye view feature extraction model. While fusing the multiple images, the model transforms each image into a bird's-eye view feature image. This model is a deep neural network utilizing an attention mechanism, comprising an encoder and a decoder. The bird's-eye view feature extraction model aggregates image features from different perspectives to obtain bird's-eye view image features.
[0064] S520. Input the bird's-eye view image features into the prediction network model to obtain the confidence information, offset information and embedding feature information of the target key points output by the prediction network model.
[0065] Here, confidence information is used to determine the probability that the target key point exists at the target location, offset information is used to determine the specific location of the target key point based on the coordinate offset of the target key point relative to the origin, and embedded feature information is used to determine the category information of the target key point.
[0066] In this step, by inputting the features of the bird's-eye view image into the prediction network model, the confidence information, offset information, and embedding feature information corresponding to the target key points under the bird's-eye view can be obtained, so as to determine the drivable area through the aforementioned confidence information, offset information, and embedding feature information corresponding to the target key points.
[0067] It should be noted that the embedded feature information here can determine which category the target key point belongs to, that is, whether the target key point belongs to the lane line or the obstacle. If the target key point belongs to the obstacle, then the specific type of obstacle is determined, so as to realize the specific information of the target key point.
[0068] Using the above implementation method, multiple images to be processed are input into the bird's-eye view feature extraction model to obtain the bird's-eye view image features output by the bird's-eye view feature extraction model; then the bird's-eye view image features are input into the prediction network model to directly obtain the confidence information, offset information and embedding feature information corresponding to the target key points under the bird's-eye view output by the prediction network model, so as to more accurately detect the drivable area of the target vehicle.
[0069] Furthermore, the bird's-eye view feature extraction model may include an image feature extraction module, a visual converter network, and a bird's-eye view feature extraction module; correspondingly, the above-mentioned S510 may include the following steps:
[0070] S610. Input multiple images to be processed into the image feature extraction module to obtain the feature images corresponding to the images to be processed.
[0071] In this step, the images corresponding to each viewpoint of the target vehicle are input into the image feature extraction module. The image feature extraction module then extracts features from these images to obtain a feature image corresponding to each image. Here, each feature image is labeled with its corresponding camera and two-dimensional coordinate system information.
[0072] S620. Input the feature image into the visual converter network for feature fusion to convert the image to be processed into a bird's-eye view image.
[0073] In this step, the multiple feature images obtained above are input into a visual converter network. While fusing the multiple feature images, the visual converter network transforms each feature image into a bird's-eye view. The visual converter network is a deep neural network utilizing an attention mechanism, comprising an encoder and a decoder. Through the visual converter network, image features from different perspectives can be aggregated to obtain a bird's-eye view.
[0074] For example, inputting the feature image into a visual converter network for feature fusion to convert the image to be processed into a bird's-eye view image can include the following method: obtaining a pre-calculated projection index to use the projection index as a mapping table for the visual converter; the projection index includes the projection relationship between each voxel in the three-dimensional voxel space and each camera index, as well as the two-dimensional coordinate system of the corresponding image features; processing the feature image according to the mapping table to obtain a three-dimensional bird's-eye view image.
[0075] S630. Use the bird's-eye view feature extraction model to extract features from the bird's-eye view image to obtain the bird's-eye view features corresponding to the image to be processed.
[0076] The further feature extraction in this step involves performing a depthwise convolution operation on the generated bird's-eye view image to extract more features. Since the bird's-eye view image is a three-dimensional image, the features here are also three-dimensional spatial features from a bird's-eye view perspective.
[0077] Using the above implementation method, multiple images to be processed are input into the image feature extraction module to obtain a two-dimensional feature image corresponding to the image to be processed; then the feature image is input into the visual converter network for feature fusion to obtain a three-dimensional bird's-eye view image; finally, the bird's-eye view image is extracted by using the bird's-eye view feature extraction model to obtain the three-dimensional bird's-eye view features corresponding to the image to be processed.
[0078] Furthermore, the backbone network of the prediction network model can be an hourglass network. Accordingly, the bird's-eye view image features are input into the prediction network model to obtain the confidence information, offset information and embedding feature information corresponding to the target key points output by the prediction network model. This can include: inputting the bird's-eye view image features into the hourglass network to obtain the confidence information, offset information and embedding feature information corresponding to the target key points output by the hourglass network.
[0079] In this embodiment, the hourglass network can have three outputs: after the bird's-eye view image features are input into the hourglass network, the confidence information, offset information, and embedded feature information corresponding to the target key points are output from the three outputs of the hourglass network, respectively. Here, the hourglass network can be a stacked hourglass network, which can be a 4th-order stacked hourglass network, that is, the stacked hourglass network includes 4 hourglass networks connected in series.
[0080] In this embodiment, by selecting the hourglass network as the backbone network of the prediction network model, the confidence information, offset information and embedded feature information corresponding to the target key points can be accurately obtained, thereby ensuring the accuracy of the extracted target key point information and ensuring that the extraction speed is within an acceptable range, thus achieving a balance between extraction accuracy and speed.
[0081] In some possible implementations, determining the drivable area of the target vehicle within the surrounding area based on the target key point information may include the following steps:
[0082] S710. Determine the location information of the target key points based on the confidence information and offset information of the target key points.
[0083] In this step, by using the confidence information of the target key point, it is possible to determine whether the target key point exists at the target location. If there is a possibility that the target key point exists at a target location, the location information of the target key point is determined based on the offset information of the target key point, that is, based on the coordinate offset of the target key point relative to the origin.
[0084] S720. Determine the category information of the target key points based on the embedded feature information of the target key points.
[0085] S730: Based on the location and category information of the target key points, determine the drivable area of the target vehicle.
[0086] In this step, the lane information, obstacle information, and road boundary information in the surrounding area can be determined based on the type information of the target key point; then, the drivable area of the target vehicle can be determined based on the lane information, obstacle information, road boundary information in the surrounding area, and the location information of the target key point.
[0087] In this embodiment, the location and category information of the target key points can be determined based on the confidence information, offset information, and embedded feature information of the target key points; then, based on the location and category information of the target key points, the drivable area information of the target vehicle can be determined, thereby enabling accurate detection of the drivable area around the target vehicle.
[0088] Figure 4 This is a schematic diagram of a drivable area detection device according to an embodiment of the present invention. Figure 4 As shown, the device 800 includes:
[0089] The image acquisition module 810 is used to acquire multiple images to be processed; the multiple images to be processed include multiple images from different perspectives captured by the camera on the target vehicle at the same time for the area surrounding the target vehicle.
[0090] The target key point information acquisition module 820 is used to input multiple images to be processed into a pre-generated drivable area detection model to obtain multiple target key point information of the drivable area output by the drivable area detection model; the target key point information includes confidence information, offset information and embedded feature information corresponding to the target key points;
[0091] The drivable area determination module 830 is used to determine the drivable area of the target vehicle in the surrounding area based on multiple target key point information.
[0092] In some implementations, the drivable area detection model is trained using multiple sample sets, which include multiple sample images of the area surrounding the target vehicle from different perspectives. Correspondingly, the device also includes a drivable area detection model generation module, which comprises:
[0093] The training unit is used to iteratively train the drivable area detection model using multiple sample fisheye images from multiple sample sets.
[0094] The parameter adjustment unit is used to adjust the parameters of the drivable area detection model based on the loss function to obtain the pre-generated drivable area detection model.
[0095] In some implementations, the parameter adjustment unit includes:
[0096] The real key point information acquisition subunit is used to acquire multi-frame point cloud data collected by the vehicle's body radar to obtain real key point information.
[0097] The loss value calculation subunit is used to calculate the loss value of the drivable area detection model based on the target key point information and the real key point information output by the drivable area detection model, according to the loss function.
[0098] The model training subunit is used to adjust the parameters of the drivable area detection model based on the loss value until the loss value of the adjusted drivable area detection model meets the preset conditions, so that the adjusted drivable area detection model can be used as the pre-generated drivable area detection model.
[0099] In some implementations, the drivable area detection model includes a bird's-eye view feature extraction model and a prediction network model; the target key point information acquisition module 820 includes:
[0100] The bird's-eye view image feature acquisition unit is used to input multiple images to be processed into the bird's-eye view image feature extraction model in order to obtain the bird's-eye view image features output by the bird's-eye view image feature extraction model.
[0101] The output unit of the prediction network model is used to input the features of the bird's-eye view image into the prediction network model to obtain the confidence information, offset information and embedding feature information corresponding to the target key points output by the prediction network model. The confidence information is used to determine the probability that the target key points exist at the target location, the offset information is used to determine the specific location of the target key points based on the coordinate offset of the target key points relative to the origin, and the embedding feature information is used to determine the category information of the target key points.
[0102] In some implementations, the bird's-eye view feature extraction model includes an image feature extraction module, a visual converter network, and a bird's-eye view feature extraction module; correspondingly, the bird's-eye view image feature acquisition unit includes:
[0103] The feature image acquisition unit is used to input multiple images to be processed into the image feature extraction module to obtain the feature images corresponding to the images to be processed;
[0104] The bird's-eye view image acquisition subunit is used to input the feature image into the visual converter network for feature fusion, so as to convert the image to be processed into a bird's-eye view image from the perspective of the bird's-eye view.
[0105] The bird's-eye view feature acquisition subunit is used to extract features from the bird's-eye view image using the bird's-eye view feature extraction model to obtain the bird's-eye view features corresponding to the image to be processed.
[0106] In some implementations, the backbone network of the prediction network model is an hourglass network, and correspondingly, the output units of the prediction network model include:
[0107] The hourglass network output subunit is used to input the features of the bird's-eye view image into the hourglass network to obtain the confidence information, offset information and embedding feature information corresponding to the target key points output by the hourglass network.
[0108] In some possible implementations, the drivable area determination module 830 includes:
[0109] The location information determination unit is used to determine the location information of the target key points based on the confidence information and offset information of the target key points;
[0110] The category information determination unit is used to determine the category information of the target key points based on the embedded feature information of the target key points;
[0111] The drivable area information determination unit is used to determine the drivable area of the target vehicle based on the location and category information of the target key points.
[0112] It should be understood that the device embodiments and the method embodiments for detecting drivable areas can correspond to each other, and similar descriptions can be found in the method embodiments for detecting drivable areas. To avoid repetition, further details are omitted here. Specifically, Figure 4 The apparatus 800 shown can execute the above-described embodiment of the method for detecting drivable areas, and the aforementioned and other operations and / or functions of each module in the apparatus 800 are respectively for implementing the corresponding process in the above-described method for detecting drivable areas, which will not be described in detail here for the sake of brevity.
[0113] The apparatus 800 of this invention, in conjunction with the accompanying drawings, has been described above from the perspective of functional modules. It should be understood that this functional module can be implemented in hardware, in software instructions, or in a combination of hardware and software modules. Specifically, the drivable area detection method and its steps in this invention embodiment can be completed by integrated logic circuits in the processor's hardware and / or by software instructions. The drivable area detection method and its steps disclosed in this invention embodiment can be directly manifested as execution by a hardware decoding processor, or by a combination of hardware and software modules in the decoding processor. Optionally, the software module can be located in a mature storage medium in the art, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory; the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the drivable area detection method and its embodiments.
[0114] Figure 5 This is a schematic block diagram of an electronic device 900 according to an embodiment of the present invention.
[0115] like Figure 5 As shown, the electronic device 900 may include:
[0116] The system includes a memory 910 and a processor 920. The memory 910 stores computer programs and transfers the program code to the processor 920. In other words, the processor 920 can retrieve and run the computer programs from the memory 910 to implement the methods described in the embodiments of the present invention.
[0117] For example, the processor 920 can be used to execute the above-described method embodiments according to instructions in the computer program.
[0118] In some embodiments of the present invention, the processor 920 may include, but is not limited to:
[0119] General-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
[0120] In some embodiments of the present invention, the memory 910 includes, but is not limited to:
[0121] Volatile memory and / or non-volatile memory. Non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory can be random access memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of RAM are available, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced Synchronous DRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DR RAM).
[0122] In some embodiments of the present invention, the computer program may be divided into one or more modules, which are stored in the memory 910 and executed by the processor 920 to perform the method provided by the present invention. The one or more modules may be a series of computer program instruction segments capable of performing a specific function, which describe the execution process of the computer program in the controller.
[0123] like Figure 5 As shown, the electronic device 900 may further include:
[0124] Transceiver 930, which can be connected to processor 920 or memory 910.
[0125] The processor 920 can control the transceiver 930 to communicate with other devices; specifically, it can send information or data to other devices or receive information or data sent by other devices. The transceiver 930 may include a transmitter and a receiver. The transceiver 930 may further include antennas, and the number of antennas may be one or more.
[0126] It should be understood that the various components in the electronic device 900 are connected through a bus system, which includes a data bus, a power bus, a control bus, and a status signal bus.
[0127] The present invention also provides a computer storage medium having a computer program stored thereon, which, when executed by a computer, enables the computer to perform the methods of the above-described method embodiments. Alternatively, one embodiment of the present invention also provides a computer program product containing instructions that, when executed by a computer, cause the computer to perform the methods of the above-described method embodiments.
[0128] When implemented using software, it can be implemented entirely or partially as a computer program product. This computer program product includes one or more computer instructions. When these computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium accessible to a computer or a data storage device such as a server or data center that integrates one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., Digital Video Disc (DVD)), or a semiconductor medium (e.g., Solid State Disk (SSD)).
[0129] Those skilled in the art will recognize that the modules and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.
[0130] In the several embodiments provided by this invention, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple modules or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or modules may be electrical, mechanical, or other forms.
[0131] The modules described as separate components may or may not be physically separate. The components shown as modules may or may not be physical modules; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. For example, the functional modules in the various embodiments of this application may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module.
[0132] The above are merely specific embodiments of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A travelable area detection method characterized by comprising: include: Acquire multiple images to be processed; The multiple images to be processed include multiple images taken simultaneously from different perspectives by the camera on the target vehicle regarding the area surrounding the target vehicle. The multiple images to be processed are input into a pre-generated drivable area detection model to obtain multiple target key point information of the drivable area output by the drivable area detection model; The target key point information includes the confidence information, offset information, and embedded feature information corresponding to the target key points; The target key points include points on the lane lines, points on the roadside curbs on both sides of the road, and points where obstacles touch down. Based on the multiple target key point information, determine the drivable area corresponding to the target vehicle in the surrounding area; The drivable area detection model includes a bird's-eye view feature extraction model and a prediction network model; the step of inputting the multiple images to be processed into the pre-generated drivable area detection model to obtain multiple target key point information of the drivable area output by the drivable area detection model includes: The multiple images to be processed are input into the bird's-eye view feature extraction model to obtain the bird's-eye view image features output by the bird's-eye view feature extraction model; The aerial view features are input into the prediction network model to obtain the confidence information, offset information, and embedding feature information corresponding to the target key points output by the prediction network model. The confidence information is used to determine the probability that the target key point exists at the target location. The offset information is used to determine the specific location of the target key point based on the coordinate offset of the target key point relative to the origin. The embedding feature information is used to determine the category information of the target key point. The step of determining the drivable area of the target vehicle in the surrounding area based on the multiple target key point information includes: Based on multiple target key point information, determine lane information, obstacle information, and road boundary information located within the surrounding area; Based on the lane information, obstacle information, road boundary information, and the location information of the target key points, the shape information, location information, and boundary information of the drivable area are determined. The shape, location, and boundary information of the drivable area are used as drivable area information to determine the drivable area corresponding to the target vehicle in the surrounding area.
2. The method of claim 1, wherein, The drivable area detection model is trained using multiple sample sets, which include multiple sample images from different perspectives collected in the area surrounding the target vehicle; correspondingly, the drivable area detection model is pre-generated in the following manner: The drivable area detection model is trained iteratively using multiple sample images from multiple sample sets; the sample sets include multiple sample images from different perspectives collected in the area surrounding the target vehicle. The parameters of the drivable area detection model are adjusted based on the loss function to obtain a pre-generated drivable area detection model.
3. The method of claim 2, wherein, The parameter adjustment of the drivable area detection model based on the loss function to obtain the pre-generated drivable area detection model includes: Acquire multiple frames of point cloud data collected by the vehicle's body radar to obtain real key point information; Based on the target key point information output by the drivable area detection model and the real key point information, the loss value of the drivable area detection model is calculated based on the loss function. The parameters of the drivable area detection model are adjusted based on the loss value until the loss value of the adjusted drivable area detection model meets the preset conditions, so that the adjusted drivable area detection model can be used as the pre-generated drivable area detection model.
4. The method of claim 1, wherein, The bird's-eye view feature extraction model includes an image feature extraction module, a visual converter network, and a bird's-eye view feature extraction module. Accordingly, the step of inputting the multiple images to be processed into the bird's-eye view feature extraction model to obtain the bird's-eye view image features output by the bird's-eye view feature extraction model includes: The multiple images to be processed are input into the image feature extraction module to obtain the feature image corresponding to the image to be processed; The feature image is input into the visual converter network for feature fusion to convert the image to be processed into a bird's-eye view image. The bird's-eye view feature extraction model is used to extract features from the bird's-eye view image to obtain the bird's-eye view features corresponding to the image to be processed.
5. The method of claim 1, wherein, The backbone network of the prediction network model is an hourglass network. Accordingly, the step of inputting the bird's-eye view image features into the prediction network model to obtain the confidence information, offset information, and embedding feature information corresponding to the target key points output by the prediction network model includes: The aerial image features are input into the hourglass network to obtain the confidence information, offset information and embedding feature information corresponding to the target key points output by the hourglass network.
6. The method of claim 1, wherein, The step of determining the drivable area of the target vehicle in the surrounding area based on the multiple target key point information includes: The location information of the target key points is determined based on the confidence information and offset information of the target key points; Based on the embedded feature information of the target key points, the category information of the target key points is determined; Based on the location and category information of the target key points, the drivable area of the target vehicle is determined.
7. A device for detecting a drivable area, characterized in that, include: The image acquisition module is used to acquire multiple images to be processed; the multiple images to be processed include multiple images from different perspectives captured by the camera on the target vehicle at the same time for the area surrounding the target vehicle; The target key point information acquisition module is used to input the multiple images to be processed into a pre-generated drivable area detection model to obtain multiple target key point information of the drivable area output by the drivable area detection model; The target key point information includes the confidence information, offset information, and embedded feature information corresponding to the target key points; The drivable area determination module is used to determine the drivable area of the target vehicle in the surrounding area based on the multiple target key point information; The drivable area detection model includes a bird's-eye view feature extraction model and a prediction network model. The target key point information acquisition module is used to input the multiple images to be processed into the bird's-eye view feature extraction model to obtain the bird's-eye view image features output by the model. The bird's-eye view image features are then input into the prediction network model to obtain the confidence information, offset information, and embedding feature information corresponding to the target key points output by the model. The confidence information is used to determine the probability that the target key point exists at the target location; the offset information is used to determine the specific location of the target key point based on its coordinate offset relative to the origin; and the embedding feature information is used to determine the category information of the target key point. The drivable area determination module is used to determine lane information, obstacle information, and road boundary information located in the surrounding area based on multiple target key point information; determine the shape information, location information, and boundary information of the drivable area based on the multiple lane information, obstacle information, road boundary information, and the location information of the target key points; and use the shape information, location information, and boundary information of the drivable area as drivable area information to determine the drivable area corresponding to the target vehicle in the surrounding area.
8. An electronic device, characterized in that, include: A processor and a memory, the memory being used to store a computer program, the processor being used to invoke and run the computer program stored in the memory to perform the method of any one of claims 1-6.
9. A computer-readable storage medium, characterized in that, Used to store a computer program that causes a computer to perform the method as described in any one of claims 1-6.