Vehicle visual perception methods, devices, electronic equipment and storage media

By constructing a visual perception model and utilizing a target detection network that combines feature extraction, fusion, and multi-head detection, the problems of high computational load and insufficient information utilization in ADAS perception algorithms are solved, achieving efficient obstacle and lane line detection.

CN115761678BActive Publication Date: 2026-06-30YINGCHE XINGCHUANG INTELLIGENT TECH (SHANGHAI) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
YINGCHE XINGCHUANG INTELLIGENT TECH (SHANGHAI) CO LTD
Filing Date
2022-11-24
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing ADAS perception algorithms are computationally intensive, complex in process, and difficult to fully utilize image information.

Method used

Multiple training images are used to train a target detection network with feature extraction, feature fusion and multi-head detection functions to build a visual perception model, which predicts the coordinates of obstacles and lane lines through target images captured by vehicle-mounted cameras.

Benefits of technology

The perception algorithm has achieved low computational complexity, simple process, full utilization of image information, and improved detection accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115761678B_ABST
    Figure CN115761678B_ABST
Patent Text Reader

Abstract

This invention provides a vehicle visual perception method, device, electronic device, and storage medium. The method includes: acquiring training images with labeled information, including obstacles and lane lines; training a target detection network with feature extraction, feature fusion, and multi-head detection functions using multiple training images to construct a visual perception model; acquiring target images including obstacles and lane lines captured by an onboard camera; inputting the target images into the visual perception model, and predicting the target bounding box coordinates, target ground point coordinates, target lane line coordinates, and target hidden point coordinates based on the output of the visual perception model. The feature extraction, fusion, and detection stages of this invention have low computational load and are relatively simple, and can be implemented using a lightweight target detection network. The fused features are interconnected, and multi-task fusion of multiple detectors can reduce computational load and improve accuracy, while also making full use of each feature information.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of driver assistance systems, and more particularly to a vehicle visual perception method, device, electronic device, and storage medium. Background Technology

[0002] Advanced Driver Assistance Systems (ADAS) utilize various sensors (cameras, navigation systems, and radar, etc.) installed in vehicles to collect environmental data inside and outside the vehicle in a timely manner. They perform technical processing such as identification, detection, and tracking of static and dynamic objects, enabling drivers to detect potential safety hazards in the shortest possible time and thus improve driving safety.

[0003] ADAS (Advanced Driver Assistance Systems) uses cameras to perceive road conditions ahead and typically includes functions such as LDW (Lane Departure Warning), HMW (Head-to-Head Warning), and FCW (Forward Collision Warning). All functions rely on the accuracy of the acquired obstacle and lane line positions. Generally, obstacle positions are obtained through object detection and regression algorithms, while lane line detection can be achieved using segmentation algorithms, point-based regression, and other deep learning networks.

[0004] Currently, a common approach in ADAS perception algorithms is to use one model to output object detection boxes and another model to output lane line coordinates. This method is computationally intensive, complex, and requires multiple steps to obtain the final perception result. Furthermore, there is no connection between obstacle detection and lane line detection tasks, making it difficult to fully utilize image information. Summary of the Invention

[0005] This invention provides a vehicle visual perception method, device, electronic device, and storage medium to address the shortcomings of existing ADAS perception algorithms, such as high computational load, complex process, and difficulty in fully utilizing image information, thereby achieving a perception algorithm with low computational load, simple process, and more efficient use of image information.

[0006] This invention provides a vehicle visual perception method, the method comprising:

[0007] Acquire training images with labeled information, including obstacles and lane lines. The labeled information includes the original coordinates of the bounding box of the obstacle, the original coordinates of the grounding point, the original coordinates of the lane line, and the original coordinates of the hidden point.

[0008] Multiple training images are used to train a target detection network with feature extraction, feature fusion, and multi-head detection functions to construct a visual perception model;

[0009] Acquire target images, including obstacles and lane lines, captured by the vehicle-mounted camera;

[0010] The target image is input into the visual perception model, and the target bounding box coordinates, target ground point coordinates, target lane line coordinates, and target hidden point coordinates of the obstacle are predicted based on the output of the visual perception model.

[0011] According to a vehicle visual perception method provided by the present invention, the feature extraction, feature fusion, and multi-head detection include:

[0012] Extract a first feature and a second feature from the training image, and perform feature fusion on the first feature and the second feature. The first feature includes the original coordinates of the obstacle and its bounding box and the ground point. The second feature includes the lane line and the original coordinates of the lane line and the original coordinates of the hidden point.

[0013] Multi-head detection is performed on the fused first and second features to output the attributes of obstacles and lane lines respectively. The attributes of obstacles and lane lines include the heat map, offset and category of obstacles, the offset of grounding points, the heat map and offset of lane lines, and the heat map and offset of hidden points.

[0014] The attributes of the obstacles and lane lines are regressed to the bounding box coordinates, ground point coordinates, lane line coordinates, and hidden point coordinates of the obstacles.

[0015] According to a vehicle visual perception method provided by the present invention, the step of regressing the attributes of the obstacle and lane line to the bounding box coordinates, ground point coordinates, lane line coordinates, and hidden point coordinates of the obstacle includes:

[0016] Based on the obstacle heatmap, offset and category, and the offset of the grounding point, the confidence level of the obstacle, the center point and width and height of the obstacle, the category of the obstacle, and the offset value of the grounding point relative to the center point of the obstacle are obtained respectively.

[0017] Based on the lane line heatmap, offset, and the heatmap and offset of the hidden point, the confidence level of the lane line, the offset value of the lane line point set relative to the starting point, and the confidence level and coordinates of the hidden point are obtained.

[0018] According to a vehicle visual perception method provided by the present invention, the step of regressing the attributes of the obstacle and lane line to the bounding box coordinates, ground point coordinates, lane line coordinates, and hidden point coordinates of the obstacle further includes:

[0019] The center point and width and height of the obstacle are represented as the bounding box coordinates of the obstacle, and the offset value of the ground point relative to the center point of the obstacle is represented as the ground point coordinates;

[0020] The offset of the lane line point set relative to the starting point is represented as the lane line coordinates.

[0021] According to a vehicle visual perception method provided by the present invention, the step of training a target detection network with feature extraction, feature fusion, and multi-head detection functions using multiple training images further includes:

[0022] The original coordinates of the bounding box are represented as the coordinates of the obstacle's center point and its width and height. The original coordinates of the ground point are represented as the offset value relative to the obstacle's center point. The original coordinates of the bounding box are the coordinates of the upper left and lower right of the bounding box.

[0023] The original coordinates of the point set of the lane line are represented as offset values ​​relative to the starting point of the lane line, and the original coordinates of the hidden point are retained.

[0024] According to a vehicle visual perception method provided by the present invention, the step of training a target detection network with feature extraction, feature fusion, and multi-head detection functions using multiple training images includes:

[0025] The backbone network was selected to extract features, with ShuffleNetV2 as the backbone.

[0026] The extracted features are fused using a Neck network, where the Neck network uses PAN to enhance the features;

[0027] Multi-head detection is used to detect the fused features separately;

[0028] The bounding box coordinates, ground point coordinates, lane line coordinates, and hidden point coordinates of obstacles are obtained by regression processing using heatmap + offset.

[0029] The present invention also provides a vehicle vision perception device, the device comprising:

[0030] The first acquisition module is used to acquire training images with label information, including obstacles and lane lines. The label information includes the original coordinates of the bounding box of the obstacle, the original coordinates of the grounding point, the original coordinates of the lane line, and the original coordinates of the hidden point.

[0031] The training module is used to train a target detection network with feature extraction, feature fusion and multi-head detection functions using multiple training images to build a visual perception model.

[0032] The second acquisition module is used to acquire target images, including obstacles and lane lines, captured by the vehicle-mounted camera;

[0033] The prediction module is used to input the target image into the visual perception model and predict the target bounding box coordinates, target ground point coordinates, target lane line coordinates, and target hidden point coordinates of the obstacle based on the output of the visual perception model.

[0034] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the vehicle visual perception method as described above.

[0035] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the vehicle visual perception method as described above.

[0036] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the vehicle visual perception method as described above.

[0037] The vehicle visual perception method, device, electronic device, and storage medium provided by this invention train a target detection network with feature extraction, feature fusion, and multi-head detection functions on training images labeled with the original coordinates of the obstacle's bounding box, the original coordinates of the ground contact point, the original coordinates of the lane line, and the original coordinates of the hidden point. The constructed visual perception model can predict the target bounding box coordinates, target ground contact point coordinates, target lane line coordinates, and target hidden point coordinates of obstacles in the target image captured by the vehicle-mounted camera, thereby completing the vehicle's visual perception. The feature extraction, fusion, and detection stages involve relatively low computational load and are simple processes, achievable using a lightweight target detection network. Furthermore, the fused features are interconnected, and multi-task fusion of multiple detection heads reduces computational load and improves accuracy, while also making full use of each feature information. This solves the shortcomings of existing ADAS perception algorithms, such as high computational load, complex processes, and difficulty in fully utilizing image information, achieving a perception algorithm with low computational load, simple process, and full utilization of image information. Attached Figure Description

[0038] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0039] Figure 1 This is one of the flowcharts of the vehicle visual perception method provided by the present invention;

[0040] Figure 2 This is the second flowchart of the vehicle visual perception method provided by the present invention;

[0041] Figure 3This is the third flowchart of the vehicle visual perception method provided by the present invention;

[0042] Figure 4 This is the fourth flowchart of the vehicle visual perception method provided by the present invention;

[0043] Figure 5 This is the fifth flowchart of the vehicle visual perception method provided by the present invention;

[0044] Figure 6 This is a schematic diagram of the multi-head detection output of the vehicle visual perception method provided by the present invention;

[0045] Figure 7 This is a schematic diagram of the structure of the vehicle vision perception device provided by the present invention;

[0046] Figure 8 This is a schematic diagram of the structure of the electronic device provided by the present invention.

[0047] Figure label:

[0048] 710: First acquisition module; 720: Training construction module; 730: Second acquisition module; 740: Prediction module; 810: Processor; 820: Communication interface; 830: Memory; 840: Communication bus. Detailed Implementation

[0049] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.

[0050] The following is combined Figures 1-8 The present invention describes a vehicle vision perception method, apparatus, electronic device, and storage medium.

[0051] like Figure 1 As shown, in one embodiment, a vehicle visual perception method includes the following steps:

[0052] Step S110: Obtain training images with label information, including obstacles and lane lines. The label information includes the original coordinates of the bounding box of the obstacle, the original coordinates of the grounding point, the original coordinates of the lane line, and the original coordinates of the hidden point.

[0053] Obstacles typically refer to all people and objects within the field of view of the onboard camera. The camera captures images of obstacles, which are then transmitted to the advanced driver assistance system (ADAS) for processing, identification, and timely response. Lane lines are crucial reference points for vehicle movement; effective perception and recognition of lane lines help the vehicle stay on the correct trajectory. Therefore, obstacles and lane lines in the training images are indispensable. Training images are generally taken from images of obstacles and lane lines taken along a pre-trip section of the road. The coordinates of obstacles and lane lines need to be accurately labeled based on actual conditions. The purpose of training images is to obtain a more accurate visual perception model. Precise labeling information is required for obstacles and lane lines in the training images. This labeling information includes the original coordinates of the obstacle's bounding box, the original coordinates of the ground contact point, the original coordinates of the lane line, and the original coordinates of the hidden point. This allows for the acquisition of an accurate visual perception model after multiple training iterations.

[0054] Step S120: Train the target detection network with feature extraction, feature fusion and multi-head detection functions using multiple training images to build a visual perception model.

[0055] To detect the coordinates and category of target objects from target images, the target detection network needs to be trained repeatedly using training images. This ensures that the trained visual perception model can accurately identify obstacles and lane lines. Advanced driver assistance systems (ADAS) use target detection networks, such as a backbone+neck+head network, to train on obstacles and lane lines in the training images. First, feature extraction is performed, extracting the original coordinates of obstacles and their bounding boxes and grounding points, as well as the original coordinates of lane lines and their hidden point coordinates. Then, the extracted features are fused, and a multi-head detection approach is used to prioritize each feature, thereby improving the accuracy of the trained visual perception model for prediction.

[0056] Step S130: Acquire a target image captured by the vehicle-mounted camera, including obstacles and lane lines.

[0057] Specifically, after obtaining a qualified visual perception model through training images, the vehicle acquires target images through an onboard camera during driving and inputs these target images into the visual perception model.

[0058] Step S140: Input the target image into the visual perception model, and predict the target bounding box coordinates, target ground point coordinates, target lane line coordinates, and target hidden point coordinates of the obstacle based on the output of the visual perception model.

[0059] Specifically, after the visual perception model is built, when the vehicle camera captures an image containing obstacles or lane lines, it will identify and detect the captured target image, and then predict the target bounding box coordinates, target ground point coordinates, target lane line coordinates, and target hidden point coordinates, thus completing the vehicle's visual perception process.

[0060] The aforementioned vehicle visual perception method trains a target detection network with feature extraction, feature fusion, and multi-head detection capabilities on training images containing labeled information such as the original coordinates of the obstacle's bounding box, the original coordinates of the ground contact point, the original coordinates of the lane line, and the original coordinates of the hidden point. The constructed visual perception model can predict the target bounding box coordinates, target ground contact point coordinates, target lane line coordinates, and target hidden point coordinates of obstacles in target images captured by onboard cameras, thus completing vehicle visual perception. The feature extraction, fusion, and detection stages involve relatively low computational cost and are simple processes, achievable using a lightweight target detection network, consuming minimal resources from the advanced driver assistance system (ADAS). Furthermore, the fused features are interconnected, and multi-task fusion with multiple detectors reduces computational cost and improves accuracy, while also fully utilizing each feature's information. This addresses the shortcomings of existing ADAS perception algorithms, such as high computational cost, complex processes, and difficulty in fully utilizing image information, achieving a perception algorithm with low computational cost, simple process, and full utilization of image information.

[0061] like Figure 2 As shown, in one embodiment, the vehicle visual perception method of the present invention, including feature extraction, feature fusion, and multi-head detection, includes the following steps:

[0062] Step S122: Extract the first feature and the second feature from the training image, and perform feature fusion on the first feature and the second feature. The first feature includes the original coordinates of the obstacle and its bounding box and the grounding point. The second feature includes the lane line and the original coordinates of the lane line and the original hidden point.

[0063] Specifically, the original coordinates of the obstacles and the original coordinates of the contact points need to be consistent with the actual positions of the obstacles and contact points, and the original coordinates of the lane lines and the original coordinates of the hidden point need to be consistent with the actual positions of the lane lines and the hidden point, so as to make the trained model more accurate. This can be achieved through manual measurement and annotation.

[0064] Specifically, feature extraction can extract the required features from the training image, and feature fusion combines the extracted features and tightly links them together.

[0065] Step S124: Perform multi-head detection on the fused first feature and second feature, and output the attributes of obstacles and lane lines respectively. The attributes of obstacles and lane lines include the heat map, offset and category of obstacles, the offset of grounding points, the heat map and offset of lane lines, and the heat map and offset of hidden point.

[0066] Among them, multi-head detection gives each feature after fusion more attention and performs multi-task detection simultaneously, which improves the detection accuracy.

[0067] Step S126: Regress the attributes of obstacles and lane lines to the bounding box coordinates, ground point coordinates, lane line coordinates, and hidden point coordinates of obstacles.

[0068] Specifically, during training, the standard coordinates need to be represented in a way that reduces computation and improves training accuracy. For example, the original coordinates of the obstacle's bounding box can be represented as the center point of the bounding box and its width and height, while the original coordinates of the ground point can be represented as its offset relative to the center point of the bounding box. The coordinates output from the training need to be re-represented in standard coordinates for easy identification and use in subsequent stages.

[0069] like Figure 3 As shown, in one embodiment, the vehicle visual perception method of the present invention regresses the attributes of obstacles and lane lines to the bounding box coordinates, ground point coordinates of the obstacles, the lane line coordinates, and the hidden point coordinates of the lane lines, including the following steps:

[0070] Step S310: Based on the obstacle heatmap, offset and category, and the offset of the grounding point, obtain the obstacle confidence level, the obstacle center point and width and height, the obstacle category, and the offset value of the grounding point relative to the obstacle center point.

[0071] Specifically, the center point and width and height of the obstacle are represented as the bounding box coordinates of the obstacle, and the offset of the ground point relative to the center point of the obstacle is represented as the ground point coordinates.

[0072] Step S320: Based on the lane line heatmap, offset, and the heatmap and offset of the hidden point, obtain the confidence level of the lane line, the offset value of the lane line point set relative to the starting point, and the confidence level and coordinates of the hidden point.

[0073] Specifically, the offset values ​​of the lane line point set relative to the starting point are represented as lane line coordinates, while the coordinates of the hidden point are preserved in their original values.

[0074] like Figure 4 As shown, in one embodiment, the vehicle visual perception method of the present invention trains a target detection network with feature extraction, feature fusion, and multi-head detection functions using multiple training images, and further includes the following steps beforehand:

[0075] Step S410: The original coordinates of the bounding box are represented as the coordinates of the obstacle's center point and its width and height. The original coordinates of the ground point are represented as the offset value relative to the obstacle's center point. The original coordinates of the bounding box are the coordinates of the upper left and lower right of the bounding box.

[0076] Step S420: Represent the original coordinates of the lane line point set as offset values ​​relative to the starting point of the lane line, and retain the original coordinates of the hidden point.

[0077] Specifically, during the training process using training images, it can be found that using the standard representation of the original coordinates makes the training process more complicated and makes it difficult to achieve the ideal accuracy. After transforming the representation of the original coordinates, the training process is simplified and the accuracy is improved.

[0078] like Figure 5 As shown, in a specific embodiment, the vehicle visual perception method of the present invention uses multiple training images to train a target detection network with feature extraction, feature fusion, and multi-head detection functions, including the following steps:

[0079] Step S510: Select the backbone network to extract features, wherein the backbone uses ShuffleNetV2 as the main body.

[0080] The backbone network is responsible for extracting features. Backbone networks excel at feature extraction in tasks like classification and localization. ShuffleNetV2 is a lightweight neural network that offers high accuracy without wasting resources or increasing system load.

[0081] Step S520: The extracted features are fused using a Neck network, where the Neck network uses PAN to enhance the features.

[0082] The neck, placed between the backbone and the head, is used to fuse the features extracted by the backbone, thus making better use of those features. The PAN (Pixel Aggregation Network) algorithm is characterized by its speed and good performance.

[0083] Step S530: Use multi-head to detect the fused features separately.

[0084] The multi-head detectors serve as the detectors for the target bounding box, ground point, lane line coordinates, and hidden point. Their output is shown below. Figure 6As shown, the object (obstacle) heatmap obtains the confidence score of each obstacle; the object offset obtains the center point and width / height of the obstacle; the object classes obtain the obstacle category; the ground point offset obtains the offset value relative to the center point of the object, which can be represented by the horizontal coordinate offset x and the vertical coordinate offset y in the coordinate system; the lane line heatmap obtains the confidence score of each lane line point; the lane line offset obtains the offset value of each lane line point relative to the starting point of the lane line, which can be represented by the horizontal coordinate offset x and the vertical coordinate offset y in the coordinate system; the hidden point heatmap obtains the confidence score of each point as a hidden point; and the hidden point offset obtains the integer x and y values ​​of each point mapped from the point on the original image to the point on the feature map.

[0085] Step S540: Obtain the bounding box coordinates, ground point coordinates, lane line coordinates, and hidden point coordinates of the obstacle by using heatmap+offset regression processing.

[0086] Specifically, the bounding box coordinates, lane line coordinates, and hidden point coordinates are regressed using a heatmap + offset approach. The ground point coordinates are obtained by adding offsets to the obstacle's heatmap. The heatmap + offsets strategy reduces the difficulty of the regression and detection tasks, demonstrating good performance and time efficiency.

[0087] The vehicle vision perception device provided by the present invention is described below. The vehicle vision perception device described below and the vehicle vision perception method described above can be referred to in correspondence.

[0088] like Figure 7 As shown, in one embodiment, the present invention also provides a vehicle vision perception device, comprising:

[0089] The first acquisition module 710 is used to acquire training images with label information, including obstacles and lane lines. The label information includes the original coordinates of the bounding box of the obstacle, the original coordinates of the grounding point, the original coordinates of the lane line, and the original coordinates of the hidden point.

[0090] Training module 720 is used to train an object detection network with feature extraction, feature fusion and multi-head detection functions using multiple training images to build a visual perception model.

[0091] The second acquisition module 730 is used to acquire target images, including obstacles and lane lines, captured by the vehicle-mounted camera.

[0092] The prediction module 740 is used to input the target image into the visual perception model and predict the target bounding box coordinates, target ground point coordinates, target lane line coordinates, and target hidden point coordinates of the obstacle based on the output of the visual perception model.

[0093] In this embodiment, the training construction module is specifically used for:

[0094] Extract a first feature and a second feature from the training image, and perform feature fusion on the first feature and the second feature. The first feature includes the original coordinates of the obstacle and its bounding box and the ground point. The second feature includes the lane line and the original coordinates of the lane line and the original coordinates of the hidden point.

[0095] Multi-head detection is performed on the fused first and second features to output the attributes of obstacles and lane lines respectively. The attributes of obstacles and lane lines include the heat map, offset and category of obstacles, the offset of grounding points, the heat map and offset of lane lines, and the heat map and offset of hidden points.

[0096] The attributes of the obstacles and lane lines are regressed to the bounding box coordinates, ground point coordinates, lane line coordinates, and hidden point coordinates of the obstacles.

[0097] In this embodiment, the training construction module is also specifically used for:

[0098] Based on the obstacle heatmap, offset and category, and the offset of the grounding point, the confidence level of the obstacle, the center point and width and height of the obstacle, the category of the obstacle, and the offset value of the grounding point relative to the center point of the obstacle are obtained respectively.

[0099] Based on the lane line heatmap, offset, and the heatmap and offset of the hidden point, the confidence level of the lane line, the offset value of the lane line point set relative to the starting point, and the confidence level and coordinates of the hidden point are obtained.

[0100] In this embodiment, the training construction module is also specifically used for:

[0101] The center point and width and height of the obstacle are represented as the bounding box coordinates of the obstacle, and the offset value of the ground point relative to the center point of the obstacle is represented as the ground point coordinates;

[0102] The offset of the lane line point set relative to the starting point is represented as the lane line coordinates.

[0103] In this embodiment, the vehicle visual perception device further includes:

[0104] The conversion representation module is used for:

[0105] The original coordinates of the bounding box are represented as the coordinates of the obstacle's center point and its width and height. The original coordinates of the ground point are represented as the offset value relative to the obstacle's center point. The original coordinates of the bounding box are the coordinates of the upper left and lower right of the bounding box.

[0106] The original coordinates of the point set of the lane line are represented as offset values ​​relative to the starting point of the lane line, and the original coordinates of the hidden point are retained.

[0107] In this embodiment, the training construction module is also specifically used for:

[0108] The backbone network was selected to extract features, with ShuffleNetV2 as the backbone.

[0109] The extracted features are fused using a Neck network, where the Neck network uses PAN to enhance the features;

[0110] Multi-head detection is used to detect the fused features separately;

[0111] The bounding box coordinates, ground point coordinates, lane line coordinates, and hidden point coordinates of obstacles are obtained by regression processing using heatmap + offset.

[0112] This vehicle vision perception device trains a target detection network with feature extraction, feature fusion, and multi-head detection functions using training images with labeled information, constructing a vision perception model. This model can predict the target bounding box coordinates, target ground point coordinates, target lane line coordinates, and target hidden point coordinates of obstacles in target images captured by onboard cameras, thus completing vehicle vision perception. The feature extraction, fusion, and detection stages involve relatively low computational cost and are simple processes, achievable using a lightweight target detection network. The fused features are interconnected, and multi-task fusion from multiple detection heads reduces computational cost and improves accuracy, while also making full use of each feature's information. This addresses the shortcomings of existing ADAS perception algorithms, such as high computational cost, complex processes, and difficulty in fully utilizing image information, achieving a perception algorithm with low computational cost, simple process, and full utilization of image information.

[0113] Figure 8 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 8As shown, the electronic device may include: a processor 810, a communications interface 820, a memory 830, and a communication bus 840, wherein the processor 810, the communications interface 820, and the memory 830 communicate with each other via the communication bus 840. The processor 810 can call logical instructions in the memory 830 to execute a vehicle vision perception method, which includes:

[0114] Acquire training images with labeled information, including obstacles and lane lines. The labeled information includes the original coordinates of the bounding box of the obstacle, the original coordinates of the grounding point, the original coordinates of the lane line, and the original coordinates of the hidden point.

[0115] Multiple training images were used to train a target detection network with feature extraction, feature fusion, and multi-head detection functions to build a visual perception model;

[0116] Acquire target images, including obstacles and lane lines, captured by the vehicle-mounted camera;

[0117] The target image is input into the visual perception model, and the target bounding box coordinates, target ground point coordinates, target lane line coordinates, and target hidden point coordinates are predicted based on the output of the visual perception model.

[0118] Furthermore, the logical instructions in the aforementioned memory 830 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, essentially, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0119] On the other hand, the present invention also provides a computer program product, the computer program product comprising a computer program that can be stored on a non-transitory computer-readable storage medium, wherein when the computer program is executed by a processor, the computer is able to execute the vehicle vision perception method provided by the above methods, the method comprising:

[0120] Acquire training images with labeled information, including obstacles and lane lines. The labeled information includes the original coordinates of the bounding box of the obstacle, the original coordinates of the grounding point, the original coordinates of the lane line, and the original coordinates of the hidden point.

[0121] Multiple training images were used to train a target detection network with feature extraction, feature fusion, and multi-head detection functions to build a visual perception model;

[0122] Acquire target images, including obstacles and lane lines, captured by the vehicle-mounted camera;

[0123] The target image is input into the visual perception model, and the target bounding box coordinates, target ground point coordinates, target lane line coordinates, and target hidden point coordinates are predicted based on the output of the visual perception model.

[0124] In another aspect, the present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, is implemented to perform the vehicle vision perception method provided by the methods described above, the method comprising:

[0125] Acquire training images with labeled information, including obstacles and lane lines. The labeled information includes the original coordinates of the bounding box of the obstacle, the original coordinates of the grounding point, the original coordinates of the lane line, and the original coordinates of the hidden point.

[0126] Multiple training images were used to train a target detection network with feature extraction, feature fusion, and multi-head detection functions to build a visual perception model;

[0127] Acquire target images, including obstacles and lane lines, captured by the vehicle-mounted camera;

[0128] The target image is input into the visual perception model, and the target bounding box coordinates, target ground point coordinates, target lane line coordinates, and target hidden point coordinates are predicted based on the output of the visual perception model.

[0129] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0130] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0131] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A vehicle visual perception method, characterized in that, The method includes: Acquire training images with labeled information, including obstacles and lane lines. The labeled information includes the original coordinates of the bounding box of the obstacle, the original coordinates of the grounding point, the original coordinates of the lane line, and the original coordinates of the hidden point. Multiple training images are used to train a target detection network with feature extraction, feature fusion, and multi-head detection functions to construct a visual perception model; Acquire target images, including obstacles and lane lines, captured by the vehicle-mounted camera; The target image is input into the visual perception model, and the target bounding box coordinates, target ground point coordinates, target lane line coordinates, and target hidden point coordinates of the obstacle are predicted based on the output of the visual perception model. The step of training the target detection network with feature extraction, feature fusion, and multi-head detection functions using multiple training images includes: The backbone network was selected to extract features, with ShuffleNetV2 as the backbone. The extracted features are fused using a Neck network, where the Neck network uses PAN to enhance the features; Multi-head detection is used to detect the fused features separately; The bounding box coordinates, ground point coordinates, lane line coordinates, and hidden point coordinates of obstacles are obtained by regression processing using heatmap + offset.

2. The vehicle visual perception method according to claim 1, characterized in that, The feature extraction, feature fusion, and multi-head detection include: Extract a first feature and a second feature from the training image, and perform feature fusion on the first feature and the second feature. The first feature includes the original coordinates of the obstacle and its bounding box and the ground point. The second feature includes the lane line and the original coordinates of the lane line and the original coordinates of the hidden point. Multi-head detection is performed on the fused first and second features to output the attributes of obstacles and lane lines respectively. The attributes of obstacles and lane lines include the heat map, offset and category of obstacles, the offset of grounding points, the heat map and offset of lane lines, and the heat map and offset of hidden points. The attributes of the obstacles and lane lines are regressed to the bounding box coordinates, ground point coordinates, lane line coordinates, and hidden point coordinates of the obstacles.

3. The vehicle visual perception method according to claim 2, characterized in that, The process of regressing the attributes of the obstacle and lane lines to the bounding box coordinates, ground point coordinates, lane line coordinates, and hidden point coordinates of the obstacle includes: Based on the obstacle heatmap, offset and category, and the offset of the grounding point, the confidence level of the obstacle, the center point and width and height of the obstacle, the category of the obstacle, and the offset value of the grounding point relative to the center point of the obstacle are obtained respectively. Based on the lane line heatmap, offset, and the heatmap and offset of the hidden point, the confidence level of the lane line, the offset value of the lane line point set relative to the starting point, and the confidence level and coordinates of the hidden point are obtained.

4. The vehicle visual perception method according to claim 3, characterized in that, The process of regressing the attributes of the obstacle and lane lines to the bounding box coordinates, ground point coordinates, lane line coordinates, and hidden point coordinates of the obstacle also includes: The center point and width and height of the obstacle are represented as the bounding box coordinates of the obstacle, and the offset value of the ground point relative to the center point of the obstacle is represented as the ground point coordinates; The offset of the lane line point set relative to the starting point is represented as the lane line coordinates.

5. The vehicle visual perception method according to claim 1, characterized in that, The step of training the target detection network with feature extraction, feature fusion, and multi-head detection functions using multiple training images also includes: The original coordinates of the bounding box are represented as the coordinates of the obstacle's center point and its width and height. The original coordinates of the ground point are represented as the offset value relative to the obstacle's center point. The original coordinates of the bounding box are the coordinates of the upper left and lower right of the bounding box. The original coordinates of the point set of the lane line are represented as offset values ​​relative to the starting point of the lane line, and the original coordinates of the hidden point are retained.

6. A vehicle visual perception device, characterized in that, The device includes: The first acquisition module is used to acquire training images with label information, including obstacles and lane lines. The label information includes the original coordinates of the bounding box of the obstacle, the original coordinates of the grounding point, the original coordinates of the lane line, and the original coordinates of the hidden point. The training module is used to train a target detection network with feature extraction, feature fusion and multi-head detection functions using multiple training images to build a visual perception model. The second acquisition module is used to acquire target images, including obstacles and lane lines, captured by the vehicle-mounted camera; The prediction module is used to input the target image into the visual perception model and predict the target bounding box coordinates, target ground point coordinates, target lane line coordinates, and target hidden point coordinates of the obstacle based on the output of the visual perception model. Specifically, the training construction module is used for: The backbone network was selected to extract features, with ShuffleNetV2 as the backbone. The extracted features are fused using a Neck network, where the Neck network uses PAN to enhance the features; Multi-head detection is used to detect the fused features separately; The bounding box coordinates, ground point coordinates, lane line coordinates, and hidden point coordinates of obstacles are obtained by regression processing using heatmap + offset.

7. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the vehicle visual perception method as described in any one of claims 1 to 5.

8. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the vehicle visual perception method as described in any one of claims 1 to 5.

9. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements the vehicle visual perception method as described in any one of claims 1 to 5.