Target detection method and device, vehicle, storage medium and chip
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- XIAOMI EV TECH CO LTD
- Filing Date
- 2022-07-12
- Publication Date
- 2026-06-16
AI Technical Summary
In existing technologies, 3D target detection models based on monocular vision suffer from poor long-distance illumination by lidar, resulting in insufficient point cloud data for small targets at long distances, making it impossible to label them. This leads to a lack of sample data, resulting in low accuracy of the detection model and affecting the safety of autonomous driving.
By combining pre-trained 3D and 2D detection models with environmental images and point cloud data, the target sample parameter information of small targets at a distance is determined, the labeled sample data is enriched, and the accuracy of the detection model is improved.
By training with abundant sample data, the accuracy of the 3D object detection model was improved, thereby enhancing the safety of autonomous driving.
Smart Images

Figure CN115205848B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of vehicle technology, and more particularly to a target detection method, apparatus, vehicle, storage medium, and chip. Background Technology
[0002] With the advancement of artificial intelligence technology, autonomous driving technology has experienced extremely rapid development. In the field of autonomous driving technology, accurate environmental perception is crucial for improving the safety of autonomous vehicles, leading to the emergence of monocular vision-based 3D object detection technology. Typically, monocular vision-based 3D object detection technology is based on deep learning methods and requires a large volume of high-precision labeled sample data.
[0003] In related technologies, image data and point cloud data are simultaneously acquired using LiDAR and cameras installed on vehicles. Sample data is obtained by annotating the image data based on the point cloud data. However, due to the limitation of LiDAR illumination distance, the number of point clouds acquired for small targets at long distances is very limited, making annotation impossible. This results in a small amount of sample data, leading to low accuracy of the trained 3D target detection model and impacting the safety of autonomous driving. Summary of the Invention
[0004] To overcome the problems existing in related technologies, this disclosure provides a target detection method, apparatus, vehicle, storage medium, and chip.
[0005] According to a first aspect of the present disclosure, a target detection method is provided, applied to a vehicle, comprising:
[0006] Collect environmental images of the surrounding environment during the vehicle's operation;
[0007] The environmental image is input into a 3D target detection model to obtain target parameter information output by the 3D target detection model. The target parameter information includes the target position information, target size information, and target angle information of the target object in the environmental image.
[0008] The three-dimensional target detection model is pre-trained using multiple first sample images and target sample parameter information corresponding to each first sample image. The target sample parameter information includes the target sample position information, target sample size information, and target sample angle information of the first sample object in the first sample image. The target sample parameter information is determined by the pre-trained three-dimensional detection model and two-dimensional detection model. The three-dimensional detection model is trained using second sample images. The distance between the second sample object in the second sample image and the vehicle is less than or equal to a preset distance threshold. The two-dimensional detection model is used to obtain the two-dimensional parameter information of the first sample object.
[0009] Optionally, the three-dimensional target detection model is trained in the following manner:
[0010] Acquire multiple images of the first sample;
[0011] For each of the first sample images, the first sample image is input into the three-dimensional detection model to obtain the first sample parameter information output by the three-dimensional detection model. The sample image is also input into the two-dimensional detection model to obtain the two-dimensional parameter information output by the two-dimensional detection model. The first sample parameter information includes the first sample position information, the first sample size information, and the first sample angle information of the first sample object.
[0012] Based on multiple first sample parameter information and multiple two-dimensional parameter information, determine the target sample parameter information corresponding to each first sample image;
[0013] The first target neural network model is trained using multiple first sample images and target sample parameter information corresponding to each first sample image to obtain the three-dimensional target detection model.
[0014] Optionally,
[0015] The step of determining the target sample parameter information corresponding to each first sample image based on multiple first sample parameter information and multiple two-dimensional parameter information includes:
[0016] For each of the two-dimensional parameter information, if it is determined that there is a first target sample parameter information corresponding to the two-dimensional parameter information among the multiple first sample parameter information, the first target sample parameter information is used as the target sample parameter information. If it is determined that there is no first target sample parameter information among the multiple first sample parameter information, multiple feature maps corresponding to the first target sample image are determined through the three-dimensional detection model. Based on the two-dimensional parameter information and the multiple feature maps, the target sample parameter information corresponding to the first target sample image is determined. The first sample object corresponding to the first target sample parameter information is the same as the first sample object corresponding to the two-dimensional parameter information. The first target sample image is the first sample image corresponding to the two-dimensional parameter information.
[0017] Optionally,
[0018] The step of determining that there is a first target sample parameter information corresponding to the two-dimensional parameter information among a plurality of first sample parameter information includes:
[0019] For each of the first sample parameter information, a first detection box corresponding to the first sample parameter information is determined, and the intersection-union ratio (IUR) of the second detection box corresponding to the two-dimensional parameter information and the first detection box is determined. If the IUR is greater than or equal to a preset IUR threshold, it is determined that there is a first target sample parameter information corresponding to the two-dimensional parameter information among the multiple first sample parameter information, and the first sample parameter information is used as the first target sample parameter information.
[0020] Optionally,
[0021] The step of determining the first detection box corresponding to the first sample parameter information includes:
[0022] Based on the intrinsic parameter information of the camera that captured the first sample image and the position information of the first sample, the position information of the center point of the first detection box corresponding to the first sample parameter information is determined.
[0023] The first detection frame is determined based on the position information of the center point of the first detection frame and the size information of the first sample.
[0024] Optionally,
[0025] The 3D detection model includes multiple detection modules, each including a first convolutional layer, a second convolutional layer, and a prediction convolutional layer. The output of the first convolutional layer is coupled to the input of the second convolutional layer, and the output of the second convolutional layer is coupled to the input of the prediction convolutional layer. Determining multiple feature maps corresponding to the first target sample image using the 3D detection model includes:
[0026] The first target sample image is input into the three-dimensional detection model to obtain multiple feature maps output by the predictive convolutional layer of the three-dimensional detection model;
[0027] The step of determining the target sample parameter information corresponding to the first target sample image based on the two-dimensional parameter information and the multiple feature maps includes:
[0028] Determine the feature position information of the center point of the second detection box corresponding to the two-dimensional parameter information on each feature map;
[0029] Based on the multiple feature location information and the two-dimensional parameter information, the target sample parameter information corresponding to the first target sample image is determined.
[0030] Optionally, the 3D detection model is pre-trained in the following manner:
[0031] Acquire multiple second sample images and second sample parameter information corresponding to each second sample image. The second sample parameter information includes the second sample position information, second sample size information, and second sample angle information of the second sample object. The second sample parameter information is determined based on the point cloud data corresponding to the second sample image.
[0032] The second target neural network model is trained using multiple second sample images and the second sample parameter information corresponding to each second sample image to obtain the three-dimensional detection model.
[0033] Optionally, the method further includes:
[0034] Based on the target parameter information, determine the vehicle's driving route;
[0035] The vehicle is controlled to drive automatically according to the stated driving route.
[0036] According to a second aspect of the present disclosure, a target detection device is provided, applied to a vehicle, comprising:
[0037] The acquisition module is configured to acquire environmental images of the surrounding environment during the vehicle's movement.
[0038] The information acquisition module is configured to input the environmental image into a three-dimensional target detection model to obtain target parameter information output by the three-dimensional target detection model. The target parameter information includes the target position information, target size information, and target angle information of the target object in the environmental image.
[0039] The three-dimensional target detection model is pre-trained using multiple first sample images and target sample parameter information corresponding to each first sample image. The target sample parameter information includes the target sample position information, target sample size information, and target sample angle information of the first sample object in the first sample image. The target sample parameter information is determined by the pre-trained three-dimensional detection model and two-dimensional detection model. The three-dimensional detection model is trained using second sample images. The distance between the second sample object in the second sample image and the vehicle is less than or equal to a preset distance threshold. The two-dimensional detection model is used to obtain the two-dimensional parameter information of the first sample object.
[0040] Optionally, the three-dimensional target detection model is trained in the following manner:
[0041] Acquire multiple images of the first sample;
[0042] For each of the first sample images, the first sample image is input into the three-dimensional detection model to obtain the first sample parameter information output by the three-dimensional detection model. The sample image is also input into the two-dimensional detection model to obtain the two-dimensional parameter information output by the two-dimensional detection model. The first sample parameter information includes the first sample position information, the first sample size information, and the first sample angle information of the first sample object.
[0043] Based on multiple first sample parameter information and multiple two-dimensional parameter information, determine the target sample parameter information corresponding to each first sample image;
[0044] The first target neural network model is trained using multiple first sample images and target sample parameter information corresponding to each first sample image to obtain the three-dimensional target detection model.
[0045] Optionally, determining the target sample parameter information corresponding to each first sample image based on the plurality of first sample parameter information and the plurality of two-dimensional parameter information includes:
[0046] For each of the two-dimensional parameter information, if it is determined that there is a first target sample parameter information corresponding to the two-dimensional parameter information among the multiple first sample parameter information, the first target sample parameter information is used as the target sample parameter information. If it is determined that there is no first target sample parameter information among the multiple first sample parameter information, multiple feature maps corresponding to the first target sample image are determined through the three-dimensional detection model. Based on the two-dimensional parameter information and the multiple feature maps, the target sample parameter information corresponding to the first target sample image is determined. The first sample object corresponding to the first target sample parameter information is the same as the first sample object corresponding to the two-dimensional parameter information. The first target sample image is the first sample image corresponding to the two-dimensional parameter information.
[0047] Optionally, determining that there exists first target sample parameter information corresponding to the two-dimensional parameter information among a plurality of first sample parameter information includes:
[0048] For each of the first sample parameter information, a first detection box corresponding to the first sample parameter information is determined, and the intersection-union ratio (IUR) of the second detection box corresponding to the two-dimensional parameter information and the first detection box is determined. If the IUR is greater than or equal to a preset IUR threshold, it is determined that there is a first target sample parameter information corresponding to the two-dimensional parameter information among the multiple first sample parameter information, and the first sample parameter information is used as the first target sample parameter information.
[0049] Optionally, determining the first detection box corresponding to the first sample parameter information includes:
[0050] Based on the intrinsic parameter information of the camera that captured the first sample image and the position information of the first sample, the position information of the center point of the first detection box corresponding to the first sample parameter information is determined.
[0051] The first detection frame is determined based on the position information of the center point of the first detection frame and the size information of the first sample.
[0052] Optionally, the 3D detection model includes multiple detection modules, each including a first convolutional layer, a second convolutional layer, and a prediction convolutional layer. The output of the first convolutional layer is coupled to the input of the second convolutional layer, and the output of the second convolutional layer is coupled to the input of the prediction convolutional layer. Determining multiple feature maps corresponding to the first target sample image using the 3D detection model includes:
[0053] The first target sample image is input into the three-dimensional detection model to obtain multiple feature maps output by the predictive convolutional layer of the three-dimensional detection model;
[0054] The step of determining the target sample parameter information corresponding to the first target sample image based on the two-dimensional parameter information and the multiple feature maps includes:
[0055] Determine the feature position information of the center point of the second detection box corresponding to the two-dimensional parameter information on each feature map;
[0056] Based on the multiple feature location information and the two-dimensional parameter information, the target sample parameter information corresponding to the first target sample image is determined.
[0057] Optionally, the 3D detection model is pre-trained in the following manner:
[0058] Acquire multiple second sample images and second sample parameter information corresponding to each second sample image. The second sample parameter information includes the second sample position information, second sample size information, and second sample angle information of the second sample object. The second sample parameter information is determined based on the point cloud data corresponding to the second sample image.
[0059] The second target neural network model is trained using multiple second sample images and the second sample parameter information corresponding to each second sample image to obtain the three-dimensional detection model.
[0060] Optionally, the device further includes:
[0061] The route determination module is configured to determine the vehicle's driving route based on the target parameter information;
[0062] The control module is configured to control the vehicle to drive automatically according to the driving route.
[0063] According to a third aspect of the present disclosure, a vehicle is provided, comprising:
[0064] First processor;
[0065] Memory used to store processor-executable instructions;
[0066] The first processor is configured as follows:
[0067] The steps for implementing the method described in the first aspect of this disclosure.
[0068] According to a fourth aspect of the present disclosure, a computer-readable storage medium is provided that stores computer program instructions thereon, which, when executed by a processor, implement the steps of the method described in the first aspect of the present disclosure.
[0069] According to a fifth aspect of the present disclosure, a chip is provided, including a second processor and an interface; the second processor is configured to read instructions to execute the method described in the first aspect of the present disclosure.
[0070] The technical solutions provided by the embodiments of this disclosure can include the following beneficial effects: acquiring environmental images of the surrounding environment during the vehicle's operation; inputting the environmental images into a three-dimensional target detection model to obtain target parameter information output by the three-dimensional target detection model, wherein the target parameter information includes target position information, target size information, and target angle information of the target object in the environmental image; wherein, the three-dimensional target detection model is pre-trained using multiple first sample images and target sample parameter information corresponding to each first sample image, wherein the target sample parameter information includes target sample position information, target sample size information, and target sample angle information of the first sample object in the first sample image, wherein the target sample parameter information is determined by a pre-trained three-dimensional detection model and a two-dimensional detection model, wherein the three-dimensional detection model is trained using a second sample image, wherein the distance between the second sample object in the second sample image and the vehicle is less than or equal to a preset distance threshold, and the two-dimensional detection model is used to obtain the two-dimensional parameter information of the first sample object. In other words, this disclosure can determine the target sample parameter information corresponding to a sample image through a pre-trained three-dimensional detection model and a two-dimensional detection model, and can also annotate small targets at a distance, making the annotated sample data richer, improving the accuracy of the three-dimensional target detection model, thereby improving the safety of autonomous driving.
[0071] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Attached Figure Description
[0072] The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure.
[0073] Figure 1 This is a flowchart illustrating a target detection method according to an exemplary embodiment;
[0074] Figure 2 This is a flowchart illustrating a training method for a three-dimensional object detection model according to an exemplary embodiment;
[0075] Figure 3 This is a flowchart illustrating a training method for a three-dimensional detection model according to an exemplary embodiment;
[0076] Figure 4 This is a flowchart illustrating another target detection method according to an exemplary embodiment;
[0077] Figure 5 This is a block diagram illustrating a target detection device according to an exemplary embodiment;
[0078] Figure 6 This is a block diagram illustrating another target detection device according to an exemplary embodiment;
[0079] Figure 7 This is a functional block diagram of a vehicle illustrating an exemplary embodiment. Detailed Implementation
[0080] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numerals in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this disclosure. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this disclosure as detailed in the appended claims.
[0081] It should be noted that all actions involving the acquisition of signals, information, or data in this application are carried out in compliance with the relevant data protection laws and policies of the country where the application is located, and with the authorization granted by the owner of the relevant device.
[0082] First, the application scenarios of this disclosure are explained. When collecting sample data for training a 3D object detection model, there are very high requirements for the intrinsic and extrinsic parameters of the LiDAR and camera installed on the vehicle, as well as time synchronization. This makes sample data collection quite difficult. Furthermore, the current sample data annotation process requires manual operation, resulting in low annotation efficiency. Consequently, when the sample data is limited, the accuracy of the trained 3D object detection model is also low. In addition, LiDAR has poor long-range illumination performance. For small targets at a distance, the point cloud data collected by LiDAR is insufficient, making it impossible to annotate the targets. This prevents the acquisition of sample data corresponding to small targets at a distance, further reducing the accuracy of the 3D object detection model and affecting the safety of autonomous driving.
[0083] To address the aforementioned technical issues, this disclosure provides a target detection method, apparatus, vehicle, storage medium, and chip. By using pre-trained 3D and 2D detection models, the target sample parameter information corresponding to the sample image is determined. Small targets at a distance can also be labeled, resulting in richer labeled sample data and improving the accuracy of the 3D target detection model, thereby enhancing the safety of autonomous driving.
[0084] Figure 1 This is a flowchart illustrating a target detection method according to an exemplary embodiment, the method being applied to a vehicle, such as... Figure 1 As shown, it may include:
[0085] S101. Collect environmental images of the surrounding environment during vehicle operation.
[0086] In this step, while the vehicle is in motion, environmental images of the surrounding environment can be captured by the camera installed on the vehicle.
[0087] S102. Input the environmental image into the three-dimensional target detection model to obtain the target parameter information output by the three-dimensional target detection model.
[0088] The target parameter information may include the target position information, target size information, and target angle information of the target object in the environmental image. The target position information may be the three-dimensional coordinate information of the target object in the vehicle's body coordinate system, for example, the target position information may be (x, y, z). The target size information may be the size of the three-dimensional detection box corresponding to the target object, for example, the target size information may be (w, h, l). The target angle information may be the angle of the target object relative to the camera that captured the environmental image.
[0089] The three-dimensional target detection model can be pre-trained using multiple first sample images and target sample parameter information corresponding to each first sample image. The target sample parameter information includes the target sample position information, target sample size information, and target sample angle information of the first sample object in the first sample image. The target sample parameter information is determined by the pre-trained three-dimensional detection model and two-dimensional detection model. The three-dimensional detection model is trained using a second sample image. The distance between the second sample object in the second sample image and the vehicle is less than or equal to a preset distance threshold. The two-dimensional detection model is used to obtain the two-dimensional parameter information of the first sample object.
[0090] In this step, after acquiring the environmental image, the environmental image can be input into the 3D target detection model. The 3D target detection model can then detect the environmental image and determine the target parameter information corresponding to the environmental image.
[0091] By using the above method, the target sample parameter information corresponding to the sample image is determined through pre-trained 3D and 2D detection models. Small targets at a distance can also be labeled, making the labeled sample data richer and improving the accuracy of the 3D target detection model, thereby improving the safety of autonomous driving.
[0092] Figure 2 This is a flowchart illustrating a training method for a 3D object detection model according to an exemplary embodiment, such as... Figure 2 As shown, the method may include:
[0093] S21. Obtain multiple images of the first sample.
[0094] In this step, historical environmental images collected during the vehicle's travel within a historical time period can be used as the first sample image. These historical environmental images can include environmental images collected under different road conditions and at different time periods. The distance between the objects captured in different historical environmental images and the vehicle can also be different. For example, the historical environmental image can be an image of an object 10 meters away from the vehicle, or it can be an image of an object 30 meters away from the vehicle. This disclosure does not limit the method of collecting the first sample image.
[0095] S22. For each of the first sample images, input the first sample image into the three-dimensional detection model to obtain the first sample parameter information output by the three-dimensional detection model, and input the sample image into the two-dimensional detection model to obtain the two-dimensional parameter information output by the two-dimensional detection model.
[0096] The first sample parameter information may include the first sample position information, the first sample size information, and the first sample angle information of the first sample object. It should be noted that the first sample position information, the first sample size information, and the first sample angle information are defined in the same way as the target position information, target size information, and target angle information in step S102, and will not be repeated here.
[0097] In this step, after acquiring multiple first sample images, each first sample image can be input into the 3D detection model and the 2D detection model respectively. The 3D detection model determines the first sample parameter information corresponding to the first sample image, and the 2D detection model determines the 2D parameter information corresponding to the first sample image. Then, the target sample parameter information corresponding to the first sample image can be determined by combining the first sample parameter information and the 2D parameter information. It should be noted that this disclosure does not limit the order in which the first sample parameter information and the 2D parameter information are determined.
[0098] S23. Based on the multiple first sample parameter information and the multiple two-dimensional parameter information, determine the target sample parameter information corresponding to each first sample image.
[0099] In one possible implementation, for each of the two-dimensional parameter information, if it is determined that there is a first target sample parameter information corresponding to the two-dimensional parameter information among the multiple first sample parameter information, the first target sample parameter information is used as the target sample parameter information. If it is determined that there is no first target sample parameter information among the multiple first sample parameter information, multiple feature maps corresponding to the first target sample image are determined through the three-dimensional detection model. Based on the two-dimensional parameter information and the multiple feature maps, the target sample parameter information corresponding to the first target sample image is determined. The first sample object corresponding to the first target sample parameter information is the same as the first sample object corresponding to the two-dimensional parameter information, and the first target sample image is the first sample image corresponding to the two-dimensional parameter information.
[0100] Specifically, for each of the first sample parameter information, a first detection box corresponding to the first sample parameter information is determined, and the intersection-union ratio (IUR) between the second detection box corresponding to the two-dimensional parameter information and the first detection box is determined. If the IUR is greater than or equal to a preset IUR threshold, it is determined that there is a first target sample parameter information corresponding to the two-dimensional parameter information among the multiple first sample parameter information, and the first sample parameter information is used as the first target sample parameter information.
[0101] For example, taking any first sample parameter information as an example, the position information of the center point of the first detection box corresponding to the first sample parameter information can be determined based on the intrinsic parameter information of the camera that captured the first sample image and the position information of the first sample; the first detection box can be determined based on the position information of the center point of the first detection box and the size information of the first sample. The intrinsic parameter information may include the focal length of the camera along the x-axis and y-axis, and the optical center position of the camera.
[0102] For example, the position information of the center point of the first detection box can be calculated using formula (1):
[0103]
[0104]
[0105] Among them, (x img ,y img f represents the position information of the center point of the first detection frame. x f is the focal length along the x-axis of the camera. y Let c be the focal length along the y-axis of the camera. x ,c y (x) represents the optical center position of the camera. i ,y i ,z i ) represents the location information of the first sample.
[0106] After calculating the position information of the center point of the first detection box, the boundary lines in the three directions of the x-axis, y-axis and z-axis can be determined by combining the size information of the first sample, and the first detection box can be obtained.
[0107] After determining the first detection box corresponding to the first sample parameter information, the intersection-union ratio (IUU) between the second detection box corresponding to the two-dimensional parameter information and the first detection box can be calculated using formula (2):
[0108]
[0109] Among them, score iou For this intersection and union ratio, bbox a For this first detection box, bbox b This is the second detection box.
[0110] After calculating the crossover ratio (CUP) of the first detection box and the second detection box, the preset CUP threshold is obtained. If the CUP is greater than or equal to the preset CUP threshold, it is determined that there is a first target sample parameter information corresponding to the two-dimensional parameter information among the multiple first sample parameter information. The first sample parameter information is then used as the first target sample parameter information.
[0111] If the cross-union ratio is less than the preset cross-union ratio threshold, it is determined that there is no first target sample parameter information corresponding to the two-dimensional parameter information among the multiple first sample parameter information. In this case, multiple feature maps corresponding to the first target sample image can be determined by the three-dimensional detection model, and the target sample parameter information corresponding to the first target sample image can be determined according to the two-dimensional parameter information and the multiple feature maps.
[0112] In one possible implementation, the 3D detection model may include multiple detection modules, each including a first convolutional layer, a second convolutional layer, and a prediction convolutional layer. The output of the first convolutional layer is coupled to the input of the second convolutional layer, and the output of the second convolutional layer is coupled to the input of the prediction convolutional layer. The first target sample image can be input into the 3D detection model to obtain multiple feature maps output by the prediction convolutional layer of the 3D detection model; the feature position information of the center point of the second detection box corresponding to the 2D parameter information on each feature map is determined; and the target sample parameter information corresponding to the first target sample image is determined based on the multiple feature position information and the 2D parameter information.
[0113] For example, if the 3D detection model includes a position estimation module, a depth estimation module, an angle estimation module, and a size estimation module, then after inputting the first target sample image into the 3D detection model, four feature maps output by the four predictive convolutional layers of the 3D detection model can be obtained. Then, the feature position information of the center point of the second detection box on each feature map can be determined. Combining multiple feature position information and the 2D parameter information, the target sample parameter information corresponding to the first target sample image can be determined. For example, the target sample size information in the target sample parameter information can be determined based on the feature map output by the depth estimation module and the 2D parameter information.
[0114] S24. The first target neural network model is trained using multiple first sample images and target sample parameter information corresponding to each first sample image to obtain the three-dimensional target detection model.
[0115] In this step, after determining the target sample parameter information corresponding to each first sample image, the first target neural network model can be trained by referring to the model training method of the prior art through multiple first sample images and the target sample parameter information corresponding to each first sample image to obtain the three-dimensional target detection model. This will not be elaborated here.
[0116] The first target neural network model can be the same as the second target neural network model, or the first target neural network model can be the three-dimensional detection model trained based on the second target neural network model. This disclosure does not limit this.
[0117] Using the above model training method, when it is determined that there is first target sample parameter information corresponding to two-dimensional parameter information among multiple first sample parameter information, that is, for near-range targets, the target sample parameter information corresponding to the sample image can be directly determined by the pre-trained three-dimensional detection model. When it is determined that there is no first target sample parameter information among multiple first sample parameter information, that is, for far-range small targets, the two-dimensional detection model can be used to supplement targets that cannot be perceived by LiDAR, and the target sample parameter information corresponding to the sample image can be determined. This makes the labeled sample data richer, and a three-dimensional target detection model that can perceive far-range small targets can be trained, thereby improving the accuracy of the three-dimensional target detection model.
[0118] Figure 3 This is a flowchart illustrating a training method for a 3D detection model according to an exemplary embodiment, such as... Figure 3 As shown, the method may include:
[0119] S31. Obtain multiple second sample images and the second sample parameter information corresponding to each second sample image.
[0120] The second sample parameter information may include the second sample position information, the second sample size information, and the second sample angle information of the second sample object. The second sample parameter information is determined based on the point cloud data corresponding to the second sample image.
[0121] In this step, the method for acquiring the first sample image in step S21 can be referred to to acquire environmental images within a preset distance around the vehicle to obtain the second sample image. While acquiring the second sample image, point cloud data can be acquired through the lidar installed on the vehicle. Then, based on the point cloud data, each second sample image is labeled to obtain the second sample parameter information corresponding to each second sample image.
[0122] S32. The second target neural network model is trained using multiple second sample images and the second sample parameter information corresponding to each second sample image to obtain the three-dimensional detection model.
[0123] In this step, after determining the second sample parameter information corresponding to each second sample image, the second target neural network model can be trained by referring to the model training method of the prior art through multiple second sample images and the second sample parameter information corresponding to each second sample image to obtain the three-dimensional detection model. This will not be elaborated here.
[0124] Using the above model training method, the 3D detection model can be trained from the collected second sample images at close range. This 3D detection model can achieve automated annotation of close-range targets, thereby saving a lot of annotation costs.
[0125] Figure 4 This is a flowchart illustrating another target detection method according to an exemplary embodiment, such as... Figure 4 As shown, the method may further include:
[0126] S103. Based on the target parameter information, determine the vehicle's driving route.
[0127] In this step, after determining the target parameter information, the target object can be avoided by referring to existing technology methods, and a driving route can be planned.
[0128] S104. Control the vehicle to drive automatically according to the driving route.
[0129] In this step, after determining the driving route, the driving route can be sent to the vehicle's autonomous driving system, which will then control the vehicle to drive automatically.
[0130] In summary, this disclosure can train a 3D detection model using only close-range environmental images and point cloud data collected by LiDAR. The 3D detection model is used to automatically annotate the first sample image at close range, and the sample data is supplemented by small targets at a distance detected by the 2D detection model, thereby achieving the annotation of a large amount of sample data. The annotated sample data is richer, thus enabling the training of a 3D target detection model that can identify small targets at a distance. This improves the accuracy of the 3D target detection model and further enhances the safety of autonomous vehicle driving.
[0131] Figure 5 This is a block diagram illustrating a target detection device according to an exemplary embodiment, the device being applied to a vehicle, such as... Figure 5 As shown, the device may include:
[0132] The acquisition module 501 is configured to acquire environmental images of the surrounding environment during the vehicle's operation.
[0133] The information acquisition module 502 is configured to input the environmental image into the three-dimensional target detection model to obtain the target parameter information output by the three-dimensional target detection model. The target parameter information includes the target position information, target size information and target angle information of the target object in the environmental image.
[0134] The three-dimensional target detection model is pre-trained using multiple first sample images and target sample parameter information corresponding to each first sample image. The target sample parameter information includes the target sample position information, target sample size information, and target sample angle information of the first sample object in the first sample image. The target sample parameter information is determined by the pre-trained three-dimensional detection model and two-dimensional detection model. The three-dimensional detection model is trained using a second sample image. The distance between the second sample object in the second sample image and the vehicle is less than or equal to a preset distance threshold. The two-dimensional detection model is used to obtain the two-dimensional parameter information of the first sample object.
[0135] Optionally, the 3D object detection model is trained in the following way:
[0136] Obtain multiple images of the first sample;
[0137] For each of the first sample images, the first sample image is input into the three-dimensional detection model to obtain the first sample parameter information output by the three-dimensional detection model. The sample image is then input into the two-dimensional detection model to obtain the two-dimensional parameter information output by the two-dimensional detection model. The first sample parameter information includes the first sample position information, the first sample size information, and the first sample angle information of the first sample object.
[0138] Based on multiple first sample parameter information and multiple two-dimensional parameter information, determine the target sample parameter information corresponding to each first sample image;
[0139] The first target neural network model is trained using multiple first sample images and the target sample parameter information corresponding to each first sample image to obtain the three-dimensional target detection model.
[0140] Optionally, determining the target sample parameter information corresponding to each first sample image based on multiple first sample parameter information and multiple two-dimensional parameter information includes:
[0141] For each of the two-dimensional parameter information, if it is determined that there is a first target sample parameter information corresponding to the two-dimensional parameter information among the multiple first sample parameter information, the first target sample parameter information is taken as the target sample parameter information. If it is determined that there is no first target sample parameter information among the multiple first sample parameter information, multiple feature maps corresponding to the first target sample image are determined through the three-dimensional detection model. Based on the two-dimensional parameter information and the multiple feature maps, the target sample parameter information corresponding to the first target sample image is determined. The first sample object corresponding to the first target sample parameter information is the same as the first sample object corresponding to the two-dimensional parameter information. The first target sample image is the first sample image corresponding to the two-dimensional parameter information.
[0142] Optionally, determining that the first target sample parameter information corresponding to the two-dimensional parameter information exists among the multiple first sample parameter information includes:
[0143] For each of the first sample parameter information, a first detection box corresponding to the first sample parameter information is determined, and the intersection-union ratio (IUR) of the second detection box corresponding to the two-dimensional parameter information and the first detection box is determined. If the IUR is greater than or equal to a preset IUR threshold, it is determined that there is a first target sample parameter information corresponding to the two-dimensional parameter information among the multiple first sample parameter information, and the first sample parameter information is used as the first target sample parameter information.
[0144] Optionally, determining the first detection box corresponding to the first sample parameter information includes:
[0145] Based on the intrinsic parameter information of the camera that captured the first sample image and the position information of the first sample, the position information of the center point of the first detection box corresponding to the parameter information of the first sample is determined;
[0146] The first detection frame is determined based on the position information of the center point of the first detection frame and the size information of the first sample.
[0147] Optionally, the 3D detection model includes multiple detection modules, each including a first convolutional layer, a second convolutional layer, and a prediction convolutional layer. The output of the first convolutional layer is coupled to the input of the second convolutional layer, and the output of the second convolutional layer is coupled to the input of the prediction convolutional layer. The multiple feature maps corresponding to the first target sample image determined by the 3D detection model include:
[0148] The first target sample image is input into the 3D detection model to obtain multiple feature maps output by the predictive convolutional layer of the 3D detection model;
[0149] Based on the two-dimensional parameter information and multiple feature maps, the target sample parameter information corresponding to the first target sample image is determined as follows:
[0150] Determine the feature position information of the center point of the second detection box corresponding to the two-dimensional parameter information on each feature map;
[0151] Based on multiple feature location information and two-dimensional parameter information, the target sample parameter information corresponding to the first target sample image is determined.
[0152] Optionally, the 3D detection model is pre-trained in the following manner:
[0153] Acquire multiple second sample images and second sample parameter information corresponding to each second sample image. The second sample parameter information includes the second sample position information, second sample size information, and second sample angle information of the second sample object. The second sample parameter information is determined based on the point cloud data corresponding to the second sample image.
[0154] The second target neural network model is trained using multiple second sample images and the second sample parameter information corresponding to each second sample image to obtain the three-dimensional detection model.
[0155] Optionally, Figure 6 This is a block diagram illustrating another target detection device according to an exemplary embodiment, such as... Figure 6 As shown, the device also includes:
[0156] The route determination module 503 is configured to determine the vehicle's driving route based on the target parameter information;
[0157] The control module 504 is configured to control the vehicle to drive automatically according to the driving route.
[0158] The aforementioned device determines the target sample parameter information corresponding to the sample image through pre-trained 3D and 2D detection models. It can also annotate small targets at a distance, making the annotated sample data richer and improving the accuracy of the 3D target detection model, thereby enhancing the safety of autonomous driving.
[0159] Regarding the apparatus in the above embodiments, the specific manner in which each module performs its operation has been described in detail in the embodiments related to the method, and will not be elaborated upon here.
[0160] This disclosure also provides a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, implement the steps of the target detection method provided in this disclosure.
[0161] The aforementioned device can be a standalone electronic device or a part of a standalone electronic device. For example, in one embodiment, the device can be an integrated circuit (IC) or a chip, wherein the integrated circuit can be a single IC or a collection of multiple ICs. The chip can include, but is not limited to, the following types: GPU (Graphics Processing Unit), CPU (Central Processing Unit), FPGA (Field Programmable Gate Array), DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), and SoC (System on Chip). The aforementioned integrated circuit or chip can be used to execute executable instructions (or code) to implement the target detection method described above. The executable instructions can be stored in the integrated circuit or chip or obtained from other devices or equipment. For example, the integrated circuit or chip may include a second processor, memory, and an interface for communicating with other devices. The executable instructions can be stored in the memory, and when the executable instructions are executed by the processor, the target detection method described above can be implemented; or, the integrated circuit or chip can receive the executable instructions through the interface and transmit them to the second processor for execution, so as to implement the target detection method described above.
[0162] See Figure 7 , Figure 7 This is a functional block diagram illustrating a vehicle 600 as an exemplary embodiment. The vehicle 600 can be configured for fully or partially autonomous driving modes. For example, the vehicle 600 can acquire environmental information about its surroundings through a perception system 620, and based on the analysis of the surrounding environmental information, derive an autonomous driving strategy to achieve fully autonomous driving, or present the analysis results to the user to achieve partial autonomous driving.
[0163] Vehicle 600 may include various subsystems, such as infotainment system 610, perception system 620, decision control system 630, drive system 640, and computing platform 650. Optionally, vehicle 600 may include more or fewer subsystems, and each subsystem may include multiple components. Furthermore, each subsystem and component of vehicle 600 may be interconnected via wired or wireless means.
[0164] In some embodiments, the infotainment system 610 may include a communication system 611, an entertainment system 612, and a navigation system 613.
[0165] Communication system 611 may include a wireless communication system that can communicate wirelessly with one or more devices directly or via a communication network. For example, the wireless communication system may use 3G cellular communication, such as CDMA, EVDO, GSM / GPRS, or 4G cellular communication, such as LTE, or 5G cellular communication. The wireless communication system may utilize WiFi or a wireless local area network (WLAN) to communicate. In some embodiments, the wireless communication system may utilize an infrared link, Bluetooth, or ZigBee to communicate directly with devices. Other wireless protocols, such as various vehicle communication systems, may also be used. For example, the wireless communication system may include one or more dedicated short-range communications (DSRC) devices that can enable public and / or private data communication between vehicles and / or roadside stations.
[0166] The entertainment system 612 may include a display device, a microphone, and speakers, allowing users to listen to the radio and play music in the vehicle; or connect their mobile phones to the vehicle and project their screens onto the display device, which may be touch-sensitive, allowing users to operate the system by touching the screen.
[0167] In some cases, the user's voice signal can be acquired through a microphone, and based on the analysis of the voice signal, the user can control certain aspects of the vehicle 600, such as adjusting the interior temperature. In other cases, music can be played to the user through the audio system.
[0168] The navigation system 613 may include map services provided by a map provider to provide navigation for the vehicle 600. The navigation system 613 can be used in conjunction with the vehicle's global positioning system 621 and inertial measurement unit 622. The map services provided by the map provider can be two-dimensional maps or high-precision maps.
[0169] The perception system 620 may include several sensors for sensing information about the environment surrounding the vehicle 600. For example, the perception system 620 may include a global positioning system 621 (which may be GPS, BeiDou, or other positioning systems), an inertial measurement unit (IMU) 622, a lidar 623, a millimeter-wave radar 624, an ultrasonic radar 625, and a camera device 626. The perception system 620 may also include sensors for the internal systems of the monitored vehicle 600 (e.g., an in-vehicle air quality monitor, fuel gauge, oil temperature gauge, etc.). Sensor data from one or more of these sensors can be used to detect objects and their corresponding characteristics (position, shape, orientation, speed, etc.). This detection and identification is a critical function for the safe operation of the vehicle 600.
[0170] The Global Positioning System 621 is used to estimate the geographical location of vehicle 600.
[0171] The inertial measurement unit 622 is used to sense changes in the pose of the vehicle 600 based on inertial acceleration. In some embodiments, the inertial measurement unit 622 may be a combination of an accelerometer and a gyroscope.
[0172] The lidar 623 uses lasers to sense objects in the environment in which the vehicle 600 is located. In some embodiments, the lidar 623 may include one or more laser sources, a laser scanner, and one or more detectors, as well as other system components.
[0173] The millimeter-wave radar 624 uses radio signals to sense objects in the surrounding environment of the vehicle 600. In some embodiments, in addition to sensing objects, the millimeter-wave radar 624 can also be used to sense the speed and / or direction of travel of objects.
[0174] The ultrasonic radar 625 can use ultrasonic signals to sense objects around the vehicle 600.
[0175] The camera device 626 is used to capture image information of the surrounding environment of the vehicle 600. The camera device 626 may include a monocular camera, a binocular camera, a structured light camera, and a panoramic camera, etc. The image information acquired by the camera device 626 may include still images or video stream information.
[0176] The decision control system 630 includes a computing system 631 that analyzes and makes decisions based on information acquired by the sensing system 620. The decision control system 630 also includes a vehicle controller 632 that controls the power system of the vehicle 600, as well as a steering system 633, a throttle 634, and a braking system 635 for controlling the vehicle 600.
[0177] The computing system 631 is operable to process and analyze various information acquired by the perception system 620 to identify targets, objects, and / or features in the environment surrounding the vehicle 600. Targets may include pedestrians or animals, and objects and / or features may include traffic signals, road boundaries, and obstacles. The computing system 631 may use object recognition algorithms, structure from motion (SFM) algorithms, video tracking, and other techniques. In some embodiments, the computing system 631 may be used to map the environment, track objects, estimate object speeds, etc. The computing system 631 can analyze the acquired information and derive a control strategy for the vehicle.
[0178] The vehicle controller 632 can be used to coordinate the control of the vehicle's power battery and engine 641 to improve the power performance of the vehicle 600.
[0179] The steering system 633 is operable to adjust the forward direction of the vehicle 600. For example, in one embodiment, it can be a steering wheel system.
[0180] Throttle 634 is used to control the operating speed of engine 641 and thus the speed of vehicle 600.
[0181] Braking system 635 is used to control the deceleration of vehicle 600. Braking system 635 can use friction to slow down wheel 644. In some embodiments, braking system 635 can convert the kinetic energy of wheel 644 into electric current. Braking system 635 may also take other forms to slow down the rotational speed of wheel 644 to control the speed of vehicle 600.
[0182] The drive system 640 may include components that provide powered motion to the vehicle 600. In one embodiment, the drive system 640 may include an engine 641, an energy source 642, a transmission system 643, and wheels 644. The engine 641 may be an internal combustion engine, an electric motor, an air-compressed engine, or other types of engine combinations, such as a hybrid engine consisting of a gasoline engine and an electric motor, or a hybrid engine consisting of an internal combustion engine and an air-compressed engine. The engine 641 converts the energy source 642 into mechanical energy.
[0183] Examples of energy sources 642 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electricity. Energy source 642 can also provide energy to other systems of vehicle 600.
[0184] The drivetrain 643 transmits mechanical power from the engine 641 to the wheels 644. The drivetrain 643 may include a gearbox, a differential, and a drive shaft. In one embodiment, the drivetrain 643 may also include other components, such as a clutch. The drive shaft may include one or more axles that can be coupled to one or more wheels 644.
[0185] Some or all of the functions of vehicle 600 are controlled by computing platform 650. Computing platform 650 may include at least one first processor 651, which can execute instructions 653 stored in a non-transitory computer-readable medium such as memory 652. In some embodiments, computing platform 650 may also be multiple computing devices that control individual components or subsystems of vehicle 600 in a distributed manner.
[0186] The first processor 651 can be any conventional processor, such as a commercially available CPU. Alternatively, the first processor 651 may also include a graphics processing unit (GPU), a field-programmable gate array (FPGA), a system-on-a-chip (SoC), an application-specific integrated circuit (ASIC), or a combination thereof. Although Figure 7 The processor, memory, and other components of a computer within the same block are functionally illustrated; however, those skilled in the art will understand that the processor, computer, or memory may actually include multiple processors, computers, or memories that may or may not be stored in the same physical enclosure. For example, memory may be a hard disk drive or other storage media located in an enclosure different from that of the computer. Therefore, references to a processor or computer will be understood to include references to a collection of processors or computers or memories that may or may not operate in parallel. Unlike using a single processor to perform the steps described herein, some components, such as steering and deceleration components, may each have their own processor, which performs calculations only related to the component's specific function.
[0187] In this embodiment of the disclosure, the first processor 651 can execute the target detection method described above.
[0188] In various aspects described herein, the first processor 651 may be located remotely from the vehicle and communicate wirelessly with the vehicle. In other aspects, some of the processes described herein are executed on a processor located within the vehicle, while others are executed by a remote processor, including taking the necessary steps to perform a single operation.
[0189] In some embodiments, memory 652 may contain instructions 653 (e.g., program logic) that can be executed by first processor 651 to perform various functions of vehicle 600. Memory 652 may also contain additional instructions, including instructions for sending data to, receiving data from, interacting with, and / or controlling one or more of the infotainment system 610, perception system 620, decision control system 630, and drive system 640.
[0190] In addition to instruction 653, memory 652 may also store data such as road maps, route information, vehicle position, direction, speed, and other vehicle data, as well as other information. This information can be used by vehicle 600 and computing platform 650 during operation of vehicle 600 in autonomous, semi-autonomous, and / or manual modes.
[0191] The computing platform 650 can control the functions of the vehicle 600 based on inputs received from various subsystems, such as the drive system 640, the perception system 620, and the decision control system 630. For example, the computing platform 650 can utilize inputs from the decision control system 630 to control the steering system 633 to avoid obstacles detected by the perception system 620. In some embodiments, the computing platform 650 is operable to provide control over many aspects of the vehicle 600 and its subsystems.
[0192] Optionally, one or more of these components may be installed separately from or associated with the vehicle 600. For example, the memory 652 may exist partially or completely separately from the vehicle 600. The components may be communicatively coupled together in a wired and / or wireless manner.
[0193] Optionally, the components described above are merely examples. In actual applications, components in each of the above modules may be added or removed as needed. Figure 7 This should not be construed as a limitation on the embodiments disclosed herein.
[0194] Autonomous vehicles traveling on roads, such as vehicle 600 above, can identify objects in their surroundings to determine adjustments to their current speed. These objects can be other vehicles, traffic control equipment, or other types of objects. In some examples, each identified object can be considered independently, and based on the object's individual characteristics, such as its current speed, acceleration, and distance from the vehicle, the speed adjustment to be made by the autonomous vehicle can be determined.
[0195] Optionally, vehicle 600 or its associated perception and computing devices (e.g., computing system 631, computing platform 650) can predict the behavior of the identified objects based on the characteristics of the identified objects and the state of the surrounding environment (e.g., traffic, rain, ice on the road, etc.). Optionally, each identified object depends on the behavior of the others, so all identified objects can be considered together to predict the behavior of a single identified object. Vehicle 600 can adjust its speed based on the predicted behavior of the identified objects. In other words, the autonomous vehicle can determine what steady state the vehicle needs to adjust to (e.g., accelerate, decelerate, or stop) based on the predicted behavior of the objects. In this process, other factors can also be considered in determining the speed of vehicle 600, such as the lateral position of vehicle 600 in the road, the curvature of the road, the proximity of static and dynamic objects, etc.
[0196] In addition to providing instructions to adjust the speed of the autonomous vehicle, the computing device can also provide instructions to modify the steering angle of the vehicle 600 so that the autonomous vehicle follows a given trajectory and / or maintains a safe lateral and longitudinal distance from objects near the autonomous vehicle (e.g., vehicles in adjacent lanes on the road).
[0197] The vehicle 600 described above can be any type of vehicle, such as a car, truck, motorcycle, bus, boat, airplane, helicopter, recreational vehicle, train, etc. This disclosure does not impose any particular limitation.
[0198] In another exemplary embodiment, a computer program product is also provided, the computer program product comprising a computer program executable by a programmable device, the computer program having a code portion for performing the target detection method described above when executed by the programmable device.
[0199] Other embodiments of this disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of this disclosure. This application is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this disclosure are indicated by the following claims.
[0200] It should be understood that this disclosure is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this disclosure is limited only by the appended claims.
Claims
1. A target detection method, characterized in that, Applied to vehicles, including: Collect environmental images of the surrounding environment during the vehicle's operation; The environmental image is input into a 3D target detection model to obtain target parameter information output by the 3D target detection model. The target parameter information includes the target position information, target size information, and target angle information of the target object in the environmental image. The three-dimensional target detection model is pre-trained using multiple first sample images and target sample parameter information corresponding to each first sample image. The target sample parameter information includes the target sample position information, target sample size information, and target sample angle information of the first sample object in the first sample image. The target sample parameter information is determined by the pre-trained three-dimensional detection model and two-dimensional detection model. The three-dimensional detection model is trained using second sample images. The distance between the second sample object in the second sample image and the vehicle is less than or equal to a preset distance threshold. The two-dimensional detection model is used to obtain the two-dimensional parameter information of the first sample object, and the three-dimensional detection model is used to obtain the first sample parameter information of the first sample object. The target sample parameter information is determined based on multiple first sample parameter information and multiple two-dimensional parameter information.
2. The method according to claim 1, characterized in that, The three-dimensional target detection model is trained in the following way: Acquire multiple images of the first sample; For each of the first sample images, the first sample image is input into the three-dimensional detection model to obtain the first sample parameter information output by the three-dimensional detection model. The first sample image is also input into the two-dimensional detection model to obtain the two-dimensional parameter information output by the two-dimensional detection model. The first sample parameter information includes the first sample position information, the first sample size information, and the first sample angle information of the first sample object. Based on multiple first sample parameter information and multiple two-dimensional parameter information, determine the target sample parameter information corresponding to each first sample image; The first target neural network model is trained using multiple first sample images and target sample parameter information corresponding to each first sample image to obtain the three-dimensional target detection model.
3. The method according to claim 2, characterized in that, The step of determining the target sample parameter information corresponding to each first sample image based on multiple first sample parameter information and multiple two-dimensional parameter information includes: For each of the two-dimensional parameter information, if it is determined that there is a first target sample parameter information corresponding to the two-dimensional parameter information among the multiple first sample parameter information, the first target sample parameter information is used as the target sample parameter information. If it is determined that there is no first target sample parameter information among the multiple first sample parameter information, multiple feature maps corresponding to the first target sample image are determined through the three-dimensional detection model. Based on the two-dimensional parameter information and the multiple feature maps, the target sample parameter information corresponding to the first target sample image is determined. The first sample object corresponding to the first target sample parameter information is the same as the first sample object corresponding to the two-dimensional parameter information. The first target sample image is the first sample image corresponding to the two-dimensional parameter information.
4. The method according to claim 3, characterized in that, The step of determining that there is a first target sample parameter information corresponding to the two-dimensional parameter information among a plurality of first sample parameter information includes: For each of the first sample parameter information, a first detection box corresponding to the first sample parameter information is determined, and the intersection-union ratio (IUR) of the second detection box corresponding to the two-dimensional parameter information and the first detection box is determined. If the IUR is greater than or equal to a preset IUR threshold, it is determined that there is a first target sample parameter information corresponding to the two-dimensional parameter information among the multiple first sample parameter information, and the first sample parameter information is used as the first target sample parameter information.
5. The method according to claim 4, characterized in that, The step of determining the first detection box corresponding to the first sample parameter information includes: Based on the intrinsic parameter information of the camera that captured the first sample image and the position information of the first sample, the position information of the center point of the first detection box corresponding to the first sample parameter information is determined. The first detection frame is determined based on the position information of the center point of the first detection frame and the size information of the first sample.
6. The method according to claim 3, characterized in that, The three-dimensional detection model includes multiple detection modules, each including a first convolutional layer, a second convolutional layer, and a prediction convolutional layer. The output of the first convolutional layer is coupled to the input of the second convolutional layer, and the output of the second convolutional layer is coupled to the input of the prediction convolutional layer. The step of determining multiple feature maps corresponding to the first target sample image through the three-dimensional detection model includes: The first target sample image is input into the three-dimensional detection model to obtain multiple feature maps output by the predictive convolutional layer of the three-dimensional detection model; The step of determining the target sample parameter information corresponding to the first target sample image based on the two-dimensional parameter information and the multiple feature maps includes: Determine the feature position information of the center point of the second detection box corresponding to the two-dimensional parameter information on each feature map; Based on the multiple feature location information and the two-dimensional parameter information, the target sample parameter information corresponding to the first target sample image is determined.
7. The method according to claim 1, characterized in that, The three-dimensional detection model is pre-trained in the following manner: Acquire multiple second sample images and second sample parameter information corresponding to each second sample image. The second sample parameter information includes the second sample position information, second sample size information, and second sample angle information of the second sample object. The second sample parameter information is determined based on the point cloud data corresponding to the second sample image. The second target neural network model is trained using multiple second sample images and the second sample parameter information corresponding to each second sample image to obtain the three-dimensional detection model.
8. The method according to any one of claims 1-7, characterized in that, The method further includes: Based on the target parameter information, determine the vehicle's driving route; The vehicle is controlled to drive automatically according to the stated driving route.
9. A target detection device, characterized in that, Applied to vehicles, including: The acquisition module is configured to acquire environmental images of the surrounding environment during the vehicle's movement. The information acquisition module is configured to input the environmental image into a three-dimensional target detection model to obtain target parameter information output by the three-dimensional target detection model. The target parameter information includes the target position information, target size information, and target angle information of the target object in the environmental image. The three-dimensional target detection model is pre-trained using multiple first sample images and target sample parameter information corresponding to each first sample image. The target sample parameter information includes the target sample position information, target sample size information, and target sample angle information of the first sample object in the first sample image. The target sample parameter information is determined by the pre-trained three-dimensional detection model and two-dimensional detection model. The three-dimensional detection model is trained using second sample images. The distance between the second sample object in the second sample image and the vehicle is less than or equal to a preset distance threshold. The two-dimensional detection model is used to obtain the two-dimensional parameter information of the first sample object, and the three-dimensional detection model is used to obtain the first sample parameter information of the first sample object. The target sample parameter information is determined based on multiple first sample parameter information and multiple two-dimensional parameter information.
10. A vehicle, characterized in that, include: First processor; Memory used to store processor-executable instructions; The first processor is configured as follows: The steps of implementing the method according to any one of claims 1-8.
11. A computer-readable storage medium storing computer program instructions thereon, characterized in that, When executed by a processor, the program instructions implement the steps of the method described in any one of claims 1-8.
12. A chip, characterized in that, It includes a second processor and an interface; the second processor is used to read instructions to execute the method of any one of claims 1-8.