Three-dimensional target detection method, computer device, storage medium and vehicle
By training a 3D target detection model and utilizing the consistency loss function between 2D and 3D information, the problem of inaccurate long-distance target detection in existing technologies is solved, and accurate detection of both long-distance and short-distance targets is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- 安徽蔚来智驾科技有限公司
- Filing Date
- 2022-06-28
- Publication Date
- 2026-06-23
AI Technical Summary
In existing technologies, the detection range of lidar is short, which means it can only accurately detect three-dimensional targets at close range, but cannot accurately detect three-dimensional targets at long range.
The three-dimensional object detection model is used to detect objects in two-dimensional images to obtain the three-dimensional information of the objects to be detected in the two-dimensional images. The model is trained using the two-dimensional information consistency loss function and the three-dimensional information consistency loss function to obtain the trained three-dimensional object detection model.
Even if the actual three-dimensional information of the target sample in the two-dimensional image sample cannot be obtained, the model can be trained through geometric constraints to achieve accurate three-dimensional detection of distant targets, thus improving the accuracy of detection.
Smart Images

Figure CN115205846B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of visual inspection technology, specifically providing a three-dimensional target detection method, computer equipment, storage medium, and vehicle. Background Technology
[0002] To improve the accuracy of 3D target detection in 2D images, a common approach is to use a combination of LiDAR and camera calibration to acquire 3D information such as the location of the target. This information is then used as labels for 2D image samples containing the target. The 2D image samples and their labels are then used to train a 3D target detection model, which is then used to detect 3D targets in 2D images. However, LiDAR typically has a short detection range, only able to acquire 3D information such as the location of nearby targets. This means that the aforementioned method can only accurately detect nearby 3D targets, but not distant ones.
[0003] Accordingly, a new technical solution is needed in this field to solve the above problems. Summary of the Invention
[0004] To overcome the above-mentioned deficiencies, the present invention is proposed to provide a three-dimensional target detection method, computer equipment, storage medium, and vehicle that solves or at least partially solves the technical problem of accurately detecting three-dimensional targets at both near and far distances simultaneously, thereby improving the accuracy of target detection.
[0005] Firstly, a three-dimensional target detection method is provided, the method comprising:
[0006] A three-dimensional target detection model is used to detect targets in a two-dimensional image, thereby obtaining the three-dimensional information of the target to be detected in the two-dimensional image.
[0007] The three-dimensional target detection model is trained in the following way:
[0008] The two-dimensional image samples are used to detect targets by a three-dimensional target detection model to be trained, thereby obtaining two-dimensional detection information and three-dimensional prediction information of the target samples in the two-dimensional image samples.
[0009] The three-dimensional prediction information is projected to obtain two-dimensional projection information;
[0010] Based on the two-dimensional detection information and the two-dimensional projection information, the two-dimensional information consistency loss function is used to train the three-dimensional target detection model to be trained, so as to obtain the trained three-dimensional target detection model.
[0011] In one technical solution of the above-mentioned three-dimensional object detection method, the step of "training the three-dimensional object detection model to be trained using a two-dimensional information consistency loss function based on the two-dimensional detection information and the two-dimensional projection information to obtain a trained three-dimensional object detection model" specifically includes:
[0012] Based on the sample labels of the two-dimensional image samples, determine whether each target sample has actual three-dimensional information;
[0013] If the current target sample has three-dimensional actual information, then the three-dimensional target detection model to be trained is trained using the three-dimensional information consistency loss function based on the three-dimensional actual information of the current target sample and the three-dimensional predicted information.
[0014] If the current target sample does not have three-dimensional actual information, then the three-dimensional target detection model to be trained is trained using the two-dimensional information consistency loss function based on the two-dimensional detection information and the two-dimensional projection information of the current target sample.
[0015] In one technical solution of the above-mentioned three-dimensional object detection method, after the step of "training the three-dimensional object detection model to be trained using a two-dimensional information consistency loss function based on the two-dimensional detection information and the two-dimensional projection information to obtain a trained three-dimensional object detection model", the method further includes training the trained three-dimensional object detection model in the following manner to correct the trained three-dimensional object detection model:
[0016] Determine whether the sample label of the two-dimensional image sample contains the actual three-dimensional information of the target sample;
[0017] If included, then based on the actual 3D information and the predicted 3D information, the 3D information consistency loss function is used to train the trained 3D target detection model to obtain the final 3D target detection model.
[0018] If not included, the trained 3D object detection model will not be trained.
[0019] In one technical solution of the above-mentioned three-dimensional target detection method, the step of "performing target detection on two-dimensional image samples using a three-dimensional target detection model to be trained, and obtaining two-dimensional detection information and three-dimensional prediction information of target samples in the two-dimensional image samples" specifically includes:
[0020] The two-dimensional image sample is detected by the three-dimensional target detection model to be trained, and the two-dimensional detection box of the target sample is obtained.
[0021] Based on the two-dimensional detection information and three-dimensional prediction information of the two-dimensional detection box, the two-dimensional detection information and three-dimensional prediction information of the target sample are determined respectively.
[0022] In one technical solution of the above-mentioned three-dimensional target detection method, the method further includes:
[0023] The two-dimensional information consistency loss function is established using the squared loss function;
[0024] And / or, the three-dimensional information consistency loss function is established using the squared loss function.
[0025] In one technical solution of the above-mentioned three-dimensional target detection method, both the three-dimensional predicted information and the three-dimensional actual information include at least the three-dimensional coordinates, size, and orientation angle of the target sample.
[0026] In one technical solution of the above-mentioned three-dimensional target detection method, the method further includes acquiring the two-dimensional image sample through a monocular camera.
[0027] In a second aspect, a computer device is provided, comprising a processor and a storage device, the storage device being adapted to store a plurality of program codes, the program codes being adapted to be loaded and run by the processor to perform the three-dimensional target detection method described in any of the above-described technical solutions.
[0028] In a third aspect, a computer-readable storage medium is provided, wherein a plurality of program codes are stored therein, the program codes being adapted to be loaded and run by a processor to perform the three-dimensional target detection method described in any of the above-described technical solutions.
[0029] In a fourth aspect, a vehicle is provided, the vehicle comprising the computer equipment described in the above-described computer equipment technical solution.
[0030] The above-described technical solutions of the present invention have at least one or more of the following beneficial effects:
[0031] In implementing the technical solution of this invention, a three-dimensional target detection model can be used to detect targets in a two-dimensional image, thereby obtaining the three-dimensional information of the target to be detected in the two-dimensional image. The three-dimensional target detection model is trained in the following way: the three-dimensional target detection model to be trained is used to detect targets in two-dimensional image samples, thereby obtaining two-dimensional detection information and three-dimensional prediction information of the target samples in the two-dimensional image samples; the three-dimensional prediction information is projected to obtain two-dimensional projection information; based on the two-dimensional detection information and the two-dimensional projection information, the two-dimensional information consistency loss function is used to train the three-dimensional target detection model to be trained, thereby obtaining the trained three-dimensional target detection model.
[0032] Through the above implementation method, even if the actual three-dimensional information of the target sample in the two-dimensional image sample cannot be obtained, the three-dimensional target detection model can be trained by geometrically constraining the two-dimensional detection information and two-dimensional projection information of the target sample. This enables the trained three-dimensional target detection model to accurately detect the three-dimensional information of the target from the two-dimensional image, overcoming the defect in the prior art that the model cannot be trained due to the inability to obtain the actual three-dimensional information of the distant target, thus making it impossible to accurately detect the distant three-dimensional target. Attached Figure Description
[0033] The disclosure of this invention will become more readily understood with reference to the accompanying drawings. It will be readily understood by those skilled in the art that these drawings are for illustrative purposes only and are not intended to limit the scope of protection of this invention. Wherein:
[0034] Figure 1 This is a schematic flowchart of the main steps of a method for obtaining a three-dimensional target detection model according to an embodiment of the present invention;
[0035] Figure 2 This is a schematic flowchart of the main steps of a method for training a 3D target detection model according to an embodiment of the present invention.
[0036] Figure 3 This is a schematic diagram of the main steps of a method for training a 3D target detection model according to another embodiment of the present invention. Detailed Implementation
[0037] Some embodiments of the present invention will now be described with reference to the accompanying drawings. Those skilled in the art should understand that these embodiments are merely illustrative of the technical principles of the present invention and are not intended to limit the scope of protection of the present invention.
[0038] In the description of this invention, "processor" can include hardware, software, or a combination of both. A processor can be a central processing unit, microprocessor, image processor, digital signal processor, or any other suitable processor. The processor has data and / or signal processing capabilities. The processor can be implemented in software, in hardware, or a combination of both. Non-transitory computer-readable storage media includes any suitable medium capable of storing program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random access memory, etc.
[0039] In one embodiment of a three-dimensional target detection method according to the present invention, the method can perform target detection on a two-dimensional image using a three-dimensional target detection model to obtain the three-dimensional information of the target to be detected in the two-dimensional image. The two-dimensional image can be an image acquired by a monocular camera, i.e., the two-dimensional image is a monocular image. By performing target detection on a monocular image using a three-dimensional target detection model, the three-dimensional information of the target to be detected in the monocular image can be obtained. The three-dimensional information of the target to be detected includes at least the three-dimensional coordinates, size, and direction cosine of the target.
[0040] A 3D object detection model can be a network model built using neural networks to detect the 3D information of an object from a 2D image. (See appendix.) Figure 1 In this embodiment of the invention, after constructing the initial three-dimensional target detection model (the three-dimensional target detection model to be trained), the above-mentioned three-dimensional target detection model to be trained can be trained through the following steps S101 to S103, so as to use the trained three-dimensional target detection model to perform target detection on the two-dimensional image and obtain the three-dimensional information of the target to be detected in the two-dimensional image.
[0041] Step S101: Perform target detection on the two-dimensional image samples using the three-dimensional target detection model to be trained, and obtain the two-dimensional detection information and three-dimensional prediction information of the target samples in the two-dimensional image samples.
[0042] Two-dimensional image samples can also be images acquired through a monocular camera; that is, two-dimensional image samples are also monocular images. Two-dimensional detection information includes at least the two-dimensional coordinates of the target sample, and three-dimensional prediction information includes at least the three-dimensional coordinates, size, and orientation angle of the target sample.
[0043] In some implementations, the two-dimensional detection information and three-dimensional prediction information of the target sample can be obtained through the following steps S1011 to S1012.
[0044] Step S1011: Perform object detection on the two-dimensional image samples using the three-dimensional object detection model to be trained, and obtain the two-dimensional detection boxes of the object samples. The two-dimensional detection box refers to the bounding box of the object sample on the two-dimensional image sample.
[0045] Step S1012: Determine the two-dimensional detection information and three-dimensional prediction information of the target sample based on the two-dimensional detection information and three-dimensional prediction information of the two-dimensional detection box.
[0046] In this embodiment, the two-dimensional detection information and three-dimensional prediction information of the two-dimensional detection box can be used as the two-dimensional detection information and three-dimensional prediction information of the target sample, respectively.
[0047] Step S102: Project the three-dimensional prediction information to obtain two-dimensional projection information.
[0048] In this embodiment of the invention, the coordinate system of the 3D prediction information can be transformed from a 3D coordinate system to a 2D image coordinate system, thereby realizing the 2D projection of the 3D prediction information to obtain 2D projection information. The 3D coordinate system can be the world coordinate system. Specifically, in this embodiment, the coordinate system transformation relationship between the world coordinate system and the 2D image coordinate system can be determined first, and then the 3D prediction information can be transformed using this coordinate system transformation relationship. It should be noted that in this embodiment, conventional coordinate system transformation relationship determination methods in the field of vision technology can be used to determine the coordinate system transformation relationship between the world coordinate system and the 2D image coordinate system. For example, the coordinate system transformation relationship between the world coordinate system and the 2D image coordinate system can be determined using the pinhole imaging principle.
[0049] Step S103: Based on the two-dimensional detection information and the two-dimensional projection information, the two-dimensional information consistency loss function is used to train the three-dimensional object detection model to be trained, and the trained three-dimensional object detection model is obtained.
[0050] Two-dimensional detection information can represent the true two-dimensional information value of the target sample on the two-dimensional image sample, while two-dimensional projection information is obtained by projecting three-dimensional prediction information. Therefore, two-dimensional projection information can represent the predicted two-dimensional information value of the target sample on the two-dimensional image sample.
[0051] By using the two-dimensional information consistency loss function to train the three-dimensional object detection model, the two-dimensional projection information (two-dimensional information prediction value) can be made to continuously approach the two-dimensional detection information (two-dimensional information true value). The closer the two-dimensional projection information is to the two-dimensional detection information, the more accurate the three-dimensional prediction information of the target sample obtained by the three-dimensional object detection model to be trained from the two-dimensional image sample is.
[0052] In some implementations, a two-dimensional information consistency loss function can be established using a squared loss function. For example, the two-dimensional information consistency loss function can be represented by the following equation (1).
[0053]
[0054] The parameters in formula (1) have the following meanings: L1 represents the loss value of the two-dimensional information consistency loss function, y1 represents the two-dimensional detection information, Represents two-dimensional projection information.
[0055] Through the above steps S101 to S103, even if the actual three-dimensional information of the target sample in the two-dimensional image sample cannot be obtained, the three-dimensional target detection model can be trained by geometrically constraining the two-dimensional detection information and two-dimensional projection information of the target sample, so that the trained three-dimensional target detection model can accurately detect the three-dimensional information of the target from the two-dimensional image.
[0056] The following provides a further explanation of step S103.
[0057] When training a 3D object detection model, a large number of 2D image samples are typically used, with each 2D image sample containing at least one object sample. The label on each 2D image sample may or may only contain the 3D information of a subset of the object samples. To further improve the accuracy and efficiency of model training, the 3D information can be used for training on object samples with labeled 3D information, while the 2D detection and projection information of the object samples can be used for training on object samples without labeled 3D information. For details, please refer to the appendix. Figure 2 In some embodiments of step S103 above, the model to be trained can be trained through the following steps S1031 to S1033.
[0058] Step S1031: Determine whether each target sample has actual three-dimensional information based on the sample label of the two-dimensional image sample.
[0059] If the current target sample has three-dimensional actual information, proceed to step S1032;
[0060] If the current target sample does not have actual three-dimensional information, proceed to step S1033.
[0061] Step S1032: Based on the actual 3D information and the predicted 3D information of the current target sample, the 3D object detection model to be trained is trained using the 3D information consistency loss function. Training the 3D object detection model using the 3D information consistency loss function allows the predicted 3D information to continuously approach the actual 3D information. The closer the predicted 3D information is to the actual 3D information, the more accurate the predicted 3D information of the target sample obtained by the trained 3D object detection model from the 2D image sample.
[0062] In some implementations, the three-dimensional information consistency loss function can be established by the squared loss function. For example, the three-dimensional information consistency loss function can be shown in the following equation (2).
[0063]
[0064] The parameters in formula (2) have the following meanings: L2 represents the loss value of the three-dimensional information consistency loss function, and y2 represents the actual three-dimensional information. This represents three-dimensional prediction information.
[0065] Step S1033: Based on the two-dimensional detection information and two-dimensional projection information of the current target sample, the two-dimensional information consistency loss function is used to train the three-dimensional target detection model to be trained. It should be noted that the specific method of step S1033 is similar to the method described in step S103 in the aforementioned method embodiment, and will not be repeated here.
[0066] Through the above steps S1031 to S1033, the 3D target detection model to be trained can be trained simultaneously using target samples labeled with 3D real information and target samples without 3D real information, which significantly improves the accuracy and efficiency of model training.
[0067] Furthermore, in other embodiments of step S103 above, after training the 3D target detection model using a 2D information consistency loss function based on the 2D detection information and 2D projection information to obtain a trained 3D target detection model, the trained 3D target detection model can be trained again using the actual 3D information of the target samples to correct the trained 3D target detection model and further improve the target detection accuracy of the 3D target detection model. Specifically, see Appendix Figure 3 In this embodiment, the trained 3D target detection model can be trained through the following steps S104 to S106 to correct the trained 3D target detection model.
[0068] Step S104: Determine whether the sample label of the two-dimensional image sample contains the actual three-dimensional information of the target sample; if it does, proceed to step S105; if it does not, proceed to step S106.
[0069] Step S105: Based on the actual 3D information and the predicted 3D information, the trained 3D target detection model is trained using the 3D information consistency loss function to obtain the final 3D target detection model.
[0070] It should be noted that the specific method of step S105 is similar to the method described in step S1032 in the aforementioned method embodiment, and will not be repeated here.
[0071] Step S106: Do not train the pre-trained 3D object detection model.
[0072] It should be noted that although the steps in the above embodiments are described in a specific order, those skilled in the art will understand that in order to achieve the effects of the present invention, different steps do not necessarily have to be executed in such an order. They can be executed simultaneously (in parallel) or in other orders, and these variations are all within the scope of protection of the present invention.
[0073] Those skilled in the art will understand that all or part of the processes in the method of the above embodiment of the present invention can also be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program includes computer program code, which can be in the form of source code, object code, executable file, or some intermediate form. The computer-readable storage medium can include any entity or device capable of carrying the computer program code, a medium, a USB flash drive, a portable hard drive, a magnetic disk, an optical disk, a computer memory, a read-only memory, a random access memory, an electrical carrier signal, a telecommunication signal, and a software distribution medium, etc. It should be noted that the content included in the computer-readable storage medium can be appropriately added or removed according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, the computer-readable storage medium does not include electrical carrier signals and telecommunication signals.
[0074] Furthermore, the present invention also provides a computer device. In one embodiment of the computer device according to the present invention, the computer device includes a processor and a storage device. The storage device can be configured to store a program for executing the three-dimensional target detection method of the above-described method embodiments, and the processor can be configured to execute the program in the storage device. The program includes, but is not limited to, a program for executing the three-dimensional target detection method of the above-described method embodiments. For ease of explanation, only the parts related to the embodiments of the present invention are shown; for specific technical details not disclosed, please refer to the method section of the embodiments of the present invention. This computer device can be a control device device comprising various electronic devices.
[0075] Furthermore, the present invention also provides a computer-readable storage medium. In one embodiment of the computer-readable storage medium according to the present invention, the computer-readable storage medium can be configured to store a program for performing the three-dimensional target detection method of the above-described method embodiments. This program can be loaded and run by a processor to implement the above-described three-dimensional target detection method. For ease of explanation, only the parts related to the embodiments of the present invention are shown; for specific technical details not disclosed, please refer to the method section of the embodiments of the present invention. The computer-readable storage medium can be a storage device comprising various electronic devices. Optionally, in the embodiments of the present invention, the computer-readable storage medium is a non-transitory computer-readable storage medium.
[0076] Furthermore, the present invention also provides a vehicle. In one embodiment of the vehicle according to the present invention, the vehicle may include the computer equipment described in the above-described computer equipment embodiments. In this embodiment, the vehicle may be an autonomous vehicle, an unmanned vehicle, or the like. Moreover, according to the type of power source, the vehicle in this embodiment may be a gasoline vehicle, an electric vehicle, a hybrid vehicle using a mixture of electric and gasoline power, or a vehicle using other new energy sources, etc.
[0077] The technical solution of the present invention has been described above with reference to one embodiment shown in the accompanying drawings. However, it will be readily understood by those skilled in the art that the scope of protection of the present invention is obviously not limited to these specific embodiments. Without departing from the principles of the present invention, those skilled in the art can make equivalent changes or substitutions to the relevant technical features, and the technical solutions resulting from such changes or substitutions will all fall within the scope of protection of the present invention.
Claims
1. A three-dimensional target detection method, characterized in that, The method includes: A three-dimensional target detection model is used to detect targets in a two-dimensional image, thereby obtaining the three-dimensional information of the target to be detected in the two-dimensional image. The three-dimensional target detection model is trained in the following way: The two-dimensional image samples are used to detect targets by a three-dimensional target detection model to be trained, thereby obtaining two-dimensional detection information and three-dimensional prediction information of the target samples in the two-dimensional image samples. The three-dimensional prediction information is projected to obtain two-dimensional projection information; Based on the two-dimensional detection information and the two-dimensional projection information, the two-dimensional information consistency loss function is used to train the three-dimensional target detection model to be trained, and a trained three-dimensional target detection model is obtained. The model training includes: determining whether each target sample has three-dimensional actual information based on the sample labels of the two-dimensional image samples; if so, training the three-dimensional target detection model using a three-dimensional information consistency loss function based on the three-dimensional actual information and the three-dimensional prediction information of the current target sample; if not, training the three-dimensional target detection model using a two-dimensional information consistency loss function based on the two-dimensional detection information and the two-dimensional projection information of the current target sample.
2. The three-dimensional target detection method according to claim 1, characterized in that, After the step of "training the three-dimensional object detection model to be trained using the two-dimensional information consistency loss function based on the two-dimensional detection information and the two-dimensional projection information to obtain a trained three-dimensional object detection model", the method further includes training the trained three-dimensional object detection model in the following manner to correct the trained three-dimensional object detection model: Determine whether the sample label of the two-dimensional image sample contains the actual three-dimensional information of the target sample; If included, then based on the actual 3D information and the predicted 3D information, the 3D information consistency loss function is used to train the trained 3D target detection model to obtain the final 3D target detection model. If not included, the trained 3D object detection model will not be trained.
3. The three-dimensional target detection method according to claim 1, characterized in that, The steps of "using a three-dimensional target detection model to be trained to perform target detection on two-dimensional image samples, and obtaining two-dimensional detection information and three-dimensional prediction information of target samples in the two-dimensional image samples" specifically include: The two-dimensional image sample is detected by the three-dimensional target detection model to be trained, and the two-dimensional detection box of the target sample is obtained. Based on the two-dimensional detection information and three-dimensional prediction information of the two-dimensional detection box, the two-dimensional detection information and three-dimensional prediction information of the target sample are determined respectively.
4. The three-dimensional target detection method according to claim 1 or 2, characterized in that, The method further includes: The two-dimensional information consistency loss function is established using the squared loss function; And / or, The three-dimensional information consistency loss function is established using the squared loss function.
5. The three-dimensional target detection method according to claim 1 or 2, characterized in that, Both the three-dimensional prediction information and the three-dimensional actual information include at least the three-dimensional coordinates, size, and orientation angle of the target sample.
6. The three-dimensional target detection method according to any one of claims 1 to 3, characterized in that, The method also includes acquiring the two-dimensional image sample using a monocular camera.
7. A computer device comprising a processor and a storage device, said storage device being adapted to store a plurality of program codes, characterized in that, The program code is adapted to be loaded and run by the processor to perform the three-dimensional target detection method according to any one of claims 1 to 6.
8. A computer-readable storage medium storing a plurality of program codes, characterized in that, The program code is adapted to be loaded and run by a processor to perform the three-dimensional target detection method according to any one of claims 1 to 6.
9. A vehicle, characterized in that, The vehicle includes the computer equipment as described in claim 7.