Target detection method, model training method, detection model and electronic device
By introducing a domain-independent feature extractor and data feature alignment technology into the 3D object detection model, the problem of performance degradation in cross-domain detection is solved, enabling efficient detection and rapid deployment in different environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HUAWEI TECH CO LTD
- Filing Date
- 2023-04-19
- Publication Date
- 2026-06-30
AI Technical Summary
Existing 3D object detection models experience performance degradation in cross-domain environments, making it difficult to quickly deploy the models in new detection scenarios. This necessitates the re-collection and labeling of data, which is time-consuming and labor-intensive.
A domain-independent feature extractor is used to extract features that are independent of different domains from point cloud data. Combined with data and feature alignment techniques, a 3D target detection model with strong cross-domain adaptability is constructed.
It improves the detection performance of cross-domain detection, reduces the data collection and annotation work for new detection scenarios, and improves the efficiency of rapid model deployment.
Smart Images

Figure CN116580371B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and in particular to object detection methods, model training methods, detection models, and electronic devices. Background Technology
[0002] 3D object detection is an important component of autonomous driving perception systems. It uses data collected by various sensors to detect surrounding objects and outputs information such as the object's category, position, size, orientation, and speed, thus providing a reference for subsequent decision-making and control modules.
[0003] Point cloud data collected by lidar sensors is a type of three-dimensional data containing rich information. 3D target detection based on point cloud data can achieve better detection results compared to traditional images.
[0004] However, current 3D object detection algorithms based on point cloud data train and test deep learning models within the same point cloud data domain. In real-world applications, sensor updates, different vehicles, and environmental changes can all lead to different distributions of detection and training data. When the detection scenario differs significantly from the training scenario, they correspond to different point cloud data domains. Using a 3D object detection model in this situation results in cross-domain detection (the point cloud data domain used for training differs from that of the detection scenario), leading to a significant drop in model performance. This necessitates re-collecting and labeling data for the new detection scenario and retraining the model, which is time-consuming, labor-intensive, and hinders the rapid deployment of the model in new detection scenarios.
[0005] Therefore, a new target detection method is needed to improve the detection performance of cross-domain detection. Summary of the Invention
[0006] To address the issue of improving the detection performance of cross-domain detection, this application provides an object detection method, a model training method, a detection model, and an electronic device. This application also provides a computer-readable storage medium.
[0007] The embodiments of this application adopt the following technical solutions:
[0008] In a first aspect, this application provides a target detection method, which is applied to an electronic device, and the method includes:
[0009] Acquire point cloud data to be detected, wherein the point cloud data to be detected corresponds to the first domain;
[0010] A first detection model is invoked, which includes a domain-independent feature extractor, wherein: the domain-independent feature extractor is used to extract domain-independent features from the underlying 3D features of the point cloud data that are independent of the first domain and the second domain; the domain-independent feature extractor is a feature extractor trained based on the 2D encoder of the second detection model, according to the second point cloud dataset and the third point cloud dataset; the second detection model is a 3D object detection model pre-trained based on the second point cloud dataset; the second point cloud dataset corresponds to the second domain, and the third point cloud dataset corresponds to the first domain;
[0011] The first detection model is used to process the point cloud data to be detected to obtain 3D target detection results, including:
[0012] The domain-independent feature extractor extracts domain-independent features from the underlying 3D features of the point cloud data to be detected.
[0013] The 3D target detection result is obtained based on the domain-independent features.
[0014] According to the object detection method proposed in the first aspect of this application, the first detection model extracts domain-independent features from point cloud data of a first domain and obtains 3D object detection results based on the domain-independent features. Therefore, the detection performance of 3D object detection is not affected by domain changes, thus improving the detection performance of cross-domain detection.
[0015] In one implementation of the first aspect, the first detection model further includes a first 3D encoder, which is used to extract the low-level three-dimensional features of the point cloud data to be detected, wherein:
[0016] The parameters of the first 3D encoder are consistent with those of the second 3D encoder of the second detection model.
[0017] In one implementation of the first aspect, the first detection model further includes a first detection head, which is used to obtain the 3D target detection result based on the domain-independent features, wherein:
[0018] The first detection head is a detection head trained based on the second detection head of the second detection model, and trained according to the second point cloud dataset and the third point cloud dataset.
[0019] Secondly, this application provides a model training method, which is applied to an electronic device, and the method includes:
[0020] Obtain the second point cloud dataset, which corresponds to the second domain;
[0021] The second detection model is obtained by pre-training based on the second point cloud dataset. The second detection model is a 3D object detection model that includes a 2D encoder.
[0022] Constructing an initial first detection model based on the second detection model includes constructing an initial domain-independent feature extractor for the first detection model based on the 2D encoder.
[0023] Based on the second point cloud dataset and the corresponding third point cloud dataset for the first domain, the first detection model is trained according to the objective function of 3D object detection, wherein:
[0024] The first detection model is used to obtain 3D target detection results based on domain-independent features that are unrelated to the first domain and the second domain;
[0025] The training of the first detection model includes: training the domain-independent feature extractor based on the second point cloud dataset and the third point cloud dataset, wherein the domain-independent feature extractor is used to extract the domain-independent features from the underlying 3D features of the point cloud data.
[0026] In one implementation of the second aspect, obtaining the second point cloud dataset includes:
[0027] Obtain the fourth point cloud dataset, which corresponds to the second domain;
[0028] Align the point cloud density of the fourth point cloud dataset to the first domain to generate the second point cloud dataset.
[0029] Based on the above implementation method, the second point cloud dataset is obtained by using data alignment, which can reduce the domain-based data differences between sample data from different domains and improve the cross-domain detection performance of the trained 3D object detection model.
[0030] In one implementation of the second aspect, aligning the point cloud density of the fourth point cloud dataset to the first domain to generate the second point cloud dataset includes:
[0031] Based on the tilt angle of the laser beam, the laser beams in the second domain and the first domain are matched to obtain the matching result;
[0032] Based on the matching results, a second point cloud dataset is generated according to the fourth point cloud dataset, including:
[0033] For the first laser beam in the first domain, the point data corresponding to the first laser beam is supplemented in each point cloud data of the fourth point cloud dataset, wherein the first laser beam in the first domain does not have a matching laser beam in the second domain.
[0034] And / or,
[0035] For the second laser beam in the second domain, the point data corresponding to the second laser beam is filtered out from each point cloud data in the fourth point cloud dataset, wherein the second laser beam in the second domain does not have a matching laser beam in the first domain.
[0036] Based on the above implementation method, during the data alignment process, attention is paid to the different number of laser beams in point cloud data of different domains, as well as the vertical distribution of the laser beams, which can better align point cloud data of different domains.
[0037] In one implementation of the second aspect, generating the second point cloud dataset based on the fourth point cloud dataset according to the matching result further includes:
[0038] Adjust the number of data points on the laser beam in the fourth point cloud dataset, and align the horizontal resolution of the fourth point cloud dataset to the first domain.
[0039] Based on the above implementation method, the resolution of the laser beam in the horizontal direction is taken into account during the data alignment process, which can better align point cloud data from different domains.
[0040] In one implementation of the second aspect, the first detection model further includes a first 3D encoder for extracting the underlying three-dimensional features from point cloud data; the second detection model further includes a second 3D encoder.
[0041] The construction of the initial first detection model based on the second detection model further includes: loading the parameters of the second 3D encoder into the initial first 3D encoder;
[0042] The training of the first detection model further includes fixing the parameters of the first 3D encoder.
[0043] In one implementation of the second aspect, the first detection model further includes a first detection head, which is used to obtain the 3D target detection result based on the domain-independent features; the second detection model further includes a second detection head.
[0044] The step of constructing an initial first detection model based on the second detection model further includes: loading the parameters of the second detection head into the initial first detection head.
[0045] In one implementation of the second aspect, the first detection model further includes a domain-related feature extractor and a domain classifier. The domain-related feature extractor is used to extract domain-related features related to the first domain or the second domain from the underlying three-dimensional features. The domain classifier is used to distinguish the domains related to the domain-related features. The domain-related feature extractor and the domain-independent feature extractor are designed to learn in an adversarial manner with the domain classifier.
[0046] The step of constructing an initial first detection model based on the second detection model further includes: loading the parameters of the 2D encoder into the initial domain-independent feature extractor and the initial domain-related feature extractor;
[0047] The training of the first detection model also includes:
[0048] The parameters of the domain-related feature extractor are fixed;
[0049] The domain-independent feature extractor and the domain classifier are trained using an adversarial learning approach.
[0050] Based on the above implementation method, the domain classification loss function is used in the feature alignment process, which can explicitly constrain the model to extract domain-independent features. There is no need to select pseudo-labels for iterative training. Only a few training rounds are needed to obtain a first detection model with good detection performance in the first domain.
[0051] In one implementation of the second aspect, the second point cloud dataset contains labels, while the third point cloud dataset does not contain labels.
[0052] Based on the above implementation method, during the feature alignment process, a 3D object detection model can be trained and obtained based on the second point cloud dataset containing labels and the third point cloud dataset without labels, which reduces the label annotation operation and the workload of preparing sample data.
[0053] Thirdly, this application provides an object detection model, which is used to obtain 3D object detection results based on domain-independent features that are independent of the first domain and the second domain, the model comprising:
[0054] Domain-independent feature extractor, where:
[0055] The domain-independent feature extractor is used to extract the domain-independent features from the underlying 3D features of the point cloud data;
[0056] The domain-independent feature extractor is a feature extractor trained based on the 2D encoder of the second detection model and the second point cloud dataset and the third point cloud dataset.
[0057] The second detection model is a 3D detection model pre-trained based on the second point cloud dataset;
[0058] The second point cloud dataset corresponds to the second domain, and the third point cloud dataset corresponds to the first domain.
[0059] Fourthly, this application provides an electronic device, the electronic device including a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein when the computer program instructions are executed by the processor, the electronic device is triggered to perform the steps of the method described in the first or second aspect.
[0060] Fifthly, this application provides a computer-readable storage medium storing a computer program that, when run on a computer, causes the computer to perform the methods described in the first or second aspect. Attached Figure Description
[0061] Figure 1 The diagram shown is a schematic diagram of an autonomous driving system structure according to an embodiment of this application;
[0062] Figure 2 A schematic diagram of point cloud data under detection scenario W;
[0063] Figure 3 A schematic diagram of point cloud data for detection scenario N;
[0064] Figure 4 A schematic diagram of point cloud data under scene K;
[0065] Figure 5 The diagram shown is a simplified representation of an electronic device according to an embodiment of this application;
[0066] Figure 6 The diagram shown is a schematic flowchart of a target detection method according to an embodiment of this application;
[0067] Figure 7 This is a schematic diagram of the structure of a first detection model according to an embodiment of this application;
[0068] Figure 8 The diagram shown is a schematic flowchart of a model training method according to an embodiment of this application;
[0069] Figure 9 The diagram shown is a schematic diagram of a point cloud density alignment process according to an embodiment of this application;
[0070] Figure 10 This is a schematic diagram comparing the tilt angles of laser beams in different domains according to an embodiment of this application;
[0071] Figure 11This is a schematic diagram of the structure of a second detection model according to an embodiment of this application;
[0072] Figure 12 The diagram shown is a structural schematic of a first detection model according to an embodiment of this application;
[0073] Figure 13 The diagram shown is a schematic flowchart of a model training method according to an embodiment of this application;
[0074] Figure 14 The diagram shown is a schematic flowchart of a model training method according to an embodiment of this application;
[0075] Figure 15 The image shown is a comparison of cross-domain detection results according to an embodiment of this application;
[0076] Figure 16 The diagram shown is a schematic diagram of a target detection device according to an embodiment of this application;
[0077] Figure 17 The diagram shown is a schematic diagram of a model training apparatus according to an embodiment of this application. Detailed Implementation
[0078] To make the objectives, technical solutions, and advantages of this application clearer, the technical solutions of this application will be clearly and completely described below in conjunction with specific embodiments and corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0079] The terminology used in the implementation section of this application is for the purpose of explaining specific embodiments of this application only, and is not intended to limit this application.
[0080] Figure 1 The diagram shown is a schematic diagram of an autonomous driving system structure according to an embodiment of this application.
[0081] like Figure 1 As shown, the autonomous driving system includes:
[0082] The perception module 110 is used to identify and locate the surrounding environment and objects using data collected by onboard sensors.
[0083] The decision module 120 is used to make decisions based on the recognition and positioning results of the perception module 110, such as whether to stop or turn.
[0084] The control module 130 is used to control the vehicle based on the decision result of the decision module 120 to achieve autonomous driving.
[0085] Specifically, in one embodiment, the vehicle-mounted sensors connected to the sensing module 110 may include various devices such as cameras, radar, and infrared sensors.
[0086] For example, the perception module 110 can obtain a 2D image by simulating the imaging principle of the human eye through the connected camera, thereby being able to perceive the color and size of an object. By using multiple cameras to observe the same object, the depth can be calculated based on the parallax of the object in different cameras, thus realizing the perception of the distance of the object.
[0087] For example, the sensing module 110 can transmit beams to the surrounding area through the connected radar, and calculate the distance and speed of objects based on the time difference or frequency difference of the echo received by the receiver.
[0088] The perception module 110 acquires point cloud data using data collected by onboard sensors. Specifically, the point cloud data contains N points, each containing information about three-dimensional coordinates (x, y, z) and reflection intensity.
[0089] Specifically, in one embodiment, the detected scenes W, N, and K are Waymo, Nuscenes, and KITTI, respectively.
[0090] Figures 2-4 These are schematic diagrams of point cloud data under detection scenarios W, N, and K, respectively.
[0091] like Figures 2-4 As shown, point cloud data can reflect information about the vehicle's surrounding environment.
[0092] The perception module 110 processes the acquired point cloud data using a point cloud 3D object detection model to obtain 3D object detection output for the identification and localization of the surrounding environment and objects. The 3D object detection output includes the object's bounding box, the object's category, and the object's velocity. The object's bounding box typically includes the three-dimensional coordinates of its center point (x, y, z), the bounding box's length, width, and height (l, w, h), and its rotation angle (θ).
[0093] One of the key aspects of achieving 3D object detection in point clouds lies in point cloud 3D object detection itself. Generally, point cloud 3D object detection models are obtained by training deep learning models based on point cloud data.
[0094] However, as Figures 2-4 As shown, under different detection scenarios, due to the different configurations and installation locations of the LiDAR used in different autonomous driving datasets, the characteristics of point cloud data under different detection scenarios exhibit significant differences.
[0095] Specifically, different detection scenarios correspond to different domains. Since the features of point cloud data in different domains are not uniform, it is difficult to train a point cloud 3D object detection model using point cloud data from different domains. Furthermore, when training a point cloud 3D object detection model using point cloud data from a single domain, the detection performance of the trained model will significantly decrease when performing object detection in other detection scenarios compared to when performing detection in the detection scenario corresponding to the point cloud data domain used to train the model.
[0096] For example, as shown in Table 1 below.
[0097] Table 1
[0098]
[0099] In Table 1, AP bev @0.7、AP 3D @0.7、AP bev @0.5, AP 3D @0.5 is a detection metric that reflects the detection performance of point cloud 3D object detection. "Same domain" means that the point cloud 3D object detection model detects objects in the same detection scenario as the training data used to train the model. "Cross domain" means that the point cloud 3D object detection model detects objects in a different detection scenario than the training data used to train the model.
[0100] W→N means: Same-domain detection is a point cloud 3D target detection model based on detection scenario W, and target detection is performed in detection scenario W; Cross-domain detection is a point cloud 3D target detection model based on detection scenario W, and target detection is performed in detection scenario N.
[0101] N→W means: Same-domain detection is a point cloud 3D target detection model based on detection scenario N, and target detection is performed in detection scenario N; cross-domain detection is a point cloud 3D target detection model based on detection scenario N, and target detection is performed in detection scenario W.
[0102] W→K means: Same-domain detection is a point cloud 3D object detection model based on detection scenario W, which performs object detection in detection scenario W; cross-domain detection is a point cloud 3D object detection model based on detection scenario W, which performs object detection in detection scenario K.
[0103] N→K means: Same-domain detection is a point cloud 3D object detection model based on detection scenario N, and object detection is performed in detection scenario N; Cross-domain detection is a point cloud 3D object detection model based on detection scenario N, and object detection is performed in detection scenario K.
[0104] As shown in Table 1, the detection indicators for cross-domain detection decreased significantly compared to those for same-domain detection.
[0105] To address the issue of improving the detection performance of cross-domain detection, a feasible solution is: alignment based on LiDAR line counts. This involves uniformly constructing source domain data with equivalent low-line counts based on the LiDAR line counts and VFOV of the source domain high-line data and the target domain low-line data. The steps are as follows:
[0106] Calculate the vertical tilt angle θ of each point based on the point cloud data point coordinates (x, y, z);
[0107] The K-Means algorithm is used to cluster the tilt angles of the data points to obtain the laser beam to which each data point belongs;
[0108] The equivalent number of lines in the source domain relative to the target domain is calculated based on the vertical field of view and the number of laser lines.
[0109] The source domain line count is uniformly downsampled to the equivalent line count.
[0110] The above scheme takes into account the differences in point cloud data at the data level across different domains, thus reducing these differences. However, the scheme only considers the overall line count differences between point cloud data from different domains, and only considers the case of data from high lines to low lines. Therefore, the detection accuracy is not ideal, and the use of clustering methods to determine the laser beam to which a data point belongs has a certain degree of error.
[0111] To improve the performance of cross-domain detection, another feasible solution is to use pseudo-labels for self-training. The steps are as follows:
[0112] Pre-train the model using source domain data and labels;
[0113] Predictive results are obtained by inferring target domain data using a pre-trained model;
[0114] Prediction results with higher confidence levels are selected as pseudo-labels;
[0115] The model is trained in the target domain using pseudo-labels as supervision signals.
[0116] Repeat the process of "selecting prediction results with higher confidence as pseudo-labels" and "using pseudo-labels as supervision signals to train the model in the target domain" multiple times to obtain the final model.
[0117] The above approach can improve cross-domain detection performance. However, due to the time-consuming multiple comparisons during model training iterations, the results are not very stable and are prone to overfitting to a small amount of high-confidence data.
[0118] To address the issue of how to improve the detection performance of cross-domain detection, this application provides a target detection method that is applied to electronic devices.
[0119] Specifically, in one embodiment of this application, the electronic device may be a mobile phone, a personal computer, a laptop computer, an unmanned vehicle, a robot device with LiDAR, or a computer for automatic point cloud annotation, etc.
[0120] Figure 5 The diagram shown is a simplified representation of an electronic device according to an embodiment of this application.
[0121] like Figure 5 As shown, the electronic device 500 includes a processor 501 and a memory 502.
[0122] Memory 502 is used to store computer program instructions, and one or more computer program instructions are stored in memory 502. Memory 502 may include a code storage area and a data storage area. The code storage area may store the operating system and application programs. The data storage area may store data created during use, etc. For example, if electronic device 500 is a headset, the data storage area may store data created during the use of the headset 100 (e.g., audio acquisition results), etc. Furthermore, memory 502 may include high-speed random access memory and may also include non-volatile memory.
[0123] The memory 502 may be a read-only memory (ROM), other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types of dynamic storage devices that can store information and instructions. It may also be an electrically erasable programmable read-only memory (EEPROM), or any computer-readable medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
[0124] The processor 501 is used to execute computer program instructions stored in the memory 502 to trigger the electronic device 500 to perform corresponding functions.
[0125] The processor 501 may be an on-chip device (SOC), which may include a central processing unit (CPU) and may further include other types of processors.
[0126] The processor 501 may include, for example, a CPU, DSP, microcontroller, or digital signal processor. The processor 501 may also include necessary hardware accelerators or logic processing hardware circuits, such as an ASIC, or one or more integrated circuits for controlling the execution of the program in this application. Furthermore, the processor may have the function of operating one or more software programs, which may be stored in memory 502.
[0127] Processor 501 may include one or more processing units. For example, the processor may include an application processor (AP), a modem processor, a controller, an audio codec, a digital signal processor (DSP), etc. The controller can generate operation control signals based on the instruction opcode and timing signals to control instruction fetching and execution.
[0128] In processor 501, different processing units can be independent components or integrated into one or more processors. In some embodiments, electronic device 500 may also include one or more processors 501.
[0129] In some embodiments, the processor 501 may include one or more interfaces. Interfaces may include an inter-integrated circuit (I2C) interface, an integrated circuit sound (I2S) interface, a pulse code modulation (PCM) interface, a universal asynchronous receiver / transmitter (UART) interface, a mobile industry processor interface (MIPI), a general-purpose input / output (GPIO) interface, and / or a USB interface, etc. The USB interface can be used to connect a charger to charge the electronic device, and can also be used for data transfer between the electronic device and peripheral devices.
[0130] Processor 501 and memory 502 can be combined into a single processing device, but more commonly they are separate components.
[0131] Figure 6 The diagram shown is a schematic flowchart of a target detection method according to an embodiment of this application.
[0132] In one embodiment, electronic device 500 performs as follows Figure 6The following process is shown to achieve target detection.
[0133] S510, acquire the point cloud data to be detected, the point cloud data to be detected corresponds to the first domain.
[0134] S520, invoke the first detection model.
[0135] S530 uses the first detection model to process the point cloud data to be detected and obtain 3D target detection results.
[0136] Specifically, in one embodiment, the first detection model is a model built based on the second detection model, which is a 3D detection model pre-trained based on the second point cloud dataset, and the second point cloud dataset corresponds to the second domain.
[0137] Specifically, the first detection model includes a domain-independent feature extractor, wherein: the domain-independent feature extractor is used to extract domain-independent features that are independent of the first domain and the second domain from the underlying 3D features of the point cloud data; the domain-independent feature extractor is a feature extractor trained based on the 2D encoder of the second detection model and according to the second point cloud dataset and the third point cloud dataset, the third point cloud dataset corresponding to the first domain.
[0138] The S530 includes:
[0139] The domain-independent feature extractor extracts domain-independent features from the underlying 3D features of the point cloud data to be detected;
[0140] 3D object detection results are obtained based on domain-independent features.
[0141] according to Figure 6 The method in the illustrated embodiment extracts domain-independent features from point cloud data in a first domain and obtains 3D object detection results based on these features. Therefore, the detection performance of 3D objects is not affected by domain changes, thus improving the detection performance of cross-domain detection.
[0142] Furthermore, in order to implement the target detection method of the embodiments of this application, one embodiment of this application proposes a target detection model (first detection model).
[0143] Figure 7 This is a schematic diagram of the structure of a first detection model according to an embodiment of this application.
[0144] like Figure 7 As shown, the first detection model 700 includes a 3D encoder 610 (first 3D encoder), a feature decoupler 620, and a detection head 640 (first detection head).
[0145] The 3D encoder 610 is used to extract underlying 3D features from point cloud data.
[0146] The feature decoupler 620 is connected after the 3D encoder 610 and includes a domain-independent feature extractor 621. The domain-independent feature extractor 621 is used to extract domain-independent features from the underlying 3D features that are independent of the first and second domains. The domain-independent feature extractor 621 is a BEV feature extractor.
[0147] The detection head 640 is connected after the domain-independent feature extractor 621, and is used to obtain 3D target detection results based on domain-independent features.
[0148] Furthermore, one embodiment of this application proposes a model training method (for training a first detection model), which is applied to an electronic device. Specifically, in the model training method of one embodiment of this application, a 3D object detection model (first detection model) is trained and obtained using a fourth point cloud dataset corresponding to the second domain and a third point cloud dataset corresponding to the first domain as training samples.
[0149] Specifically, in one embodiment of this application, a first detection model is obtained by combining data alignment and feature alignment.
[0150] In one embodiment, data alignment refers to aligning the point cloud densities of point cloud data belonging to different domains, reducing the domain-based data differences between sample data from different domains, and improving the cross-domain detection performance of the trained 3D object detection model. Feature alignment refers to aligning the target features of a 3D object detection model trained on a single domain to another domain, thereby improving the cross-domain detection performance of the 3D object detection model.
[0151] Specifically, Figure 8 The diagram shown is a schematic flowchart of a model training method according to an embodiment of this application.
[0152] The training sample data consists of the fourth point cloud dataset corresponding to the second domain and the third point cloud dataset corresponding to the first domain. The electronic device executes, for example... Figure 8 The following procedure is used to train and obtain the first detection model.
[0153] S700, obtain the fourth point cloud dataset containing labels.
[0154] S710, Data Alignment Step: Align the fourth point cloud dataset to the first domain to generate a second point cloud dataset containing labels.
[0155] Specifically, in S710, the fourth point cloud dataset is aligned to the first domain for point cloud density. That is, the second point cloud dataset is the data generated by aligning the point cloud density of the fourth point cloud dataset to the first domain.
[0156] Specifically, in certain application scenarios, the point cloud density of point cloud data from different domains is different. In S710, a second point cloud dataset is generated based on the fourth point cloud dataset so that the point cloud density of the second point cloud dataset is consistent with the point cloud density of the first domain's point cloud data (e.g., the third point cloud dataset). This facilitates the application of point cloud data from different domains (the first domain and the second domain) to the training process of the same detection model, improving model training efficiency and detection performance.
[0157] Specifically, Figure 9 The diagram shown is a schematic diagram of a point cloud density alignment process according to an embodiment of this application.
[0158] In one implementation of S710, the electronic device performs actions such as Figure 9 The following procedure is shown to align the point cloud density of the fourth point cloud dataset to the first domain.
[0159] S800, calculate the vertical tilt angle of each data point in each point cloud data in the fourth point cloud dataset (corresponding to the second domain) and the third point cloud dataset (corresponding to the first domain).
[0160] Specifically, in S800, the tilt angle θ of a data point p = (x, y, z) in a point cloud dataset is calculated in the vertical direction. p The calculation method is as follows:
[0161]
[0162] S810, determine the laser beam to which each data point in each point cloud data in the fourth point cloud dataset and the third point cloud dataset belongs.
[0163] S820, based on the data points on the laser beams, determines the tilt angle of each laser beam in the first domain, and determines the tilt angle of each laser beam in the second domain.
[0164] Specifically, in S820, for a certain laser beam, the median of the tilt angles of all data points on the laser beam in a certain point cloud data is used to represent the tilt angle of the laser beam in the point cloud data. The same calculation is performed on all point cloud data in the same domain point cloud dataset, and the median of the tilt angles of the laser beam in the same domain point cloud dataset is taken as the tilt angle of the laser beam corresponding to that domain.
[0165] Figure 10 This is a schematic diagram comparing the tilt angles of laser beams in different domains according to an embodiment of this application.
[0166] like Figure 10As shown, the vertical axis represents the tilt angle, the tilt angle of the laser beam to the left of the center line A corresponds to the W domain, and the tilt angle of the laser beam to the right of the center line A corresponds to the S domain.
[0167] S830 matches the laser beams in the second and first domains according to the tilt angle of the laser beams, and obtains the matching results.
[0168] Specifically, in S830, for each laser beam j in the first domain, a matching beam i in the second domain with the closest tilt angle θ and a difference between them less than a threshold δ (e.g., 0.5 degrees) is found. The calculation method is as follows:
[0169]
[0170] S840, based on the matching results of S830, filters and / or completes the laser beams in the fourth point cloud dataset to generate the fifth point cloud dataset.
[0171] Specifically, the S840 includes:
[0172] For the first laser beam in the first domain, the point data corresponding to the first laser beam is supplemented in each point cloud data of the fourth point cloud dataset. The first laser beam in the first domain does not have a matching laser beam in the second domain.
[0173] And / or,
[0174] For the second laser beam in the second domain, the point data corresponding to the second laser beam is filtered out in each point cloud data of the fourth point cloud dataset. The second laser beam in the second domain does not have a matching laser beam in the first domain.
[0175] Specifically, when aligning high-line data to low-line data (the laser beam density in the second domain is higher than that in the first domain), laser beams in the second domain that match the first domain are retained, while laser beams in the second domain that do not match the first domain are filtered out.
[0176] When aligning low-line data to high-line data (the laser beam density in the first domain is higher than that in the second domain), for laser beams that are not matched in the first domain, interpolation is used in the second domain to complete the laser beams at these angles.
[0177] Furthermore, to align low-line data to high-line data, denser point cloud data can be obtained by superimposing multiple frames of sparse point clouds in the second domain, thereby simulating the point cloud density of the first domain.
[0178] S850, adjust the number of data points on the laser beam in the fifth point cloud dataset, and align the horizontal resolution of the fifth point cloud dataset to the first domain to generate the second point cloud dataset.
[0179] Specifically, in S850, the number of data points on each laser beam is counted, and the laser beams in the fifth point cloud dataset are uniformly upsampled or downsampled.
[0180] According to the method of this application embodiment, in the data alignment process, in addition to considering the different number of laser beams in point cloud data from different domains, the distribution of laser beams in the vertical direction and the resolution in the horizontal direction are also considered. According to the method of this application embodiment, point cloud data from different domains can be better aligned during the data alignment process.
[0181] Furthermore, according to the method of the embodiments of this application, during the data alignment process, the laser beam to which each data point belongs is determined by using the information provided by the point cloud data, and the laser beam to which each data point belongs can be accurately obtained.
[0182] S720 is executed after S710.
[0183] S720, based on the objective function of 3D object detection, pre-trains a 3D object detection model (second detection model) on the second point cloud dataset to obtain the model.
[0184] Figure 11 This is a schematic diagram of the structure of a second detection model according to an embodiment of this application.
[0185] like Figure 11 As shown, the second detection model 1000 includes:
[0186] 3D encoder 1010 (second 3D encoder), which is used to extract low-level 3D features from point cloud data;
[0187] 2D encoder 1020 is used to extract target features from underlying 3D features; 2D encoder 1020 is a bird's eye view (BEV) feature extractor.
[0188] The detection head 1030 (second detection head) is used to obtain 3D target detection results based on target features.
[0189] Following S720, a feature alignment method is used to align the target features (features of the second domain) from the second detection model to the first domain. Specifically, the feature alignment steps include S730 to S743.
[0190] S730, constructs the initial first detection model based on the second detection model.
[0191] Figure 12 The diagram shown is a structural schematic of a first detection model according to an embodiment of this application.
[0192] like Figure 12 As shown, the first detection model 1100 includes a 3D encoder 1110 (first 3D encoder), a feature decoupler 1120, a domain classifier 1130, and a detection head 1140 (first detection head).
[0193] The 3D encoder 1110 is based on the 3D encoder 610.
[0194] The feature decoupler 1120 is connected after the 3D encoder 1110 and includes a domain-independent feature extractor 1121 and a domain-dependent feature extractor 1122. The domain-independent feature extractor 1121 extracts domain-independent features (refer to domain-independent feature extractor 1121) from the underlying 3D features that are independent of the first and second domains. The domain-dependent feature extractor 1122 extracts domain-dependent features (related to the first or second domain) from the underlying 3D features. Both the domain-independent feature extractor 1121 and the domain-dependent feature extractor 1122 are BEV feature extractors.
[0195] The domain classifier 1130 is connected after the feature decoupler 1120 and is used to distinguish the different domains to which the input features belong. That is, in an ideal state, the trained domain classifier 1130 can distinguish whether the domain-related features output by the domain-related feature extractor 1122 belong to the first domain or the second domain, but cannot distinguish which domain the domain-independent features output by the domain-independent feature extractor 1121 belong to.
[0196] The detection head 1140 is connected after the domain-independent feature extractor 1121, and it is used to obtain 3D target detection results based on domain-independent features (refer to detection head 640).
[0197] The S730 includes:
[0198] S731, Initialize 3D encoder 1110, including loading the parameters of 3D encoder 1010 into the initial 3D encoder 1110.
[0199] S732, the feature decoupler 1120 is initialized, including initializing the domain-dependent feature extractor 1122 and initializing the domain-independent feature extractor 1121 based on the domain-dependent feature extractor 1122. Specifically, in S732, the parameters of the 2D encoder 1020 are loaded into the initial domain-dependent feature extractor 1122 and the initial domain-independent feature extractor 1121.
[0200] S733, Initialize the detection head 1140, including loading the parameters of the detection head 1030 into the initial detection head 1140.
[0201] S740, based on the objective function of 3D object detection, trains the first detection model on the second point cloud dataset (corresponding to the second domain) and the third point cloud dataset (corresponding to the first domain).
[0202] Specifically, the S740 includes:
[0203] S741, fixes the parameters of the 3D encoder 1110 and the domain-related feature extractor 1122.
[0204] The execution of S741 can provide stronger prior information for the domain-independent feature extractor 1121, the domain classifier 1130, and the detection head 1140.
[0205] S742 adjusts the origin height of the point cloud data of the second and third point cloud datasets to the same height and normalizes the laser reflection intensity of the second and third point cloud datasets.
[0206] S743, based on the objective function of 3D object detection, and the data adjustment results of S742, train the domain-independent feature extractor 1121, the domain classifier 1130 and the detection head 1140 of the first detection model.
[0207] Specifically, in S743, the point cloud data (P) of the second point cloud dataset s ) and point cloud data of the third point cloud dataset (P t Simultaneously, the input is fed into the network of the first detection model. The 3D encoder 1110 maps the input point cloud data into the feature space to obtain the underlying 3D features of the point cloud data. as well as P s and P t Features obtained after passing through 3D encoder 1110. as well as It is strongly correlated with the data distribution of the corresponding domain, therefore as well as It contains both domain-specific information and information that remains unchanged across domains.
[0208] as well as It is input to feature decoupler 1120 (input to domain-independent feature extractor 1121 and domain-dependent feature extractor 1122). as well as After passing through the domain-independent feature extractor 1121, we can obtain... as well as as well as They represent from P respectively s and P t Domain-independent features extracted from [the data]. as well as After being processed by the domain correlation feature extractor 1122, the following can be obtained: as well as as well as They represent from P respectively s and P t Domain-related features extracted from [the data].
[0209] The domain classifier 1130 is used to constrain the feature decoupler 1120 to ensure... It can be correctly decoupled into domain-dependent features and domain-independent features. After passing through the domain classifier 1130, the output of the domain-dependent feature extractor 1122, as well as It should be easily distinguishable. As for the output of the domain-independent feature extractor 1121, as well as It should be difficult to distinguish, that is, the domain classifier 1130 has difficulty accurately determining which domain a certain feature comes from.
[0210] Therefore, in one embodiment, the domain classifier 1130 and the feature decoupling unit 1120 are designed to learn in an adversarial manner. That is, during training, the domain classifier 1130 continuously improves its ability to classify features in order to achieve correct classification of domain-related features; at the same time, the domain-independent feature extractor 1121 continuously improves its ability to extract domain-independent features, so that the domain classifier 1130 cannot correctly classify the features.
[0211] The loss function for the adversarial learning approach of the domain classifier 1130 and the feature decoupler 1120 is the domain classification loss L. dc , including the domain-related feature component L ds The component L of the domain-independent feature di Loss function L dc The calculation formula is as follows:
[0212] L dc =L ds +λL di ; (Formula 3)
[0213]
[0214]
[0215] In Equation 3, λ is a hyperparameter used to balance domain-dependent and domain-independent components.
[0216] In Formula 4, the loss function L for domain-related features ds The cross-entropy function; x i y represents the predicted score for the i-th category; i ∈{0,1} indicates whether it belongs to this category.
[0217] In Formula 5, the loss function L for domain-independent features di Here, y represents the mean squared error (MSE); xi represents the predicted score for the i-th class, and y represents the mean squared error (MSE). i =0.5 means that the domain classifier 1130 cannot distinguish the features. as well as Which domain it comes from.
[0218] Loss function L of detector head 1140 det It can be represented as:
[0219]
[0220] In formula 6, and These represent labeled point cloud data (P) s The loss function L for classification and regression is used. total It is the sum of the original detection loss function and the domain classification loss function, that is:
[0221] L total =L det +L dc . (Formula 7)
[0222] During the training of the domain classifier 1130 and the domain-independent feature extractor 1121, since the outputs of the domain-independent feature extractor 1121, the domain-related feature extractor 1122, and the domain classifier 1130 do not require reference labels, the training of the domain classifier 1130 and the domain-independent feature extractor 1121 does not require P. s and P t Tags must be included.
[0223] Furthermore, the output of the domain-independent feature extractor 1121 as well as middle, Point cloud data corresponding to the second point cloud dataset containing labels (P) s ),therefore, Includes tags.
[0224] During the training of the detection head 1140, since the detection head 1140 is connected to the output of the domain-independent feature extractor 1121... as well as exist If tags are included, it is not required Tags must be included.
[0225] Therefore, in one embodiment of this application, the fourth point cloud dataset includes labels (P). sThe third point is that the cloud dataset does not contain labels (P). t (Excluding tags).
[0226] According to the model training method of this application embodiment, during the feature alignment process, a 3D object detection model (first detection model) can be trained and obtained based on a fourth point cloud dataset containing labels and a third point cloud dataset without labels. Since the third point cloud dataset of the first domain does not need to contain labels, the label annotation operation is reduced, and the workload of preparing sample data is reduced.
[0227] According to the model training method of this application embodiment, in the feature alignment process, the domain classification loss function is used to explicitly constrain the model to extract domain-independent features.
[0228] According to the model training method of this application embodiment, during the feature alignment process, there is no need to select pseudo-labels for iterative training. Only a small number of training rounds are needed to obtain a first detection model with good detection performance in the first domain.
[0229] Furthermore, according to the model training method of this application embodiment, a 3D object detection model (first detection model) for processing point cloud data of the first domain can be trained and obtained based on the fourth point cloud dataset corresponding to the second domain and the third point cloud dataset corresponding to the first domain. Compared with the 3D object detection model (second detection model) trained and obtained based on the point cloud data of the second domain, the detection performance of the first detection model in object detection of point cloud data of the first domain is greatly improved.
[0230] Table 2 shows a comparison of the local domain detection and cross-domain detection effects based on an embodiment of this application.
[0231] Table 2
[0232]
[0233]
[0234] As shown in Table 2, the method according to an embodiment of this application for unsupervised domain transfer tasks in 3D object detection of point cloud data has achieved better cross-domain detection results than existing methods on multiple public datasets.
[0235] exist Figure 8 In the illustrated embodiment, a combination of data alignment and feature alignment is used to obtain the first detection model. In another embodiment of this application, the first detection model can be obtained using only data alignment (without using feature alignment).
[0236] Specifically, Figure 13 The diagram shown is a schematic flowchart of a model training method according to an embodiment of this application.
[0237] S1200, obtain the fourth point cloud dataset containing labels.
[0238] S1210, Data alignment step: Align the fourth point cloud dataset to the first domain to generate a second point cloud dataset containing labels. Refer to S710.
[0239] Specifically, in S710, the fourth point cloud dataset is aligned to the first domain for point cloud density. That is, the second point cloud dataset is the data generated by aligning the point cloud density of the fourth point cloud dataset to the first domain.
[0240] S1220: Based on the objective function of 3D object detection, a 3D object detection model (second detection model) is obtained by pre-training on the second point cloud dataset. Refer to S720.
[0241] S1230, Construct the first detection model based on the second detection model (the first detection model is referenced). Figure 7 As shown), where:
[0242] Load the parameters of the 3D encoder of the second detection model into the 3D encoder of the first detection model;
[0243] Load the parameters of the 2D encoder of the second detection model into the domain-independent feature extractor of the first detection model;
[0244] Load the parameters of the detection head of the second detection model into the detection head of the first detection model.
[0245] In another embodiment of this application, the first detection model can also be obtained by using only feature alignment (without using data alignment).
[0246] Specifically, Figure 14 The diagram shown is a schematic flowchart of a model training method according to an embodiment of this application.
[0247] S1300, obtain the fourth point cloud dataset containing labels.
[0248] S1320: Based on the objective function of 3D object detection, a 3D object detection model (second detection model) is obtained through pre-training on the fourth point cloud dataset. Refer to S720.
[0249] S1330, Construct the initial first detection model based on the second detection model. Refer to S730.
[0250] S1340, Based on the objective function of 3D object detection, train the first detection model on the fourth point cloud dataset (corresponding to the second domain) and the third point cloud dataset (corresponding to the first domain).
[0251] Specifically, S1340 includes:
[0252] Fix the parameters of the 3D encoder and the domain-related feature extractor of the first detection model. Refer to S741.
[0253] Adjust the origin height of the point cloud data in the fourth and third point cloud datasets to the same height, and normalize the laser reflection intensity of the fourth and third point cloud datasets. Refer to S742.
[0254] Based on the objective function of 3D object detection, the domain-independent feature extractor, domain classifier, and detection head of the first detection model are trained using the data adjustment results from S1342. Refer to S743.
[0255] Table 3 shows a comparison of detection performance when using data alignment and / or feature alignment methods for data migration in the detection scenario W→N.
[0256] Table 3
[0257] W→N Data alignment (Data) Feature alignment <![CDATA[AP bev ]]> <![CDATA[AP 3D ]]> (a) 39.39 23.50 (b) √ 42.93 23.93 (c) √ 48.43 28.13 (d) √ √ 50.02 28.43 Oracle 58.24 37.40
[0258] In Table 3, (a) corresponds to applying a 3D object detection model trained based on detection scenario W to detection scenario N; (b) corresponds to generating a 3D object detection model using sample data based on detection scenario W according to the feature alignment method of this application embodiment and applying it to detection scenario N; (c) corresponds to generating a 3D object detection model using sample data based on detection scenario W according to the data alignment method of this application embodiment and applying it to detection scenario N; (d) corresponds to generating a 3D object detection model using sample data based on detection scenario W according to the data alignment and feature alignment method of this application embodiment and applying it to detection scenario N; Oracle corresponds to applying a 3D object detection model trained based on detection scenario N to detection scenario N.
[0259] Table 4 shows a comparison of detection performance when using data alignment and / or feature alignment methods for data migration in the detection scenario N→W.
[0260] Table 3
[0261]
[0262]
[0263] In Table 4, (a) corresponds to applying a 3D object detection model trained based on detection scenario N to detection scenario W; (b) corresponds to generating a 3D object detection model using sample data based on detection scenario N according to the feature alignment method of this application embodiment and applying it to detection scenario W; (c) corresponds to generating a 3D object detection model using sample data based on detection scenario N according to the data alignment method of this application embodiment and applying it to detection scenario W; (d) corresponds to generating a 3D object detection model using sample data based on detection scenario N according to the data alignment and feature alignment methods of this application embodiment and applying it to detection scenario W; Oracle corresponds to applying a 3D object detection model trained based on detection scenario W to detection scenario W.
[0264] As shown in Tables 3 and 4, the data alignment and feature alignment methods according to the embodiments of this application can achieve good results in the domain migration task of public datasets, thereby improving the cross-domain detection capability of 3D object detectors.
[0265] Figure 15 The image shown is a comparison of cross-domain detection results according to an embodiment of this application.
[0266] For the detection scenario W→N, Figure 15 The diagram on the left (top and bottom) shows the detection results obtained without using the detection method of the embodiments of this application. Figure 15 The right-hand side diagram (top and bottom) shows the detection results obtained using the detection method of this application embodiment. Comparing the left and right side diagrams, as shown... Figure 15 As shown, a 3D object detector trained only in the W domain has a large number of missed detections when detecting objects in the N domain (left side), while most objects can be detected after using the detection method of this application embodiment (right side). The detection method according to this application embodiment improves the detection rate and the detection accuracy of the model in the target domain.
[0267] Furthermore, based on the target detection method of the embodiments of this application, an embodiment of this application also proposes a target detection device, which is applied to an electronic device.
[0268] Figure 16 The diagram shown is a schematic diagram of a target detection device according to an embodiment of this application.
[0269] like Figure 16 As shown, the target detection device 1600 includes:
[0270] Input module 1610 is used to acquire point cloud data to be detected, which corresponds to the first domain.
[0271] The model invocation module 1620 is used to invoke the first detection model trained and generated according to the method described in the embodiments of this application.
[0272] The detection module 1630 is used to process the point cloud data to be detected using the first detection model to obtain the 3D target detection results.
[0273] Furthermore, based on the model training method of the embodiments of this application, one embodiment of this application also proposes a model training device, which is applied to an electronic device.
[0274] Figure 17 The diagram shown is a schematic diagram of a model training apparatus according to an embodiment of this application.
[0275] like Figure 17 As shown, the model training device 1700 includes:
[0276] Input module 1710 is used to obtain the fourth point cloud dataset and the third point cloud dataset;
[0277] Data alignment module 1720 is used to generate a second point cloud dataset based on the fourth point cloud dataset (see S710);
[0278] The first training module 1730 is used to train the second detection model based on the second point cloud dataset;
[0279] The second training module 1740 is used to create an initial first detection model based on the second detection model, and to train the first detection model based on the second point cloud dataset and the third point cloud dataset (see S730 to S743).
[0280] In the description of the embodiments of this application, for the sake of convenience, the device is described by dividing it into various modules according to its functions. The division of each module is only a logical functional division. When implementing the embodiments of this application, the functions of each module can be implemented in one or more software and / or hardware.
[0281] Specifically, the apparatus proposed in this application can be fully or partially integrated onto a single physical entity, or physically separated. These modules can be implemented entirely in software via processing element calls; entirely in hardware; or partially in software via processing element calls and partially in hardware. For example, the detection module can be a separate processing element or integrated into a chip in the electronic device. The implementation of other modules is similar. Furthermore, these modules can be fully or partially integrated together, or implemented independently. During implementation, each step of the above method or each of the above modules can be completed through integrated logic circuits in the hardware of the processor element or through software instructions.
[0282] For example, these modules can be one or more integrated circuits configured to implement the above methods, such as one or more application-specific integrated circuits (ASICs), one or more digital signal processors (DSPs), or one or more field-programmable gate arrays (FPGAs). Alternatively, these modules can be integrated together as a system-on-a-chip (SOC).
[0283] Furthermore, the devices, apparatuses, and modules described in the embodiments of this application can be implemented by computer chips or physical entities, or by products with certain functions.
[0284] Specifically, one embodiment of this application also proposes an electronic device, which can be referred to Figure 5 The electronic device 500 shown is specifically designed to include a memory for storing computer program instructions and a processor for executing those instructions. When the processor executes the computer program instructions, it triggers the electronic device to perform the target detection method steps or model training method steps described in the embodiments of this application.
[0285] Those skilled in the art will understand that embodiments of this application can be provided as methods, apparatus, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media containing computer-usable program code.
[0286] In the several embodiments provided in this application, any function, if implemented as a software functional unit and sold or used as an independent product, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application.
[0287] Specifically, one embodiment of this application also provides a computer-readable storage medium storing a computer program that, when run on a computer, causes the computer to execute the method provided in the embodiment of this application.
[0288] An embodiment of this application also provides a computer program product, which includes a computer program that, when run on a computer, causes the computer to perform the method provided in the embodiment of this application.
[0289] The embodiments described in this application are described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (devices), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0290] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0291] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0292] It should also be noted that in the embodiments of this application, "at least one" refers to one or more, and "more than one" refers to two or more. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent the existence of A alone, the simultaneous existence of A and B, or the existence of B alone. A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one of the following" and similar expressions refer to any combination of these items, including any combination of singular or plural items. For example, at least one of a, b, and c can represent: a, b, c, a and b, a and c, b and c, or a and b and c, where a, b, and c can be single or multiple.
[0293] In this application, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0294] This application can be described in the general context of computer-executable instructions, such as program modules, that are executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform a specific task or implement a specific abstract data type. This application can also be practiced in distributed computing environments where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.
[0295] The various embodiments in this application are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the device embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions of the method embodiments.
[0296] Those skilled in the art will recognize that the units and algorithm steps described in the embodiments of this application can be implemented using electronic hardware, computer software, or a combination of electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0297] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the devices, apparatuses, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0298] The above description is merely a specific embodiment of this application. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the protection scope of this application. The protection scope of this application should be determined by the protection scope of the claims.
Claims
1. A target detection method characterized by, The method is applied to an electronic device, and the method includes: Acquire point cloud data to be detected, wherein the point cloud data to be detected corresponds to the first domain; The first detection model is invoked. This first detection model is a 3D object detection model trained using point cloud datasets corresponding to the second domain and the first domain as training samples. The first detection model includes a domain-independent feature extractor, wherein: the domain-independent feature extractor is used to extract domain-independent features unrelated to the first and second domains from the underlying 3D features of the point cloud data; the domain-independent feature extractor is a feature extractor trained based on the 2D encoder of the second detection model, using the second point cloud dataset and the third point cloud dataset; the second detection model is a 3D object detection model pre-trained based on the second point cloud dataset; the second point cloud dataset corresponds to the second domain, and the third point cloud dataset corresponds to the first domain. The first detection model is used to process the point cloud data to be detected to obtain 3D target detection results, including: The domain-independent feature extractor extracts domain-independent features from the underlying 3D features of the point cloud data to be detected. The 3D target detection result is obtained based on the domain-independent features.
2. The method according to claim 1, characterized in that, The first detection model further includes a first 3D encoder, which is used to extract the low-level three-dimensional features of the point cloud data to be detected, wherein: The parameters of the first 3D encoder are consistent with those of the second 3D encoder of the second detection model.
3. The method of claim 2, wherein, The first detection model further includes a first detection head, which is used to obtain the 3D target detection result based on the domain-independent features, wherein: The first detection head is a detection head trained based on the second detection head of the second detection model, and trained according to the second point cloud dataset and the third point cloud dataset.
4. A model training method, comprising: The method is applied to electronic devices, and uses point cloud datasets corresponding to the second domain and the first domain as training samples to train and obtain a 3D object detection model; the method includes: Obtain the second point cloud dataset, which corresponds to the second domain; The second detection model is obtained by pre-training based on the second point cloud dataset. The second detection model is a 3D object detection model that includes a 2D encoder. Constructing an initial first detection model based on the second detection model includes constructing an initial domain-independent feature extractor for the first detection model based on the 2D encoder. Based on the second point cloud dataset and the corresponding third point cloud dataset for the first domain, the first detection model is trained according to the objective function of 3D object detection, wherein: The first detection model is used to obtain 3D target detection results based on domain-independent features that are unrelated to the first domain and the second domain; The training of the first detection model includes: training the domain-independent feature extractor based on the second point cloud dataset and the third point cloud dataset, wherein the domain-independent feature extractor is used to extract the domain-independent features from the underlying 3D features of the point cloud data.
5. The method of claim 4, wherein, The acquisition of the second point cloud dataset includes: Obtain the fourth point cloud dataset, which corresponds to the second domain; Align the point cloud density of the fourth point cloud dataset to the first domain to generate the second point cloud dataset.
6. The method of claim 5, wherein, The step of aligning the point cloud density of the fourth point cloud dataset to the first domain to generate the second point cloud dataset includes: Based on the tilt angle of the laser beam, the laser beams in the second domain and the first domain are matched to obtain the matching result; Based on the matching results, a second point cloud dataset is generated according to the fourth point cloud dataset, including: For the first laser beam in the first domain, the point data corresponding to the first laser beam is supplemented in each point cloud data of the fourth point cloud dataset, wherein the first laser beam in the first domain does not have a matching laser beam in the second domain. And / or, For the second laser beam in the second domain, the point data corresponding to the second laser beam is filtered out from each point cloud data in the fourth point cloud dataset, wherein the second laser beam in the second domain does not have a matching laser beam in the first domain.
7. The method of claim 6, wherein, The step of generating the second point cloud dataset based on the matching result and the fourth point cloud dataset further includes: Adjust the number of data points on the laser beam in the fourth point cloud dataset, and align the horizontal resolution of the fourth point cloud dataset to the first domain.
8. The method according to any one of claims 4-7, characterized by, The first detection model further includes a first 3D encoder, which is used to extract the underlying three-dimensional features from point cloud data; The second detection model also includes a second 3D encoder; The construction of the initial first detection model based on the second detection model further includes: loading the parameters of the second 3D encoder into the initial first 3D encoder; The training of the first detection model further includes fixing the parameters of the first 3D encoder.
9. The method according to any one of claims 4-7, characterized in that, The first detection model further includes a first detection head, which is used to obtain the 3D target detection result based on the domain-independent features; The second detection model also includes a second detection head; The step of constructing an initial first detection model based on the second detection model further includes: loading the parameters of the second detection head into the initial first detection head.
10. The method according to any one of claims 4-7, characterized in that, The first detection model further includes a domain-related feature extractor and a domain classifier. The domain-related feature extractor is used to extract domain-related features related to the first domain or the second domain from the underlying 3D features. The domain classifier is used to distinguish the domains related to the domain-related features. The domain-related feature extractor and the domain-independent feature extractor are designed to learn in an adversarial manner with the domain classifier. The step of constructing an initial first detection model based on the second detection model further includes: loading the parameters of the 2D encoder into the initial domain-independent feature extractor and the initial domain-related feature extractor; The training of the first detection model also includes: The parameters of the domain-related feature extractor are fixed; The domain-independent feature extractor and the domain classifier are trained using an adversarial learning approach.
11. The method of any one of claims 4-7, wherein, The second point cloud dataset contains labels, while the third point cloud dataset does not contain labels.
12. A target detection model, characterized in that, The model is a 3D object detection model trained using point cloud datasets corresponding to the second domain and the first domain as training samples. The model is used to obtain 3D object detection results based on domain-independent features unrelated to the first and second domains. The model includes: Domain-independent feature extractor, where: The domain-independent feature extractor is used to extract the domain-independent features from the underlying 3D features of the point cloud data; The domain-independent feature extractor is a feature extractor trained based on the 2D encoder of the second detection model and the second point cloud dataset and the third point cloud dataset. The second detection model is a 3D detection model pre-trained based on the second point cloud dataset; The second point cloud dataset corresponds to the second domain, and the third point cloud dataset corresponds to the first domain.
13. An electronic device, comprising: The electronic device includes a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein when the computer program instructions are executed by the processor, the electronic device is triggered to perform the method steps as claimed in any one of claims 1-7 or any one of claims 8-11.
14. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when run on a computer, causes the computer to perform the method as claimed in any one of claims 1-7 or any one of claims 8-11.