3D Object Detection Neural Network Accelerator and Its Implementation Method
By designing a 3D target detection neural network accelerator that includes a processor, off-chip memory, bus, cache read/write control circuit, columnar feature extraction acceleration module, and neural network computing acceleration module, the problem that existing accelerators cannot effectively accelerate columnar feature extraction is solved, and efficient 3D target detection computation is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SUN YAT SEN UNIV
- Filing Date
- 2024-09-10
- Publication Date
- 2026-06-30
AI Technical Summary
Existing neural network accelerators cannot effectively accelerate the columnar feature extraction process, resulting in computational difficulties and high latency for 3D target detection algorithms. Furthermore, existing accelerators suffer from high static power consumption and resource waste.
A three-dimensional target detection neural network accelerator was designed, which includes a processor, off-chip memory, bus, cache read/write control circuit, bar feature extraction acceleration module and neural network calculation acceleration module. The computational efficiency is improved by customizing the bar feature extraction and neural network calculation.
This approach simultaneously accelerates columnar feature extraction and neural network computation, reduces computational latency and static power consumption, and improves the computational efficiency of 3D target detection neural networks.
Smart Images

Figure CN119358611B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of neural network accelerator technology, and in particular to a three-dimensional target detection neural network accelerator and its implementation method. Background Technology
[0002] 3D object detection algorithms are widely used in 3D environmental perception in intelligent driving scenarios. They use LiDAR point cloud data as input, process it through neural networks, and detect the spatial location, size, and specific category of various objects in the surrounding environment. Each frame of LiDAR point cloud contains a large number of irregular points, making 3D object detection algorithms computationally difficult and resulting in high latency. Pillar-based 3D object detection algorithms, however, convert the irregular 3D point cloud into regular columnar features (Pillars) before processing them through the neural network. This approach effectively reduces the computational load of the neural network, making it highly suitable for 3D perception applications requiring real-time deployment.
[0003] The deployment of Pillar-based 3D object detection algorithms mainly involves two processes: extracting columnar features and computing neural networks. However, existing neural network hardware accelerators typically only support acceleration of the convolutional network portion, lacking customized acceleration for the columnar feature extraction process. Furthermore, most existing neural network accelerators are general-purpose accelerators, containing numerous redundant computational modules, leading to high static power consumption and wasted on-chip resources. These issues urgently need to be addressed. Summary of the Invention
[0004] To address the aforementioned technical problems, the present invention aims to provide a three-dimensional target detection neural network accelerator and its implementation method, which can simultaneously accelerate the processes of columnar feature extraction and neural network calculation, thereby improving the computational efficiency of the three-dimensional target detection neural network.
[0005] The first technical solution adopted in this invention is:
[0006] A three-dimensional target detection neural network accelerator includes a processor, off-chip memory, a bus, a cache read / write control circuit, a histogram feature extraction acceleration module, and a neural network computation acceleration module. The processor, the off-chip memory, and the cache read / write control circuit are communicatively connected via the bus. The output of the histogram feature extraction acceleration module is connected to the input of the neural network computation acceleration module. Both the histogram feature extraction acceleration module and the neural network computation acceleration module are communicatively connected to the cache read / write control circuit.
[0007] The processor is used to control the data reading, writing, and calculation processes;
[0008] The off-chip memory is used to store the raw point cloud data and intermediate feature data during the calculation process;
[0009] The bus is used to bridge the processor, the off-chip memory, and the cache read / write control circuit.
[0010] The cache read / write control circuit is used to execute corresponding data read / write operations according to the control instructions of the processor;
[0011] The column feature extraction acceleration module is used to accelerate the computation of column feature extraction;
[0012] The neural network computation acceleration module is used to accelerate the computation of the backbone network.
[0013] Furthermore, the columnar feature extraction acceleration module includes a point cloud cache, a feature cache, a column indexing unit, a point count unit, a point cloud center point calculation unit, and a point cloud offset calculation unit.
[0014] Furthermore, the point cloud cache is used to store point cloud data;
[0015] The feature cache is used to store point cloud offset data of multiple dimensions obtained through calculation;
[0016] The column indexing unit is used to calculate the index value of the corresponding column based on the three-dimensional coordinates of the point cloud data, and to determine the storage address of the point cloud data in the off-chip memory based on the index value.
[0017] The point counting unit is used to count the number of point clouds stored in each column;
[0018] The point cloud center point calculation unit is used to calculate the centroid coordinates of the point cloud data stored in the cylinder;
[0019] The point cloud offset calculation unit is used to calculate the three-dimensional offset of the point cloud data stored in the cylinder relative to the coordinates of the centroid point, and the two-dimensional offset relative to the center point of the cylinder.
[0020] Furthermore, the neural network computation acceleration module includes an input cache unit, a weight cache unit, a bias cache unit, a data alignment unit, a multiply-accumulate calculation array, a post-processing unit, and an output cache unit.
[0021] Furthermore, the input buffer unit is used to store the input features calculated by the neural network;
[0022] The weight cache unit is used to store the weight data calculated by the neural network;
[0023] The bias cache unit is used to store the bias data calculated by the neural network;
[0024] The data alignment unit is used to rearrange and mask the input features;
[0025] The multiply-accumulate array is used to receive the aligned input features and the weight data, and to perform multiply-accumulate calculations.
[0026] The post-processing unit is used to perform accumulation, pooling, biasing, and activation operations on the feature data after multiplication and addition to obtain the output features.
[0027] The output buffer unit is used to store the output features.
[0028] Furthermore, the control commands include data loading control commands, execution calculation control commands, and data write-back control commands, wherein:
[0029] When the data loading control instruction is received, the cache read / write control circuit reads the original point cloud data, input features, weight data, and bias data from the off-chip memory, stores the original point cloud data in the cache of the columnar feature extraction acceleration module, and stores the input features, weight data, and bias data in the cache of the neural network calculation acceleration module.
[0030] When the execution calculation control instruction is received, the cache read / write control circuit controls the bar feature extraction acceleration module and the neural network calculation acceleration module to read data from the cache and perform bar feature extraction and neural network calculation to obtain output features and store them in the cache of the neural network calculation acceleration module;
[0031] When the data write-back control command is received, the cache read-write control circuit writes the data in the cache of the columnar feature extraction acceleration module and the neural network calculation acceleration module into the off-chip memory.
[0032] Furthermore, the cache read / write control circuit can asynchronously execute at least two of the data loading control instruction, the execution calculation control instruction, and the data write-back control instruction.
[0033] Furthermore, the processor is also used to configure the input feature address, weight address, bias address, input feature dimension, and output feature dimension for each convolutional layer computation.
[0034] The second technical solution adopted in this invention is:
[0035] A method for implementing a 3D target detection neural network accelerator, comprising the following steps:
[0036] The column feature extraction acceleration module reads the raw point cloud data from the off-chip memory, calculates the column index and the number of points stored in each non-empty column based on the three-dimensional coordinates of the raw point cloud data, and then calculates the group region address to be written back to the off-chip memory based on the column index and the number of points. The raw point cloud data is then grouped and written back to the corresponding group region address.
[0037] The column feature extraction acceleration module reads grouped point cloud data from the grouped region address, calculates the centroid coordinates of the grouped point cloud data, and then calculates the three-dimensional offset of the grouped point cloud data relative to the centroid coordinates, as well as the two-dimensional offset relative to the center point of the column.
[0038] The columnar feature extraction acceleration module generates multi-dimensional features by splicing the three-dimensional offset, the two-dimensional offset, and the grouped point cloud data, and sends the multi-dimensional features to the neural network computing acceleration module.
[0039] The neural network computing acceleration module performs matrix multiplication on the multidimensional features to obtain cylindrical features, and then writes the cylindrical features back to the off-chip memory.
[0040] The neural network computation acceleration module aligns the column features, and then performs multiplication and addition operations on the aligned column features and weight data obtained from the off-chip memory to obtain fused feature data. The fused feature data is then accumulated, pooled, biased, and activated to obtain output features, which are then written back to the off-chip memory.
[0041] Furthermore, the implementation method also includes the following steps:
[0042] The processor stores the raw point cloud data in the off-chip memory and configures the input feature address, weight address, bias address, input feature dimension, and output feature dimension for each convolutional layer.
[0043] The beneficial effects of this invention are as follows: This invention provides a 3D target detection neural network accelerator and its implementation method. The 3D target detection neural network accelerator includes a processor, off-chip memory, a bus, a cache read / write control circuit, a columnar feature extraction acceleration module, and a neural network calculation acceleration module. The processor controls the data read / write and calculation process. The off-chip memory stores the original point cloud data and intermediate feature data during the calculation process. The bus bridges the processor, off-chip memory, and cache read / write control circuit. The cache read / write control circuit executes corresponding data read / write operations according to the processor's control instructions. The columnar feature extraction acceleration module accelerates the calculation of columnar feature extraction, and the neural network calculation acceleration module accelerates the calculation of the backbone network. This invention provides a reasonable and efficient 3D target detection neural network accelerator by customizing the acceleration of columnar feature extraction and convolutional neural network calculation. It can be applied to Pillar-based 3D target detection algorithms, and can simultaneously accelerate the processes of columnar feature extraction and neural network calculation, resulting in good acceleration effects and improving the computational efficiency of 3D target detection neural networks. Attached Figure Description
[0044] Figure 1 This is a schematic diagram of the structure of a three-dimensional target detection neural network accelerator provided in an embodiment of the present invention;
[0045] Figure 2 The flowchart illustrates the steps of an implementation method for a three-dimensional target detection neural network accelerator provided in this embodiment of the invention. Detailed Implementation
[0046] The present invention will now be described in further detail with reference to the accompanying drawings and specific embodiments. The step numbers in the following embodiments are only for ease of explanation and do not limit the order of the steps. The execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
[0047] In the description of this invention, "multiple" means two or more. The use of "first" and "second" is for distinguishing technical features only and should not be construed as indicating or implying relative importance, or implicitly indicating the number of indicated technical features, or the order in which the indicated technical features are presented. Furthermore, unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. The terminology used in this specification is for the purpose of describing particular embodiments only and not for limiting the invention.
[0048] The purpose of columnar feature extraction in Pillar-based 3D object detection algorithms is to transform irregular point clouds into regular columnar features. The main steps are as follows:
[0049] 1) Acquire 3D point cloud data. The data format of 3D point cloud is (x,y,z,i), where (x,y,z) are the 3D spatial coordinates of the point cloud and i is the reflectivity of the point cloud. This data is generally obtained from radar or depth camera.
[0050] 2) Group the point cloud. Based on the three-dimensional spatial coordinates of the input point cloud, divide the point cloud into a pre-defined two-dimensional grid of size H*W. Each grid point is called a cylinder. At the same time, count the number of point clouds in each cylinder. The upper limit of the number of point clouds in each cylinder is M. If there are no point clouds in a cylinder, or the number of point clouds is insufficient, fill it with all zeros. If the number of point clouds exceeds the upper limit, the extra point clouds are not included in the cylinder.
[0051] 3) Based on the point cloud distribution in each column, extract 5-dimensional features (x... m ,y m ,z m ,x c ,y c ), where (x m ,y m ,z m (x) is the offset vector from the point cloud to the centroid of the point cloud in each cylinder. c ,y c () is a two-dimensional offset vector from the point cloud to the center of the cylinder. Concatenating these features with the original point cloud data yields a 9-dimensional feature. Assuming the number of non-empty cylinders is N, the resulting feature dimensions are (N, M, 9).
[0052] 4) Then, the feature dimension is expanded to (N, M, 64) through matrix multiplication, and the second dimension is compressed through pooling to obtain cylinder features of size (N, 9). After the cylinder features are extracted, they are placed back into the two-dimensional grid according to the index of each cylinder to form two-dimensional pseudo-image features of size (H, W, 64).
[0053] 5) The pseudo-image features are fed into the backbone network for calculation. The backbone network is composed of a convolutional neural network. The final calculation results are the features of the 3D bounding box, the predicted category of the bounding box, and the azimuth angle of the bounding box.
[0054] 6) Based on the 3D target detection calculation results, the 3D nonmaximum suppression algorithm is used to obtain the final effective 3D bounding box.
[0055] To accelerate the above calculation process, this invention provides a three-dimensional target detection neural network accelerator, referring to... Figure 1The three-dimensional target detection neural network accelerator of this invention includes a processor, off-chip memory, a bus, a cache read / write control circuit, a histogram feature extraction acceleration module, and a neural network computation acceleration module. The processor, off-chip memory, and cache read / write control circuit are connected via the bus. The output of the histogram feature extraction acceleration module is connected to the input of the neural network computation acceleration module. Both the histogram feature extraction acceleration module and the neural network computation acceleration module are communicatively connected to the cache read / write control circuit.
[0056] The processor is used to control the data reading, writing, and calculation processes;
[0057] Off-chip memory is used to store raw point cloud data and intermediate feature data during the calculation process;
[0058] The bus is used to bridge the processor, off-chip memory, and cache read / write control circuitry.
[0059] The cache read / write control circuit is used to execute corresponding data read / write operations according to the processor's control instructions;
[0060] The column feature extraction acceleration module is used to accelerate the computation of column feature extraction;
[0061] The neural network computation acceleration module is used to accelerate the computation of the backbone network.
[0062] Specifically, the processor controls the accelerator's cache read / write and computation processes; off-chip memory stores raw point cloud data and intermediate feature data during computation; in specific applications, off-chip memory is typically DRAM; the bus bridges the data and control interfaces of the processor, off-chip memory, and accelerator, where the processor is the control bus master and data bus master, the off-chip memory is the data bus slave, and the accelerator is the control bus slave and data bus master; in specific applications, the bus is typically an AMBA bus; the cache read / write control circuit receives configuration information and control commands from the processor and automatically performs data read / write operations through a built-in state machine; the histogram feature extraction acceleration module is mainly used to accelerate histogram feature extraction computation, and the neural network computation acceleration module is mainly used to accelerate the backbone network computation.
[0063] Reference Figure 1 As an optional implementation, the column feature extraction acceleration module includes a point cloud cache, a feature cache, a column index unit, a point count unit, a point cloud center point calculation unit, and a point cloud offset calculation unit.
[0064] As an optional implementation, a point cloud cache is used to store point cloud data;
[0065] Feature cache is used to store multi-dimensional point cloud offset data obtained from calculation;
[0066] The column index unit is used to calculate the index value of the corresponding column based on the three-dimensional coordinates of the point cloud data, and to determine the storage address of the point cloud data in off-chip memory based on the index value;
[0067] The point counting unit is used to count the number of point clouds stored in each column;
[0068] The point cloud center point calculation unit is used to calculate the centroid coordinates of the point cloud data stored in the cylinder;
[0069] The point cloud offset calculation unit is used to calculate the three-dimensional offset of the point cloud data stored in the cylinder relative to the coordinates of the centroid point, as well as the two-dimensional offset relative to the center point of the cylinder.
[0070] Specifically, the point cloud caching unit is used to store the original point cloud data and the grouped point cloud data; the feature caching unit is used to store the offset data (x, y, y) of the five dimensions of the point cloud obtained by calculation. m ,y m ,z m ,x c ,y c The cylinder index ternary is used to calculate the index value (u, v) of the corresponding cylinder from the 3D coordinates of the point cloud, and generate the address in off-chip memory based on the index to store the point cloud data in the corresponding location; the point count unit is used to count the number of point clouds stored in each cylinder. The number of point clouds stored in each cylinder is initialized to 0, with an upper limit of M. For each point cloud stored, the point count at the corresponding index position is increased by 1; the point cloud center point calculation unit is used to calculate the centroid coordinates of the point clouds stored in the cylinder. For the point clouds in each cylinder, the 3D coordinate values of the point clouds are first accumulated. When the last point is accumulated, a division is performed, and the average value is the centroid coordinate of the point cloud; the point cloud offset calculation unit is used to calculate the 3D offset (x) of the point clouds stored in the cylinder relative to the centroid. m ,y m ,z m ) and the 2D offset (x) of the center point of the cylinder c ,y c ).
[0071] This invention divides columnar feature extraction into two steps: point cloud grouping and feature extraction. The matrix calculation in the feature extraction process reuses a neural network computation acceleration core to expand the feature dimension. Compared to traditional software-based columnar feature extraction, the architecture provided by this invention achieves lower computational latency.
[0072] Reference Figure 1 As an optional implementation, the neural network computation acceleration module includes an input cache unit, a weight cache unit, a bias cache unit, a data alignment unit, a multiply-accumulate computation array, a post-processing unit, and an output cache unit.
[0073] As an optional implementation, the input buffer unit is used to store the input features for neural network computation;
[0074] The weight cache unit is used to store the weight data calculated by the neural network;
[0075] The bias cache unit is used to store the bias data for neural network computation;
[0076] Data alignment units are used to rearrange and mask input features;
[0077] The multiply-accumulate array is used to receive aligned input feature and weight data and perform multiply-accumulate calculations.
[0078] The post-processing unit is used to perform accumulation, pooling, biasing, and activation operations on the feature data after multiplication and addition to obtain the output features;
[0079] The output buffer unit is used to store output characteristics.
[0080] Specifically, the input buffer unit stores the input features for neural network computation; the weight buffer unit stores the weight data for neural network computation; the bias buffer unit stores the bias data for neural network computation; the data alignment unit rearranges and masks the input features; the multiply-accumulate array, composed of multiply-accumulate units arranged in parallel, receives the aligned input features and weight data to perform the multiply-accumulate operations required by the convolutional neural network; the post-processing unit performs operations such as accumulation, pooling, biasing, and activation of the output features, and stores the results generated by the multiply-accumulate array; and the output buffer unit stores the processed output features.
[0081] The neural network computing acceleration module architecture of this invention is simpler than that of a general architecture, supporting only functions such as convolution, transposed convolution, activation functions, and vector merging. While meeting the operator requirements of Pillar-based 3D object detection algorithms, it can reduce the consumption of hardware resources and the static power consumption caused by redundant circuits.
[0082] As a further optional implementation, the control instructions include data loading control instructions, execution calculation control instructions, and data write-back control instructions, wherein:
[0083] When a data loading control command is received, the cache read / write control circuit reads the raw point cloud data, input features, weight data, and bias data from the off-chip memory, stores the raw point cloud data in the cache of the columnar feature extraction acceleration module, and stores the input features, weight data, and bias data in the cache of the neural network computing acceleration module.
[0084] When a calculation control instruction is received, the cache read / write control circuit controls the bar feature extraction acceleration module and the neural network calculation acceleration module to read data from the cache and perform bar feature extraction and neural network calculation to obtain output features and store them in the cache of the neural network calculation acceleration module.
[0085] When a data write-back control command is received, the cache read / write control circuit writes the data in the caches of the bar feature extraction acceleration module and the neural network calculation acceleration module into the off-chip memory.
[0086] Specifically, control commands can be further subdivided into three types: data loading, computation execution, and data write-back. When the received command is a data loading command, the control circuit reads the required computation data, such as the original point cloud, input features, weights, and biases, from a specified address in off-chip memory. This data is temporarily stored in the point cloud cache, input cache, weight cache, and feature cache. When the received command is a computation execution command, the control circuit reads data from the cache into the processing module and stores the obtained results in the output cache. When the received command is a data write-back command, the control circuit writes the data in the cache into off-chip memory.
[0087] As an optional implementation, the cache read / write control circuit may asynchronously execute at least two of the following: data loading control instructions, execution calculation control instructions, and data write-back control instructions.
[0088] Specifically, each cache unit supports ping-pong operations, where data loading, computation, and data write-back can be performed asynchronously. This allows the time for data loading and write-back to overlap with the time for computation, thereby further reducing the total time consumed.
[0089] As an optional implementation, the processor is also configured to configure the input feature address, weight address, bias address, input feature dimension, and output feature dimension for each convolutional computation.
[0090] Specifically, before each layer of computation, the processor needs to provide the input feature address, weight address, bias address, input feature dimension, and output feature dimension for each convolutional layer and write them into the accelerator's configuration register. Meanwhile, the point cloud data, weight data, and bias data have already been pre-quantized.
[0091] Reference Figure 2 This invention provides a method for implementing a three-dimensional target detection neural network accelerator, which is used to implement the three-dimensional target detection neural network accelerator, and includes the following steps:
[0092] S101. The column feature extraction acceleration module reads the original point cloud data from the off-chip memory, calculates the column index and the number of points stored in each non-empty column based on the three-dimensional coordinates of the original point cloud data, and then calculates the group region address to be written back to the off-chip memory based on the column index and the number of points, and writes the original point cloud data back to the corresponding group region address after grouping it.
[0093] S102. Read the grouped point cloud data from the group region address through the column feature extraction acceleration module, calculate the centroid coordinates of the grouped point cloud data, and then calculate the three-dimensional offset of the grouped point cloud data relative to the centroid coordinates, as well as the two-dimensional offset relative to the center point of the column.
[0094] S103. The columnar feature extraction acceleration module generates multi-dimensional features by splicing together the three-dimensional offset, two-dimensional offset, and grouped point cloud data, and sends the multi-dimensional features to the neural network computing acceleration module.
[0095] S104. Perform matrix multiplication calculations on multidimensional features through the neural network calculation acceleration module to obtain columnar features, and write the columnar features back to off-chip memory;
[0096] S105. The cylinder features are aligned using the neural network computation acceleration module. The aligned cylinder features and the weight data obtained from off-chip memory are multiplied and added to obtain fused feature data. The fused feature data is then accumulated, pooled, biased, and activated to obtain output features. The output features are then written back to off-chip memory.
[0097] As an optional implementation method, the method further includes the following steps:
[0098] S100: The processor stores the raw point cloud data in off-chip memory and configures the input feature address, weight address, bias address, input feature dimension, and output feature dimension for each convolutional layer.
[0099] Specifically, based on the aforementioned 3D target detection neural network accelerator, this invention can achieve customized acceleration of Pillar-based 3D target detection algorithms. The specific workflow is as follows:
[0100] 1) The processor first stores the obtained 3D point cloud data of the LiDAR in off-chip memory, and then configures the starting address and number of point clouds to be stored in the accelerator via the bus. After the configuration is completed, it issues a command to start the calculation.
[0101] 2) After receiving the instruction, the accelerator first performs a grouping operation on the point cloud. The accelerator actively reads the point cloud data sequentially from off-chip memory, stores it in the point cloud cache, calculates the index of the stored cylinder based on the three-dimensional coordinates of the point cloud, and counts the number of point clouds stored in each non-empty cylinder.
[0102] 3) Based on the column index and the number of points, the address to be written back to off-chip memory can be calculated, and then the data in the point cloud cache can be written back to the corresponding group area.
[0103] 4) After completing all point cloud grouping, cylinder features need to be extracted. The accelerator reads point cloud data from the group address, calculates the centroid of the point cloud in each cylinder, and then further calculates the offset of the point cloud relative to the centroid and the offset relative to the center coordinate of the cylinder, for a total of 5-dimensional features, which are stored in the feature cache.
[0104] 5) The feature cache and point cloud cache are concatenated into a 9-dimensional feature and sent to the neural network computing acceleration core module for matrix multiplication to obtain the features of this column, and then written back to off-chip memory.
[0105] 6) After completing the feature extraction of all pillars, the backbone network can be calculated. The backbone network of Pillar's 3D object detection algorithm is composed of operators such as 2D convolution, 2D transposed convolution, activation function, and vector merging. Therefore, the core can be accelerated to complete the calculation layer by layer through neural networks.
[0106] 7) After completing the calculation of the 3D target detection head, the accelerator returns a completion signal to the processor. The processor reads back the bounding box features, orientation features, and category features from the specified address in the off-chip memory. Then, the processor implements the post-processing algorithm through the software program on the processor, including 3D detection box extraction and 3D non-maximum suppression algorithm, to obtain the final 3D target detection result.
[0107] The above describes the 3D target detection neural network accelerator and implementation method according to embodiments of the present invention. It can be understood that the present invention provides a reasonable and efficient 3D target detection neural network accelerator by customizing the acceleration of column feature extraction and convolutional neural network calculation. It can be applied to Pillar-based 3D target detection algorithms, and can simultaneously accelerate the process of column feature extraction and neural network calculation, with good acceleration effect and improved computational efficiency of 3D target detection neural networks.
[0108] It should be recognized that embodiments of the present invention can be implemented or carried out by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer-readable storage medium. The methods described above can be implemented using standard programming techniques—including implementation in a computer program on a non-transitory computer-readable storage medium configured to allow the computer to operate in a specific and predefined manner—according to the methods and drawings described in the specific embodiments. Each program can be implemented in a high-level procedural or object-oriented programming language to communicate with the computer system. However, if desired, the program can be implemented in assembly or machine language. In any case, the language can be a compiled or interpreted language. Furthermore, for this purpose, the program can run on a programmed application-specific integrated circuit (ASIC).
[0109] Furthermore, the procedures described herein may be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by the context. The procedures described herein (or variations and / or combinations thereof) may be executed under the control of one or more computer systems configured with executable instructions, and may be implemented by hardware or a combination thereof as code (e.g., executable instructions, one or more computer programs, or one or more applications) that commonly executes on one or more processors. The aforementioned computer programs include a plurality of instructions executable by one or more processors.
[0110] Furthermore, the above methods can be implemented in any suitable type of computing platform, including but not limited to personal computers, minicomputers, mainframes, workstations, networked or distributed computing environments, standalone or integrated computer platforms, or in communication with charged particle tools or other imaging devices, etc. Aspects of the invention can be implemented as machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optical read and / or write storage medium, RAM, ROM, etc., such that it is readable by a programmable computer, and when the storage medium or device is read by the computer, it can be used to configure and operate the computer to perform the processes described herein. Furthermore, the machine-readable code, or portions thereof, can be transmitted via wired or wireless networks. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media comprises instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. When programmed according to the methods and techniques described in the invention, the invention also includes the computer itself.
[0111] A computer program can be applied to input data to perform the functions described herein, thereby transforming the input data to generate output data stored in non-volatile memory. The output information can also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including a specific visual depiction of physical and tangible objects generated on the display.
[0112] The above description is merely a preferred embodiment of the present invention. The present invention is not limited to the above-described embodiments. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention, as long as they achieve the technical effects of the present invention by the same means, should be included within the scope of protection of the present invention. Within the scope of protection of the present invention, the technical solutions and / or implementation methods can have various modifications and variations.
Claims
1. A three-dimensional target detection neural network accelerator, characterized in that, The system includes a processor, off-chip memory, a bus, a cache read / write control circuit, a histogram feature extraction acceleration module, and a neural network computation acceleration module. The processor, off-chip memory, and cache read / write control circuit are communicatively connected via the bus. The output of the histogram feature extraction acceleration module is connected to the input of the neural network computation acceleration module. Both the histogram feature extraction acceleration module and the neural network computation acceleration module are communicatively connected to the cache read / write control circuit. The processor is used to control the data reading, writing, and calculation processes; The off-chip memory is used to store the raw point cloud data and intermediate feature data during the calculation process; The bus is used to bridge the processor, the off-chip memory, and the cache read / write control circuit. The cache read / write control circuit is used to execute corresponding data read / write operations according to the control instructions of the processor; The columnar feature extraction acceleration module is used for: The raw point cloud data is read from the off-chip memory. The cylinder index and the number of points stored in each non-empty cylinder are calculated based on the three-dimensional coordinates of the raw point cloud data. Then, the group region address to be written back to the off-chip memory is calculated based on the cylinder index and the number of points. The raw point cloud data is grouped and written back to the corresponding group region address. Read the grouped point cloud data from the grouped region address, calculate the centroid coordinates of the grouped point cloud data, and then calculate the three-dimensional offset of the grouped point cloud data relative to the centroid coordinates, and the two-dimensional offset relative to the center point of the cylinder. Multi-dimensional features are generated by stitching together the three-dimensional offset, the two-dimensional offset, and the grouped point cloud data, and the multi-dimensional features are sent to the neural network computing acceleration module. The neural network computation acceleration module is used for: Matrix multiplication is performed on the multidimensional features to obtain cylindrical features, and the cylindrical features are written back to the off-chip memory; The column features are aligned, and the aligned column features are multiplied and added with the weight data obtained from the off-chip memory to obtain fused feature data. Then, the fused feature data is accumulated, pooled, biased and activated to obtain output features, and the output features are written back to the off-chip memory.
2. The three-dimensional target detection neural network accelerator according to claim 1, characterized in that: The columnar feature extraction acceleration module includes a point cloud cache, a feature cache, a column indexing unit, a point count unit, a point cloud center point calculation unit, and a point cloud offset calculation unit.
3. The three-dimensional target detection neural network accelerator according to claim 2, characterized in that: The point cloud cache is used to store point cloud data; The feature cache is used to store point cloud offset data of multiple dimensions obtained through calculation; The column indexing unit is used to calculate the index value of the corresponding column based on the three-dimensional coordinates of the point cloud data, and to determine the storage address of the point cloud data in the off-chip memory based on the index value; The point counting unit is used to count the number of point clouds stored in each column; The point cloud center point calculation unit is used to calculate the centroid coordinates of the point cloud data stored in the cylinder; The point cloud offset calculation unit is used to calculate the three-dimensional offset of the point cloud data stored in the cylinder relative to the coordinates of the centroid point, and the two-dimensional offset relative to the center point of the cylinder.
4. The three-dimensional target detection neural network accelerator according to claim 1, characterized in that: The neural network computation acceleration module includes an input cache unit, a weight cache unit, a bias cache unit, a data alignment unit, a multiply-accumulate calculation array, a post-processing unit, and an output cache unit.
5. A three-dimensional target detection neural network accelerator according to claim 4, characterized in that: The input buffer unit is used to store the input features for neural network computation; The weight cache unit is used to store the weight data calculated by the neural network; The bias cache unit is used to store the bias data calculated by the neural network; The data alignment unit is used to rearrange and mask the input features; The multiply-accumulate array is used to receive the aligned input features and the weight data, and to perform multiply-accumulate calculations. The post-processing unit is used to perform accumulation, pooling, biasing, and activation operations on the feature data after multiplication and addition to obtain the output features. The output buffer unit is used to store the output features.
6. A three-dimensional target detection neural network accelerator according to claim 1, characterized in that, The control commands include data loading control commands, execution calculation control commands, and data write-back control commands, wherein: When the data loading control instruction is received, the cache read / write control circuit reads the original point cloud data, input features, weight data, and bias data from the off-chip memory, stores the original point cloud data in the cache of the columnar feature extraction acceleration module, and stores the input features, weight data, and bias data in the cache of the neural network calculation acceleration module. When the execution calculation control instruction is received, the cache read / write control circuit controls the bar feature extraction acceleration module and the neural network calculation acceleration module to read data from the cache and perform bar feature extraction and neural network calculation to obtain output features and store them in the cache of the neural network calculation acceleration module; When the data write-back control command is received, the cache read-write control circuit writes the data in the cache of the columnar feature extraction acceleration module and the neural network calculation acceleration module into the off-chip memory.
7. A three-dimensional target detection neural network accelerator according to claim 6, characterized in that: The cache read / write control circuit can asynchronously execute at least two of the data loading control instruction, the execution calculation control instruction, and the data write-back control instruction.
8. A three-dimensional target detection neural network accelerator according to any one of claims 1 to 7, characterized in that: The processor is also used to configure the input feature address, weight address, bias address, input feature dimension, and output feature dimension for each convolutional layer computation.
9. A method for implementing a three-dimensional target detection neural network accelerator, used to implement it by any one of the three-dimensional target detection neural network accelerators as described in claims 1 to 8, characterized in that, Includes the following steps: The column feature extraction acceleration module reads the raw point cloud data from the off-chip memory, calculates the column index and the number of points stored in each non-empty column based on the three-dimensional coordinates of the raw point cloud data, and then calculates the group region address to be written back to the off-chip memory based on the column index and the number of points. The raw point cloud data is then grouped and written back to the corresponding group region address. The column feature extraction acceleration module reads grouped point cloud data from the grouped region address, calculates the centroid coordinates of the grouped point cloud data, and then calculates the three-dimensional offset of the grouped point cloud data relative to the centroid coordinates, as well as the two-dimensional offset relative to the center point of the column. The columnar feature extraction acceleration module generates multi-dimensional features by splicing the three-dimensional offset, the two-dimensional offset, and the grouped point cloud data, and sends the multi-dimensional features to the neural network computing acceleration module. The neural network computing acceleration module performs matrix multiplication on the multidimensional features to obtain cylindrical features, and then writes the cylindrical features back to the off-chip memory. The neural network computation acceleration module aligns the column features, and then performs multiplication and addition operations on the aligned column features and weight data obtained from the off-chip memory to obtain fused feature data. The fused feature data is then accumulated, pooled, biased, and activated to obtain output features, which are then written back to the off-chip memory.
10. The method for implementing a three-dimensional target detection neural network accelerator according to claim 9, characterized in that, The implementation method further includes the following steps: The processor stores the raw point cloud data in the off-chip memory and configures the input feature address, weight address, bias address, input feature dimension, and output feature dimension for each convolutional layer.