Target identification method, device, apparatus and storage medium
By performing block processing and sub-manifold convolution operations on 3D point cloud data, the problem of excessive memory consumption in traditional 3D point cloud processing solutions is solved, thereby improving target recognition efficiency and NPU processing performance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- YUAN LI (BEI JING) BAN DAO TI JI SHU YOU XIAN GONG SI
- Filing Date
- 2026-02-09
- Publication Date
- 2026-06-19
AI Technical Summary
Traditional 3D point cloud processing solutions consume a lot of memory, resulting in low real-time processing capabilities, computational efficiency, and resource utilization of the NPU, which affects target recognition efficiency.
By segmenting the 3D point cloud data into blocks, height feature blocks are generated. The submanifold convolution rule manual is used to determine the target point cloud features, generate a feature matrix, and finally identify the target to be identified.
It effectively reduces the memory footprint of 3D point cloud data, improves feature extraction speed and target recognition efficiency, and ensures the real-time processing performance of the NPU.
Smart Images

Figure CN122244573A_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the field of artificial intelligence, and specifically relates to a target recognition method and apparatus, an electronic device, and a storage medium. Background Technology
[0002] 3D (Three Dimensions) point clouds, as a key form of 3D data, are being deeply integrated into cutting-edge fields such as autonomous driving and medicine for various target recognition applications. In scenarios that demand real-time performance and low power consumption in intelligent driving and medical image processing, NPU (Neural Processing Unit) has become an indispensable dedicated computing engine due to its ultra-high energy efficiency.
[0003] However, due to the massive amount of 3D point cloud data, traditional 3D point cloud processing solutions require a large amount of memory, which can easily introduce additional data transmission latency and bandwidth consumption in practical applications. This restricts the real-time processing capability, computing efficiency and resource utilization of the NPU, resulting in poor target recognition efficiency. Summary of the Invention
[0004] The purpose of this application is to provide a target recognition method, apparatus, device, and storage medium that can solve or at least partially solve the problem of poor target recognition efficiency caused by the large memory consumption of 3D point cloud data during computation, making it difficult for the NPU to process it in real time.
[0005] To solve the above-mentioned technical problems, this application is implemented as follows: In a first aspect, embodiments of this application provide a target recognition method, the method comprising: Acquire 3D point cloud data; Generate corresponding height feature blocks based on the three-dimensional point cloud data; Obtain the submanifold convolution rule manual corresponding to the 3D point cloud data; the submanifold convolution rule manual is used to determine the input point index and convolution kernel position index corresponding to the 3D point cloud data in the submanifold convolution operation; The target point cloud features corresponding to the height feature block are determined according to the submanifold convolution rule manual. Generate a feature matrix corresponding to the height feature block based on the target point cloud features; The target features corresponding to the 3D point cloud data are determined based on the feature matrix. The target to be identified is determined based on the target features, and the target identification result is obtained.
[0006] Optionally, generating corresponding height feature blocks based on the 3D point cloud data includes: Determine the coordinates of each data point in the three-dimensional point cloud data; A three-dimensional point cloud input feature is generated based on the point coordinates; the three-dimensional point cloud input feature is obtained by mapping the point coordinates to a one-dimensional space, and includes height input feature, depth input feature and width input feature. The height input feature is divided into blocks according to a preset number of blocks to generate height feature blocks of the 3D point cloud data.
[0007] Optionally, determining the target point cloud features corresponding to the height feature block according to the submanifold convolution rule manual includes: The target rule information corresponding to the height feature block is determined from the submanifold convolution rule manual; the target rule information includes the target input index sequence corresponding to the target input point; Obtain the 3D point cloud input features of the 3D point cloud data; The target point cloud features corresponding to the height feature block are determined based on the target input index sequence and the 3D point cloud input features.
[0008] Optionally, determining the target point cloud features corresponding to the height feature block based on the target input index sequence and the 3D point cloud input features includes: Obtain the input parameters corresponding to the 3D point cloud data, the input parameters including the number of input channels and the data bit width; The starting address of the target input point corresponding to the height feature block is determined based on the target input index sequence and the input parameters; The number of target input points is determined based on the target input index sequence; The feature loading amount corresponding to the target input point is determined based on the input parameters and the number of target input points; The target point cloud feature corresponding to the height feature block is determined from the three-dimensional point cloud input features based on the starting address of the target input point and the feature loading amount.
[0009] Optionally, generating the feature matrix corresponding to the height feature block based on the target point cloud features includes: An initial matrix is created based on the target point cloud features; Determine the target convolution kernel position index sequence corresponding to the height feature block from the submanifold convolution rule manual; The feature matrix corresponding to the height feature block is generated based on the target convolution kernel position index sequence and the initial matrix.
[0010] Optionally, generating the feature matrix corresponding to the height feature block based on the target convolutional kernel position index sequence and the initial matrix includes: The target convolutional kernel position index sequence is determined as the target index corresponding to the height feature block; The source index corresponding to the height feature block is determined according to the submanifold convolution rule manual; Extract target data points from the target point cloud features based on the source index; The target data points are filled into the initial matrix according to the target index corresponding to the source index to generate the feature matrix corresponding to the height feature block.
[0011] Optionally, determining the target features corresponding to the 3D point cloud data based on the feature matrix includes: Receive convolution parameters, which include convolution kernel weights and convolution kernel offsets; The target feature matrix is obtained by performing matrix multiplication on the feature matrix according to the convolution parameters; The target data point address corresponding to the 3D point cloud data is determined based on the target feature matrix; The target features corresponding to the three-dimensional point cloud data are determined based on the target data point addresses.
[0012] Secondly, embodiments of this application provide a target recognition device, the device comprising: The 3D point cloud data acquisition module is used to acquire 3D point cloud data. The height feature block generation module is used to generate corresponding height feature blocks based on the three-dimensional point cloud data. The submanifold convolution rule manual acquisition module is used to acquire the submanifold convolution rule manual corresponding to the 3D point cloud data; the submanifold convolution rule manual is used to determine the input point index and convolution kernel position index corresponding to the 3D point cloud data in the submanifold convolution operation; The target point cloud feature determination module is used to determine the target point cloud features corresponding to the height feature block according to the submanifold convolution rule manual. The matrix generation module is used to generate a feature matrix corresponding to the height feature block based on the target point cloud features; The target feature determination module is used to determine the target features corresponding to the three-dimensional point cloud data based on the feature matrix. The identification module is used to identify the target to be identified based on the target features and obtain the target identification result.
[0013] Thirdly, embodiments of this application provide an electronic device including a processor, a memory, and a program or instructions stored in the memory and executable on the processor, wherein the program or instructions, when executed by the processor, implement the steps of the method described in the first aspect.
[0014] Fourthly, embodiments of this application provide a readable storage medium on which a program or instructions are stored, which, when executed by a processor, implement the steps of the method described in the first aspect.
[0015] Fifthly, embodiments of this application provide a chip, the chip including a processor and a communication interface, the communication interface being coupled to the processor, the processor being used to run programs or instructions to implement the method as described in the first aspect.
[0016] The target recognition method provided in this application embodiment can be achieved by: acquiring 3D point cloud data; generating corresponding height feature blocks based on the 3D point cloud data; acquiring a submanifold convolution rule manual corresponding to the 3D point cloud data; wherein the submanifold convolution rule manual is used to determine the input point index and convolution kernel position index of the 3D point cloud data in the submanifold convolution operation; then determining the target point cloud features corresponding to the height feature blocks according to the submanifold convolution rule manual; generating a feature matrix corresponding to the height feature blocks based on the target point cloud features; subsequently determining the target features corresponding to the 3D point cloud data based on the feature matrix; and finally recognizing the target to be recognized based on the target features to obtain the target recognition result. In this application embodiment, by dividing the 3D point cloud data into blocks in the height dimension to obtain corresponding height feature blocks, and then performing submanifold convolution operations on each height feature block according to the submanifold convolution rule manual to determine the target features of the 3D point cloud data, the memory usage when processing 3D point cloud data can be effectively reduced, the feature extraction speed of 3D point cloud data can be improved, and thus the corresponding target recognition efficiency can be improved. Attached Figure Description
[0017] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0018] Figure 1 This is a schematic diagram of a traditional submanifold convolution rule manual structure in related technologies; Figure 2 This is a flowchart illustrating the steps of an embodiment of the target recognition method of this application; Figure 3 This is a schematic diagram of the rule manual structure of an embodiment of the target recognition method of this application; Figure 4 This is a flowchart of 3D point cloud data feature extraction according to an embodiment of the target recognition method of this application; Figure 5 This is a structural block diagram of an embodiment of a target recognition device according to this application. Detailed Implementation
[0019] To make the above-mentioned objectives, features, and advantages of this application more apparent and understandable, the technical solutions in the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0020] The terms "first," "second," etc., used in the specification and claims of this application are used to distinguish similar objects and not to describe a specific order or sequence. It should be understood that such use of data can be interchanged where appropriate so that embodiments of this application can be implemented in orders other than those illustrated or described herein. Furthermore, in the specification and claims, "and / or" indicates at least one of the connected objects, and the character " / " generally indicates that the preceding and following objects are in an "or" relationship.
[0021] The target recognition method provided in this application will be described in detail below with reference to the accompanying drawings, through specific embodiments and application scenarios.
[0022] 3D point clouds, as a key form of 3D data, are being deeply integrated into cutting-edge fields such as autonomous driving and medicine for various target recognition applications. In traditional 3D point cloud processing, to avoid the explosive growth of non-zero locations as the network depth increases, submanifold convolution is typically used to process 3D point cloud data to control computational and memory overhead.
[0023] Submanifold convolution is a special type of sparse convolution that preserves the sparsity of the data during the convolution process. That is, the sparse pattern of the output after convolution remains consistent with the sparse pattern of the input before convolution, and the number of non-zero positions is not increased due to the convolution operation. In sparse convolution, a rulebook is typically used to record the connections between the left matrix generated from the input feature data, the right matrix generated from the input convolution kernel weights, and the output feature data (the result of the matrix multiplication).
[0024] Reference Figure 1This is a schematic diagram of a traditional submanifold convolution rulebook structure in related technologies. A traditional submanifold convolution rulebook can include data such as kernel offset, input index, and output index. The kernel offset represents the relative position of an element within the kernel with respect to the kernel center; the input index represents a unique identifier corresponding to an activation point in the input feature map, used to find the input feature value corresponding to the activation point; and the output index represents a unique identifier for each output point in the output feature map, used to indicate the position of the output feature map corresponding to the convolution operation result of the input feature values. A single kernel offset may correspond to multiple activation points, thus allowing for multiple input and output indices.
[0025] Traditional rule manuals first create a table based on the kernel size to determine possible kernel offsets, such as 0, 1, ..., x. Then, they record the input index and the corresponding output index for each kernel offset. For example, the input index corresponding to offset 0 could be m00, m01, ..., m0 n0 The output index corresponding to offset 0 can be p00, p01, ..., p0 n0 n0 represents the number of input and output indices corresponding to offset 0. For example, a 3×3 convolutional kernel has 9 possible offsets, so a table can be created with convolutional kernel offsets of 0, 1, ..., 8, and the input index and the corresponding output index can be recorded for each convolutional kernel offset.
[0026] Adopting such Figure 1 The traditional rulebook shown processes 3D point cloud data by including the following steps: extracting the weights, input index, and output index of the convolution kernel offset i; extracting the input data corresponding to the convolution kernel offset i based on the input index; performing matrix multiplication on the weights of the convolution kernel offset i and the corresponding input data; hashing and accumulating the result of the matrix multiplication to the corresponding output feature position based on the corresponding output index; repeating the above steps until all convolution kernel offsets have been traversed to obtain the calculation result of the submanifold convolution.
[0027] In scenarios that prioritize real-time performance and low power consumption, such as intelligent driving and medical image processing, NPUs are typically used as dedicated data processing engines. However, the aforementioned 3D point cloud data processing requires a large amount of data computation to be performed simultaneously, consuming a significant amount of memory. The NPU has limited memory space and cannot store large amounts of data at the same time, leading to additional data transmission latency and bandwidth consumption. This affects the NPU's real-time processing capabilities, computational efficiency, and resource utilization, resulting in poor target recognition efficiency.
[0028] To address the aforementioned issues, this application provides a target recognition method applicable to an NPU or an electronic device equipped with an NPU. This method can reduce the memory usage of 3D point cloud data during computation, thereby improving target recognition efficiency.
[0029] Reference Figure 2 This is a flowchart illustrating the steps of an embodiment of a target recognition method according to this application. The method includes the following steps: Step 201: Obtain 3D point cloud data; Three-dimensional point cloud data refers to a data set composed of a large number of discrete data points in three-dimensional space. This data set includes multiple data points and can reflect various features such as location and color information. Three-dimensional point cloud data can be obtained by scanning with a three-dimensional scanning device. This three-dimensional scanning device can be a sensor such as a lidar mounted on a vehicle, or various medical scanning devices with three-dimensional imaging capabilities; this application does not impose specific limitations on this.
[0030] Step 202: Generate corresponding height feature blocks based on the three-dimensional point cloud data; Among them, the height feature block can correspond to the block of the height dimension of the 3D point cloud data in the output dense space.
[0031] Step 203: Obtain the submanifold convolution rule manual corresponding to the 3D point cloud data; the submanifold convolution rule manual is used to determine the input point index and convolution kernel position index corresponding to the 3D point cloud data in the submanifold convolution operation; The submanifold convolution rule manual can be pre-defined by the user, including a lookup table that maps the features of the input 3D point cloud data to the features of the output 3D point cloud data. (See reference...) Figure 3 This is a schematic diagram of the rulebook structure of an embodiment of the target recognition method of this application. The submanifold convolution rulebook may include an output point count lookup table (Table 0), an input point count lookup table (Table 1), an input point index lookup table (Table 2), and a lookup table of the convolution kernel index position corresponding to the input points (Table 3). In the submanifold convolution rulebook, each internal element is stored tightly, with no gaps between elements, and each lookup table is stored separately. The output point count lookup table stores the number of coordinate points output for each block. The input point count lookup table stores the number of coordinate points input for each block. The input point index lookup table stores the input order position of the coordinate points in the 3D point cloud data for each block. The convolution kernel index position lookup table stores the index position of the convolution kernel weight corresponding to the feature data of the input index of each block in the multiplication-addition calculation, which can be used to describe the output index and represent the connection relationship between the input index and the convolution kernel weight. M is the preset number of blocks.
[0032] Among them, the input point index corresponding to the 3D point cloud data can reflect the input order and position information of each data point in the input 3D point cloud feature, corresponding to... Figure 3 The input point index lookup table is shown in the submanifold convolution rule manual. The kernel position index is an address index with kernel position information. The kernel position index corresponding to the 3D point cloud data can reflect the neighborhood points required for each data point in the 3D point cloud data to perform the convolution operation. Figure 3 The table showing the kernel index location lookup table in the submanifold convolution rule manual is shown.
[0033] Step 204: Determine the target point cloud features corresponding to the height feature block according to the submanifold convolution rule manual; Among them, the target point cloud features are the input features of the 3D point cloud data corresponding to the height feature blocks.
[0034] Step 205: Generate the feature matrix corresponding to the height feature block based on the target point cloud features; Step 206: Determine the target features corresponding to the 3D point cloud data based on the feature matrix; Among them, target features are three-dimensional features extracted by performing submanifold convolution on three-dimensional point cloud data, which can reflect the geometric features and semantic information of each target to be identified; Step 207: Identify the target to be identified based on the target features to obtain the target identification result.
[0035] The target features may include features corresponding to one or more targets to be identified, and the target identification result may include information such as the type, location, and confidence level of the target to be identified.
[0036] In this embodiment, the following steps can be taken: acquiring 3D point cloud data; generating corresponding height feature blocks based on the 3D point cloud data; acquiring a submanifold convolution rule manual corresponding to the 3D point cloud data; wherein the submanifold convolution rule manual is used to determine the input point index and convolution kernel position index of the 3D point cloud data in the submanifold convolution operation; then determining the target point cloud features corresponding to the height feature blocks according to the submanifold convolution rule manual; generating a feature matrix corresponding to the height feature blocks according to the target point cloud features; subsequently determining the target features corresponding to the 3D point cloud data according to the feature matrix; and finally identifying the target to be identified based on the target features to obtain the target recognition result. Through the above implementation process, the 3D point cloud data can be segmented in the height dimension to obtain corresponding height feature blocks. Then, submanifold convolution operations are performed on each height feature block according to the submanifold convolution rule manual to determine the target features of the 3D point cloud data. This can effectively reduce the memory usage when processing 3D point cloud data, improve the feature extraction speed of 3D point cloud data, and thus improve the corresponding target recognition efficiency.
[0037] In some embodiments of this application, generating corresponding height feature blocks based on the 3D point cloud data includes: Determine the coordinates of each data point in the three-dimensional point cloud data; A three-dimensional point cloud input feature is generated based on the point coordinates; the three-dimensional point cloud input feature is obtained by mapping the point coordinates to a one-dimensional space, and includes height input feature, depth input feature and width input feature. The height input feature is divided into blocks according to a preset number of blocks to generate height feature blocks of the 3D point cloud data.
[0038] In this embodiment, the 3D point cloud data sent by the 3D scanning device is an unordered set of points, which may contain multiple 3D data points. The coordinates of each data point in the 3D point cloud data can be determined, and each point coordinate includes coordinate information in three dimensions: height, depth, and width.
[0039] The coordinates of each data point can be tiled (mapped) to a one-dimensional space according to the HDW (Height-Depth-Width) dimension order to obtain the corresponding one-dimensional linear index. Then, the data points are sorted according to the one-dimensional linear index to generate the 3D point cloud input features. The 3D point cloud input features can include features in three dimensions: height, depth, and width. For example, 3D point cloud data can include point A(h, d, w), where h is the coordinate value of point A in the height dimension, d is the coordinate value of point A in the depth dimension, and w is the coordinate value of point A in the width dimension. The spatial range corresponding to the 3D point cloud data is represented by the maximum height H_max, maximum depth D_max, and maximum width W_max. The one-dimensional linear index can be represented as linear_index = h × (D_max × W_max) + d × W_max + w. The corresponding data points can be sorted according to the size of the one-dimensional linear index to obtain the 3D point cloud input features. The height input feature is the feature value of the 3D point cloud input feature in the height dimension, the depth input feature is the feature value of the 3D point cloud input feature in the depth dimension, and the width input feature is the feature value of the 3D point cloud input feature in the width dimension.
[0040] After obtaining the 3D point cloud input features, the height output features corresponding to the height input features can be divided into blocks according to a preset number of blocks M, generating height feature blocks of the 3D point cloud data. The height feature blocks of the 3D point cloud data can be blocks in the height dimension of the output dense space after the 3D point cloud data has undergone sparse convolution processing. The number of blocks can be determined according to a pre-set submanifold convolution rule manual, and the specific value can be set according to actual needs; this application does not impose specific limitations on this. For example, based on the height input features of the obtained 3D point cloud input features, combined with the height dimension in the preset convolution kernel size and the preset convolution kernel padding parameters, the corresponding height output features can be determined. Then, the height output features are divided into M consecutive subsets, each subset corresponding to a height feature block. The number of feature points in each height feature block is determined according to the total number of feature points and the number of blocks in the 3D point cloud data.
[0041] Through the above implementation process, unordered 3D point cloud data can be transformed into ordered 3D point cloud features, enabling subsequent orderly data reading and improving data retrieval efficiency. Furthermore, the ordered 3D point cloud features can be used as input features for sub-manifold convolution, facilitating efficient sparse convolution processing and improving NPU processing performance. By dividing the 3D point cloud features into blocks to generate height feature blocks, subsequent sub-manifold convolution processing can be performed in blocks, reducing memory usage per computation, ensuring real-time NPU processing performance, and improving target recognition efficiency.
[0042] In some embodiments of this application, determining the target point cloud features corresponding to the height feature block according to the submanifold convolution rule manual includes: The target rule information corresponding to the height feature block is determined from the submanifold convolution rule manual; the target rule information includes the target input index sequence corresponding to the target input point; Obtain the 3D point cloud input features of the 3D point cloud data; The target point cloud features corresponding to the height feature block are determined based on the target input index sequence and the 3D point cloud input features.
[0043] The target rule information corresponding to the height feature block can be determined from the submanifold convolution rule manual. This target rule information includes the target input index sequence corresponding to the target input point, the number of input points corresponding to the target input point, the target convolution kernel position index sequence corresponding to the target input index sequence, and the number of output points corresponding to the height feature block. Here, the target input point is the data point of the 3D point cloud data corresponding to the current height feature block. For example, after obtaining the height feature block, the height dimension in the preset convolution kernel size and the preset convolution kernel padding parameters can be combined to determine each input feature block corresponding to the height output feature block in the height input features, and then the data points in the corresponding input feature block can be used as the target input points. Alternatively, the 3D point cloud input features of the 3D point cloud data can be obtained, and then the target point cloud features corresponding to the height feature block can be determined from the 3D point cloud input features according to the target input index sequence. Through the above implementation process, the input features corresponding to each block can be determined from the 3D point cloud input features by combining the height feature blocks with the pre-set rule information in the submanifold convolution rule manual. This decomposes the massive 3D point cloud input features into multiple features with smaller data volumes for processing. Each time a calculation is performed, the memory only needs to store the target point cloud features corresponding to the height feature blocks, avoiding the NPU data transmission delay and excessive resource consumption caused by storing all 3D point cloud input features at once, which would affect the target recognition efficiency.
[0044] In some embodiments of this application, determining the target point cloud features corresponding to the height feature block based on the target input index sequence and the 3D point cloud input features includes: Obtain the input parameters corresponding to the 3D point cloud data, the input parameters including the number of input channels and the data bit width; The starting address of the target input point corresponding to the height feature block is determined based on the target input index sequence and the input parameters; The number of target input points is determined based on the target input index sequence; The feature loading amount corresponding to the target input point is determined based on the input parameters and the number of target input points; The target point cloud feature corresponding to the height feature block is determined from the three-dimensional point cloud input features based on the starting address of the target input point and the feature loading amount.
[0045] In this embodiment, input parameters corresponding to 3D point cloud data can be obtained. These input parameters include the number of input channels and the data bit width. The number of input channels reflects the feature dimension corresponding to each data point in the 3D point cloud data. When only considering the coordinate features of the data points, the corresponding number of input channels is 3. Depending on the target detection requirements, different numbers of input channels can be set, and this application does not impose specific limitations on this. The data bit width reflects the number of bits used to store each feature value. It can typically be set to 16 bits or 32 bits, and can also be set according to the accuracy requirements of target recognition. This application does not impose specific limitations on this either.
[0046] The starting address of the target input point corresponding to the height feature block can be determined based on the target input index sequence and input parameters. For example, the minimum and maximum values of the target input index sequence can be determined. If the currently processed height feature block is M0, and its corresponding target input index sequence is m00, m01, ..., m05, then the maximum value (m00, m01, ..., m05) can be determined by the fixed-point maximum value extraction unit and the fixed-point minimum value extraction unit, respectively. max ) and minimum value (m0) min The order of the starting points of the current height feature block in the 3D point cloud input features can be determined by the minimum value of the target input index sequence; that is, the order of the first data point in the height feature block in the 3D point cloud input features. The number of input channels and the data bit width can determine the number of bits occupied by the input feature corresponding to each data point. Multiplying the order of the starting points of the height feature block by the number of bits occupied by each data point yields the offset of the starting point of the height feature block. Then, based on the starting position of the data corresponding to the 3D point cloud input features and the offset of the starting point of the height feature block, the starting address of the target input point corresponding to the height feature block can be determined.
[0047] The number of target input points required to process the current height feature block can be calculated by subtracting the minimum value from the maximum value in the target input index sequence and then adding 1. Then, the feature loading amount corresponding to each target input point can be determined based on the input parameters and the number of target input points. The number of bits occupied by the input feature corresponding to each data point can be determined based on the number of input channels and the data bit width. Multiplying the number of bits occupied by the input feature corresponding to each data point by the number of target input points determines the feature loading amount corresponding to each target input point. Based on the starting address of the target input point and the feature loading amount, the target point cloud feature corresponding to the height feature block can be determined from the 3D point cloud input features. The target point cloud feature is the point cloud feature obtained by loading the feature loading amount corresponding to the target input point from the starting address of the target input point forward; that is, the 3D point cloud input feature corresponding to the current height feature block.
[0048] Through the above implementation process, the massive 3D point cloud input features are decomposed into multiple features with smaller data volumes for processing. During each calculation, the memory only needs to obtain the target point cloud features corresponding to the height feature block being processed from the 3D point cloud input features. This avoids storing all 3D point cloud input features at once, which would lead to NPU data transmission latency and excessive resource consumption, affecting the target recognition efficiency.
[0049] In some embodiments of this application, generating the feature matrix corresponding to the height feature block based on the target point cloud features includes: An initial matrix is created based on the target point cloud features; Determine the target convolution kernel position index sequence corresponding to the height feature block from the submanifold convolution rule manual; The feature matrix corresponding to the height feature block is generated based on the target convolution kernel position index sequence and the initial matrix.
[0050] In this embodiment, an initial matrix can be created based on the target point cloud features. The number of rows in the initial matrix corresponds to the number of data points corresponding to the target point cloud features, and the number of columns can be determined based on a preset convolutional kernel size. For example, if the preset convolutional kernel size is 3×3×3, the corresponding number of columns in the initial matrix can be set to 27. The convolutional kernel size can also be set according to actual needs; this application does not impose specific limitations on this. The values in the initial matrix can be uniformly filled with 0. The target convolutional kernel position index sequence corresponding to the height feature block can be determined from the submanifold convolution rule manual. Based on the target convolutional kernel position index sequence, the target point cloud features corresponding to the height feature block can be hashed and distributed into the initial matrix, generating a feature matrix corresponding to the height feature block. Through the above implementation process, an initial matrix can be created and the features corresponding to the height feature block can be filled into the initial matrix to obtain the feature matrix, facilitating efficient subsequent computation.
[0051] In some embodiments of this application, generating the feature matrix corresponding to the height feature block based on the target convolutional kernel position index sequence and the initial matrix includes: The target convolutional kernel position index sequence is determined as the target index corresponding to the height feature block; The source index corresponding to the height feature block is determined according to the submanifold convolution rule manual; Extract target data points from the target point cloud features based on the source index; The target data points are filled into the initial matrix according to the target index corresponding to the source index to generate the feature matrix corresponding to the height feature block.
[0052] In this embodiment, the loaded target point cloud features can be stored in memory, and the target convolution kernel position index sequence can be used as the target index corresponding to the height feature block. The target input index sequence corresponding to the height feature block can be determined from the submanifold convolution rule manual, and the target input index sequence can be used as the source index corresponding to the height feature block. The spatial address of the target point cloud features can be used as the source data starting address, and the spatial address of the initial matrix can be used as the destination data starting address. After obtaining the source index and target index, the corresponding target data points can be extracted from the target point cloud features according to the source index and the source data starting address. Then, the target data points are filled into the initial matrix according to the target index corresponding to the source index and the destination data starting address, thereby generating the feature matrix corresponding to the height feature block. Through the above implementation process, the features corresponding to the height feature block can be filled into the initial matrix to obtain the feature matrix, which facilitates efficient subsequent computation.
[0053] In some embodiments of this application, determining the target features corresponding to the 3D point cloud data based on the feature matrix includes: Receive convolution parameters, which include convolution kernel weights and convolution kernel offsets; The target feature matrix is obtained by performing matrix multiplication on the feature matrix according to the convolution parameters; The target data point address corresponding to the 3D point cloud data is determined based on the target feature matrix; The target features corresponding to the three-dimensional point cloud data are determined based on the target data point addresses.
[0054] In this embodiment, user-inputted convolutional parameters can be received. These parameters may include kernel weights and kernel offsets. The kernel weights can be a parameter matrix in the convolutional layer used to extract local features of the data, enabling perception of local information through sparse interaction. The input shape of the kernel weights can be represented as (outchannels [number of output channels], in channels [number of input channels between the kernel and feature data], kerneldepth [kernel depth], kernel height [kernel height], kernel width [kernel width]). The kernel offset is a learnable spatial offset corresponding to each convolutional kernel. Both kernel weights and kernel offsets are learnable parameters.
[0055] The target feature matrix can be obtained by performing matrix multiplication on the feature matrix based on the convolution parameters. For example, an initial weight matrix can be created, where the number of rows corresponds to the number of output channels of the convolution kernel, and the number of columns can be determined based on the preset convolution kernel size. All values in the initial weight matrix are filled with 0. Then, each convolution kernel offset can be iterated over, and the convolution kernel weights corresponding to each output channel can be filled into the initial weight matrix to obtain the final weight matrix.
[0056] Using the feature matrix as the left matrix and the weight matrix as the right matrix in matrix multiplication, matrix multiplication can be expressed by the following formula:
[0057]
[0058] In this matrix, matrix a represents the target point cloud features, matrix b represents the convolutional kernel weights corresponding to the target point cloud features, matrix c represents the target feature matrix, m represents the number of data points corresponding to the target point cloud features, and n represents the number of output channels of the convolutional kernel. This can represent the feature of the i-th target point cloud at offset ki. This can represent the weight of the j-th output channel at offset ki. It can represent the output feature of the j-th output channel of the i-th target point cloud feature.
[0059] After obtaining the target feature matrix, the number of output points for the corresponding height feature blocks can be determined based on the target feature matrix. The address of the target data point corresponding to the 3D point cloud data can then be determined based on the number of output points for the height feature blocks. The address of the target data point can be expressed by the following formula:
[0060] Where dst_addr can represent the address of the target data point, M i The number of height feature blocks can be represented, and mi can represent the i-th height feature block. The number of target data points corresponding to the height feature block mi can be represented, out channels can be represented, and data_bytes can be represented, which can be represented, as well as the number of bytes occupied by each target data point.
[0061] After determining the target data point address, the target feature value in the target feature matrix can be written back to the pre-allocated target storage space in the upper layer according to the target data point address. After processing all height feature blocks, the target feature value stored in the target storage space is used as the target feature corresponding to the 3D point cloud data. Through the above implementation process, each height feature block can be processed separately, the corresponding output feature value of each height feature block can be determined, and the target feature value in the target feature matrix can be written back to the pre-allocated target storage space according to the target data point address. The target feature corresponding to the 3D point cloud data can be used for subsequent target recognition, which can reduce the memory consumption during a single calculation, improve the real-time processing capability, computational efficiency and resource utilization of the NPU, and thus improve the target recognition efficiency.
[0062] For example, the NPU in this application embodiment may include a fixed-point maximum value extraction unit, a fixed-point minimum value extraction unit, an indexing unit, a filling unit, and a matrix multiplication unit.
[0063] The fixed-point maximum value extraction unit can be used to determine the maximum value in a given set of data. The fixed-point minimum value extraction unit can be used to determine the minimum value in a given set of data.
[0064] The indexing unit can index and write data based on the source index, destination index, source data start address, and destination data start address. First, data is read based on the source data start address and the source index. Then, data is written based on the destination data start address and the destination index. The source index indicates the relative position of the data read, the destination index indicates the relative position of the data written, the source data start address indicates the starting address of the space containing the read data, and the destination data start address indicates the starting address of the destination space where the data is written.
[0065] The fill unit can be used to fill elements in an allocated area according to a preset fill value and type, and it has an enumeration fill function. Enumeration fill means that starting with the fill value, it enumerates and fills elements one by one in the fill area.
[0066] The matrix multiplication unit can perform matrix multiplication operations on two given matrices.
[0067] Reference Figure 4 This is a flowchart of 3D point cloud data feature extraction according to an embodiment of the target recognition method of this application.
[0068] The input features (i.e., 3D point cloud input features) can be processed by sparse convolution, and the H-dimensional (i.e., height output features) in the output dense space can be divided into M blocks to obtain height feature blocks. The value of M can be set according to a preset rule manual, and this application does not impose specific restrictions on it.
[0069] All data for the currently processed height feature block Mi can be obtained from the rule book (submanifold convolution rule manual) input by the user, including: the number of input points, the number of output points, the index of the input points (target input index), and the index of the input points at the convolution kernel position.
[0070] Then, the fixed-point maximum value extraction unit and the fixed-point minimum value extraction unit can be used to determine the maximum and minimum values in the target input index sequence corresponding to the indices of the input points of the height feature block Mi, respectively.
[0071] Then, the 3D point cloud input feature `src` can be obtained. Based on the maximum and minimum values in the target input index sequence, the data corresponding to the height feature block `Mi` is extracted from `src`, which is to say, the target point cloud features are extracted. For example, based on the minimum value in the target input index sequence and the number of input channels and data bit width corresponding to `src`, the starting address of the data corresponding to the height feature block `Mi` in `src` can be calculated. Then, the maximum value in the target input index sequence is subtracted from the minimum value, and 1 is added to calculate the number of target input points corresponding to the height feature block `Mi` that need to be loaded. Then, the data loading amount corresponding to the height feature block `Mi` is calculated based on the number of input channels and data bit width. Finally, the `src` data required for the height feature block `Mi` is loaded from the corresponding spatial address in `src` to obtain the target point cloud features.
[0072] The left matrix for matrix multiplication can be created using padding units. The initial padding value of the left matrix is 0, the number of rows is the number of output points of the height feature block Mi, and the number of columns is the kernel size. Then, based on the kernel position index corresponding to the target input index (target kernel position index sequence), the target point cloud feature hash of the height feature block Mi is distributed to the initial left matrix using indexing units to generate the left matrix for matrix multiplication operations.
[0073] Then, the kernel weights and biases input by the user can be obtained, and matrix multiplication is performed on the left matrix based on the weights and biases to obtain the result (target feature matrix).
[0074] After obtaining the calculation result corresponding to the current height feature block Mi, the destination address corresponding to the calculation result can be determined, and then the calculation result can be written to the destination address in the final destination storage space allocated by the upper layer.
[0075] After writing is complete, return to the step of obtaining the height feature block Mi from the rule book (submanifold convolution rule manual) entered by the user, obtain the next height feature block, and so on until the operation results corresponding to all height feature blocks have been written.
[0076] Through the above implementation process, a block-based method can be used to compute sub-manifold convolutions of 3D point cloud data, reducing the memory usage per computation. By using a block-based approach, memory only needs to store the required source (src) data within each block, solving the problem that traditional sub-manifold convolutions require storing all source (src) data at once, thus improving target recognition efficiency.
[0077] It should be noted that the target recognition method provided in this application embodiment can be executed by a target recognition device, or by a control module within the target recognition device for executing the method of loading target recognition. This application embodiment uses the execution of the method of loading target recognition by a target recognition device as an example to illustrate the target recognition method provided in this application embodiment.
[0078] Reference Figure 5 This is a structural block diagram of an embodiment of a target recognition device according to this application, which may specifically include the following modules: The 3D point cloud data acquisition module 501 is used to acquire 3D point cloud data; The height feature block generation module 502 is used to generate corresponding height feature blocks based on the three-dimensional point cloud data. The submanifold convolution rule manual acquisition module 503 is used to acquire the submanifold convolution rule manual corresponding to the 3D point cloud data; the submanifold convolution rule manual is used to determine the input point index and convolution kernel position index corresponding to the 3D point cloud data in the submanifold convolution operation; The target point cloud feature determination module 504 is used to determine the target point cloud features corresponding to the height feature block according to the submanifold convolution rule manual. The matrix generation module 505 is used to generate a feature matrix corresponding to the height feature block based on the target point cloud features. The target feature determination module 506 is used to determine the target features corresponding to the three-dimensional point cloud data based on the feature matrix. The identification module 507 is used to identify the target to be identified based on the target features and obtain the target identification result.
[0079] The height feature block generation module 502 includes: The point coordinate determination submodule is used to determine the point coordinates of each data point in the three-dimensional point cloud data; A 3D point cloud input feature generation submodule is used to generate 3D point cloud input features based on the point coordinates; the 3D point cloud input features are obtained by mapping the point coordinates to a one-dimensional space, including height input features, depth input features, and width input features. The block segmentation submodule is used to segment the height input features into blocks according to a preset number of blocks, thereby generating height feature blocks of the 3D point cloud data.
[0080] The target point cloud feature determination module 504 includes: The target rule information determination submodule is used to determine the target rule information corresponding to the height feature block from the sub-manifold convolution rule manual; the target rule information includes the target input index sequence corresponding to the target input point; A 3D point cloud input feature acquisition submodule is used to acquire the 3D point cloud input features of the 3D point cloud data; The target point cloud feature determination submodule is used to determine the target point cloud features corresponding to the height feature block based on the target input index sequence and the three-dimensional point cloud input features.
[0081] The target point cloud feature determination submodule includes: An input parameter acquisition unit is used to acquire input parameters corresponding to the three-dimensional point cloud data, the input parameters including the number of input channels and the data bit width; The target input point start address determination unit is used to determine the target input point start address corresponding to the height feature block based on the target input index sequence and the input parameters. A target input point quantity determination unit is used to determine the quantity of target input points based on the target input index sequence; A feature loading amount determination unit is used to determine the feature loading amount corresponding to the target input point based on the input parameters and the number of target input points; The target point cloud feature determination unit is used to determine the target point cloud feature corresponding to the height feature block from the three-dimensional point cloud input features based on the starting address of the target input point and the feature loading amount.
[0082] The matrix generation module 505 includes: An initial matrix creation submodule is used to create an initial matrix based on the target point cloud features; The target convolutional kernel position index sequence determination submodule is used to determine the target convolutional kernel position index sequence corresponding to the height feature block from the submanifold convolutional rule manual; The feature matrix generation submodule is used to generate the feature matrix corresponding to the height feature block based on the target convolution kernel position index sequence and the initial matrix.
[0083] The feature matrix generation submodule includes: The target index determination unit is used to determine the target convolutional kernel position index sequence as the target index corresponding to the height feature block; The source index determination unit is used to determine the source index corresponding to the height feature block according to the submanifold convolution rule manual. The target data point extraction unit is used to extract target data points from the target point cloud features based on the source index; The feature matrix generation unit is used to fill the target data points into the initial matrix according to the target index corresponding to the source index, and generate the feature matrix corresponding to the height feature block.
[0084] The target feature determination module 506 includes: A convolution parameter receiving submodule is used to receive convolution parameters, which include convolution kernel weights and convolution kernel offsets. A matrix multiplication operation submodule is used to perform matrix multiplication operations on the feature matrix according to the convolution parameters to obtain the target feature matrix; The target data point address determination submodule is used to determine the target data point address corresponding to the three-dimensional point cloud data based on the target feature matrix. The target feature determination submodule is used to determine the target features corresponding to the three-dimensional point cloud data based on the target data point address.
[0085] The target identification device in this application embodiment can be a device, or a component, integrated circuit, or chip in a terminal. The device can be a mobile electronic device or a non-mobile electronic device. For example, mobile electronic devices can be mobile phones, tablets, laptops, PDAs, in-vehicle electronic devices, wearable devices, ultra-mobile personal computers (UMPCs), netbooks, or personal digital assistants (PDAs), etc., while non-mobile electronic devices can be servers, network attached storage (NAS), personal computers (PCs), televisions (TVs), ATMs, or self-service machines, etc. This application embodiment does not impose specific limitations.
[0086] The target identification device in this application embodiment can be a device with an operating system. This operating system can be Android, iOS, or other possible operating systems; this application embodiment does not specifically limit the specific operating system used.
[0087] The target recognition device provided in this application embodiment can achieve... Figures 1 to 4 The various processes implemented by the target recognition device in the method embodiment will not be described again here to avoid repetition.
[0088] The target recognition device provided in this application can acquire 3D point cloud data; generate corresponding height feature blocks based on the 3D point cloud data; acquire a submanifold convolution rule manual corresponding to the 3D point cloud data; wherein, the submanifold convolution rule manual is used to determine the input point index and convolution kernel position index of the 3D point cloud data in the submanifold convolution operation; then, the target point cloud features corresponding to the height feature blocks can be determined according to the submanifold convolution rule manual; then, a feature matrix corresponding to the height feature blocks can be generated according to the target point cloud features; subsequently, the target features corresponding to the 3D point cloud data can be determined according to the feature matrix; finally, the target to be recognized can be identified according to the target features to obtain the target recognition result. Through the above implementation process, the 3D point cloud data can be segmented in the height dimension to obtain corresponding height feature blocks, and then the submanifold convolution operation can be performed on each height feature block according to the submanifold convolution rule manual to determine the target features of the 3D point cloud data. This can effectively reduce the memory usage when processing 3D point cloud data, improve the feature extraction speed of 3D point cloud data, and thus improve the corresponding target recognition efficiency.
[0089] Optionally, embodiments of this application also provide an electronic device, including a processor, a memory, and a program or instructions stored in the memory and executable on the processor. When the program or instructions are executed by the processor, they implement the various processes of the above-described target recognition method embodiments and achieve the same technical effects. To avoid repetition, they will not be described again here.
[0090] It should be noted that the electronic devices in the embodiments of this application include the mobile electronic devices and non-mobile electronic devices described above.
[0091] This application also provides a readable storage medium storing a program or instructions. When the program or instructions are executed by a processor, they implement the various processes of the above-described target recognition method embodiments and achieve the same technical effect. To avoid repetition, they will not be described again here.
[0092] The processor is the processor in the electronic device described in the above embodiments. The readable storage medium includes computer-readable storage media, such as computer read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk.
[0093] This application embodiment also provides a chip, which includes a processor and a communication interface. The communication interface is coupled to the processor. The processor is used to run programs or instructions to implement the various processes of the above-described target recognition method embodiments and can achieve the same technical effect. To avoid repetition, it will not be described again here.
[0094] It should be understood that the chip mentioned in the embodiments of this application may also be referred to as a system-on-a-chip, system chip, chip system, or system-on-a-chip, etc.
[0095] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element. Furthermore, it should be noted that the scope of the methods and apparatuses in the embodiments of this application is not limited to performing functions in the order shown or discussed, but may also include performing functions substantially simultaneously or in the reverse order, depending on the functions involved. For example, the described methods may be performed in a different order than described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
[0096] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in the various embodiments of this application.
[0097] The embodiments of this application have been described above with reference to the accompanying drawings. However, this application is not limited to the specific embodiments described above. The specific embodiments described above are merely illustrative and not restrictive. Those skilled in the art can make many other forms under the guidance of this application without departing from the spirit and scope of the claims, and all of these forms are within the protection scope of this application.
Claims
1. A target recognition method, characterized in that, The method includes: Acquire 3D point cloud data; Generate corresponding height feature blocks based on the three-dimensional point cloud data; Obtain the submanifold convolution rule manual corresponding to the 3D point cloud data; the submanifold convolution rule manual is used to determine the input point index and convolution kernel position index corresponding to the 3D point cloud data in the submanifold convolution operation; The target point cloud features corresponding to the height feature block are determined according to the submanifold convolution rule manual. Generate a feature matrix corresponding to the height feature block based on the target point cloud features; The target features corresponding to the 3D point cloud data are determined based on the feature matrix. The target to be identified is determined based on the target features, and the target identification result is obtained.
2. The method according to claim 1, characterized in that, The step of generating corresponding height feature blocks based on the 3D point cloud data includes: Determine the coordinates of each data point in the three-dimensional point cloud data; A three-dimensional point cloud input feature is generated based on the point coordinates; the three-dimensional point cloud input feature is obtained by mapping the point coordinates to a one-dimensional space, and includes height input feature, depth input feature and width input feature. The height input feature is divided into blocks according to a preset number of blocks to generate height feature blocks of the 3D point cloud data.
3. The method according to claim 1, characterized in that, The step of determining the target point cloud features corresponding to the height feature block according to the submanifold convolution rule manual includes: The target rule information corresponding to the height feature block is determined from the submanifold convolution rule manual; the target rule information includes the target input index sequence corresponding to the target input point; Obtain the 3D point cloud input features of the 3D point cloud data; The target point cloud features corresponding to the height feature block are determined based on the target input index sequence and the 3D point cloud input features.
4. The method according to claim 3, characterized in that, The step of determining the target point cloud features corresponding to the height feature block based on the target input index sequence and the 3D point cloud input features includes: Obtain the input parameters corresponding to the 3D point cloud data, the input parameters including the number of input channels and the data bit width; The starting address of the target input point corresponding to the height feature block is determined based on the target input index sequence and the input parameters; The number of target input points is determined based on the target input index sequence; The feature loading amount corresponding to the target input point is determined based on the input parameters and the number of target input points; The target point cloud feature corresponding to the height feature block is determined from the three-dimensional point cloud input features based on the starting address of the target input point and the feature loading amount.
5. The method according to claim 1, characterized in that, The step of generating the feature matrix corresponding to the height feature block based on the target point cloud features includes: An initial matrix is created based on the target point cloud features; Determine the target convolution kernel position index sequence corresponding to the height feature block from the submanifold convolution rule manual; The feature matrix corresponding to the height feature block is generated based on the target convolution kernel position index sequence and the initial matrix.
6. The method according to claim 5, characterized in that, The step of generating the feature matrix corresponding to the height feature block based on the target convolutional kernel position index sequence and the initial matrix includes: The target convolutional kernel position index sequence is determined as the target index corresponding to the height feature block; The source index corresponding to the height feature block is determined according to the submanifold convolution rule manual; target data points are extracted from the target point cloud features according to the source index; The target data points are filled into the initial matrix according to the target index corresponding to the source index to generate the feature matrix corresponding to the height feature block.
7. The method according to claim 1, characterized in that, Determining the target features corresponding to the 3D point cloud data based on the feature matrix includes: Receive convolution parameters, which include convolution kernel weights and convolution kernel offsets; The target feature matrix is obtained by performing matrix multiplication on the feature matrix according to the convolution parameters; The target data point address corresponding to the 3D point cloud data is determined based on the target feature matrix; The target features corresponding to the three-dimensional point cloud data are determined based on the target data point addresses.
8. A target recognition device, characterized in that, The device includes: The 3D point cloud data acquisition module is used to acquire 3D point cloud data. The height feature block generation module is used to generate corresponding height feature blocks based on the three-dimensional point cloud data. The submanifold convolution rule manual acquisition module is used to acquire the submanifold convolution rule manual corresponding to the 3D point cloud data; the submanifold convolution rule manual is used to determine the input point index and convolution kernel position index corresponding to the 3D point cloud data in the submanifold convolution operation; The target point cloud feature determination module is used to determine the target point cloud features corresponding to the height feature block according to the submanifold convolution rule manual. The matrix generation module is used to generate a feature matrix corresponding to the height feature block based on the target point cloud features; The target feature determination module is used to determine the target features corresponding to the three-dimensional point cloud data based on the feature matrix. The identification module is used to identify the target to be identified based on the target features and obtain the target identification result.
9. An electronic device, characterized in that, It includes a processor, a memory, and a program or instructions stored in the memory and executable on the processor, wherein the program or instructions, when executed by the processor, implement the steps of the target recognition method as described in claims 1-7.
10. A readable storage medium, characterized in that, The readable storage medium stores a program or instructions that, when executed by a processor, implement the steps of the target recognition method as described in claims 1-7.