Target identification method, device, apparatus and storage medium
By pre-setting convolution kernel parameters to determine the output features of 3D point clouds and inversely deriving the effective output points and their input sequence, the problem of excessive memory consumption of sparse convolution on the NPU is solved, the target recognition efficiency is improved, and it is suitable for autonomous driving and medical image processing.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- YUAN LI (BEI JING) BAN DAO TI JI SHU YOU XIAN GONG SI
- Filing Date
- 2026-02-09
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies rely on hash tables and traversal lookups when processing 3D point cloud data using sparse convolution, resulting in excessive memory consumption and making it difficult to process in real time on the NPU, thus affecting the efficiency of target recognition.
By pre-setting convolution kernel parameters to determine the output features of 3D point clouds, the effective output points and their input sequence are derived in reverse, avoiding the traditional sparse convolution rule manual, reducing memory usage and improving processing efficiency.
It achieves efficient target recognition on the NPU, reduces memory usage, and improves target recognition efficiency, making it suitable for autonomous driving and medical image processing scenarios.
Smart Images

Figure CN122244572A_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the field of artificial intelligence, and specifically relates to a target recognition method and apparatus, an electronic device, and a storage medium. Background Technology
[0002] 3D (Three Dimensions) point clouds, as a key form of 3D data, are being deeply integrated into cutting-edge fields such as autonomous driving and medicine for target recognition. In scenarios that prioritize real-time performance and low power consumption in intelligent driving and medical image processing, NPU (Neural Processing Unit) has become an indispensable dedicated computing engine due to its ultra-high energy efficiency.
[0003] When dealing with naturally sparse 3D point cloud data, sparse convolution is typically used for data processing. Sparse convolution requires generating a rulebook using hash tables or traversal lookups, then accessing sparse points according to the rulebook, and finally calculating based on the corresponding weights. However, hash table operations heavily rely on random memory access, complex pointer jumps, and dynamic data structure management, making them difficult to implement on an NPU. Traversing and looking up input points is inefficient, and sparse convolution implemented using traditional rulebooks requires a large amount of memory, which is difficult to support given the limited memory space of NPUs, resulting in low efficiency for target recognition based on 3D point cloud data. Summary of the Invention
[0004] The purpose of this application is to provide a target recognition method, apparatus, device, and storage medium that can solve or at least partially solve the problem of poor target recognition efficiency caused by the large memory consumption of 3D point cloud data during computation, making it difficult for the NPU to process it in real time.
[0005] To solve the above-mentioned technical problems, this application is implemented as follows: In a first aspect, embodiments of this application provide a target recognition method, the method comprising: Acquire 3D point cloud data; The three-dimensional point cloud output features corresponding to the three-dimensional point cloud data are determined according to the preset convolution kernel parameters; The effective output points and corresponding effective input sequence of the three-dimensional point cloud data are determined based on the output characteristics of the three-dimensional point cloud. The effective output features corresponding to the three-dimensional point cloud data are determined based on the effective output points and the corresponding effective input sequence. The target to be identified is determined based on the effective output features, and the target identification result is obtained.
[0006] Optionally, the convolution kernel parameters include kernel size, kernel padding parameters, and kernel stride. Determining the 3D point cloud output features corresponding to the 3D point cloud data based on the preset kernel parameters includes: The corresponding three-dimensional point cloud input features are generated based on the three-dimensional point cloud data. The three-dimensional point cloud input features are obtained by mapping the point coordinates of the three-dimensional point cloud data to a one-dimensional space, and include height input features, depth input features and width input features. The first output dimension feature corresponding to the input feature of the 3D point cloud is determined based on the convolution kernel size and the convolution kernel filling parameters. The second output dimension feature corresponding to the input feature of the 3D point cloud is determined based on the first output dimension feature and the convolution kernel stride. The three-dimensional point cloud output features corresponding to the three-dimensional point cloud data are determined based on the second output dimension features.
[0007] Optionally, determining the valid output points and corresponding valid input sequence of the 3D point cloud data based on the 3D point cloud output features includes: The effective output identifier bit sequence of the three-dimensional point cloud data is determined based on the output characteristics of the three-dimensional point cloud. The effective input index corresponding to the 3D point cloud data is determined based on the effective output identifier sequence and the convolution kernel parameters. The valid output points and corresponding valid input sequence of the 3D point cloud data are determined based on the valid input index.
[0008] Optionally, determining the effective output features corresponding to the 3D point cloud data based on the effective output points and the corresponding effective input sequence includes: The output height feature blocks corresponding to the three-dimensional point cloud data are generated according to the preset number of blocks; The block output point sequence and the corresponding block input sequence of the output height feature block are determined based on the valid output points and the corresponding valid input sequence. Determine the output spatial dimensions corresponding to the three-dimensional point cloud data; The output point coordinates of the output height feature block are determined based on the block output point sequence. The output point sequence index corresponding to the output point coordinates is determined based on the output space size; A dense space corresponding to the output height feature block is generated based on the output point sequential index; Generate the input point matrix position index corresponding to the block input point of the output height feature block based on the dense space, the block output point sequence, and the block input sequence; Generate a valid input matrix based on the position index of the input point matrix; The effective output features are obtained by performing matrix multiplication between the effective input matrix and the preset parameter matrix.
[0009] Optionally, determining the block output point sequence and the corresponding block input order sequence of the output height feature block includes: Obtain the average size of the output height feature block; The boundary information of the output height feature block is determined based on the average size of the output height feature block and the output space size; The block output point sequence and the corresponding block input sequence of the output height feature block are determined based on the boundary information, the valid output points, and the corresponding valid input sequence.
[0010] Optionally, generating the input point matrix position index corresponding to the block input points of the output height feature block based on the dense space, the block output point sequence, and the block input order sequence includes: Generate a matrix row sequence corresponding to the valid output points based on the dense space and the block output point sequence; Generate a matrix column sequence corresponding to the valid output points based on the block input sequence; The input point matrix position index of the block input point of the output height feature block is generated based on the matrix row sequence and matrix column sequence corresponding to the valid output point.
[0011] Optionally, generating a valid input matrix based on the input point matrix position index includes: The true index of the output height feature block is determined based on the block input sequence. The distribution area of the block input point in the output point coordinate set is determined based on the actual index; The actual index offset of the real index is determined based on the distribution area; The block input start address of the output height feature block is determined based on the distribution area; Extract target input features from the 3D point cloud data based on the distribution area and the block input start address; A valid input matrix is generated based on the input point matrix position index and the target input features.
[0012] Secondly, embodiments of this application provide a target recognition device, the device comprising: The 3D point cloud data acquisition module is used to acquire 3D point cloud data. The 3D point cloud output feature determination module is used to determine the 3D point cloud output features corresponding to the 3D point cloud data based on preset convolution kernel parameters. The first valid determination module is used to determine the valid output points of the three-dimensional point cloud data and the corresponding valid input sequence based on the output features of the three-dimensional point cloud. The second effective determination module is used to determine the effective output features corresponding to the three-dimensional point cloud data based on the effective output points and the corresponding effective input sequence. The target recognition module is used to identify the target to be identified based on the effective output features, and obtain the target recognition result.
[0013] Thirdly, embodiments of this application provide an electronic device including a processor, a memory, and a program or instructions stored in the memory and executable on the processor, wherein the program or instructions, when executed by the processor, implement the steps of the method described in the first aspect.
[0014] Fourthly, embodiments of this application provide a readable storage medium on which a program or instructions are stored, which, when executed by a processor, implement the steps of the method described in the first aspect.
[0015] Fifthly, embodiments of this application provide a chip, the chip including a processor and a communication interface, the communication interface being coupled to the processor, the processor being used to run programs or instructions to implement the method as described in the first aspect.
[0016] The target recognition method provided in this application can acquire 3D point cloud data; then determine the 3D point cloud output features corresponding to the 3D point cloud data according to preset convolution kernel parameters; determine the effective output points and corresponding effective input sequence sequences of the 3D point cloud data according to the 3D point cloud output features; determine the effective output features corresponding to the 3D point cloud data according to the effective output points and corresponding effective input sequence sequences; and then identify the target to be identified according to the effective output features to obtain the target recognition result. In this application embodiment, the output features corresponding to the 3D point cloud data are first determined by preset convolution kernel parameters, and then the effective output points and corresponding effective input sequence sequences in the 3D point cloud data are determined in reverse, thereby determining the effective output features corresponding to the 3D point cloud data, which are used to identify the target to be identified and obtain the target recognition result. It does not rely on traditional hash tables or traversal search methods to generate traditional sparse convolution rule manuals, avoiding the impact on target recognition efficiency caused by the large amount of memory occupied by using traditional sparse convolution rule manuals for 3D point cloud data processing. Attached Figure Description
[0017] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0018] Figure 1 This is a flowchart illustrating the steps of an embodiment of the target recognition method of this application; Figure 2 This is a schematic diagram of a point extraction unit in an embodiment of the target recognition method of this application; Figure 3 This is a flowchart illustrating the steps of another embodiment of the target recognition method of this application; Figure 4 This is a structural block diagram of an embodiment of a target recognition device according to this application. Detailed Implementation
[0019] To make the above-mentioned objectives, features, and advantages of this application more apparent and understandable, the technical solutions in the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0020] The terms "first," "second," etc., used in the specification and claims of this application are used to distinguish similar objects and not to describe a specific order or sequence. It should be understood that such use of data can be interchanged where appropriate so that embodiments of this application can be implemented in orders other than those illustrated or described herein. Furthermore, in the specification and claims, "and / or" indicates at least one of the connected objects, and the character " / " generally indicates that the preceding and following objects are in an "or" relationship.
[0021] The target recognition method provided in this application will be described in detail below with reference to the accompanying drawings, through specific embodiments and application scenarios.
[0022] 3D point clouds, as a key form of 3D data, are being deeply integrated into cutting-edge fields such as autonomous driving and healthcare for various target recognition applications. Given the inherently sparse nature of 3D point cloud data, sparse convolution is typically used for data processing to significantly reduce computational and memory consumption while better preserving data details. This approach is more suitable for real-time target recognition scenarios such as autonomous driving or medical image analysis.
[0023] In scenarios requiring real-time performance and low power consumption, such as intelligent driving and medical image processing, the NPU is typically used as a dedicated data processing engine. However, traditional sparse convolution requires generating a rule book using hash tables or traversal lookups, then accessing sparse points according to the rule book, and finally calculating based on the corresponding weights. Hash tables heavily rely on random memory access, complex pointer jumps, and dynamic data structure management, which contradicts the sequential, batch data stream processing mode that NPUs excel at. This makes implementing hash tables on NPUs difficult, while traversing to find input points is extremely inefficient. Furthermore, data processing based on traditional rule books consumes a large amount of memory, impacting the NPU's real-time processing capabilities, computational efficiency, and resource utilization, ultimately leading to poor target recognition efficiency.
[0024] To address the aforementioned issues, this application provides a target recognition method applicable to an NPU or an electronic device equipped with an NPU. This method eliminates the need to rely on traditional sparse convolution rule manuals, thus avoiding the impact on target recognition efficiency caused by the large amount of memory consumed in processing 3D point cloud data using traditional sparse convolution rule manuals.
[0025] Reference Figure 1 This is a flowchart illustrating the steps of an embodiment of a target recognition method according to this application. The method includes the following steps: Step 101: Obtain 3D point cloud data; Three-dimensional point cloud data refers to a data set composed of a large number of discrete data points in three-dimensional space. This data set includes multiple data points and can reflect various features such as location and color information. Three-dimensional point cloud data can be obtained by scanning with a three-dimensional scanning device. This three-dimensional scanning device can be a sensor such as a lidar mounted on a vehicle, or various medical scanning devices with three-dimensional imaging capabilities. This application does not impose specific limitations on this.
[0026] Step 102: Determine the 3D point cloud output features corresponding to the 3D point cloud data according to the preset convolution kernel parameters; Among them, the convolution kernel parameters can reflect the operational characteristics and output features of the convolution kernel. The output features of 3D point cloud can include the output list in three dimensions, D (Depth), H (Height), and W (Width), after sparse convolution of 3D point cloud data.
[0027] Step 103: Determine the valid output points and corresponding valid input sequence of the three-dimensional point cloud data based on the output features of the three-dimensional point cloud; In this context, a valid output point represents a feature point among the feature points of the 3D point cloud output features that corresponds to the same derivation input point as a data point in the 3D point cloud data. The derivation input point is the input feature point obtained by reverse derivation from the feature points of the 3D point cloud output features. Because the forward derivation process for determining the 3D point cloud output features involves dividing by the convolution kernel stride and rounding the result, decimals in the result are ignored. Therefore, the derivation input points obtained by reverse derivation from the 3D point cloud output features may differ from the input points corresponding to the original 3D point cloud data. These differing points can be considered invalid output points.
[0028] Among them, data points in the 3D point cloud data that are the same as the derivation input points of the valid output points are valid input points. The valid input sequence can reflect the arrangement order of the valid input points in the input features corresponding to the 3D point cloud data, and is used to indicate the position of each valid input point.
[0029] Step 104: Determine the effective output features corresponding to the 3D point cloud data based on the effective output points and the corresponding effective input sequence. Among them, the effective output features can be the feature maps output by the effective input points in the 3D point cloud data after sparse convolution processing, which can reflect the geometric features and semantic information of each target to be identified.
[0030] Step 105: Identify the target to be identified based on the effective output features to obtain the target identification result.
[0031] The effective output features may include features corresponding to one or more targets to be identified, and the target identification result may include information such as the type, location, and confidence level of the target to be identified.
[0032] In this embodiment, 3D point cloud data can be acquired; then, the 3D point cloud output features corresponding to the 3D point cloud data are determined according to preset convolution kernel parameters; the valid output points and corresponding valid input sequence of the 3D point cloud data can be determined based on the 3D point cloud output features; the valid output features corresponding to the 3D point cloud data are determined based on the valid output points and corresponding valid input sequence; then, the target to be identified can be identified based on the valid output features to obtain the target recognition result. Through the above implementation process, the output features corresponding to the 3D point cloud data are first determined by preset convolution kernel parameters, and then the valid output points and their corresponding valid input sequence in the 3D point cloud data are determined in reverse. The valid output features corresponding to the 3D point cloud data are determined based on the valid output points and their corresponding valid input sequence to identify the target to be identified and obtain the target recognition result. This can reduce the processing of invalid data points in the 3D point cloud data processing process and improve the efficiency of target recognition. Furthermore, it does not require relying on traditional hash tables or traversal search methods to generate traditional sparse convolution rule manuals, avoiding the impact on target recognition efficiency caused by the large amount of memory occupied by using traditional sparse convolution rule manuals for 3D point cloud data processing.
[0033] In some embodiments of this application, the convolution kernel parameters include kernel size, kernel padding parameters, and kernel stride. The kernel size represents the size of the convolution kernel on the input data and is used to define the receptive field of the convolution operation. The kernel padding parameter reflects the extra pixels added to the boundaries of the input data; by strategically adding values (typically 0) around the edges of the input data, the size of the output feature map is controlled, and the information processing capability of the convolution operation is optimized. The kernel stride controls the step size by which the convolution kernel slides across the input data, affecting the size of the output feature map.
[0034] The step of determining the 3D point cloud output features corresponding to the 3D point cloud data according to preset convolution kernel parameters includes: The corresponding three-dimensional point cloud input features are generated based on the three-dimensional point cloud data. The three-dimensional point cloud input features are obtained by mapping the point coordinates of the three-dimensional point cloud data to a one-dimensional space, and include height input features, depth input features and width input features. The first output dimension feature corresponding to the input feature of the 3D point cloud is determined based on the convolution kernel size and the convolution kernel filling parameters. The second output dimension feature corresponding to the input feature of the 3D point cloud is determined based on the first output dimension feature and the convolution kernel stride. The three-dimensional point cloud output features corresponding to the three-dimensional point cloud data are determined based on the second output dimension features.
[0035] In this embodiment, the 3D point cloud data sent by the 3D scanning device is an unordered set of points, which may contain multiple 3D data points. Corresponding ordered 3D point cloud input features can be generated based on the unordered 3D point cloud data. The 3D point cloud input features are obtained by mapping the point coordinates of the 3D point cloud data to a one-dimensional space, and include height input features, depth input features, and width input features.
[0036] For example, the coordinates of each data point in a 3D point cloud can be determined, with each coordinate including height, depth, and width. The coordinates of each data point can be tiled (mapped) to a one-dimensional space according to the HDW (Height-Depth-Width) dimensional order to obtain a one-dimensional linear index corresponding to the data point. Then, the data points are sorted according to the one-dimensional linear index to generate the 3D point cloud input features. The 3D point cloud input features can include features in three dimensions: height, depth, and width. For example, the 3D point cloud data can include point A (h, d, w), where h is the coordinate value of point A in the height dimension, d is the coordinate value of point A in the depth dimension, and w is the coordinate value of point A in the width dimension. The spatial range corresponding to the 3D point cloud data is represented by the maximum height H_max, maximum depth D_max, and maximum width W_max. The one-dimensional linear index can be represented as linear_index = h × (D_max × W_max) + d × W_max + w. Data points can be sorted according to the size of a one-dimensional linear index to obtain the 3D point cloud input features. The height input feature is the feature value of the 3D point cloud input features in the height dimension, the depth input feature is the feature value of the 3D point cloud input features in the depth dimension, and the width input feature is the feature value of the 3D point cloud input features in the width dimension. Transforming unordered 3D point cloud data into ordered 3D point cloud features allows for ordered data reading subsequently, improving data retrieval efficiency.
[0037] After obtaining the 3D point cloud input features, the first output dimension feature corresponding to the 3D point cloud input features can be determined based on the convolution kernel size and kernel padding parameters. The first output dimension feature can represent the coordinate values of the output features corresponding to each data point of the 3D point cloud input features in the D, H, and W dimensions, respectively, without considering the convolution kernel stride. The first output dimension feature can be expressed by the following formula:
[0038] Wherein, no_stride_p can represent the dimensional features without considering the convolution kernel stride; i=0,1,2, representing the three dimensions D, H, and W respectively; j represents the size of the convolution kernel in dimension i, j=0,1,...,kernel_size[i]-1; kernel_size is used to represent the dimensions of the convolution kernel in the three dimensions D, H, and W; kernel_size[i] can represent the size of the convolution kernel in the i-th dimension; p[i] can represent the coordinates of the 3D point cloud input features in the i-th dimension; padding can represent the padding size in the three dimensions D, H, and W when performing sparse convolution operations, and padding[i] can represent the padding size in the i-th dimension when performing sparse convolution operations.
[0039] The second output dimension feature corresponding to the 3D point cloud input feature can be determined based on the first output dimension feature and the convolution kernel stride. The second output dimension feature represents the coordinate values of each data point of the 3D point cloud input feature in the D, H, and W dimensions of the output feature corresponding to the convolution kernel. The second output dimension feature can be expressed by the following formula:
[0040] Where stride_p represents the dimensional features considering the kernel stride; i=0,1,2, representing the three dimensions D, H, and W respectively; j represents the size of the kernel in dimension i, j=0,1,...,kernel_size[i]-1; kernel_size represents the dimensions of the kernel in the three dimensions D, H, and W; kernel_size[i] represents the size of the kernel in the i-th dimension; no_stride_p[i,j] represents the dimensional features of the 3D point cloud input features at the j-th kernel position in dimension i without considering the kernel stride; stride represents the stride of the kernel in the three dimensions D, H, and W when performing sparse convolution operations. It can represent the step size of the convolution kernel in the i-th dimension; This is the floor function.
[0041] Then, the second output dimension features in the D, H, and W dimensions can be determined as the corresponding 3D point cloud output features, which are essentially the output dimension lists corresponding to the 3D point cloud data in each dimension. This is the forward derivation process for determining the 3D point cloud output features based on the 3D point cloud data. Through the above implementation process, the unordered 3D point cloud data is first transformed into ordered 3D point cloud input features adapted for NPU processing. Then, based on the 3D point cloud input features and the preset convolution kernel parameter features, the corresponding 3D point cloud output features are determined from the 3D point cloud data without relying on hash tables or traversal search methods to generate a rule manual. This reduces memory usage during data processing and improves the processing efficiency of 3D point cloud data.
[0042] In some embodiments of this application, determining the valid output points and corresponding valid input sequence of the 3D point cloud data based on the output features of the 3D point cloud includes: The effective output identifier bit sequence of the three-dimensional point cloud data is determined based on the output characteristics of the three-dimensional point cloud. The effective input index corresponding to the 3D point cloud data is determined based on the effective output identifier sequence and the convolution kernel parameters. The valid output points and corresponding valid input sequence of the 3D point cloud data are determined based on the valid input index.
[0043] In this embodiment, the valid output identifier sequence of the 3D point cloud data can be determined based on the 3D point cloud output features. The valid output identifier sequence reflects the positional information of each valid output point in the 3D point cloud output features. The corresponding inverse output features can be determined based on the 3D point cloud output features and the convolution kernel stride. The inverse output features can be used to verify the valid output points in the 3D point cloud output features. The feature points of the inverse output features can be represented by the following formula:
[0044] Where reverse_p represents the reverse output feature corresponding to the 3D point cloud output feature, i=0,1,2, representing the three dimensions D, H, and W respectively; j represents the size of the convolution kernel in dimension i, j=0,1,...,kernel_size[i]-1; kernel_size is used to represent the dimensions of the convolution kernel in the three dimensions D, H, and W; kernel_size[i] can represent the size of the convolution kernel in the i-th dimension; stride_p[i,j] can represent the dimensional feature of the 3D point cloud input feature at the j-th convolution kernel position in dimension i, considering the convolution kernel stride; stride can represent the stride of the convolution kernel in the three dimensions D, H, and W when performing sparse convolution operation; It can represent the step size of the convolution kernel in the i-th dimension.
[0045] After obtaining the reverse output features, a list of valid output identifiers for the 3D point cloud data can be obtained by comparing the equal positions in the reverse output features and the first output dimension features. This list reflects the positional information of the valid output points corresponding to the 3D point cloud data. The valid output identifier list, valid_list, can be represented as: valid_list = reverse_p == no_stride_p. For example, if the reverse output features are further deduced by combining the kernel size and kernel padding parameters, specific derivation input points can be obtained. The derivation input points corresponding to the equal positions in the reverse output features and the first output dimension features are the same as the data points in the 3D point cloud data, which are the valid input points.
[0046] Then, based on the kernel parameters, the dimensions of the convolution kernel in the D, H, and W dimensions can be determined. Then, based on the kernel dimensions, the D, H, and W dimensions of the 3D point cloud output features are traversed layer by layer. According to the list of valid output identifiers, the corresponding valid identifiers are extracted from the 3D point cloud output features to obtain a valid output identifier sequence. Invalid positions in the valid output identifier sequence are set to zero. The extracted valid output identifier sequence reflects the coordinate values of the valid output feature points, expressed by the following formula:
[0047] Wherein, valid_point_list can represent the sequence of valid output flags; kdi = 0,1,...,kernel_size[0] – 1 can represent the index of the convolutional kernel size in the depth dimension; khi = 0,1,...,kernel_size[1] – 1 can represent the index of the convolutional kernel size in the height dimension; kwi = 0,1,...,kernel_size[2] - 1 can represent the index of the convolutional kernel size in the height dimension; valid_list[0,kdi] can represent the feature value of the valid flag in the depth dimension, valid_list[1,khi] can represent the feature value of the valid flag in the height dimension, and valid_list[2,kwi] can represent the feature value of the valid flag in the width dimension. An AND operation is performed on the feature values in the three dimensions. Only output points that are valid in all three dimensions are retained. If any feature value in one dimension is invalid and has a value of 0, the corresponding output point will be set to zero.
[0048] After determining the valid output identifier sequence, the valid input index corresponding to the 3D point cloud data can be determined based on the valid output identifier sequence and the convolution kernel parameters. The number of data points corresponding to the 3D point cloud data can be determined, and then, based on the number of data points, combined with the convolution kernel size and its index in the D, H, and W dimensions, the initial padding value for the input order index with weighted information can be determined. The initial padding value can be expressed by the following formula:
[0049] Where f can represent the initial padding value, used to locate the storage location of the input sequence index; kdi = 0,1,...,kernel_size[0] – 1, can represent the kernel size index in the depth dimension; khi = 0,1,...,kernel_size[1] – 1, can represent the kernel size index in the height dimension; kwi = 0,1,...,kernel_size[2] - 1, can represent the kernel size index in the height dimension; kernel_size[1] can represent the kernel size in the first dimension (height dimension); kernel_size[2] can represent the kernel size in the second dimension (width dimension); N can represent the total number of data points corresponding to the 3D point cloud data. For example, padding units in the NPU can be used to pad the input sequence index with weight information. The padding unit can pad elements in a specified area according to the padding value and has an enumeration padding function.
[0050] The valid input index is then obtained by performing a fixed-point multiplication operation on the valid output flag sequence and the obtained padding start value. The valid input index idx_list can be represented by the following formula:
[0051] The corresponding output space size can be determined based on the output features of a 3D point cloud. The output space size reflects the size of the data point set of the 3D point cloud output features in the feature space dimension, including depth, height, and width. The discrete sequence of output points corresponding to the 3D point cloud output features can be determined based on the output space size and the 3D point cloud output features. The discrete sequence of output points can be expressed by the following formula:
[0052] Wherein, out can represent the discrete sequence of output points, stride_p[1,khi] can represent the feature value of the 3D point cloud output feature at position khi in the first dimension (height dimension) and the convolution kernel size in the height dimension after zeroing invalid positions, out_D can represent the depth dimension of the output space, out_W can represent the width dimension of the output space, stride_p[0,kdi] can represent the feature value of the 3D point cloud output feature at position kdi in the 0th dimension (depth dimension) and the convolution kernel size in the depth dimension after zeroing invalid positions, and stride_p[2,kwi] can represent the feature value of the 3D point cloud output feature at position kwi in the second dimension (width dimension) and the convolution kernel size in the width dimension after zeroing invalid positions.
[0053] The effective output points and corresponding effective input sequence of the 3D point cloud data are determined based on the effective input index and the discrete sequence of output points. For example, zeros can be removed from the effective input index and the discrete sequence of output points, and the remaining data points can be closely arranged. The effective input sequence with weighted positional information is determined based on the effective input index after zero-point removal, and the corresponding effective output points are determined based on the discrete sequence of output points after zero-point removal.
[0054] Through the above implementation process, the list of corresponding valid output identifiers can be determined from the output features of the 3D point cloud, thereby determining the corresponding valid identifier sequence. Then, based on the valid output identifier sequence, the valid input index corresponding to the 3D point cloud data can be derived. Finally, based on the valid input index, the valid output points and corresponding valid input sequence of the 3D point cloud data can be determined. This reduces the processing of invalid data points during 3D point cloud data processing, improving the efficiency of target recognition. Furthermore, it eliminates the need to rely on traditional hash tables or traversal search methods to generate a traditional rule manual to determine the correspondence between input and output points, avoiding the impact on target recognition efficiency caused by the large memory consumption of traditional sparse convolution rule manuals for 3D point cloud data processing.
[0055] For example, a comparator can be used to compare the equal positions in the inverse output features and the first output dimension features to obtain a list of valid output identifier bits for the 3D point cloud data. Then, a point extraction unit extracts a sequence of valid output identifier bits based on the list of valid output identifier bits. (Refer to...) Figure 2 This is a schematic diagram of a point extraction unit in an embodiment of a target recognition method of this application.
[0056] The inverse output features and the first output dimension features can be input into, such as... Figure 2In the comparator shown, the comparator sequentially compares the feature values corresponding to the inverse output feature and the first output dimension feature, finds the positions where the inverse output feature equals the first output dimension feature, and sets the corresponding value to 1; otherwise, it sets it to 0. This results in a set of flag bits where the inverse output feature equals the first output dimension feature, which serves as the set of flag bits corresponding to the values, i.e., the list of valid output flag bits. Then, based on the list of valid output flag bits, the sequence of valid output flag bits is extracted from the 3D point cloud output features corresponding to the first output dimension feature. The sequence of valid output flag bits and the initial filling value are then multiplied in a fixed-point multiplication unit, and the fixed-point operation result is output, which is the input index sequence containing valid input index information, i.e., the valid input index. The input index sequence containing valid input index information can then be used as input data, and the discrete output point sequence `out` can be used as the index, input to the zero-point removal unit to obtain the points within the current output height feature space block and their corresponding indices, i.e., the valid output points and their valid input order sequence of the current output height feature space block. The zero-point removal unit can remove zero-point data from a set of data and output the result and its corresponding index.
[0057] See Figure 3 This is a flowchart illustrating the steps of another embodiment of the target recognition method of this application. In some embodiments of this application, determining the effective output features corresponding to the 3D point cloud data based on the effective output points and the corresponding effective input sequence includes: Step 301: Generate the output height feature block corresponding to the 3D point cloud data according to the preset number of blocks; The number of blocks can be set according to actual needs, and this application does not impose specific restrictions on it. The output height feature block can correspond to the block of the 3D point cloud data in the height dimension of the output dense space.
[0058] For example, the 3D point cloud input features corresponding to the 3D point cloud data can be obtained. The 3D point cloud input features include the corresponding height input features. Then, the height dimension in the preset convolution kernel size and the preset convolution kernel padding parameters are combined to determine the corresponding height output features. The height output features are divided into blocks according to the preset number of blocks M, and are divided into M consecutive subsets. Each subset corresponds to an output height feature block to generate the output height feature block of the 3D point cloud data. The number of feature points in each output height feature block is determined according to the total number of feature points and the number of blocks of the 3D point cloud data.
[0059] Step 302: Determine the block output point sequence and the corresponding block input sequence of the output height feature block based on the valid output points and the corresponding valid input sequence. Among them, the block output point sequence can represent the data point sequence corresponding to the current output height feature block, and the block input order sequence carries weight position information, which can reflect the arrangement order of the data points output by the current output height feature block in the feature space after sparse convolution.
[0060] Step 303: Determine the output spatial dimensions corresponding to the three-dimensional point cloud data; The output space size reflects the size of the set of data points in the feature space of the 3D point cloud output features, including the depth, height, and width dimensions.
[0061] Step 304: Determine the coordinates of the output points of the output height feature block based on the block output point sequence; The output point coordinates can represent the coordinates of the data points output after the current output height feature block is sparsely convolved.
[0062] Step 305: Determine the output point sequence index corresponding to the output point coordinates based on the output space size; Among them, the output point sequence index can reflect the arrangement order of each data point in the output features of each 3D point cloud.
[0063] Step 306: Generate a dense space corresponding to the output height feature block according to the output point sequential index; Among them, dense space is a continuous and regular three-dimensional network space, which facilitates subsequent convolution processing of point cloud data in sparse and irregular output height feature blocks, thereby improving data processing efficiency and feature extraction quality.
[0064] Step 307: Generate the input point matrix position index corresponding to the block input point of the output height feature block based on the dense space, the block output point sequence, and the block input sequence; The input point matrix position index indicates the position of the block input point of the current output height feature block in the feature calculation matrix.
[0065] Step 308: Generate a valid input matrix based on the position index of the input point matrix; The effective input matrix is the feature calculation matrix composed of the effective input points in the 3D point cloud input features.
[0066] Step 309: Perform matrix multiplication on the effective input matrix and the preset parameter matrix to obtain the effective output features.
[0067] The parameter matrix is generated based on the preset convolutional kernel weights. The convolutional kernel weights can be a parameter matrix used in the convolutional layer to extract local features of the data, and can achieve perception of local information through sparse interactions. The input shape of the convolutional kernel weights can be represented as (out channels [number of output channels of the convolutional kernel], in channels [number of input channels between the convolutional kernel and the feature data], kernel depth [kernel depth], kernel height [kernel height], kernel width [kernel width]).
[0068] In this embodiment, the 3D point cloud input features corresponding to the 3D point cloud data can be obtained. After obtaining the 3D point cloud input features, the height output features corresponding to the height input features can be divided into blocks according to a preset number of blocks M to generate output height feature blocks corresponding to the 3D point cloud data. Based on the valid output points and the corresponding valid input sequence, the block output point sequence of the output height feature block and the corresponding block input sequence with weighted position information can be determined.
[0069] After obtaining the block output point sequence and the corresponding block input sequence with weighted position information, the block output point sequence can be sorted in ascending order of output point values. The sorted block output point sequence also needs to be deduplicated to obtain a sequence of block output points without duplicates, i.e., a set of block output points without duplicates, and the number of elements corresponding to each unique block output point can be obtained. For example, the block output point sequence can be input into the `sort` unit for sorting. The `sort` unit can sort a set of fixed-point data and its corresponding indices in ascending or descending order, and output the sorting result, along with the indices of the sorted fixed-point data. The sorted block output point sequence can also be input into the `sortedunique` unit for deduplication. The `sortedunique` unit can deduplicate the input ordered fixed-point sequence.
[0070] The output spatial dimensions of a 3D point cloud can be determined based on its output features. These dimensions reflect the size of the set of data points representing the 3D point cloud output features in the feature space, including depth, height, and width. Then, the output point coordinates of the output height feature block can be determined based on the sequence of non-repeating block output points and the output spatial dimensions. These output point coordinates can be expressed by the following formula:
[0071]
[0072]
[0073] Wherein, out_h represents the height coordinates of the output points of the output height feature block, out_d represents the depth coordinates of the output points of the output height feature block, and out_w represents the width coordinates of the output points of the output height feature block. unique_out represents the block output points in a sequence of non-repeating block output points, out_D represents the depth dimension of the output space, out_W represents the width dimension of the output space, and % is the modulo operator.
[0074] The output point sequence index corresponding to the output point coordinates is determined based on the output spatial dimensions. For example, the size of the output height feature block in the height dimension can be determined first, and then the first offset for arranging the output height feature blocks in HDW order in the feature space of the 3D point cloud data can be determined based on the spatial dimensions and the size of the output height feature blocks in the height dimension. The first offset can be expressed by the following formula:
[0075] Where Mi_offset can represent the first offset, Mi can represent the current output height feature block, block_H_size can represent the size of the output height feature block Mi in the height dimension, out_D can represent the depth dimension of the output space, and out_W can represent the width dimension of the output space.
[0076] Then, based on the first offset and the set of block output points without repetition, the actual offset of the set of block output points without repetition (unique_out) within the feature space of the current output height feature block is determined. The actual offset (unique_out_no_bias) can be expressed by the following formula: unique_out_no_bias = unique_out - Mi_offset. The user-allocated space to be filled is obtained. This space is the same size as the space required to store the actual offset, with an initial fill value of 0. Then, the actual offset is filled into the space to be filled using enumerated filling units, generating the output point sequence index of the set of block output points without repetition.
[0077] The actual offset can be used as the destination index, the sequential index as the source index, the output point sequential index as the source data, and the empty dense space as the destination space. Index data is extracted from the source data (output point sequential index) sequentially according to the source index (sequential index). Then, the extracted index data is written into the destination space according to the destination index (actual offset), generating the dense space corresponding to the output height feature block. The sequential index can be 1, 2…n, where n is the same as the number of output point sequential indices.
[0078] Subsequently, the input point matrix position indices corresponding to the block input points of the output height feature block can be generated based on the dense space, the block output point sequence, and the block input sequence. Then, a valid input matrix is generated based on these input point matrix position indices. Finally, the valid input matrix can be multiplied by a preset parameter matrix to obtain the valid output features. For example, after obtaining the valid output features corresponding to each output height feature block, the valid output feature address can be calculated based on the number of elements corresponding to the non-repeating block output points of the current output height feature block. Then, the valid output features corresponding to that output height feature block are written to the corresponding valid output feature address. The valid output feature address can be represented by the following formula:
[0079] Among them, dst_addr_M i The output height feature block M can be represented i The corresponding valid output feature address, dst_addr, can represent the starting address of the valid output feature storage space allocated by the upper layer. This can represent the distance from the first output height feature block M0 to the previous output height feature block M of the current output height feature block. i-1 The total number of elements corresponds to the total number of valid output features that have been saved. `out channels` can represent the number of output channels corresponding to valid output features, and `data_byte` can represent the number of bytes occupied by each valid output feature. The update can be performed after each valid output feature address corresponding to the output height feature block is saved. The total number of saved valid output features after the update is: Repeat the above steps until all output height feature blocks have been processed. This will give you the number of valid points generated from the 3D point cloud data after sparse convolution, as well as the coordinates and corresponding feature values of each point.
[0080] Through the above implementation process, 3D point cloud data can be segmented into blocks along the output height dimension. This decomposes the massive 3D point cloud input features into multiple smaller output height feature blocks, each corresponding to a smaller input feature block. During each computation, the memory only needs to retrieve the target point cloud features within the input height feature block derived from the currently processed output height feature block. This avoids storing all 3D point cloud input features at once, which would lead to excessive NPU data transmission latency and resource consumption, impacting target recognition efficiency. Furthermore, without relying on traditional hash tables or traversal search methods to generate a traditional rule book, convolution operations can be performed on each output height feature block to obtain the effective output features corresponding to the 3D point cloud data. This avoids the impact on target recognition efficiency caused by the large memory consumption of traditional rule books for 3D point cloud data processing.
[0081] In some embodiments of this application, determining the block output point sequence and the corresponding block input order sequence of the output height feature block includes: Obtain the average size of the output height feature block; The boundary information of the output height feature block is determined based on the average size of the output height feature block and the output space size; The block output point sequence and the corresponding block input sequence of the output height feature block are determined based on the boundary information, the valid output points, and the corresponding valid input sequence.
[0082] In this embodiment, the average size of the output height feature block can be obtained. The average size of the output height feature block can be determined based on the feature space size corresponding to the 3D point cloud output features and the number of blocks corresponding to the output height feature block. The boundary information of the output height feature block can be determined based on the average size and output space size. The boundary information reflects the extraction range of the output points corresponding to the output height feature block, including the left and right boundaries. The boundary information of the output height feature block can be expressed by the following formula:
[0083]
[0084] Where l can represent the left boundary of the output height feature block, and r can represent the right boundary of the output height feature block. can represent the i-th output height feature block currently being processed, average_block_size can represent the average size of the output height feature block, out_D can represent the depth size of the output space, and out_W can represent the width size of the output space.
[0085] Then, based on the boundary information, the valid output points, and the corresponding valid input sequence, the block output point sequence and the corresponding block input sequence of the output height feature block can be determined. For example, the valid output points can be used as input data, and the corresponding valid input sequence can be used as an index input to... Figure 2 The point extraction unit shown in the diagram can find the elements within the range of values (maximum and minimum) input by the user, as well as the corresponding valid input points. This can be represented as p0, p1…pn-1. The valid input points and the left boundary of the output height feature block are input into a comparator. The comparator finds valid input points greater than or equal to the left boundary and marks the corresponding values as 1; otherwise, it marks them as 0, resulting in a first set of flags for valid input points greater than or equal to the left boundary. Similarly, the valid input points and the right boundary of the output height feature block are input into a comparator. The comparator finds valid input points less than the right boundary and marks the corresponding values as 1; otherwise, it marks them as 0, resulting in a second set of flags for valid input points less than the right boundary. The first and second flag sets output by the comparator can then be input into an AND operation logic unit to perform an AND operation, obtaining the set of flags corresponding to the valid output points within the current boundary range of the output height feature block. The set of flags corresponding to the valid input points and valid output points is input to the fixed-point multiplication unit to perform fixed-point multiplication. This retains the valid input points in the valid input point sequence that correspond to the output height feature block. The valid input points outside the output height feature block have a value of 0 after the multiplication. The fixed-point operation result obtained from the fixed-point multiplication and the valid input sequence used as an index are then input to the zero-point removal unit. This retains only the block valid output points in the current output height feature block and the corresponding block valid input sequence.
[0086] Through the above implementation process, the output height feature blocks obtained by segmentation can be used to divide the effective output points corresponding to the 3D point cloud input features into a block output point sequence and a corresponding block input sequence for each output height feature block. During each calculation, the memory only needs to use the features corresponding to the input feature points in the input height feature space region derived from the current output height feature block based on the convolution kernel size, padding size, and stride. This avoids storing all 3D point cloud input features at once, which would lead to NPU data transmission latency and excessive resource consumption, affecting the target recognition efficiency.
[0087] In some embodiments of this application, generating the input point matrix position index corresponding to the block output point of the output height feature block based on the dense space, the block output point sequence, and the block input order sequence includes: Generate a matrix row sequence corresponding to the valid output points based on the dense space and the block output point sequence; Generate a matrix column sequence corresponding to the valid output points based on the block input sequence; The input point matrix position index of the block input point of the output height feature block is generated based on the matrix row sequence and matrix column sequence corresponding to the valid output point.
[0088] In this embodiment, a matrix row sequence corresponding to valid output points can be generated based on the dense space and the block output point sequence. The matrix row sequence is the sequential position sequence of valid output points on the rows of the feature matrix used for matrix multiplication. The length of the matrix row sequence corresponding to each output height feature block is Mi_out_nums, which is the number of elements corresponding to the non-repeating block output points of each output height feature block. For example, data can be written to the dense space using an indexing unit in the NPU. The indexing unit uses the dense space as the source data, the block output point sequence as the source index, and the sequential index as the destination index. The source index indicates the relative position of data reading, the destination index indicates the relative position of data writing, and the source data indicates the data content to be read. Data is extracted from the dense space of the output height feature blocks according to the source index (block output point sequence), and the extracted data is arranged according to the destination index (sequential index) to obtain the matrix row sequence.
[0089] A sequence of matrix columns corresponding to valid output points can be generated based on the block input sequence. This sequence of matrix columns can be represented by the following formula:
[0090] Where gemm_K_idx represents the matrix column sequence, cur_block_out_idx_list represents the block input sequence, and N represents the total number of data points corresponding to the 3D point cloud data. The length of the matrix column sequence is kernel_size[0]×kernel_size[1]×kernel_size[2], which is the product of the D, H, and W dimensions of the convolution kernel. Assuming it is a 3×3×3 convolution kernel, the length of its corresponding matrix column sequence is 27.
[0091] Based on the matrix row and column sequences corresponding to the valid output points, generate the input point matrix position index corresponding to the block input points of the input height feature region corresponding to the output height feature block. The input point matrix position index can be represented by the following formula:
[0092] Where gemm_MK_idx can represent the index of the input point matrix, gemm_M_idx can represent the matrix row sequence, gemm_K_idx can represent the matrix column sequence, kd can represent the depth dimension of the convolution kernel size, kh can represent the height dimension of the convolution kernel size, and kw can represent the height dimension of the convolution kernel size.
[0093] Through the above implementation process, the matrix row sequence, matrix column sequence, and input point matrix position index corresponding to the effective input points on the feature matrix to be operated on can be determined according to the output height feature block. During each calculation, the memory only needs to obtain the block output point sequence corresponding to the currently processed output height feature block and the corresponding effective input points within the corresponding input height feature region from the 3D point cloud input features. This avoids excessive data storage at once, which could lead to NPU data transmission latency and high resource consumption, affecting target recognition efficiency.
[0094] In some embodiments of this application, generating a valid input matrix based on the input point matrix position index includes: The true index of the output height feature block is determined based on the block input sequence. The distribution area of the block input point in the output point coordinate set is determined based on the actual index; The actual index offset of the real index is determined based on the distribution area; The block input start address of the output height feature block is determined based on the distribution area; Extract target input features from the 3D point cloud data based on the distribution area and the block input start address; A valid input matrix is generated based on the input point matrix position index and the target input features.
[0095] In this embodiment, the true index of the input height feature region corresponding to the output height feature block can be determined based on the block input order sequence with weighted position information. The true index reflects the position of the 3D point cloud input features within the input height feature region corresponding to the output height feature block in the global input. The true index of the output height feature block can be expressed by the following formula:
[0096] Where real_indice_idx can represent the real index, cur_block_out_idx_list can represent the block input sequence with weighted position information, N can represent the total number of data points corresponding to the 3D point cloud data, and % is the modulo operator.
[0097] The distribution region of the block input points within the output point coordinate set can be determined based on the actual index. This distribution region can be determined using the maximum and minimum distribution indices. (Maximum distribution index...) Minimum distribution index For example, the maximum distribution index can be determined using the fixed-point maximum extraction unit in the NPU, which finds and outputs the maximum value in a set of data. Similarly, the minimum distribution index can be determined using the fixed-point minimum extraction unit in the NPU, which finds and outputs the minimum value in a set of data. Based on the distribution region, the actual index offset of each true index within the input height feature region corresponding to the current output height feature block can be determined. Actual index offset. .
[0098] The block input start address of the output height feature block can also be determined based on the distribution area. The block input start address refers to the starting address of the input height feature region corresponding to the output height feature block. The input height feature region refers to the area where the input height feature block corresponding to the output height feature block is located. The block input start address reflects the starting position of the feature data required by the output height feature block in the 3D point cloud input features. The block input start address can be expressed by the following formula:
[0099] Here, `cur_src_addr` represents the starting address of the block input for the output height feature block, i.e., the starting address of the block input data; `src_addr` represents the starting address of the storage space allocated by the upper layer for the 3D point cloud input features; `min_idx` represents the minimum distribution index of the block input points of the output height feature block in the set of output point coordinates; `inchannels` represents the number of input channels for the 3D point cloud data; and `data_bytes` represents the bit width occupied by each data element of the 3D point cloud data.
[0100] After determining the block input start address, target input features can be extracted from the 3D point cloud data based on the distribution area and the block input start address. The amount of data to be loaded for the input height feature region corresponding to the output height feature block can be determined first based on the distribution area. The amount of data to be loaded for the input height feature region can be expressed by the following formula:
[0101] Here, `load_size` represents the amount of data to be loaded, `max_idx` represents the maximum distribution index of the block input points of the output height feature block in the output point coordinate set, `min_idx` represents the minimum distribution index of the block input points of the output height feature block in the output point coordinate set, `in channels` represents the number of input channels for the 3D point cloud data, and `data_bytes` represents the bit width occupied by each data element of the 3D point cloud data. Then, starting from the block input start address, feature data of the amount of data to be loaded can be extracted from the corresponding position in the 3D point cloud input features corresponding to the 3D point cloud data, thus obtaining the target input features.
[0102] An initial matrix can be created first. The number of rows in the initial matrix corresponds to the number of data points corresponding to the target point cloud features, and the number of columns can be determined based on a preset convolutional kernel size. For example, if the preset convolutional kernel size is 3×3×3, the corresponding number of columns in the initial matrix can be set to 27. The convolutional kernel size can also be set according to actual needs; this application does not impose specific restrictions on this. The values in the initial matrix can be uniformly filled with 0. Then, the input point matrix position index can be used as the target index, the target input features as the source data, and the actual index offset as the source index. Feature values are extracted from the target input features according to the actual index offset, and the extracted feature values are filled into the initial matrix according to the input point matrix position index to generate a valid input matrix with effective input features. Through the above implementation process, an initial matrix can be created, and the target input features corresponding to the output height feature blocks can be filled into the initial matrix to obtain a valid input matrix, which facilitates efficient operation in subsequent blocks to obtain output features. During each calculation, the memory only needs to obtain the target input features corresponding to the current output height feature block from the 3D point cloud input features. This avoids storing too much data at once, which would lead to excessive NPU data transmission latency and resource consumption, thus affecting the target recognition efficiency.
[0103] It should be noted that the target identification method provided in this application embodiment can be executed by a target identification device, or a control module in the target identification device for executing the loading target identification method. This application embodiment uses the execution of the loading target identification method by a target identification device as an example to illustrate the target identification method provided in this application embodiment.
[0104] Reference Figure 4 This is a structural block diagram of an embodiment of a target recognition device according to this application, which may specifically include the following modules: The 3D point cloud data acquisition module 401 is used to acquire 3D point cloud data. The 3D point cloud output feature determination module 402 is used to determine the 3D point cloud output features corresponding to the 3D point cloud data according to the preset convolution kernel parameters. The first valid determination module 403 is used to determine the valid output points of the three-dimensional point cloud data and the corresponding valid input sequence based on the output features of the three-dimensional point cloud. The second valid determination module 404 is used to determine the valid output features corresponding to the three-dimensional point cloud data based on the valid output points and the corresponding valid input sequence. The target recognition module 405 is used to recognize the target to be recognized based on the effective output features and obtain the target recognition result.
[0105] The convolution kernel parameters include kernel size, kernel padding parameters, and kernel stride. The 3D point cloud output feature determination module 402 includes: The 3D point cloud input feature generation submodule is used to generate corresponding 3D point cloud input features based on the 3D point cloud data. The 3D point cloud input features are obtained by mapping the point coordinates of the 3D point cloud data to a one-dimensional space, and include height input features, depth input features, and width input features. The first output dimension feature determination submodule is used to determine the first output dimension feature corresponding to the three-dimensional point cloud input feature based on the convolution kernel size and the convolution kernel padding parameters. The second output dimension feature determination submodule is used to determine the second output dimension feature corresponding to the three-dimensional point cloud input feature based on the first output dimension feature and the convolution kernel stride. The 3D point cloud output feature determination submodule is used to determine the 3D point cloud output features corresponding to the 3D point cloud data based on the second output dimension features.
[0106] The first valid determination module 403 includes: The effective output identifier sequence determination submodule is used to determine the effective output identifier sequence of the three-dimensional point cloud data based on the output characteristics of the three-dimensional point cloud. The effective input index determination submodule is used to determine the effective input index corresponding to the three-dimensional point cloud data based on the effective output identifier bit sequence and the convolution kernel parameters. The effective output point and corresponding effective input sequence determination submodule is used to determine the effective output points and corresponding effective input sequence of the three-dimensional point cloud data based on the effective input index.
[0107] The second valid determination module 404 includes: The output height feature block generation submodule is used to generate output height feature blocks corresponding to the three-dimensional point cloud data according to a preset number of blocks; The block output determination submodule is used to determine the block output point sequence and the corresponding block input sequence of the output height feature block based on the valid output points and the corresponding valid input sequence. The output space size determination submodule is used to determine the output space size corresponding to the three-dimensional point cloud data; The output point coordinate determination submodule is used to determine the output point coordinates of the output height feature block based on the block output point sequence. The output point sequence index determination submodule is used to determine the output point sequence index corresponding to the output point coordinates based on the output space size. The dense space generation submodule is used to generate the dense space corresponding to the output height feature block according to the sequential index of the output points. The block input determination submodule is used to generate the input point matrix position index corresponding to the block input point of the output height feature block based on the dense space, the block output point sequence, and the block input order sequence; The valid input matrix generation submodule is used to generate a valid input matrix based on the position index of the input point matrix. The matrix multiplication operation submodule is used to perform matrix multiplication operations between the effective input matrix and the preset parameter matrix to obtain effective output features.
[0108] The block output determination submodule is further configured to: Obtain the average size of the output height feature block; The boundary information of the output height feature block is determined based on the average size of the output height feature block and the output space size; The block output point sequence and the corresponding block input sequence of the output height feature block are determined based on the boundary information, the valid output points, and the corresponding valid input sequence.
[0109] The block input determination submodule is further used for: Generate a matrix row sequence corresponding to the valid output points based on the dense space and the block output point sequence; Generate a matrix column sequence corresponding to the valid output points based on the block input sequence; The input point matrix position index of the block output point of the output height feature block is generated based on the matrix row sequence and matrix column sequence corresponding to the valid output point.
[0110] The effective input matrix generation submodule is further used for: The true index of the output height feature block is determined based on the block input sequence. The distribution area of the block input point in the input point coordinate set is determined based on the actual index; The actual index offset of the real index is determined based on the distribution area; The block input start address of the output height feature block is determined based on the distribution area; Extract target input features from the 3D point cloud data based on the distribution area and the block input start address; A valid input matrix is generated based on the input point matrix position index and the target input features.
[0111] The target identification device in this application embodiment can be a device, or a component, integrated circuit, or chip in a terminal. The device can be a mobile electronic device or a non-mobile electronic device. For example, mobile electronic devices can be mobile phones, tablets, laptops, PDAs, in-vehicle electronic devices, wearable devices, ultra-mobile personal computers (UMPCs), netbooks, or personal digital assistants (PDAs), etc., while non-mobile electronic devices can be servers, network attached storage (NAS), personal computers (PCs), televisions (TVs), ATMs, or self-service machines, etc. This application embodiment does not impose specific limitations.
[0112] The target identification device in this application embodiment can be a device with an operating system. This operating system can be Android, iOS, or other possible operating systems; this application embodiment does not specifically limit the specific operating system used.
[0113] The target recognition device provided in this application embodiment can achieve... Figures 1 to 3 The various processes implemented by the target recognition device in the method embodiment will not be described again here to avoid repetition.
[0114] The target recognition device provided in this application can acquire three-dimensional point cloud data; then determine the three-dimensional point cloud output features corresponding to the three-dimensional point cloud data according to preset convolution kernel parameters; based on the three-dimensional point cloud output features, the effective output points of the three-dimensional point cloud data and the corresponding effective input sequence with weighted position information can be determined; based on the effective output points and the corresponding effective input sequence with weighted position information, the effective output features corresponding to the three-dimensional point cloud data are determined; then, the target to be recognized can be identified based on the effective output features to obtain the target recognition result. Through the above implementation process, the output features corresponding to the three-dimensional point cloud data are first determined by preset convolution kernel parameters, and then the effective output points in the three-dimensional point cloud data and their corresponding effective input sequence with weighted position information are determined in reverse. The effective output features corresponding to the three-dimensional point cloud data are determined based on the effective output points and their corresponding effective input sequence with weighted position information to identify the target to be recognized and obtain the target recognition result. This can reduce the processing of invalid data points in the three-dimensional point cloud data processing process and improve the efficiency of target recognition. Furthermore, it does not rely on traditional hash tables or traversal search methods to generate traditional sparse convolution rule manuals, thus avoiding the impact on target recognition efficiency caused by the large amount of memory consumed in 3D point cloud data processing using traditional sparse convolution rule manuals.
[0115] Optionally, embodiments of this application also provide an electronic device, including a processor, a memory, and a program or instructions stored in the memory and executable on the processor. When the program or instructions are executed by the processor, they implement the various processes of the above-described target recognition method embodiments and achieve the same technical effects. To avoid repetition, they will not be described again here.
[0116] It should be noted that the electronic devices in the embodiments of this application include the mobile electronic devices and non-mobile electronic devices described above.
[0117] This application also provides a readable storage medium storing a program or instructions. When the program or instructions are executed by a processor, they implement the various processes of the above-described target recognition method embodiments and achieve the same technical effect. To avoid repetition, they will not be described again here.
[0118] The processor is the processor in the electronic device described in the above embodiments. The readable storage medium includes computer-readable storage media, such as computer read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk.
[0119] This application embodiment also provides a chip, which includes a processor and a communication interface. The communication interface is coupled to the processor. The processor is used to run programs or instructions to implement the various processes of the above-described target recognition method embodiments and can achieve the same technical effect. To avoid repetition, it will not be described again here.
[0120] It should be understood that the chip mentioned in the embodiments of this application may also be referred to as a system-on-a-chip, system chip, chip system, or system-on-a-chip, etc.
[0121] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element. Furthermore, it should be noted that the scope of the methods and apparatuses in the embodiments of this application is not limited to performing functions in the order shown or discussed, but may also include performing functions substantially simultaneously or in the reverse order, depending on the functions involved. For example, the described methods may be performed in a different order than described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
[0122] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in the various embodiments of this application.
[0123] The embodiments of this application have been described above with reference to the accompanying drawings. However, this application is not limited to the specific embodiments described above. The specific embodiments described above are merely illustrative and not restrictive. Those skilled in the art can make many other forms under the guidance of this application without departing from the spirit and scope of the claims, and all of these forms are within the protection scope of this application.
Claims
1. A target recognition method, characterized in that, The method includes: Acquire 3D point cloud data; The three-dimensional point cloud output features corresponding to the three-dimensional point cloud data are determined according to the preset convolution kernel parameters; The effective output points and corresponding effective input sequence of the three-dimensional point cloud data are determined based on the output characteristics of the three-dimensional point cloud. The effective output features corresponding to the three-dimensional point cloud data are determined based on the effective output points and the corresponding effective input sequence. The target to be identified is determined based on the effective output features, and the target identification result is obtained.
2. The method according to claim 1, characterized in that, The convolution kernel parameters include kernel size, kernel padding parameters, and kernel stride. Determining the 3D point cloud output features corresponding to the 3D point cloud data based on the preset kernel parameters includes: The corresponding three-dimensional point cloud input features are generated based on the three-dimensional point cloud data. The three-dimensional point cloud input features are obtained by mapping the point coordinates of the three-dimensional point cloud data to a one-dimensional space, and include height input features, depth input features and width input features. The first output dimension feature corresponding to the input feature of the 3D point cloud is determined based on the convolution kernel size and the convolution kernel filling parameters. The second output dimension feature corresponding to the input feature of the 3D point cloud is determined based on the first output dimension feature and the convolution kernel stride. The three-dimensional point cloud output features corresponding to the three-dimensional point cloud data are determined based on the second output dimension features.
3. The method according to claim 1, characterized in that, The step of determining the valid output points and corresponding valid input sequence of the 3D point cloud data based on the output features of the 3D point cloud includes: The effective output identifier bit sequence of the three-dimensional point cloud data is determined based on the output characteristics of the three-dimensional point cloud. The effective input index corresponding to the 3D point cloud data is determined based on the effective output identifier sequence and the convolution kernel parameters. The valid output points and corresponding valid input sequence of the 3D point cloud data are determined based on the valid input index.
4. The method according to claim 1, characterized in that, The step of determining the effective output features corresponding to the 3D point cloud data based on the effective output points and the corresponding effective input sequence includes: The output height feature blocks corresponding to the three-dimensional point cloud data are generated according to the preset number of blocks; The block output point sequence and the corresponding block input sequence of the output height feature block are determined based on the valid output points and the corresponding valid input sequence. Determine the output spatial dimensions corresponding to the three-dimensional point cloud data; The output point coordinates of the output height feature block are determined based on the block output point sequence. The output point sequence index corresponding to the output point coordinates is determined based on the output space size; A dense space corresponding to the output height feature block is generated based on the output point sequential index; Generate the input point matrix position index corresponding to the block input point of the output height feature block based on the dense space, the block output point sequence, and the block input sequence; Generate a valid input matrix based on the position index of the input point matrix; The effective output features are obtained by performing matrix multiplication between the effective input matrix and the preset parameter matrix.
5. The method according to claim 4, characterized in that, The process of determining the block output point sequence and the corresponding block input order sequence of the output height feature block includes: Obtain the average size of the output height feature block; The boundary information of the output height feature block is determined based on the average size of the output height feature block and the output space size; The block output point sequence and the corresponding block input sequence of the output height feature block are determined based on the boundary information, the valid output points, and the corresponding valid input sequence.
6. The method according to claim 4, characterized in that, The step of generating the input point matrix position index corresponding to the block input points of the output height feature block based on the dense space, the block output point sequence, and the block input order sequence includes: Generate a matrix row sequence corresponding to the valid output points based on the dense space and the block output point sequence; Generate a matrix column sequence corresponding to the valid output points based on the block input sequence; The input point matrix position index of the block input point of the output height feature block is generated based on the matrix row sequence and matrix column sequence corresponding to the valid output point.
7. The method according to claim 4, characterized in that, The step of generating a valid input matrix based on the position index of the input point matrix includes: The true index of the output height feature block is determined based on the block input sequence. The distribution area of the block input point in the output point coordinate set is determined based on the actual index; The actual index offset of the real index is determined based on the distribution area; The block input start address of the output height feature block is determined based on the distribution area; Extract target input features from the 3D point cloud data based on the distribution area and the block input start address; A valid input matrix is generated based on the input point matrix position index and the target input features.
8. A target recognition device, characterized in that, The device includes: The 3D point cloud data acquisition module is used to acquire 3D point cloud data. The 3D point cloud output feature determination module is used to determine the 3D point cloud output features corresponding to the 3D point cloud data based on preset convolution kernel parameters. The first valid determination module is used to determine the valid output points of the three-dimensional point cloud data and the corresponding valid input sequence based on the output features of the three-dimensional point cloud. The second effective determination module is used to determine the effective output features corresponding to the three-dimensional point cloud data based on the effective output points and the corresponding effective input sequence. The target recognition module is used to identify the target to be identified based on the effective output features, and obtain the target recognition result.
9. An electronic device, characterized in that, It includes a processor, a memory, and a program or instructions stored in the memory and executable on the processor, wherein the program or instructions, when executed by the processor, implement the steps of the target recognition method as described in claims 1-7.
10. A readable storage medium, characterized in that, The readable storage medium stores a program or instructions that, when executed by a processor, implement the steps of the target recognition method as described in claims 1-7.