Image recognition method and device based on point cloud data, equipment and storage medium

By performing matrix partitioning and grayscale image generation on point cloud data, combined with a convolutional neural network model, the problem of low target object recognition accuracy at long distances and in harsh environments was solved, achieving higher recognition accuracy.

CN116704491BActive Publication Date: 2026-06-23INDUSTRIAL AND COMMERCIAL BANK OF CHINA

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
INDUSTRIAL AND COMMERCIAL BANK OF CHINA
Filing Date
2023-05-25
Publication Date
2026-06-23

Smart Images

  • Figure CN116704491B_ABST
    Figure CN116704491B_ABST
Patent Text Reader

Abstract

Embodiments of the present specification provide a point cloud data-based image recognition method, device and equipment and a storage medium, wherein the method comprises: obtaining point cloud data of a to-be-recognized object; dividing the point cloud data to obtain a plurality of matrices, and calculating an average value of distances between all elements in each matrix as a value of the matrix; generating a gray-scale image of the to-be-recognized object according to the value of each matrix; and inputting the gray-scale image of the to-be-recognized object into a convolutional neural network model to recognize the to-be-recognized object. Embodiments of the present specification can determine the specific morphology of the to-be-recognized object, and improve the accuracy of recognition.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This specification relates to the field of image recognition, and in particular to an image recognition method, apparatus, device, and storage medium based on point cloud data. Background Technology

[0002] Point cloud data refers to a set of vectors in a three-dimensional coordinate system. Point cloud data of a target object can generally be acquired through vision devices or LiDAR. After acquiring the point cloud data, it is necessary to identify the target object based on the point cloud data in order to determine its specific shape.

[0003] In existing technologies, the accuracy of target object recognition is reduced due to the influence of the scene environment, such as at a distance, in harsh environments, or when the target object is not clear. Therefore, there is an urgent need for an image recognition method based on point cloud data that can improve the accuracy of target object recognition. Summary of the Invention

[0004] The purpose of the embodiments in this specification is to provide an image recognition method, apparatus, device, and storage medium based on point cloud data to improve the accuracy of target object recognition.

[0005] To achieve the above objectives, in one aspect, embodiments of this specification provide an image recognition method based on point cloud data, including:

[0006] Acquire point cloud data of the object to be identified;

[0007] The point cloud data is divided into several matrices, and the average distance between all elements in each matrix is ​​calculated as the value of the matrix.

[0008] Based on the value of each of the matrices, a grayscale image of the object to be identified is generated;

[0009] The grayscale image of the object to be identified is input into a convolutional neural network model to identify the object.

[0010] Preferably, the step of dividing the point cloud data to obtain several matrices further includes:

[0011] The point cloud data is divided into several cells of a preset size, wherein each cell occupies a three-dimensional space and no two cells overlap.

[0012] The cell is divided along a certain direction to form a preset number of planar grids, wherein no two planar grids overlap, and each planar grid contains at least one point cloud data.

[0013] Based on the predetermined number of planar grids, a predetermined number of matrices are obtained, wherein each point cloud data in the planar grid corresponds to an element in the corresponding matrix.

[0014] Preferably, generating a grayscale image of the object to be identified based on the value of each of the matrices further includes:

[0015] The target resolution of the grayscale image is determined based on the preset number;

[0016] Based on the value of each matrix, a grayscale image of the object to be identified with a resolution matching the target resolution is generated.

[0017] Preferably, the method for determining the preset number of planar grids includes:

[0018] Obtain known point cloud data of known objects;

[0019] The known point cloud data is divided into several cells of a preset size;

[0020] The cell is divided into multiple segments along a certain direction to form multiple experimental grids, thereby obtaining multiple different experimental matrices, wherein the number of grids in the multiple experimental grids is different;

[0021] Based on the values ​​of each matrix in each experiment matrix, generate a grayscale image of the known object corresponding to each experiment grid.

[0022] The grayscale image of the known object is input into a convolutional neural network model to obtain the recognition result corresponding to each experimental grid.

[0023] Based on the recognition results, the recognition accuracy rates corresponding to various different experimental grids were obtained;

[0024] The one with the highest recognition accuracy is used as the preset number of planar grids.

[0025] Preferably, the convolutional layers of the convolutional neural network model are discrete convolutional layers.

[0026] Preferably, the convolutional neural network model has two discrete convolutional layers, wherein the number of feature maps and the size of the convolutional kernels of the two discrete convolutional layers are different.

[0027] Preferably, each discrete convolutional layer of the convolutional neural network model is provided with a max pooling layer to prevent overfitting and a normalization layer for local normalization.

[0028] Preferably, the discrete convolutional layer is expressed by the following formula:

[0029]

[0030] Among them, M β α is the input feature; x is the weight of the network connection; β is the number of layers in the input feature network; k is the convolution kernel; γ is the number of layers in the network; b is the bias term for each output feature map. For a specific output map, the input feature map can be obtained by convolution with different kernels; f(·) is the activation function used in the neural network.

[0031] Preferably, before dividing the point cloud data into several matrices and calculating the average distance between all elements in each matrix as the value of the matrix, the method further includes:

[0032] The point cloud data is denoised using the centroid method.

[0033] On the other hand, embodiments of this specification provide an image recognition device based on point cloud data, the device comprising:

[0034] The acquisition module is used to acquire point cloud data of the object to be identified;

[0035] The partitioning module is used to partition the point cloud data into several matrices and calculate the average distance between all elements in each matrix as the value of the matrix.

[0036] A generation module is used to generate a grayscale image of the object to be identified based on the value of each of the matrices;

[0037] The recognition module is used to input the grayscale image of the object to be recognized into the convolutional neural network model to recognize the object to be recognized.

[0038] In another aspect, embodiments of this specification also provide a computer device, including a memory, a processor, and a computer program stored in the memory, wherein the computer program, when executed by the processor, performs instructions of any of the methods described above.

[0039] In another aspect, embodiments of this specification also provide a computer-readable storage medium having a computer program stored thereon, the computer program being executed by a processor of a computer device to perform instructions for any of the methods described above.

[0040] In another aspect, embodiments of this specification also provide a computer program product, which, when run by the processor of a computer device, executes instructions for any of the methods described above.

[0041] As can be seen from the technical solutions provided in the embodiments of this specification above, the method of the embodiments of this specification can divide the point cloud data of the object to be identified into several matrices, calculate the average distance between all elements in each matrix as the value of the matrix, generate a grayscale image of the object to be identified based on the value of the matrix, identify the grayscale image through a convolutional neural network to determine the object to be identified, and then determine the specific shape of the object to be identified, thereby improving the accuracy of identification.

[0042] To make the above and other objects, features and advantages of the embodiments of this specification more apparent and understandable, preferred embodiments are described below in detail with reference to the accompanying drawings. Attached Figure Description

[0043] To more clearly illustrate the technical solutions in the embodiments of this specification or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the embodiments of this specification. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0044] Figure 1 A flowchart illustrating an image recognition method based on point cloud data provided in an embodiment of this specification is shown.

[0045] Figure 2 This document illustrates a flowchart of the process for obtaining several matrices from point cloud data as provided in an embodiment of this specification.

[0046] Figure 3 A flowchart illustrating the process for determining a preset number of planar grids, as provided in an embodiment of this specification, is shown.

[0047] Figure 4 The following are graphs showing the relationship between the number of training iterations and the recognition rate for the four experimental grids provided in the embodiments of this specification.

[0048] Figure 5 This document illustrates a flowchart of a process for generating a grayscale image of an object to be identified based on the values ​​of each matrix, as provided in an embodiment of this specification.

[0049] Figure 6 This specification shows a schematic diagram of the module structure of an image recognition device based on point cloud data provided in an embodiment of the present specification;

[0050] Figure 7 A schematic diagram of the structure of a computer device provided in an embodiment of this specification is shown.

[0051] Explanation of symbols in the attached drawings:

[0052] 100. Acquisition Module;

[0053] 200. Divide into modules;

[0054] 300. Generation module;

[0055] 400. Identification module;

[0056] 702. Computer equipment;

[0057] 704, Processor;

[0058] 706. Memory;

[0059] 708. Drive mechanism;

[0060] 710. Input / Output Module;

[0061] 712. Input devices;

[0062] 714. Output devices;

[0063] 716. Presentation equipment;

[0064] 718. Graphical User Interface;

[0065] 720. Network interface;

[0066] 722. Communication link;

[0067] 724. Communication bus. Detailed Implementation

[0068] The technical solutions in the embodiments of this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the embodiments of this specification, and not all embodiments. Based on the embodiments in this specification, all other embodiments obtained by those skilled in the art without creative effort are within the protection scope of the embodiments of this specification.

[0069] Point cloud data refers to a set of vectors in a three-dimensional coordinate system. It is typically acquired using vision devices or LiDAR. After acquiring the point cloud data, identification is performed based on it to determine the specific shape of the target object. However, in existing technologies, the accuracy of target object identification decreases due to environmental factors such as long distances, harsh environments, and unclear target objects.

[0070] To address the aforementioned issues, this specification provides an image recognition method based on point cloud data. Figure 1This is a flowchart illustrating an image recognition method based on point cloud data provided in an embodiment of this specification. This specification provides the operational steps of the method described in the embodiment or flowchart, but based on conventional or non-inventive labor, more or fewer operational steps may be included. The order of steps listed in the embodiment is merely one possible execution order among many and does not represent the only possible execution order. In actual system or device products, the methods shown in the embodiment or drawings can be executed sequentially or in parallel.

[0071] It should be noted that the terms "first," "second," etc., used in the specification, claims, and accompanying drawings herein are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, apparatus, product, or device that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or devices.

[0072] Reference Figure 1 This specification provides an image recognition method based on point cloud data, including:

[0073] S101: Acquire point cloud data of the object to be identified;

[0074] S102: Divide the point cloud data into several matrices, and calculate the average distance between all elements in each matrix as the value of the matrix;

[0075] S103: Generate a grayscale image of the object to be identified based on the value of each of the matrices;

[0076] S104: Input the grayscale image of the object to be identified into the convolutional neural network model to identify the object.

[0077] The object to be identified is an unknown image. Point cloud data of the object can be acquired using LiDAR or vision devices. Point cloud data is generally three-dimensional data. To better identify the image, the point cloud data of the object to be identified can be divided into several matrices. The image of the object to be identified is quantified by the values ​​of these matrices. Each element in the matrix corresponds to a point cloud data point. The method for calculating the average distance between all elements in the matrix is ​​as follows: obtain the nearest element of each element in the matrix, then calculate the distance from that element to its nearest element, sum the distances corresponding to all elements, and finally divide by the number of elements to obtain the average distance between all elements. This average distance is used as the value of the matrix.

[0078] Each matrix has a corresponding matrix value. After converting it into a grayscale image of the object to be identified, the object can be further identified through a convolutional neural network model, which is a neural network model that has been trained.

[0079] Generally, LiDAR can obtain point cloud data under adverse conditions such as nighttime and foggy weather. This point cloud data is sparse and incomplete, and extracting features manually requires heuristic methods and highly specialized knowledge, which largely relies on personal experience. Convolutional neural network models, on the other hand, can automatically extract features and classify data, and are invariant to displacement, scaling, and other forms of rigid body changes, thus improving the accuracy of recognition.

[0080] In the embodiments described in this specification, reference is made to Figure 2 The step of dividing the point cloud data to obtain several matrices further includes:

[0081] S201: Divide the point cloud data into several cells of a preset size, wherein each cell occupies a three-dimensional space and no two cells overlap.

[0082] S202: The cell is divided along a certain direction to form a preset number of planar grids, wherein no two planar grids overlap, and each planar grid contains at least one point cloud data.

[0083] S203: Based on the preset number of planar grids, obtain a preset number of matrices, wherein each point cloud data in the planar grid corresponds to an element in the corresponding matrix.

[0084] When dividing point cloud data into several matrices, the point cloud data is first divided into several cells of a preset size. Each cell occupies three-dimensional space, and its preset size can be determined according to requirements, for example, a 32×32×32 cell. Each point cloud data belongs to one and only one cell. Then, the cells are cut along a certain direction to form a preset number of planar grids, converting the three-dimensional cells into two-dimensional planar grids. Each point cloud data belongs to one and only one planar grid, and one planar grid corresponds to one matrix.

[0085] When dividing the planar grid into segments, it is necessary to further determine the preset number of planar grids. Too many or too few planar grids will result in low accuracy of the final recognition. Therefore, in order to improve the accuracy of subsequent recognition, it is necessary to determine the preset number of planar grids through a certain method.

[0086] Specifically, refer to Figure 3 The method for determining the preset number of planar grids includes:

[0087] S301: Obtain known point cloud data of a known object;

[0088] S302: Divide the known point cloud data into several cells of a preset size;

[0089] S303: The cell is divided into multiple segments along a certain direction to form multiple experimental grids, so as to obtain multiple different experimental matrices, wherein the number of grids is different between the multiple experimental grids;

[0090] S304: Generate a grayscale image of the known object corresponding to each experimental grid based on the value of each matrix in each experimental matrix;

[0091] S305: Input the grayscale image of the known object into a convolutional neural network model to obtain the recognition result corresponding to each experimental grid;

[0092] S306: Based on the recognition results, obtain the recognition accuracy rates corresponding to various different experimental grids;

[0093] S307: Select the grid with the highest recognition accuracy as the preset number of planar grids.

[0094] The known point cloud data of the known object is divided into several cells of a preset size, which are consistent with the preset size cells of the object to be identified. Similarly, the cells need to be segmented along a certain direction. The segmentation direction of the known object is consistent with the segmentation direction of the object to be identified. However, in order to determine the optimal number of planar grids for the highest accuracy of subsequent identification, multiple segmentations are required. Each segmentation yields an experimental grid, resulting in multiple experimental grids, which correspond to multiple experimental matrices. The number of grids differs among the multiple experimental grids.

[0095] For example, the first experimental grid has 216×280 grids, resulting in the first experimental matrix; the second experimental grid has 108×140 grids, resulting in the second experimental matrix; the third experimental grid has 54×70 grids, resulting in the third experimental matrix; and the fourth experimental grid has 27×35 grids, resulting in the fourth experimental matrix. Each experimental grid includes multiple grids, corresponding to multiple matrices in an experimental matrix. The average distance between all elements in each matrix is ​​calculated as the matrix value. Based on the value of each matrix in each experimental matrix, a grayscale image of the known object corresponding to each experimental grid is generated.

[0096] The grayscale image of each known object is input into the same convolutional neural network model as the object to be identified, and the recognition result corresponding to each experimental grid is obtained. Furthermore, based on the recognition result, the recognition accuracy corresponding to various different experimental grids can be obtained. The specific method for calculating the recognition accuracy is as follows:

[0097] accuracy=(TP+TN) / (TP+FP+TN+FN);

[0098] Wherein, accuracy is the accuracy rate, TP is the number of positive classes predicted as positive, TN is the number of negative classes predicted as negative, FP is the number of negative classes predicted as positive (i.e., false positives), and FN is the number of positive classes predicted as negative (i.e., false negatives).

[0099] The above method can be used to determine the experimental grid with the highest recognition accuracy, and the number of grids corresponding to this experimental grid is the preset number. This experimental grid is the preset number of planar grids.

[0100] Reference Figure 4 Taking the four experimental grids mentioned above as examples, the fourth experimental grid, namely the 27×35 grid, has the highest recognition accuracy. Therefore, the preset number of planar grids is the 27×35 experimental grid. Combined with... Figure 4At the beginning of training a convolutional neural network (CNN) model, the model is not fully trained, resulting in low accuracy. As the number of training iterations increases, the parameters of the CNN model learn continuously, and the classification accuracy gradually increases. Finally, the classification accuracy will fluctuate within a small range, indicating that the CNN model has converged and the classification accuracy has stabilized.

[0101] In the embodiments described in this specification, reference is made to Figure 5 The step of generating a grayscale image of the object to be identified based on the value of each of the matrices further includes:

[0102] S401: Determine the target resolution of the grayscale image based on the preset number;

[0103] S402: Generate a grayscale image of the object to be identified with a resolution that matches the target resolution, based on the value of each matrix.

[0104] Assuming the preset number of planar grids corresponds to a grid number of 27×35, the target resolution of the grayscale image is also 27×35. This ensures that each planar grid is converted during the matrix-to-grayscale image conversion process.

[0105] In the embodiments described herein, the convolutional layers of the convolutional neural network model are discrete convolutional layers.

[0106] Specifically, the convolutional neural network model includes two discrete convolutional layers, each with a different number of feature maps and kernel size. For example, the two discrete convolutional layers may have 10 and 15 feature maps respectively, and kernel sizes of 8 and 5 respectively. Each discrete convolutional layer in the convolutional neural network model includes a max-pooling layer to prevent overfitting and a normalization layer for local normalization. For example, the convolutional neural network model may sequentially include an input layer, a first discrete convolutional layer, a first normalization layer, a first max-pooling layer, a second discrete convolutional layer, a second normalization layer, a second max-pooling layer, a first fully connected layer, a Dropout layer, a second fully connected layer, and an output layer. The dropout rate of the Dropout layer is 0.5, and the number of neurons in the two fully connected layers is 256 and 10 respectively. The more parameters and depth of the two discrete convolutional layers, the better the feature learning ability. Convolutional Neural Networks (CNNs) are inspired by the neural mechanisms of the visual system and combine their structural principles with those of artificial neural networks. They are artificial neural network systems with deep learning capabilities. A standard CNN model is a special type of multi-layered feedforward neural network. It has a deep network structure, generally consisting of an input layer, convolutional layers, downsampling layers, fully connected layers, and an output layer, where the convolutional, downsampling, and fully connected layers are hidden layers. In a CNN model, the input layer is typically a matrix used to receive the original image; the convolutional layers are used for image feature extraction and can reduce noise interference; convolutional layers share local weights, a special structure that more closely resembles real biological neural networks, giving CNNs a unique advantage in image processing. Compared to fully connected layers, sharing weights reduces network parameters and speeds up training. It also reduces network complexity, and multi-dimensional input signals (speech, images) can be directly input, avoiding the data rearrangement process during feature extraction and classification. The downsampling layer reduces the amount of data that needs to be processed based on the principle of local correlation in the image; the output layer maps the extracted features to the predicted labels.

[0107] The weights and biases of a convolutional neural network can be learned through the backpropagation algorithm, so there is no need for manual feature extraction. Convolutional neural networks use the classic BP (back propagation) algorithm to adjust parameters and finally complete the learning task.

[0108] The discrete convolutional layer is expressed by the following formula:

[0109]

[0110] Among them, M βα is the input feature; γ is a choice of input feature; x is the weight of the network connection, which is continuously updated during backpropagation; β is the number of layers in the input feature network, representing the bias term of which layer; k is the convolution kernel; γ is the number of layers in the network; b is the bias term of each output feature map. For a specific output map, the input feature map can be obtained by convolution with different kernels; f(·) is the activation function used in the neural network, the most commonly used being the sigmoid function and the ReLU function. The sigmoid function is... It maps [-∞, +∞] to [0, +1]. The ReLU function is expressed as f(x) = max(0, x), and the output is equal to the input only when the input signal is greater than 0, and the output is 0 when the signal is less than 0.

[0111] In this embodiment of the specification, before dividing the point cloud data into several matrices and calculating the average distance between all elements in each matrix as the value of the matrix, the method further includes:

[0112] The point cloud data is denoised using the centroid method. Of course, besides the centroid method, other methods can also be used for denoising, which will not be elaborated upon in the embodiments of this specification. Due to external factors, device pixel count, acquisition distance, and corner positions, some point cloud data may be unusable and need to be removed to ensure the accuracy of subsequent object recognition.

[0113] The method described in this specification can divide the point cloud data of the object to be identified into several matrices. The average distance between all elements in each matrix is ​​calculated as the value of the matrix. A grayscale image of the object to be identified is then generated based on the matrix value. The grayscale image is then identified by a convolutional neural network to determine the object to be identified, thereby determining the specific shape of the object and improving the accuracy of the identification.

[0114] Based on the image recognition method based on point cloud data described above, this specification also provides an image recognition device based on point cloud data. The device may include a system (including a distributed system), software (application), module, component, server, client, etc., using the method described in this specification, combined with necessary hardware implementation. Based on the same innovative concept, the devices in one or more embodiments provided in this specification are as described in the following embodiments. Since the implementation schemes and methods for solving the problem are similar, the implementation of specific devices in this specification can refer to the implementation of the aforementioned method, and repeated details will not be repeated. As used below, the terms "unit" or "module" can refer to a combination of software and / or hardware that implements a predetermined function. Although the devices described in the following embodiments are preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.

[0115] Specifically, Figure 6 This is a schematic diagram of the module structure of an image recognition device based on point cloud data according to an embodiment of the present specification. (Refer to...) Figure 6 As shown in the embodiments of this specification, an image recognition device based on point cloud data includes: an acquisition module 100, a segmentation module 200, a generation module 300, and a recognition module 400.

[0116] The acquisition module 100 is used to acquire point cloud data of the object to be identified;

[0117] The segmentation module 200 is used to segment the point cloud data to obtain several matrices, and calculate the average distance between all elements in each matrix as the value of the matrix;

[0118] The generation module 300 is used to generate a grayscale image of the object to be identified based on the value of each of the matrices;

[0119] The recognition module 400 is used to input the grayscale image of the object to be recognized into a convolutional neural network model to recognize the object to be recognized.

[0120] Reference Figure 7 As shown, based on the image recognition method based on point cloud data described above, one embodiment of this specification also provides a computer device 702, wherein the above method runs on the computer device 702. The computer device 702 may include one or more processors 704, such as one or more central processing units (CPUs) or graphics processing units (GPUs), each processing unit may implement one or more hardware threads. The computer device 702 may also include any memory 706 for storing any kind of information such as code, settings, data, etc. In one specific embodiment, a computer program is stored on the memory 706 and can run on the processor 704. When the computer program is run by the processor 704, it can execute instructions according to the above method. Non-limitingly, for example, the memory 706 may include any type of RAM, any type of ROM, flash memory device, hard disk, optical disk, etc. More generally, any memory can use any technology to store information. Further, any memory can provide volatile or non-volatile retention of information. Further, any memory can represent a fixed or removable component of the computer device 702. In one scenario, when processor 704 executes associated instructions stored in any memory or combination of memories, computer device 702 can perform any operation of the associated instructions. Computer device 702 also includes one or more drive mechanisms 708 for interacting with any memory, such as hard disk drive mechanisms, optical disk drive mechanisms, etc.

[0121] Computer device 702 may also include an input / output module 710 (I / O) for receiving various inputs (via input device 712) and providing various outputs (via output device 714). A specific output mechanism may include a presentation device 716 and an associated graphical user interface 718 (GUI). In other embodiments, the input / output module 710 (I / O), input device 712, and output device 714 may be omitted, and the device may function solely as a computer device within a network. Computer device 702 may also include one or more network interfaces 720 for exchanging data with other devices via one or more communication links 722. One or more communication buses 724 couple the components described above together.

[0122] Communication link 722 can be implemented in any way, such as via a local area network, a wide area network (e.g., the Internet), a point-to-point connection, or any combination thereof. Communication link 7422 may include any combination of hardwired links, wireless links, routers, gateway functions, name servers, etc., governed by any protocol or combination of protocols.

[0123] Corresponding to Figures 1-3 and Figure 5 In addition to the methods described above, embodiments of this specification also provide a computer-readable storage medium storing a computer program that, when executed by a processor, performs the steps of the methods described above.

[0124] This specification also provides an embodiment of a computer program product that, when executed by the processor of a computer device, performs according to... Figures 1-3 and Figure 5 The method shown.

[0125] This specification also provides computer-readable instructions, wherein when a processor executes the instructions, the program therein causes the processor to perform the following... Figures 1-3 and Figure 5 The method shown.

[0126] It should be understood that in the various embodiments of this specification, the sequence number of each process does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this specification.

[0127] It should also be understood that, in the embodiments of this specification, the term "and / or" is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, and B existing alone. Furthermore, in the embodiments of this specification, the character " / " generally indicates that the preceding and following related objects have an "or" relationship.

[0128] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed in this specification can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of each example have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of the embodiments in this specification.

[0129] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0130] In the embodiments provided in this specification, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the couplings or direct couplings or communication connections shown or discussed may be indirect couplings or communication connections through some interfaces, devices, or units, or they may be electrical, mechanical, or other forms of connection.

[0131] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of the embodiments described in this specification, depending on actual needs.

[0132] Furthermore, the functional units in the various embodiments of this specification can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0133] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the embodiments of this specification, in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this specification. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0134] This specification uses specific embodiments to illustrate the principles and implementation methods of the embodiments. The above description of the embodiments is only for the purpose of helping to understand the methods and core ideas of the embodiments in this specification. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of the embodiments in this specification. Therefore, the content of this specification should not be construed as a limitation on the embodiments in this specification.

Claims

1. An image recognition method based on point cloud data, characterized in that, include: Acquire point cloud data of the object to be identified; The point cloud data is divided into several matrices, and the average distance between all elements in each matrix is ​​calculated as the value of the matrix. Based on the value of each of the matrices, a grayscale image of the object to be identified is generated; The grayscale image of the object to be identified is input into a convolutional neural network model to identify the object. The step of dividing the point cloud data to obtain several matrices further includes: The point cloud data is divided into several cells of a preset size, wherein each cell occupies a three-dimensional space and no two cells overlap. The cell is divided along a certain direction to form a preset number of planar grids, wherein no two planar grids overlap, and each planar grid contains at least one point cloud data. Based on the predetermined number of planar grids, a predetermined number of matrices are obtained, wherein each point cloud data in the planar grid corresponds to an element in the corresponding matrix; The method for determining the preset number of planar grids includes: Obtain known point cloud data of known objects; The known point cloud data is divided into several cells of a preset size; The cell is divided into multiple segments along a certain direction to form multiple experimental grids, thereby obtaining multiple different experimental matrices, wherein the number of grids in the multiple experimental grids is different; Based on the values ​​of each matrix in each experiment matrix, generate a grayscale image of the known object corresponding to each experiment grid. The grayscale image of the known object is input into a convolutional neural network model to obtain the recognition result corresponding to each experimental grid. Based on the recognition results, the recognition accuracy rates corresponding to various different experimental grids were obtained; The one with the highest recognition accuracy is used as the preset number of planar grids.

2. The image recognition method based on point cloud data according to claim 1, characterized in that, The step of generating a grayscale image of the object to be identified based on the value of each of the matrices further includes: The target resolution of the grayscale image is determined based on the preset number; Based on the value of each matrix, a grayscale image of the object to be identified with a resolution matching the target resolution is generated.

3. The image recognition method based on point cloud data according to claim 1, characterized in that, The convolutional layers of the convolutional neural network model are discrete convolutional layers.

4. The image recognition method based on point cloud data according to claim 3, characterized in that, The convolutional neural network model has two discrete convolutional layers, wherein the number of feature maps and the size of the convolutional kernels of the two discrete convolutional layers are different.

5. The image recognition method based on point cloud data according to claim 4, characterized in that, Each discrete convolutional layer of the convolutional neural network model is provided with a max pooling layer to prevent overfitting and a normalization layer for local normalization.

6. The image recognition method based on point cloud data according to claim 3, characterized in that, The discrete convolutional layer is expressed by the following formula: ; in, α is the input feature; x is a choice of input feature; β is the weight of the network connection; k is the number of layers in the input feature network; and k is the convolution kernel. is the number of layers in the network; b is the bias term for each output feature map. For a specific output map, the input feature map can be obtained by convolution with different kernels; f(·) is the activation function used in the neural network.

7. The image recognition method based on point cloud data according to claim 1, characterized in that, Before dividing the point cloud data into several matrices and calculating the average distance between all elements in each matrix as the value of the matrix, the process further includes: The point cloud data is denoised using the centroid method.

8. An image recognition device based on point cloud data, characterized in that, The device includes: The acquisition module is used to acquire point cloud data of the object to be identified; The partitioning module is used to partition the point cloud data into several matrices and calculate the average distance between all elements in each matrix as the value of the matrix. A generation module is used to generate a grayscale image of the object to be identified based on the value of each of the matrices; The recognition module is used to input the grayscale image of the object to be recognized into a convolutional neural network model to recognize the object to be recognized; The step of dividing the point cloud data to obtain several matrices further includes: The point cloud data is divided into several cells of a preset size, wherein each cell occupies a three-dimensional space and no two cells overlap. The cell is divided along a certain direction to form a preset number of planar grids, wherein no two planar grids overlap, and each planar grid contains at least one point cloud data. Based on the predetermined number of planar grids, a predetermined number of matrices are obtained, wherein each point cloud data in the planar grid corresponds to an element in the corresponding matrix; The method for determining the preset number of planar grids includes: Obtain known point cloud data of known objects; The known point cloud data is divided into several cells of a preset size; The cell is divided into multiple segments along a certain direction to form multiple experimental grids, thereby obtaining multiple different experimental matrices, wherein the number of grids in the multiple experimental grids is different; Based on the values ​​of each matrix in each experiment matrix, generate a grayscale image of the known object corresponding to each experiment grid. The grayscale image of the known object is input into a convolutional neural network model to obtain the recognition result corresponding to each experimental grid. Based on the recognition results, the recognition accuracy rates corresponding to various different experimental grids were obtained; The one with the highest recognition accuracy is used as the preset number of planar grids.

9. A computer device comprising a memory, a processor, and a computer program stored in the memory, characterized in that, When the computer program is run by the processor, it executes the instructions of the method according to any one of claims 1-7.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is run by the processor of the computer device, it executes the instructions of the method according to any one of claims 1-7.

11. A computer program product, characterized in that, When the computer program product is run by the processor of a computer device, it executes the instructions of the method according to any one of claims 1-7.