A material grabbing sequence determination method, device, equipment and storage medium

By generating template point clouds and depth maps of material stacks, and using convolution operations to slide and generate multi-layer depth maps in the point cloud of the material scene, the problem of inaccurate height positioning of stacked materials in the existing technology is solved, and high-precision global height recognition and grasping order determination are achieved under different lighting conditions.

CN122244166APending Publication Date: 2026-06-19DEXFORCE TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
DEXFORCE TECH CO LTD
Filing Date
2026-05-21
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing technologies, the height detection and positioning methods for stacked materials are difficult to accurately distinguish the actual height position of a single material in multi-layer stacked materials, especially under poor lighting conditions or when the material has no texture, making it difficult for automated gripping devices to determine a reasonable gripping sequence.

Method used

By acquiring the point cloud of the material scene of the material stack, performing instance segmentation, and generating standardized convolution kernels for template point clouds, multi-layer depth maps are generated based on the point cloud density. The global height position of each material is determined through convolution operations, thereby determining the grabbing order.

🎯Benefits of technology

Under different lighting conditions, the boundary of stacked materials can be accurately distinguished, and high-precision global height position recognition can be achieved. This improves computing efficiency, reduces the complexity of point cloud computing, and solves the problem that existing technologies cannot accurately locate the height of stacked materials.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244166A_ABST
    Figure CN122244166A_ABST
Patent Text Reader

Abstract

This invention discloses a method, apparatus, device, and storage medium for determining the material grasping order. It relates to the field of point cloud processing technology. The method includes: acquiring a material scene point cloud corresponding to a material stack, wherein the material stack includes multiple stacked materials; performing instance segmentation on the material scene point cloud to obtain instance point clouds corresponding to each material; acquiring a template point cloud corresponding to the material stack, and generating a standardized convolutional kernel based on the depth map corresponding to the template point cloud; determining a matching window in the material scene point cloud based on the point cloud density of the template point cloud, and generating multiple layers of depth maps by sliding according to the matching window in the material scene point cloud; performing convolution operations with the standardized convolutional kernel of the template point cloud on each layer of depth map to obtain the global height position of each material; and determining the material grasping order based on the global height position of each material. This method can accurately obtain the global height position of each material.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The embodiments of the present invention relate to the field of point cloud processing technology, and in particular to a method, apparatus, device and storage medium for determining the material grasping order. Background Technology

[0002] In industrial settings, materials are often stacked in multiple layers to form stacks. The materials vary depending on the industry, ranging from cardboard boxes and sacks to industrial parts and cylinders. Traditionally, sorting is done manually or by manually controlled machines. With the development of artificial intelligence, automated gripping devices have emerged that can automatically grasp materials from stacks. However, these automated gripping devices require precise measurement of the height of each stack to ensure a proper gripping sequence and prevent stack collapse.

[0003] However, existing methods for detecting and locating the height of stacked materials typically rely on recognizing image contours, which can only achieve a rough detection of the overall contour of the material stack. Furthermore, recognizing image contours depends on the texture of the material and lighting conditions. When multiple materials are stacked together and the edge contours are difficult to define, or when faced with solid-color, reflective, textureless, or poorly lit materials, existing solutions struggle to distinguish the actual height of individual materials in multi-layer stacks. How to accurately locate the height of stacked materials is a problem that urgently needs to be solved in existing technologies. Summary of the Invention

[0004] This invention provides a method, apparatus, device, and storage medium for determining the material grabbing order, in order to solve the problem of inaccurate positioning of the height of stacked materials in the prior art.

[0005] According to one aspect of the present invention, a method for determining the material grasping order is provided, the method comprising: Obtain the point cloud of the material scene corresponding to the material stack, wherein the material stack includes multiple materials stacked together; The point cloud of the material scene is segmented into instances to obtain the instance point cloud corresponding to each material. Obtain the template point cloud corresponding to the material stack, and generate a standardized convolution kernel based on the depth map corresponding to the template point cloud; Based on the point cloud density of the template point cloud, a matching window is determined in the material scene point cloud, and a multi-layer depth map is generated by sliding the matching window in the material scene point cloud. Each depth map generated by sliding is convolved with the normalized convolution kernel of the template point cloud to obtain the global height position of each material. The order in which materials are picked up is determined based on the global height position of each material.

[0006] According to another aspect of the present invention, a material grasping sequence determination device is provided, the device comprising: The acquisition module is used to acquire the point cloud of the material scene corresponding to the material stack, wherein the material stack includes multiple materials stacked together; The segmentation module is used to segment the point cloud of the material scene into instances to obtain the instance point cloud corresponding to each material. The generation module is used to obtain the template point cloud corresponding to the material stack and generate a standardized convolution kernel based on the depth map corresponding to the template point cloud. The sliding module is used to determine a matching window in the material scene point cloud based on the point cloud density of the template point cloud, and to generate a multi-layer depth map by sliding the matching window in the material scene point cloud. The convolution module is used to perform convolution operations on each layer of depth map generated by sliding and the normalized convolution kernel of the template point cloud to obtain the global height position of each material. The determination module is used to determine the grabbing order of materials based on the global height position of each material.

[0007] According to another aspect of the present invention, an electronic device is provided, the electronic device comprising: at least one processor; and A memory communicatively connected to the at least one processor; wherein, The memory stores a computer program that can be executed by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform the material grabbing order determination method according to any embodiment of the present invention.

[0008] According to another aspect of the present invention, a computer-readable storage medium is provided, the computer-readable storage medium storing computer instructions for causing a processor to execute and implement the material grabbing order determination method according to any embodiment of the present invention.

[0009] This invention discloses a method, apparatus, device, and storage medium for determining the material grasping order. The method includes: acquiring a material scene point cloud corresponding to a material stack, wherein the material stack includes multiple stacked materials; performing instance segmentation on the material scene point cloud to obtain instance point clouds corresponding to each material; acquiring a template point cloud corresponding to the material stack, and generating a standardized convolutional kernel based on the depth map corresponding to the template point cloud; determining a matching window in the material scene point cloud based on the point cloud density of the template point cloud, and generating multiple layers of depth maps by sliding according to the matching window in the material scene point cloud; performing convolution operations on each layer of depth map generated by sliding with the standardized convolutional kernel of the template point cloud to obtain the global height position of each material; and determining the material grasping order based on the global height position of each material. As can be seen, this method generates multi-layer depth maps by sliding a matching window determined by the template point cloud across the material scene point cloud. These depth maps distinguish the depth differences of stacked materials. By convolving each depth map with a standardized convolution kernel corresponding to the standard template point cloud, the boundaries of stacked materials can be accurately identified without relying on the color or texture of the material surface. This allows for accurate calculation of the global height position of each material, solving the problem of inaccurate height positioning of stacked materials in existing technologies. Even in outdoor bright light and dim lighting conditions, it exhibits high-precision global height position recognition and is robust to illumination. Furthermore, the point cloud computing has low complexity, eliminating the need for complex feature matching required in image contour recognition, thus improving computational efficiency.

[0010] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of the present invention, nor is it intended to limit the scope of the invention. Other features of the invention will become readily apparent from the following description. Attached Figure Description

[0011] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0012] Figure 1 This is a flowchart illustrating a method for determining the material grabbing order according to Embodiment 1 of the present invention. Figure 2 A flowchart illustrating a method for determining the material grabbing order according to an embodiment of the present invention; Figure 3 This is a schematic diagram of a process for denoising a depth map according to Embodiment 1 of the present invention; Figure 4This is a schematic diagram of a material grasping sequence determination device provided in Embodiment 2 of the present invention; Figure 5 This is a schematic diagram of the electronic device used in the material grabbing sequence determination method according to an embodiment of the present invention. Detailed Implementation

[0013] To enable those skilled in the art to better understand the present invention, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are merely some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention. It should be understood that the various steps described in the method embodiments of the present invention can be performed in different orders and / or in parallel. Furthermore, the method embodiments may include additional steps and / or omit the steps shown. The scope of the present invention is not limited in this respect.

[0014] The term "comprising" and its variations as used herein are open-ended inclusions, meaning "including but not limited to". The term "based on" means "at least partially based on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Definitions of other terms will be given in the description below.

[0015] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, any variations of the terms "comprising" and "having," etc., are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0016] It should be noted that the terms "a" and "a plurality of" used in this invention are illustrative rather than restrictive. Those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".

[0017] The names of the messages or information exchanged between the multiple devices in the embodiments of the present invention are for illustrative purposes only and are not intended to limit the scope of these messages or information.

[0018] Example 1 Figure 1 This is a flowchart illustrating a method for determining the material grasping order according to Embodiment 1 of the present invention. This method is applicable to determining the order of material grasping. The method can be executed by a material grasping order determining device, which can be implemented by software and / or hardware and is generally integrated into an electronic device. In this embodiment, the electronic device includes, but is not limited to, devices such as computers.

[0019] like Figure 1 As shown, the material grabbing order determination method provided in Embodiment 1 of the present invention includes the following steps: S110. Obtain the point cloud of the material scene corresponding to the material stack, wherein the material stack includes multiple materials stacked together.

[0020] The material stack may include multiple stacked materials. This embodiment does not limit the shape of the material stack; all materials in the stack can be piled together or formed into multiple separate stacks. The material scene point cloud can be point cloud data composed of the point clouds of the materials, with point clouds of non-material objects such as the ground and walls removed. The material scene point cloud can be obtained by processing the original scene point cloud; the specific processing method is not limited in this embodiment. The original scene point cloud can be the initial 3D point cloud data of the material stack. Besides the point clouds of the materials, the original scene point cloud may also include point cloud data of objects such as the ground and walls, which can be collected by point cloud acquisition devices set at corresponding locations.

[0021] In this embodiment, the point cloud of the material scene corresponding to the material stack can be obtained.

[0022] S120. Perform instance segmentation on the point cloud of the material scene to obtain the instance point cloud corresponding to each material.

[0023] Instance point cloud can be a 3D point cloud data corresponding to a single material, segmented and extracted from the point cloud of a material scene. Instance point cloud can refer to the point cloud of a material. Instance segmentation can refer to the process of extracting the point cloud of a single material from all point cloud data.

[0024] In this embodiment, the point cloud of the material scene can be segmented into instances to obtain the instance point cloud corresponding to each material in the point cloud of the material scene.

[0025] S130. Obtain the template point cloud corresponding to the material stack, and generate a standardized convolution kernel based on the depth map corresponding to the template point cloud.

[0026] In this context, "template point cloud" refers to the point cloud data corresponding to a standard material. It can be a 3D point cloud of a material with a complete shape and standard features, used as a benchmark template point cloud for subsequent feature matching and comparison detection. When the materials are of the same type, there can be one or more template point clouds. When the materials are of different types, the number of template point clouds must be at least the number of material types; that is, each type of material must have at least one corresponding template point cloud. "Depth map" refers to a 2D image where the pixel value of each pixel represents the actual depth distance from the scene location corresponding to that pixel to the acquisition device. "Normalized convolutional kernel" refers to the convolutional kernel corresponding to the template point cloud.

[0027] In this embodiment, a template point cloud corresponding to the material stack can be obtained, and a standardized K×K convolution kernel can be generated based on the depth map corresponding to the template point cloud.

[0028] In one embodiment, obtaining the template point cloud corresponding to the material stack includes: determining the number of template point clouds to be generated based on the type and shape of the material; and selecting the corresponding point cloud from the instance point cloud as the template point cloud based on the number of template point clouds generated.

[0029] The number of template point clouds generated can be set according to the type and shape of the material, and this embodiment does not limit this. For example, if there are 3 types of materials, the number of template point clouds generated should be at least 3; if there is 1 type of material, but the material has 4 shapes in the point cloud, the number of template point clouds generated can be set to 4.

[0030] In this embodiment, the number of template point clouds generated can be determined based on the type and shape of the material. Based on the number of template point clouds generated, the point cloud of the corresponding material can be selected from the instance point cloud as the template point cloud.

[0031] S140. Based on the point cloud density of the template point cloud, a matching window is determined in the material scene point cloud, and a multi-layer depth map is generated by sliding the matching window in the material scene point cloud.

[0032] Point cloud density refers to the number of point clouds contained within a unit volume of three-dimensional space. The matching window can be a pre-defined local spatial region used to extract a local point cloud region from the point cloud.

[0033] In this embodiment, a matching window can be determined in the material scene point cloud based on the point cloud density of the template point cloud. By sliding the matching window in the material scene point cloud, a multi-layer (e.g., N-layer) depth map can be generated.

[0034] S150. Perform convolution operations between each depth map generated by sliding and the normalized convolution kernel of the template point cloud to obtain the global height position of each material.

[0035] The global height position can refer to the actual height of the material in the scene.

[0036] In this embodiment, each depth map layer can be convolved with the standardized convolution kernel of the template point cloud to obtain the global height position of each material.

[0037] In one embodiment, the step of convolving each depth map generated by sliding with the normalized convolution kernel of the template point cloud to obtain the global height position of each material includes: determining the height range of the template point cloud along the vertical coordinate axis, where the vertical coordinate axis is perpendicular to the ground; using the height range as the width of the window along the vertical coordinate axis; sliding the window along the vertical coordinate axis in the material scene point cloud and converting the point cloud corresponding to each window into a depth map; convolving each depth map with the normalized convolution kernel of the template point cloud to obtain multiple first heatmaps; and determining the global height position of each material based on the first heatmaps corresponding to all windows.

[0038] The vertical coordinate axis can be a coordinate axis perpendicular to the ground. The height range can refer to the range of possible heights of the material corresponding to the template point cloud in the scene. The heatmap can be a probabilistic image representing the matching degree of regional features, with pixel values ​​reflecting the similarity of local features. The terms "first" and "second" are only used to distinguish different heatmaps.

[0039] In this embodiment, the length of the window can be the length of each layer in the depth map of the material scene point cloud, and the width of the window can be determined based on the height range of the template point cloud. First, the height range of the template point cloud in the vertical coordinate axis direction can be determined, and the height range can be used as the width of the window in the vertical coordinate axis direction. In the material scene point cloud, the window can be slid along the vertical coordinate axis direction, and the point cloud corresponding to each window can be converted into a depth map. Then, each depth map can be convolved with the normalized convolution kernel of the template point cloud to obtain multiple first heat maps. Based on the first heat maps corresponding to all windows, the global height position of each material can be determined.

[0040] The deep convolutional network design that incorporates height information introduces height as an additional input channel, which can improve the model's ability to perceive the vertical direction.

[0041] In one embodiment, determining the global height position of each material based on the first heatmap corresponding to all windows includes: performing weighted averaging and normalization on the first heatmap corresponding to all windows to obtain a probability map; the probability map includes multiple height probability values ​​for each instance point cloud located at different heights; for each instance point cloud in the material scene point cloud, locating the global height position of the material corresponding to the instance point cloud based on the height probability value corresponding to the instance point cloud.

[0042] The probability values ​​can be probability distribution images obtained by fusing multi-layer thermal matching results. These probability values ​​can include multiple height probability values ​​for each instance point cloud located at different heights. The height probability value can indicate the confidence level of the instance point cloud at a particular height.

[0043] In this embodiment, the first heatmaps corresponding to all windows can be weighted and normalized to obtain a probability map. For each instance point cloud in the material scene point cloud, the global height position of the material corresponding to the instance point cloud is located based on the height probability value of the instance point cloud. For example, the weighted average of the N largest height probability values ​​among all height probability values ​​of each instance point cloud can be taken as the global height position of the material corresponding to that instance point cloud.

[0044] For example, the height range dz of the template point cloud along the z-axis can be calculated. Before the sliding window, the density distribution of the material scene point cloud along the z-axis can be statistically analyzed based on the material scene point cloud to identify high-density regions in the material scene point cloud with a density greater than a preset density threshold. A window of width dz is then slid along the z-axis in the high-density region, and the depth map generated in each window is convolved with the template convolution kernel to obtain the corresponding heatmap. The height range dz is determined based on the lowest height z_min and the highest height z_max of the points in the template point cloud. The values ​​of the height and width ranges depend on the shape of the standard material and can be equal or unequal; this embodiment does not impose any limitations on these values.

[0045] Finally, the heatmaps of all windows are weighted and averaged, then transformed into a probability map using a normalized exponential function (softmax). The global height position of each instance point cloud is located based on the weighted average of the top-k height probability values ​​in the probability map. Here, the top-k height probability values ​​refer to the k largest height probability values ​​among the corresponding height probability values ​​of each instance point cloud in the probability map, where k is a preset value, which is not limited in this embodiment.

[0046] S160. Determine the material grabbing order based on the global height position of each material.

[0047] In this embodiment, the global height position of each material can be analyzed to determine the material grabbing order. For example, the global height positions can be sorted, and based on the size of each material's global height position, the material with the highest height is grabbed first, and the material with the second highest height is grabbed second. The method for determining the grabbing order can be customized according to actual conditions, and this embodiment does not limit it.

[0048] An embodiment of the present invention provides a method for determining the material grasping order, comprising: acquiring a material scene point cloud corresponding to a material stack, wherein the material stack includes multiple materials stacked together; performing instance segmentation on the material scene point cloud to obtain instance point clouds corresponding to each material; acquiring a template point cloud corresponding to the material stack, and generating a normalized convolutional kernel based on the depth map corresponding to the template point cloud; determining a matching window in the material scene point cloud based on the point cloud density of the template point cloud, and generating multiple layers of depth maps by sliding according to the matching window in the material scene point cloud; performing convolution operation on each layer of depth map generated by sliding with the normalized convolutional kernel of the template point cloud to obtain the global height position of each material; and determining the material grasping order based on the global height position of each material. This method generates multi-layer depth maps by sliding a matching window defined by a template point cloud across the material scene point cloud. These depth maps distinguish the depth differences of stacked materials. Each depth map is then convolved with a standardized convolution kernel corresponding to the standard template point cloud. This allows for accurate differentiation of the boundaries of stacked materials without relying on the color or texture of the material surface, thus accurately calculating the global height position of each material. This solves the problem of inaccurate height positioning of stacked materials in existing technologies. It maintains high-precision global height position recognition even in bright outdoor light and dim lighting conditions, exhibiting illumination robustness. Furthermore, the point cloud computing has low complexity, eliminating the need for complex feature matching required in image contour recognition, thereby improving computational efficiency.

[0049] Based on the above embodiments, modified embodiments of the above embodiments are proposed. It should be noted that, in order to keep the description brief, only the differences from the above embodiments are described in the modified embodiments.

[0050] In one embodiment, the method further includes: denoising each instance point cloud based on the global height position of each instance point cloud and the template point cloud to obtain a denoised instance point cloud; and planning a grasping strategy for the material grasping machine based on the denoised instance point cloud and the grasping order.

[0051] The grasping strategy can refer to the operating plan of the material grasping machine. For example, the grasping strategy can include the grasping sequence of materials and the machine execution parameters.

[0052] In this embodiment, the instance point cloud can be denoised based on the global height position of each instance point cloud and the template point cloud to obtain the denoised instance point cloud. Based on the denoised instance point cloud and the grasping order, a grasping strategy is planned for the material grasping machine.

[0053] In one embodiment, the step of denoising each instance point cloud based on the global height position of each instance point cloud and the template point cloud to obtain a denoised instance point cloud includes: for each instance point cloud, matching the most similar template point cloud from all template point clouds based on a similarity matching method; convolving the instance depth map corresponding to the instance point cloud with the similarity template convolution kernel corresponding to the most similar template point cloud to obtain a second heatmap; determining the matching center point of the instance point cloud based on the second heatmap; aligning the most similar template point cloud and the instance point cloud based on the matching center point to obtain an aligned depth map; and removing noise from the instance point cloud based on the depth difference between instance pixels in the instance depth map and template pixels in the similarity template convolution kernel in the aligned depth map to obtain a denoised instance point cloud.

[0054] Here, the matching center point can refer to the coordinate center point in the instance depth map of the instance point cloud that has the highest matching score with the similar template convolution kernel feature. Instance pixels can refer to pixels in the instance depth map, and template pixels can refer to pixels in the template convolution kernel.

[0055] In this embodiment, for each instance point cloud, the most similar template point cloud can be matched from all template point clouds based on similarity matching. The instance depth map corresponding to the instance point cloud is then convolved with the similarity template convolution kernel corresponding to the most similar template point cloud to obtain a second heatmap. Based on the second heatmap, the matching center point of the instance point cloud can be determined. Based on the matching center point, the most similar template point cloud and the instance point cloud can be aligned to obtain an aligned depth map. According to the depth difference between the instance pixels in the instance depth map and the template pixels in the similarity template convolution kernel in the aligned depth map, noise in the instance point cloud can be removed to obtain a denoised instance point cloud.

[0056] The similarity matching method can refer to a matching method that dynamically adjusts the weights of depth-aware convolution. For example, for each receptive field region in a similar template convolution kernel, the similarity between the depth value of each pixel / point in the current region and the depth value of the pixel corresponding to the center position of the convolution kernel can be calculated to obtain a set of depth similarity weights. The depth similarity weights can be calculated exponentially based on a preset hyperparameter α to enhance the weight contribution of neighboring pixels with depth values ​​close to the center pixel and suppress the interference of pixels with excessively large depth differences on the convolution output. In terms of point cloud density and noise perception adjustment, a global or local scaling factor can be applied to the convolution weights during training based on the local density estimation and noise level statistics of the input point cloud to achieve dynamic weight adjustment, thereby adapting to sparse or high-noise scenes. In this embodiment, the convolution kernel weights of the template point cloud can be directly generated from the template point cloud or template depth map, and after edge filling processing, they are used as fixed convolution kernels to perform convolution operations with the depth map to be processed, without the need to train the weights through traditional backpropagation. The features after deep weight modulation can be unfolded into vector form, multiplied by the adjusted convolution kernel weight matrix to obtain the output, and then reconstructed into the standard convolution output feature map format.

[0057] Through a dynamic weight adjustment mechanism, the contribution ratio of convolution weights can be adaptively adjusted according to the density distribution and noise level of the input point cloud / depth map during training or inference, thereby significantly improving the model's feature extraction capability in scenarios with large differences in point cloud density and unstable noise levels; improving the utilization efficiency of depth information in convolution operations, enabling the model to better perceive the geometric structure of the scene; and enhancing its generalization performance and robustness without increasing the number of model parameters, while maintaining the model's lightweight nature.

[0058] In one embodiment, the step of removing noise from the instance point cloud based on the depth difference between instance pixels in the instance depth map and template pixels in similar template convolution kernels in the aligned depth map includes: comparing the depth difference between instance pixels in the instance depth map and template pixels in similar template convolution kernels in the aligned depth map on a local window basis; the local window includes a preset number of template pixels and instance pixels; for each local window, if the depth difference between all template pixels and instance pixels in the local window, as well as between adjacent instance pixels of the instance pixel, is greater than a preset depth difference threshold, then the instance pixels in the local window are considered noise and removed.

[0059] The preset depth difference threshold can be set according to the actual situation, and this embodiment does not limit it.

[0060] In this embodiment, a local window can be preset in the aligned depth map. The local window is slid in the aligned depth map, and the depth difference between the instance pixels in the instance depth map and the template pixels in the similar template convolution kernel is compared on a local window basis. For each local window, if the depth difference between all template pixels and instance pixels in the local window, as well as between the instance pixels and their adjacent instance pixels, is greater than the preset depth difference threshold, then the instance pixels in the local window are regarded as noise and removed.

[0061] For example, for each instance point cloud, the template point cloud is translated to the global height position of the instance point cloud. The most similar template point cloud is matched from all template point clouds using a similarity matching method. The template convolution kernel of the most similar template point cloud is convolved with the instance depth map of the instance point cloud to obtain a heatmap. The matching center point (x, y) is obtained by calculating the weighted average of the two-dimensional coordinates of the heatmap. The most similar template point cloud is aligned to (x, y) to achieve coarse alignment between the template point cloud and the instance point cloud in three-dimensional space. In the aligned depth map, the depth value difference between instance pixels and template pixels is compared on a local window basis. If the depth difference between all template pixels and instance pixels within a local window exceeds a set threshold, the instance pixel is considered noise and discarded. For example, if there are four template pixels within a local window, for each template pixel, it is determined whether the depth difference between the corresponding instance pixels exceeds the set threshold. The n neighboring instance pixels around the instance pixel are then obtained, and it is determined whether the depth difference between the template pixel and the n neighboring instance pixels exceeds the set threshold. After comparing all four template pixels, if the depth difference between the four template pixels and their corresponding instance pixels as well as between the n adjacent instance pixels all exceed the set threshold, then the instance pixels in the local window can be regarded as noise and removed.

[0062] In this embodiment, template convolution, heatmap calculation, noise removal and other operations can be performed in parallel in a multi-graphics processing unit (GPU) or multi-core CPU environment, and finally the results of each subtask are summarized to improve the overall processing speed.

[0063] Based on the technical solutions of the above embodiments, this invention provides several specific implementation methods.

[0064] As one specific implementation method of this embodiment. Figure 2 This is a flowchart illustrating a method for determining the material grabbing order according to an embodiment of the present invention, as shown below. Figure 2As shown, the input can be a red-green-blue (RGB) image of the scene. A 2D bounding box is generated using an instance segmentation model. Alternatively, the input can be a scene depth map of the original scene point cloud. The bounding box is then used for 2D cropping to obtain a scene instance depth map (i.e., the depth map of the material scene point cloud), thus extracting a single material region. The material scene point cloud is then layered based on point cloud density to generate N layers of scene instance depth maps. A template depth convolution kernel generated from a point cloud template is used to perform depth-aware convolution on each of the N layers of depth maps, outputting an N-layer convolutional heatmap. The convolution results at the same layer are weighted and averaged, then converted to a probability distribution using Softmax. Finally, a Topk weighted average is applied to the N-layer probabilities to obtain the expected height value of the instance.

[0065] Figure 3 This is a schematic diagram of a depth map denoising process provided in Embodiment 1 of the present invention, as shown below. Figure 3 As shown, based on the depth map of the template point cloud, the template depth convolution kernel is first aligned and generated; then, depth-aware convolution is performed on the scene instance depth map using this kernel to obtain a convolution heatmap, and the xyz values ​​of the template point cloud are obtained by weighted averaging of xy coordinates; then, the template depth map is aligned to this xyz position to obtain the template depth map aligned with the instance point cloud, and denoising is performed based on depth snapping map / nearest neighbor retrieval / matching / filtering to obtain the denoised scene instance depth map.

[0066] Example 2 Figure 4 This is a schematic diagram of a material grasping sequence determination device provided in Embodiment 2 of the present invention. The device is applicable to determining the order of material grasping. The device can be implemented by software and / or hardware and is generally integrated into an electronic device.

[0067] like Figure 4 As shown, the device includes: The acquisition module 210 is used to acquire the point cloud of the material scene corresponding to the material stack, wherein the material stack includes multiple materials stacked together. The segmentation module 220 is used to segment the material scene point cloud into instances to obtain the instance point cloud corresponding to each material. The generation module 230 is used to obtain the template point cloud corresponding to the material stack and generate a standardized convolution kernel based on the depth map corresponding to the template point cloud. The sliding module 240 is used to determine a matching window in the material scene point cloud based on the point cloud density of the template point cloud, and to generate a multi-layer depth map by sliding in the material scene point cloud according to the matching window. The convolution module 250 is used to perform convolution operations on each layer of depth map generated by sliding with the normalized convolution kernel of the template point cloud to obtain the global height position of each material. The determination module 260 is used to determine the grabbing order of materials based on the global height position of each material.

[0068] This embodiment provides a material grasping order determination device, comprising: an acquisition module for acquiring a material scene point cloud corresponding to a material stack, wherein the material stack includes multiple stacked materials; a segmentation module for segmenting the material scene point cloud into instances to obtain instance point clouds corresponding to each material; a generation module for acquiring a template point cloud corresponding to the material stack and generating a normalized convolutional kernel based on the depth map corresponding to the template point cloud; a sliding module for determining a matching window in the material scene point cloud based on the point cloud density of the template point cloud and generating multiple layers of depth maps by sliding the matching window in the material scene point cloud; a convolution module for performing convolution operations on each layer of depth map generated by the sliding module with the normalized convolutional kernel of the template point cloud to obtain the global height position of each material; and a determination module for determining the material grasping order based on the global height position of each material. This device can accurately obtain the global height position of each material, solving the problem in the prior art that it is impossible to accurately locate the height of stacked materials.

[0069] Furthermore, the generation module 230 includes: The generation unit is used to determine the number of template point clouds to be generated based on the type and shape of the material. The selection unit is used to select a corresponding point cloud from the instance point cloud as a template point cloud based on the number of template point clouds generated.

[0070] Furthermore, the convolutional module 250 includes: The first determining unit is used to determine the height range of the template point cloud in the direction of the vertical coordinate axis, wherein the vertical coordinate axis is a coordinate axis perpendicular to the ground. The second determining unit is used to take the height range as the width of the window in the direction of the vertical coordinate axis; The conversion unit is used to slide the window along the vertical coordinate axis in the point cloud of the material scene and convert the point cloud corresponding to each window into a depth map. The first convolutional unit is used to perform convolution operations between each depth map and the standardized convolutional kernel of the template point cloud to obtain multiple first heatmaps; The third determining unit is used to determine the global height position of each material based on the first heatmap corresponding to all windows.

[0071] Furthermore, the third determining unit is specifically used for: The first heatmaps corresponding to all windows are weighted and normalized to obtain a probability map; the probability map includes multiple height probability values ​​for each instance point cloud located at different heights; For each instance point cloud in the material scene point cloud, the global height position of the material corresponding to the instance point cloud is located based on the height probability value of the instance point cloud.

[0072] Furthermore, the device also includes: The denoising module is used to denoise each instance point cloud based on the global height position of each instance point cloud and the template point cloud to obtain a denoised instance point cloud. The planning module is used to plan a grasping strategy for the material grasping machine based on the denoised instance point cloud and the grasping order.

[0073] Furthermore, the noise reduction module includes: A matching unit is used to match the most similar template point cloud from all template point clouds for each instance point cloud based on a similarity matching method. The second convolutional unit is used to convolve the instance depth map corresponding to the instance point cloud with the similarity template convolutional kernel corresponding to the most similar template point cloud to obtain the second heat map. The fourth determining unit is used to determine the matching center point of the instance point cloud based on the second heat map; An alignment unit is used to align the most similar template point cloud and the instance point cloud based on the matching center point to obtain an aligned depth map. The culling unit is used to remove noise from the instance point cloud based on the depth difference between instance pixels in the instance depth map of the aligned depth map and template pixels in the similar template convolution kernel, so as to obtain a denoised instance point cloud.

[0074] Furthermore, the elimination unit is specifically used for: In the aligned depth map, the depth difference between instance pixels in the instance depth map and template pixels in similar template convolution kernels is compared in units of local windows; the local window includes a preset number of template pixels and instance pixels; For each local window, if the depth difference between all template pixels and instance pixels within the local window, as well as between adjacent instance pixels of the instance pixel, is greater than a preset depth difference threshold, then the instance pixels in the local window are considered noise and discarded.

[0075] The above-mentioned material grabbing order determination device can execute the material grabbing order determination method provided in any embodiment of the present invention, and has the corresponding functional modules and beneficial effects of the method.

[0076] Example 3 Figure 5A schematic diagram of an electronic device 30 that can be used to implement embodiments of the present invention is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the invention described and / or claimed herein.

[0077] like Figure 5 As shown, the electronic device 30 includes at least one processor 31 and a memory, such as a read-only memory (ROM) 32 and a random access memory (RAM) 33, communicatively connected to the at least one processor 31. The memory stores computer programs executable by the at least one processor. The processor 31 can perform various appropriate actions and processes based on the computer program stored in the ROM 32 or loaded from storage unit 38 into the RAM 33. The RAM 33 can also store various programs and data required for the operation of the electronic device 30. The processor 31, ROM 32, and RAM 33 are interconnected via a bus 34. An input / output (I / O) interface 35 is also connected to the bus 34.

[0078] Multiple components in electronic device 30 are connected to I / O interface 35, including: input unit 36, such as keyboard and mouse; output unit 37, such as various types of displays and speakers; storage unit 38, such as disk and optical disk; and communication unit 39, such as network card, modem, and wireless transceiver. Communication unit 39 allows electronic device 30 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0079] Processor 31 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of processor 31 include, but are not limited to, a central processing unit (CPU), a GPU, various special-purpose artificial intelligence (AI) computing chips, various processors running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, and microcontroller. Processor 31 performs the various methods and processes described above, such as the material grabbing sequence determination method.

[0080] In some embodiments, the material grabbing order determination method may be implemented as a computer program tangibly contained in a computer-readable storage medium, such as storage unit 38. In some embodiments, part or all of the computer program may be loaded and / or mounted on electronic device 30 via ROM 32 and / or communication unit 39. When the computer program is loaded into RAM 33 and executed by processor 31, one or more steps of the material grabbing order determination method described above may be performed. Alternatively, in other embodiments, processor 31 may be configured to perform the material grabbing order determination method by any other suitable means (e.g., by means of firmware).

[0081] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems on chips (SOCs), complex programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.

[0082] Computer programs used to implement the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, such that when executed by the processor, the computer programs cause the functions / operations specified in the flowcharts and / or block diagrams to be performed. The computer programs may be executed entirely on a machine, partially on a machine, or as a standalone software package, partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0083] In the context of this invention, a computer-readable storage medium can be a tangible medium that may contain or store a computer program for use by or in conjunction with an instruction execution system, apparatus, or device. A computer-readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination thereof. Alternatively, a computer-readable storage medium may be a machine-readable signal medium. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) or flash memory, optical fibers, compact disc read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.

[0084] To provide interaction with a user, the systems and techniques described herein can be implemented on an electronic device having: a display device for displaying information to the user, such as a cathode ray tube (CRT) or liquid crystal display (LCD) monitor; and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the electronic device. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0085] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as data servers), or middleware components (e.g., application servers), or frontend components (e.g., user computers with graphical user interfaces or web browsers through which users can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., communication networks). Examples of communication networks include local area networks (LANs), wide area networks (WANs), blockchain networks, and the Internet.

[0086] A computing system can include clients and servers. Clients and servers are generally located far apart and typically interact through communication networks. The client-server relationship is created by computer programs running on the respective computers and having a client-server relationship with each other. The server can be a cloud server, also known as a cloud computing server or cloud host, which is a hosting product within the cloud computing service system. It addresses the shortcomings of traditional physical hosts and Virtual Private Server (VPS) services, such as high management difficulty and weak business scalability.

[0087] It should be understood that the various forms of processes shown above can be used, with steps reordered, added, or deleted. For example, the steps described in this invention can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution of this invention can be achieved, and this is not limited herein.

[0088] The specific embodiments described above do not constitute a limitation on the scope of protection of this invention. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this invention should be included within the scope of protection of this invention.

Claims

1. A method for determining the material grasping sequence, characterized in that, The method includes: Obtain the point cloud of the material scene corresponding to the material stack, wherein the material stack includes multiple materials stacked together; The point cloud of the material scene is segmented into instances to obtain the instance point cloud corresponding to each material. Obtain the template point cloud corresponding to the material stack, and generate a standardized convolution kernel based on the depth map corresponding to the template point cloud; Based on the point cloud density of the template point cloud, a matching window is determined in the material scene point cloud, and a multi-layer depth map is generated by sliding the matching window in the material scene point cloud. Each depth map generated by sliding is convolved with the normalized convolution kernel of the template point cloud to obtain the global height position of each material. The order in which materials are picked up is determined based on the global height position of each material.

2. The method according to claim 1, characterized in that, The step of obtaining the template point cloud corresponding to the material stack includes: The number of template point clouds generated is determined based on the type and shape of the material. Based on the number of template point clouds generated, a corresponding point cloud is selected from the instance point clouds as the template point cloud.

3. The method according to claim 1, characterized in that, The step of performing convolution operations between each depth map generated by sliding and the normalized convolution kernel of the template point cloud to obtain the global height position of each material includes: Determine the height range of the template point cloud along the vertical coordinate axis, where the vertical coordinate axis is perpendicular to the ground. The height range is used as the width of the window in the vertical coordinate axis direction; In the point cloud of the material scene, the window is slid along the vertical coordinate axis, and the point cloud corresponding to each window is converted into a depth map; Each depth map is convolved with the standardized convolution kernel of the template point cloud to obtain multiple first heatmaps; Based on the first heatmap corresponding to all windows, the global height position of each material is determined.

4. The method according to claim 3, characterized in that, The determination of the global height position of each material based on the first heatmap corresponding to all windows includes: The first heatmaps corresponding to all windows are weighted and normalized to obtain a probability map; the probability map includes multiple height probability values ​​for each instance point cloud located at different heights; For each instance point cloud in the material scene point cloud, the global height position of the material corresponding to the instance point cloud is located based on the height probability value of the instance point cloud.

5. The method according to claim 1, characterized in that, The method further includes: Based on the global height position of each instance point cloud and the template point cloud, the instance point cloud is denoised to obtain the denoised instance point cloud. Based on the denoised instance point cloud and the grasping order, a grasping strategy is planned for the material grasping machine.

6. The method according to claim 5, characterized in that, The denoising of each instance point cloud based on the global height position of each instance point cloud and the template point cloud to obtain a denoised instance point cloud includes: For each instance point cloud, the most similar template point cloud is matched from all template point clouds based on the similarity matching method; The instance depth map corresponding to the instance point cloud is convolved with the similarity template convolution kernel corresponding to the most similar template point cloud to obtain the second heat map. The matching center point of the instance point cloud is determined based on the second heat map; Align the most similar template point cloud with the instance point cloud based on the matching center point to obtain the aligned depth map; Based on the depth difference between instance pixels in the instance depth map and template pixels in the similar template convolution kernel in the aligned depth map, noise in the instance point cloud is removed to obtain a denoised instance point cloud.

7. The method according to claim 6, characterized in that, The step of removing noise from the instance point cloud based on the depth difference between instance pixels in the instance depth map of the aligned depth map and template pixels in the similar template convolution kernel includes: In the aligned depth map, the depth difference between instance pixels in the instance depth map and template pixels in similar template convolution kernels is compared in units of local windows; the local window includes a preset number of template pixels and instance pixels; For each local window, if the depth difference between all template pixels and instance pixels within the local window, as well as between adjacent instance pixels of the instance pixel, is greater than a preset depth difference threshold, then the instance pixels in the local window are considered noise and discarded.

8. A material grasping sequence determination device, characterized in that, The device includes: The acquisition module is used to acquire the point cloud of the material scene corresponding to the material stack, wherein the material stack includes multiple materials stacked together; The segmentation module is used to segment the point cloud of the material scene into instances to obtain the instance point cloud corresponding to each material. The generation module is used to obtain the template point cloud corresponding to the material stack and generate a standardized convolution kernel based on the depth map corresponding to the template point cloud. The sliding module is used to determine a matching window in the material scene point cloud based on the point cloud density of the template point cloud, and to generate a multi-layer depth map by sliding the matching window in the material scene point cloud. The convolution module is used to perform convolution operations on each layer of depth map generated by sliding and the normalized convolution kernel of the template point cloud to obtain the global height position of each material. The determination module is used to determine the grabbing order of materials based on the global height position of each material.

9. An electronic device, characterized in that, The device includes: At least one processor; and A memory communicatively connected to the at least one processor; wherein, The memory stores a computer program that can be executed by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform the material grabbing sequence determination method according to any one of claims 1-7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions that cause a processor to execute the material grabbing sequence determination method according to any one of claims 1-7.