Data processing method of point cloud data, object grabbing method, and readable storage medium

By using RGB images for 3D information prediction and point cloud data completion in the intelligent store and warehouse system, the problem of missing point cloud data is solved, the accuracy and adaptability of object grasping are improved, and it is suitable for dynamic production line environments.

CN122265097APending Publication Date: 2026-06-23HEMA (CHINA) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HEMA (CHINA) CO LTD
Filing Date
2026-01-30
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

In the smart warehouse system, due to mutual occlusion between multiple objects in the container and special materials on the object surface, the point cloud data scanned by the 3D camera has large areas of missing data, which affects the accuracy of pose estimation and grasping planning, resulting in grasping failure.

Method used

By using RGB images for 3D information prediction on the server side, and combining the 3D information prediction model to complete the initial point cloud data, the predicted point cloud data of the target object is generated, and intelligent capture is performed in a dynamic production line environment.

Benefits of technology

It effectively solves the problem of severe lack of point cloud data, improves the accuracy of pose estimation, realizes precise object grasping, and adapts to single scans in dynamic production line environments.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122265097A_ABST
    Figure CN122265097A_ABST
Patent Text Reader

Abstract

Embodiments of the present application disclose a data processing method of point cloud data, an object grabbing method and a readable storage medium. The method comprises: obtaining a single deployment of a 3D camera to perform single-frame scanning on a target container, and outputting initial point cloud data and an RGB image; a plurality of objects are stacked in the target container, and when a grabbing operation needs to be performed on a target object in the target container, the target container is placed in a 3D camera acquisition area for single-frame scanning; taking the RGB image as input, calling a three-dimensional information prediction model, segmenting the target object from the RGB image by the model, and performing three-dimensional information prediction on the target object, and outputting predicted point cloud data of the target object; and according to the predicted point cloud data, performing completion processing on part of point cloud data of the target object in the initial point cloud data, obtaining point cloud data of the target object after completion, and realizing intelligent grabbing of the target object based on the point cloud data after completion. The problem of insufficient point cloud data under single scanning of a single camera can be effectively solved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of image processing technology, and in particular to a data processing method for point cloud data, an object grasping method, a computer-readable storage medium, an electronic device, and a computer program product. Background Technology

[0002] In a smart warehouse system, intelligent grasping devices can achieve precise object grasping based on 3D (Three-Dimensional) vision guidance technology. Specifically, when a container holding an object is placed in the acquisition area of ​​a 3D camera, the 3D camera can scan the point cloud data of the object in the container and perform pose estimation based on this, so that the intelligent grasping device can complete the precise grasping action with the calculated grasping pose.

[0003] However, in practical applications, due to mutual occlusion between multiple objects within the container and special materials on the object surface, large areas of point cloud data scanned by the 3D camera may be missing, affecting the accuracy of subsequent pose estimation and grasping planning, and thus causing serious problems such as grasping failure.

[0004] How to achieve accurate object grasping, especially when point cloud data is severely lacking, has become a technical problem that needs to be solved by those skilled in the art. Summary of the Invention

[0005] This application provides a data processing method and apparatus for point cloud data, an object grasping method and apparatus, a computer-readable storage medium, an electronic device, and a computer program product, which can effectively solve the problem of insufficient point cloud data under a single camera scan.

[0006] This application provides the following solution: A data processing method for point cloud data, applied on a server side, the method comprising: A single-frame scan of a target container is performed by a 3D camera, outputting initial point cloud data and RGB images. When multiple objects are piled up inside the target container, and a grasping operation needs to be performed on the target object in the target container, the target container is placed in the acquisition area of ​​the 3D camera for single-frame scan. Using the RGB image as input, a three-dimensional information prediction model is invoked. The model segments the target object from the RGB image and performs three-dimensional information prediction on the target object, outputting the predicted point cloud data of the target object. Based on the predicted point cloud data, the partial point cloud data corresponding to the target object in the initial point cloud data is completed to obtain the completed point cloud data of the target object, so as to realize intelligent grasping of the target object based on the completed point cloud data.

[0007] The method further includes: Obtain the confidence information of the model in predicting the three-dimensional information of the target object; If the confidence level is less than a preset value, and a 3D model of the target object is pre-modeled, then based on the 3D model, the partial point cloud data corresponding to the target object in the initial point cloud data is completed to obtain the completed point cloud data of the target object.

[0008] If multiple objects stacked in the target container are the same object, the object with the fewest missing points in the initial point cloud data is determined as the target object, based on the point cloud missing information of the multiple identical objects, so that the point cloud data of the target object can be completed.

[0009] If the multiple objects piled up in the target container are different objects, the method further includes: Receive a request to perform a grabbing operation on the target object in the target container; The target detection model is invoked to identify the target object from the RGB image, so as to complete the point cloud data of the target object.

[0010] An object grasping method, applied on a server side, is used to perform grasping operations on target objects piled up inside a target container using an intelligent grasping device. The method includes: A single-frame scan of a target container is performed by a 3D camera, outputting initial point cloud data and RGB images. When multiple objects are piled up inside the target container, and a grasping operation needs to be performed on the target object in the target container, the target container is placed in the acquisition area of ​​the 3D camera for single-frame scan. Using the RGB image as input, a three-dimensional information prediction model is invoked. The model segments the target object from the RGB image and performs three-dimensional information prediction on the target object, outputting the predicted point cloud data of the target object. Based on the predicted point cloud data, the partial point cloud data corresponding to the target object in the initial point cloud data is completed to obtain the point cloud data of the target object after completion. Based on the completed point cloud data, the surface of the target object is subjected to regional planar fitting to determine the target partition with the highest flatness, and the intelligent grasping device is controlled to grasp the target object through the target partition.

[0011] The method further includes: If the grasping result returned by the intelligent grasping device indicates that the grasping of the target object failed and the grasping process caused the stacking method of the objects in the target container to change, then the 3D camera is controlled to perform a single-frame scan of the target container again, so as to realize the intelligent grasping of the target object based on the new initial point cloud data and the new RGB image.

[0012] The system includes a work area for performing object grasping operations, and a single 3D camera is fixedly deployed in the work area, with the acquisition area of ​​the 3D camera covering the work area. When it is determined that a target container is placed in the work area and a grasping operation needs to be performed on the target object in the target container, the 3D camera is controlled to perform a single-frame scan of the target container in the acquisition area.

[0013] The system includes setting up a work area for performing object grasping operations, and fixing a single 3D camera on an intelligent grasping device. The intelligent grasping device is then moved to the work area so that the acquisition area of ​​the 3D camera covers the work area. When it is determined that a target container is placed in the work area and a grasping operation needs to be performed on the target object in the target container, the 3D camera is controlled to perform a single-frame scan of the target container in the acquisition area.

[0014] A method for grasping objects, applied on a server side, is used to perform a product picking operation on target goods piled up in a target container using an intelligent grasping device. The method includes: When it is determined that the target container is placed in the picking operation area and a product picking operation needs to be performed on the target product in the target container, a single deployed 3D camera is controlled to perform a single-frame scan of the target container to obtain the initial point cloud data and RGB image of the target container. The three-dimensional information prediction model is invoked, and the model performs image segmentation and three-dimensional information prediction on the RGB image, outputting the predicted point cloud data of the target product; Based on the predicted point cloud data, the partial point cloud data corresponding to the target product in the initial point cloud data is completed to obtain the completed point cloud data of the target product. Based on the completed point cloud data, the intelligent grasping device is controlled to grasp the target product from the target container, thereby realizing intelligent picking of the target product.

[0015] An object grasping method, applied on a server side, is used to perform a product shelfing operation on target goods stacked in a target container using an intelligent grasping device. The method includes: When it is determined that the target container is placed in the shelving operation area and a product shelving operation needs to be performed on the target product in the target container, a single deployed 3D camera is controlled to perform a single-frame scan of the target container to obtain the initial point cloud data and RGB image of the target container. The three-dimensional information prediction model is invoked, and the model performs image segmentation and three-dimensional information prediction on the RGB image, outputting the predicted point cloud data of the target product; Based on the predicted point cloud data, the partial point cloud data corresponding to the target product in the initial point cloud data is completed to obtain the completed point cloud data of the target product. Based on the completed point cloud data, the intelligent grasping device is controlled to grasp the target product from the target container and display the target product in the target location.

[0016] An object grasping method, applied on a server side, is used to perform a product packaging operation on target objects stacked inside a target container using an intelligent grasping device. The method includes: When it is determined that the target container is placed in the packaging operation area and a product packaging operation needs to be performed on the target product in the target container, a single deployed 3D camera is controlled to perform a single-frame scan of the target container to obtain the initial point cloud data and RGB image of the target container. The three-dimensional information prediction model is invoked, and the model performs image segmentation and three-dimensional information prediction on the RGB image, outputting the predicted point cloud data of the target product; Based on the predicted point cloud data, the partial point cloud data corresponding to the target product in the initial point cloud data is completed to obtain the completed point cloud data of the target product. Based on the completed point cloud data, the intelligent grasping device is controlled to grasp the target product from the target container and place the target product into the target packaging box.

[0017] A training method for a three-dimensional information prediction model, the method comprising: Obtain a sample RGB image and a sample 3D model of a preset object, wherein the sample RGB image includes image content generated by single-frame scanning of the preset object; Construct an initial model for 3D information prediction; Using the sample RGB image as input, the initial model is trained so that the initial model can perform image segmentation on the sample RGB image and predict the three-dimensional information of the preset object, and output the predicted point cloud data of the preset object. The predicted point cloud data is compared with the baseline point cloud data determined based on the sample 3D model of the preset object to obtain the prediction accuracy of the initial model during the model training process. The model effect is verified based on the prediction accuracy, and if the model effect verification is successful, a three-dimensional information prediction model is obtained so as to complete the point cloud data of objects collected by a single 3D camera.

[0018] The method further includes: obtaining knowledge information related to the three-dimensional information of the object, and optimizing the three-dimensional information prediction model.

[0019] A point cloud data processing device, applied on a server side, the device comprising: The data acquisition unit is used to acquire the initial point cloud data and RGB image output by a single-frame scan of a target container by a single deployed 3D camera. When multiple objects are piled up inside the target container and a grasping operation needs to be performed on the target object in the target container, the target container is placed in the acquisition area of ​​the 3D camera for single-frame scanning. The model invocation unit is used to take the RGB image as input, invoke the three-dimensional information prediction model, and have the model segment the target object from the RGB image and perform three-dimensional information prediction on the target object, and output the predicted point cloud data of the target object; The data completion unit is used to complete the part of the point cloud data corresponding to the target object in the initial point cloud data according to the predicted point cloud data, so as to obtain the completed point cloud data of the target object, so as to realize intelligent grasping of the target object based on the completed point cloud data.

[0020] An object grasping device, applied on a server side, is used to perform grasping operations on target objects piled up inside a target container via an intelligent grasping device. The device includes: The data acquisition unit is used to acquire the initial point cloud data and RGB image output by a single-frame scan of a target container by a single deployed 3D camera. When multiple objects are piled up inside the target container and a grasping operation needs to be performed on the target object in the target container, the target container is placed in the acquisition area of ​​the 3D camera for single-frame scanning. The model invocation unit is used to take the RGB image as input, invoke the three-dimensional information prediction model, and have the model segment the target object from the RGB image and perform three-dimensional information prediction on the target object, and output the predicted point cloud data of the target object; The data completion unit is used to complete the part of the point cloud data corresponding to the target object in the initial point cloud data according to the predicted point cloud data, so as to obtain the point cloud data of the target object after completion. The grasping control unit is used to perform regional planar fitting on the surface of the target object based on the completed point cloud data, determine the target partition with the highest flatness, and control the intelligent grasping device to grasp the target object through the target partition.

[0021] An object grasping device, applied on a server side, is used to perform product picking operations on target goods stacked in a target container via intelligent grasping equipment. The device includes: The data acquisition unit is used to determine that when the target container is placed in the picking operation area and a product picking operation needs to be performed on the target product in the target container, control a single deployed 3D camera to perform a single-frame scan of the target container to obtain the initial point cloud data and RGB image of the target container. The model invocation unit is used to invoke the three-dimensional information prediction model, which performs image segmentation and three-dimensional information prediction on the RGB image and outputs the predicted point cloud data of the target product. The data completion unit is used to complete the part of the point cloud data corresponding to the target product in the initial point cloud data according to the predicted point cloud data, so as to obtain the point cloud data of the target product after completion. The grasping control unit is used to control the intelligent grasping device to grasp the target product from the target container based on the completed point cloud data, thereby realizing intelligent picking of the target product.

[0022] An object grasping device, applied on a server side, is used to perform a product placement operation on target goods stacked in a target container via intelligent grasping equipment. The device includes: The data acquisition unit is used to determine that when the target container is placed in the shelving operation area and a product shelving operation needs to be performed on the target product in the target container, control a single deployed 3D camera to perform a single-frame scan of the target container to obtain the initial point cloud data and RGB image of the target container. The model invocation unit is used to invoke the three-dimensional information prediction model, which performs image segmentation and three-dimensional information prediction on the RGB image and outputs the predicted point cloud data of the target product. The data completion unit is used to complete the part of the point cloud data corresponding to the target product in the initial point cloud data according to the predicted point cloud data, so as to obtain the point cloud data of the target product after completion. The grasping control unit is used to control the intelligent grasping device to grasp the target product from the target container and display the target product in the target location based on the completed point cloud data.

[0023] An object grasping device, applied on a server side, is used to perform a product packaging operation on target objects stacked inside a target container via an intelligent grasping device. The device includes: The data acquisition unit is used to determine that when the target container is placed in the packaging operation area and a product packaging operation needs to be performed on the target product in the target container, it controls a single deployed 3D camera to perform a single-frame scan of the target container to obtain the initial point cloud data and RGB image of the target container. The model invocation unit is used to invoke the three-dimensional information prediction model, which performs image segmentation and three-dimensional information prediction on the RGB image and outputs the predicted point cloud data of the target product. The data completion unit is used to complete the part of the point cloud data corresponding to the target product in the initial point cloud data according to the predicted point cloud data, so as to obtain the point cloud data of the target product after completion. The grasping control unit is used to control the intelligent grasping device to grasp the target product from the target container and place the target product into the target packaging box based on the completed point cloud data.

[0024] A training device for a three-dimensional information prediction model, the device comprising: The sample data acquisition unit is used to acquire a sample RGB image and a sample 3D model of a preset object. The sample RGB image includes image content generated by single-frame scanning of the preset object. The initial model building unit is used to build an initial model for 3D information prediction. The model training unit is used to train the initial model with the sample RGB image as input, so that the initial model can perform image segmentation on the sample RGB image, predict the three-dimensional information of the preset object, and output the predicted point cloud data of the preset object. The data comparison unit is used to compare the predicted point cloud data with the reference point cloud data determined based on the sample 3D model of the preset object, and to obtain the prediction accuracy of the initial model during the model training process. The model verification unit is used to verify the model effect based on the prediction accuracy, and obtain a three-dimensional information prediction model if the model effect verification is passed, so as to complete the object point cloud data collected by a single 3D camera through the model.

[0025] A computer-readable storage medium having a computer program stored thereon that, when executed by a processor, implements the steps of any of the preceding methods.

[0026] An electronic device, comprising: One or more processors; and A memory associated with the one or more processors, the memory being used to store program instructions that, when read and executed by the one or more processors, perform the steps of any of the preceding methods.

[0027] A computer program product includes a computer program / computer executable instructions that, when executed by a processor in an electronic device, implement the steps of any of the preceding methods.

[0028] According to the specific embodiments provided in this application, the following technical effects are disclosed: This application embodiment can obtain initial point cloud data and RGB images simultaneously generated by a single deployed 3D camera during a single-frame scan. Both are images output after scanning the same target container and the objects stacked inside. Furthermore, the 2D RGB image is unaffected by light beam reflection. Therefore, a scheme is proposed that uses RGB images to infer predicted point cloud data of the target object, and uses the predicted point cloud data to complete the missing point cloud data in the initial point cloud data, effectively solving the problem of severely missing point cloud data. Furthermore, when performing intelligent object grasping based on the completed point cloud data, it also helps improve the accuracy of pose estimation, achieving precise object grasping. This scheme can better adapt to dynamic production line environments and effectively solve the problem of insufficient point cloud data in a single scan by a single 3D camera.

[0029] Of course, any product implementing this application does not necessarily need to achieve all of the advantages described above at the same time. Attached Figure Description

[0030] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0031] Figure 1 This is a flowchart of a point cloud data processing method provided in an embodiment of this application; Figure 2 This is a flowchart of the object grasping method provided in the embodiments of this application; Figure 3 This is a schematic diagram of a point cloud data processing device provided in an embodiment of this application; Figure 4 This is a schematic diagram of the object grasping device provided in the embodiments of this application; Figure 5This is a schematic diagram of the electronic device provided in the embodiments of this application. Detailed Implementation

[0032] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of this application are within the scope of protection of this application.

[0033] In intelligent warehouse systems, to improve operational efficiency, a 3D vision-guided object grasping solution is proposed. Intelligent grasping devices with end effectors, such as intelligent robotic arms, can grasp target objects from containers based on predicted poses, effectively improving operational efficiency through precise object grasping.

[0034] When a target object needs to be retrieved from a container, if the target object is partially obscured by other objects piled up inside the container, it may be impossible to collect surface point cloud data of the obscured area, resulting in missing point cloud data. Alternatively, occlusion may also change the illumination angle on the surface of the target object, causing reflection on the object surface and thus causing beam reflection failure, which will also lead to missing point cloud data. Or, occlusion may also cause shadows on the surface of the target object due to insufficient or no lighting, making it impossible to effectively identify the depth information of the shadowed area, resulting in missing point cloud data in that area.

[0035] Among them, beam reflection failure refers to the situation where the probe beam emitted by the 3D camera fails to return to the receiving module along the preset path, causing the camera to be unable to calculate the effective depth value of the measured object's surface. This ultimately manifests in the point cloud data as phenomena such as no point cloud in the corresponding area, missing point cloud, or holes. These can include: ① specular reflection, where the object's surface reflects the camera's probe beam out of the receiving field of view in a single direction, resulting in the camera not receiving any echo; ② strong light scattering, where the object's surface scatters the camera's probe beam into irregular diffuse reflection light, resulting in weak and scattered echo signals, making it impossible for the camera to identify effective ranging signals; ③ overexposure, where strong light (such as strong ambient light or strong emitted light from the object) causes the camera's receiving module's photosensitive element to reach saturation, resulting in signal overflow and distortion, making it impossible to distinguish between effective reflected light and interference light, thus causing the camera's output depth value to fail.

[0036] In addition, if the surface of the target object is made of a special material with high absorption and high reflectivity, the point cloud data scanned by the 3D camera may be missing in large areas due to the material itself.

[0037] Highly absorbent materials can be those that strongly absorb active light sources (such as structured light and lasers). Examples include black plastic, matte coatings, and carbon fiber, which absorb a lot of light but reflect little. If the surface of the target object is a highly absorbent material, it will absorb a large amount of projected infrared / visible light stripes, resulting in extremely weak reflected signals received by the camera, forming a point cloud "black hole" region. For example, in retail scenarios, when intelligently grasping products such as matte black boxes of fresh produce and dark-colored snack bags, the highly absorbent material on the product's outer surface may lead to severe deficiencies in point cloud data.

[0038] Highly reflective materials are those that produce strong specular or diffuse reflection of incident light. Examples include glass, stainless steel, electroplated parts, and polished metal surfaces—materials with strong reflectivity and a tendency to reflect light specularly. If the surface of a target object is a highly reflective material, it can easily cause saturation, ghosting, or mismatching of the photosensitive element, resulting in incorrect or missing depth values. For example, in retail scenarios, when intelligently grasping products such as beverage bottles, laminated packaging, and metal cans, the high reflectivity of the product's outer surface can lead to severe deficiencies in point cloud data.

[0039] To address the issue of missing point cloud data, existing solutions mostly focus on hardware-level lighting optimization combined with multi-angle scanning to minimize the degree of missing points. However, this approach is not suitable for dynamic production line environments. This is because deploying multiple 3D cameras at various angles is impractical in dynamic production line environments; typically, there is only one top-down acquisition perspective, making it impossible to scan object point cloud data from multiple angles.

[0040] A dynamic production line is a flexible and reconfigurable production line system that can quickly adjust production processes and resource allocation according to changes in production tasks. In a dynamic production line environment, the production line is typically divided into multiple independent functional units. For example, in a new retail scenario, it can be divided into functional units such as shelving, picking, and packaging, with goods flowing between these units. When production demands change, only the connection relationships and process parameters of each unit need to be configured, without the need for large-scale hardware modifications.

[0041] As mentioned above, effectively solving the problem of insufficient point cloud data in a single scan is a key technical point for accurate object grasping.

[0042] In response, this application provides a point cloud data completion scheme to effectively solve the problem of insufficient point cloud data in a single scan. When this scheme is applied to intelligent object grasping, even in cases of severe point cloud deficiencies, it can accurately identify the target object to be grasped and plan a stable grasping strategy.

[0043] The implementation process of the point cloud data processing method in this application embodiment will be explained below with specific examples. See [link to relevant documentation]. Figure 1The flowchart shown may include: S101: Obtain initial point cloud data and RGB image output by a single-frame scan of the target container by a single deployed 3D camera; when multiple objects are piled up inside the target container and a grasping operation needs to be performed on the target object in the target container, place the target container in the acquisition area of ​​the 3D camera for single-frame scan.

[0044] To improve the operational flexibility of smart warehouse systems, dynamic production lines are typically deployed for tasks such as shelving, picking, and packaging goods. This allows for rapid adjustments to the production process when production tasks change, adapting to the warehouse system, especially the dynamic adjustment needs of new retail scenarios. However, constrained by the dynamic production line environment, automated object grasping using intelligent grasping devices can only deploy a single 3D camera, employing single-frame scanning to ensure scanning and grasping efficiency. In this single-camera, single-frame scanning scenario is prone to issues such as insufficient point cloud data in a single scan due to factors like lighting, occlusion, and special surface materials of the object, affecting the accuracy of intelligent grasping.

[0045] Correspondingly, this application provides a solution for data completion of point cloud data for single-frame scanning, which can better adapt to the use scenario of dynamic production line environment where only a single 3D camera can be deployed.

[0046] As an example, the point cloud data completion solution in this application can be applied to the server side. Specifically, the server can be deployed on a cloud server or, depending on usage requirements, on an edge computing device, ensuring both the integrity of the point cloud data and the efficiency of the completion processing.

[0047] Typically, a target container can hold multiple objects. When a grasping operation is required on a target object within the container, the container can be placed within the capture area of ​​a 3D camera. The server then controls the 3D camera to perform a single-frame scan based on the grasping requirements, outputting the initial point cloud data and RGB image of the target container and its contents. The target container can be a cargo box (also called a material bin), a goods box, a turnover box, a pallet, or any other structure capable of accommodating and holding multiple objects. This application embodiment does not limit the form of the target container. The objects stacked within the target container can have different forms depending on the application scenario of the grasping scheme; for example, they could be goods in a new retail scenario, parts used in equipment manufacturing, etc. This application embodiment does not limit this as well.

[0048] S102: Using the RGB image as input, call the three-dimensional information prediction model, the model segments the target object from the RGB image and performs three-dimensional information prediction on the target object, and outputs the predicted point cloud data of the target object.

[0049] Based on the above analysis of the causes of point cloud missing data, this application proposes a scheme for predicting 3D information using 2D RGB images and then using the predicted point cloud data for data completion. This is because 2D images are not acquired based on the principle of beam reflection such as laser or structured light, and therefore do not have the problem of missing point clouds. Thus, this application trains a 3D information prediction model, uses a 2D image as input, and performs image segmentation and 3D information prediction, inferring the predicted point cloud data of the target object based on the 2D image. Here, RGB image refers to a color image synthesized based on the three basic color channels of red, green, and blue.

[0050] As an example, embodiments of this application may provide a method for training a three-dimensional information prediction model, including: First, a sample RGB image and a sample 3D model of a preset object can be obtained. The sample RGB image includes image content generated by single-frame scanning of the preset object.

[0051] Taking the new retail scenario as an example, commonly used products can be identified as preset objects, RGB images of commonly used products can be collected as sample images, and commonly used products can be pre-modeled to obtain 3D models of products presented from different angles.

[0052] Secondly, an initial model can be constructed for predicting three-dimensional information.

[0053] As an example, the initial model can be a traditional model, fully leveraging its fast response capabilities and high vertical domain accuracy to achieve 3D information reasoning; or, the initial model can be an artificial intelligence (AI) model. Traditional models can be built based on mathematical equations and predefined rules, primarily operating through rule mapping, with a relatively small parameter size and characteristics such as deterministic output and strong interpretability. AI models, on the other hand, can be deep learning models containing massive amounts of parameters. Due to their large parameter size, these models can store and process vast amounts of information, thereby achieving higher performance across various tasks.

[0054] Next, the initial model can be trained using the sample RGB image as input, so that the initial model can perform image segmentation on the sample RGB image, predict the three-dimensional information of the preset object, and output the predicted point cloud data of the preset object.

[0055] In other words, the model in this application has image segmentation capability and three-dimensional information reasoning capability. It can perform image segmentation on 2D images to determine the region where a preset object is located, and then perform model reasoning to obtain the predicted point cloud data of the preset object.

[0056] In one example, a predefined object can be selectively processed. For instance, the predefined object can be segmented from a sample RGB image, and 3D information prediction can be performed solely on the region where the predefined object is located. Correspondingly, the model's output prediction result can include only the predicted point cloud data of the predefined object. In another example, all objects in the sample RGB image can be comprehensively segmented. After determining the regions where different objects are located, 3D information prediction can be performed on the objects within each region. Correspondingly, the model's output prediction result can include the predicted point cloud data of the predefined object, as well as the predicted point cloud data of other segmented objects.

[0057] As an example, the model can infer and predict the depth information of an object by performing 3D information prediction. By combining the predicted depth information, RGB images, and intrinsic parameters of the 3D camera, the predicted point cloud data of the object can be obtained. For details on the implementation process, please refer to relevant technologies. Here, depth information refers to the distance from the object in the scene to the camera.

[0058] Finally, the predicted point cloud data is compared with the baseline point cloud data determined based on the sample 3D model of the preset object to obtain the prediction accuracy of the initial model during the model training process. The model effect is verified based on the prediction accuracy. Then, if the model effect verification is successful, a 3D information prediction model is obtained. Subsequently, the model can be used to complete the object point cloud data collected by a single 3D camera.

[0059] In other words, the embodiments of this application can extract reference point cloud data from the pre-modeling model of a preset object, compare the data with the predicted point cloud data of the model inference, obtain the prediction accuracy during the model training process, verify the model effect accordingly, and if the verification is passed, for example, if the prediction accuracy is not lower than the preset value or the prediction accuracy no longer increases, the three-dimensional information prediction model of the embodiments of this application can be obtained.

[0060] Optionally, knowledge information related to the object's three-dimensional information can also be obtained, such as geometric information representing the object's shape, structure, and size in three-dimensional space. Based on this, the three-dimensional information prediction model can be optimized, which helps improve the model's generalization ability and increase its prediction accuracy for unknown objects. This is because there are many categories of objects in real-world applications, and objects within the same category may have various different specifications. Taking goods in new retail applications as an example, goods such as bottled water have many specifications. Considering the time and manpower costs of pre-modeling, it is impossible to pre-model objects of all specifications separately. Furthermore, for goods such as fruits and vegetables, the varying sizes and lengths lead to uncertainties in specification information, making it difficult to pre-model objects of all specifications separately from a pre-modeling feasibility perspective. Therefore, this application embodiment, through model training and optimization, enables the three-dimensional information prediction model to predict the three-dimensional information of objects relatively accurately based on 2D images, better covering data completion scenarios for unknown objects.

[0061] Thus, once the server determines that it needs to perform a grabbing operation on the target object in the target container and obtains the RGB image of a single frame scan by the 3D camera, it can call the 3D information prediction model. The model can then segment the target object from the RGB image and perform 3D information prediction on the target object, outputting the predicted point cloud data of the target object.

[0062] Regarding the target object, different methods can be used in different intelligent grasping scenarios to determine the target object from multiple objects stacked in the target container.

[0063] In one grasping scenario, multiple objects stacked inside a target container can be the same object.

[0064] Correspondingly, any object can be randomly selected as the target object for point cloud data completion. Alternatively, based on the point cloud missing data of multiple identical objects in the initial point cloud data, the object with the fewest missing points can be identified as the target object. For example, after obtaining the initial point cloud data from a single-frame scan by a 3D camera, missing area detection can be performed to identify target objects with no or minimal point cloud missing data, thus determining the coordinate region of the target object. Because the RGB image and the initial point cloud data are obtained from a single-frame scan by a 3D camera, their coordinates are consistent. Therefore, the 3D information prediction model can perform targeted 3D information prediction for the target object within the coordinate region, outputting the predicted point cloud data of the target object for subsequent completion processing; or, the 3D information prediction model can perform 3D information prediction for all scanned objects separately, outputting the predicted point cloud data of multiple objects, and then selecting the predicted point cloud data of the target object based on the coordinate region.

[0065] In another grasping scenario, the multiple objects stacked inside the target container can be different objects.

[0066] Correspondingly, the target object can be determined from multiple objects based on the grasping task. Specifically, upon receiving a request to perform a grasping operation on the target object in the target container, an object detection model can be invoked. This model identifies the target object from the RGB image to complete the point cloud data of the target object. In one example, the object detection model can be invoked first to identify the target object, and then a 3D information prediction model can be invoked to perform targeted 3D information prediction for the target object. In another example, the 3D information prediction model can be invoked first to perform 3D information prediction for all scanned objects, and then the object detection model can be invoked to identify the target object. The predicted point cloud data of the target object can then be determined from the predicted point cloud data of multiple objects output by the model.

[0067] S103: Based on the predicted point cloud data, the partial point cloud data corresponding to the target object in the initial point cloud data is completed to obtain the completed point cloud data of the target object, so as to realize intelligent grasping of the target object based on the completed point cloud data.

[0068] After obtaining the predicted point cloud data of the target object, coordinate alignment can be performed. A portion of the target object's point cloud data is determined from the initial point cloud data, and the predicted point cloud data is used to complete it. Specifically, the predicted point cloud data is used to fill in the missing positions in the point cloud data, obtaining the completed point cloud data of the target object. This completed point cloud data can then be used for pose estimation of the intelligent grasping device to achieve intelligent grasping of the target object.

[0069] Optionally, if a 3D model of a known object is pre-saved, after obtaining the predicted point cloud data output by the 3D information prediction model, the point cloud data used for completion processing can be selected based on the confidence information corresponding to the predicted point cloud data. Specifically, the confidence information of the model's 3D information prediction of the target object can be obtained. If the confidence is not less than a preset value, data completion can be achieved based on the predicted point cloud data; if the confidence is less than the preset value, and a 3D model of the target object is pre-built, data completion can be achieved based on the 3D model of the target object. In other words, the 3D model of the target object can be registered with the corresponding part of the point cloud data of the target object in the initial point cloud data, and the missing positions of the point cloud data can be filled in according to the 3D model to obtain the point cloud data of the target object after completion.

[0070] Based on the point cloud data completion scheme described above, this application embodiment can also provide an intelligent object grasping scheme, which performs pose estimation based on the point cloud data after completion processing, which helps to achieve accurate object grasping.

[0071] The implementation process of the object grasping method in this application embodiment will be explained below with specific examples. See [link to relevant documentation]. Figure 2 The flowchart shown may include: S201: Obtain initial point cloud data and RGB image output by a single-frame scan of the target container by a single-deployed 3D camera; when multiple objects are piled up inside the target container and a grasping operation needs to be performed on the target object in the target container, place the target container in the acquisition area of ​​the 3D camera for single-frame scan.

[0072] S202: Using the RGB image as input, call the three-dimensional information prediction model, and have the model segment the target object from the RGB image and perform three-dimensional information prediction on the target object, and output the predicted point cloud data of the target object.

[0073] S203: Based on the predicted point cloud data, perform completion processing on the partial point cloud data corresponding to the target object in the initial point cloud data to obtain the point cloud data of the target object after completion.

[0074] The specific implementation process of steps S201 to S203 in this embodiment can be referred to the description in S101 to S103 above, and will not be illustrated here.

[0075] It should be noted that in the intelligent object grasping solution of this application, the 3D camera can be deployed individually in different ways according to the usage requirements.

[0076] In one approach, a work area for object grasping operations can be defined, and a single 3D camera can be fixedly deployed within this work area, ensuring that the camera's acquisition area covers the entire work area. For example, the acquisition area can cover the entire work area or only a portion of the work area used to place the container. The goal is to ensure that when the target container is placed within the work area, the 3D camera performs a single-frame scan of its acquisition area, capturing the initial point cloud data and RGB image of the target container and the multiple objects stacked within it. Thus, when the server determines that a target container is placed within the work area and a grasping operation is needed for the target object within the container, it can control the 3D camera to perform a single-frame scan of the target container within its acquisition area.

[0077] In another approach, a work area for object grasping operations can be defined, and a single 3D camera can be fixedly deployed on the intelligent grasping device. The 3D camera can move flexibly with the device. When object grasping is required, the target container can be placed within the work area. Simultaneously, the intelligent grasping device can be controlled to move to the work area, ensuring that the acquisition area of ​​the 3D camera deployed on the intelligent grasping device covers the work area. Thus, when the server determines that a target container is placed within the work area and a grasping operation is needed for the target object within the container, it can control the 3D camera deployed on the device to perform a single-frame scan of the target container within the acquisition area.

[0078] This application does not impose specific limitations on the deployment method of the 3D camera, as long as the 3D camera has a fixed and stable acquisition area when performing single-frame scanning.

[0079] S204: Based on the completed point cloud data, perform regional planar fitting on the surface of the target object to determine the target partition with the highest flatness, and control the intelligent grasping device to grasp the target object through the target partition.

[0080] After obtaining the point cloud data of the target object, the surface of the target object can be fitted with regional planes to determine the grasping plane of the intelligent grasping device for the target object.

[0081] As an example, a fixed-size sliding window can be preset according to usage requirements to divide the surface of the target object into multiple partitions. Plane fitting is then performed on each partition of the sliding window to determine the relatively flattest target partition. The intelligent grasping device is then controlled to intelligently grasp the target object based on this target partition. As another example, the variance of the distance from each point within a partition to the fitted plane can be calculated, and the flatness of the partition can be determined based on the variance. Typically, the partition with the smallest variance has the highest flatness, i.e., it is relatively the flattest. This partition with the smallest variance can be determined as the target partition, and pose estimation can be performed based on the target partition. This method of finding the relatively flattest target partition from the surface of the target object by fitting a plane can better accommodate irregularly shaped objects and achieve accurate grasping of irregularly shaped objects.

[0082] In this embodiment of the application, the intelligent grasping device is equipped with an end effector, which can be used to grasp objects.

[0083] In one example, the intelligent gripping device can grasp target objects using vacuum suction. The corresponding end effector can be a vacuum suction cup, which can adapt to goods on flat surfaces, such as cardboard boxes, mask boxes, carbon fiber boxes, etc. Accordingly, the center of the target area can be used as the suction center, and the plane normal vector can be taken as the suction pose. The movement of the end effector is controlled according to the suction pose to achieve intelligent gripping of the target object through vacuum suction.

[0084] In another example, intelligent gripping devices can grasp target objects by clamping, and the corresponding end effector can be a gripper that can be adapted to hard goods with edges or gripping surfaces, such as aluminum cans, metal boxes, toiletries bottles, etc.

[0085] In another example, intelligent grasping devices can grasp target objects by hooking or lifting, and the corresponding end effector can be a hook or a pallet, which can be adapted to soft bags or irregularly shaped goods such as snack bags and fresh food bags.

[0086] This application does not impose specific limitations on the gripping method of the intelligent gripping device, the performance form of the end effector, or the types of goods that can be adapted. In a dynamic production line environment, it can be flexibly adjusted and configured according to usage requirements.

[0087] Optionally, embodiments of this application can also monitor the grasping results. If the grasping result returned by the intelligent grasping device indicates that the target object has been successfully grasped, the intelligent grasping of other objects can continue or the intelligent grasping process can be terminated according to the operational requirements. If the grasping result returned by the intelligent grasping device indicates that the target object has failed to be grasped, and the grasping process has caused a change in the stacking method of objects in the target container, the 3D camera can be controlled to perform a single-frame scan of the target container again, so as to re-complete the point cloud data based on the new initial point cloud data and the new RGB image, thereby realizing the intelligent grasping of the target object.

[0088] This is because, when object grasping is based on the completed point cloud data, even if the grasping fails, the end effector will most likely encounter the target object. For example, it may encounter the target object but fail to grasp it, or it may successfully grasp the target object but fall back into the target container during the grasping process. These situations may cause changes in the stacking of objects in the target container, resulting in changes in the occlusion of the target object, the lighting environment, etc. As a result, the new initial point cloud data obtained by rescanning may no longer have the problem of missing points or the degree of missing points may be reduced. Based on the relatively better point cloud data, data completion is performed again, which helps to achieve accurate grasping of the target object.

[0089] As described above, the intelligent object grasping solution of this application embodiment can be applied to intelligent store and warehouse systems, especially physical stores, forward warehouses, central warehouses and other areas in the new retail scenario. It can automate the grasping of goods during operations such as picking, shelving and packaging, which helps to improve operational efficiency and grasping accuracy.

[0090] As an example, the implementation process of the intelligent object grasping in this application can be explained by taking the example of performing a product picking operation on the target products stacked in the target container using an intelligent grasping device.

[0091] First, when the server determines that the target container is placed in the picking area and that a product picking operation needs to be performed on the target product in the target container, it can control a single deployed 3D camera to perform a single-frame scan of the target container to obtain the initial point cloud data and RGB image of the target container. As described above, a single 3D camera can be fixedly deployed in the picking area and bound to the picking area; or it can be fixedly deployed on the intelligent gripping device and bound to the intelligent gripping device. Taking a fixed location in the picking area as an example, when the server receives a picking task for the target product in the target container, it can determine the picking area where the target container is placed, send a scanning command to the 3D camera bound to that area, and control the 3D camera to perform a single-frame scan within its acquisition area to obtain the initial point cloud data and RGB image of the target container and the multiple products stacked inside it. It is understood that in the embodiments of this application, a single 3D camera typically performs a single-frame scan from a single perspective, such as acquiring object point cloud data from a top-down perspective.

[0092] Secondly, a 3D information prediction model can be invoked to perform image segmentation and 3D information prediction on the RGB image, outputting predicted point cloud data of the target product. Then, based on the predicted point cloud data, the corresponding part of the point cloud data of the target product in the initial point cloud data is completed to obtain the completed point cloud data of the target product. The specific completion process can be found in the above description, and will not be illustrated here.

[0093] Finally, pose estimation can be performed based on the completed point cloud data, and the calculated pose information can be used to control the intelligent gripping device to pick up the target product from the target container, thus achieving intelligent picking of the target product. As an example, the intelligent gripping device can pick up the target product and place it on a conveyor belt, which will then transport the product to the corresponding compartment, completing the product sorting operation. Alternatively, the intelligent gripping device can pick up the target product and place it in a sorting tote. This application embodiment does not specifically limit the operations performed after the device picks up the target product.

[0094] As another example, the implementation process of the intelligent object grasping in this application can be explained by taking the example of performing a product shelving operation on the target goods piled up in the target container using an intelligent grasping device.

[0095] First, when the server determines that the target container is placed in the shelf operation area and needs to perform a shelf operation on the target product in the target container, it controls a single deployed 3D camera to perform a single-frame scan of the target container to obtain the initial point cloud data and RGB image of the target container.

[0096] Secondly, a 3D information prediction model can be invoked to perform image segmentation and 3D information prediction on the RGB image, outputting the predicted point cloud data of the target product. Then, based on the predicted point cloud data, the corresponding part of the point cloud data of the target product in the initial point cloud data is completed to obtain the completed point cloud data of the target product.

[0097] The specific implementation process described above can be found in the text above; no further examples will be provided here.

[0098] Finally, pose estimation can be performed based on the completed point cloud data, and the calculated pose information can be used to control the intelligent grasping device to grasp the target product from the target container and display the target product in the target location. The target location can be a storage location on a shelf in the back area of ​​a warehouse or store, meaning the intelligent grasping device can perform the product placement operation in the back area. Alternatively, the target location can be a storage location on a shelf in the front area of ​​a store, meaning the intelligent grasping device can perform the product placement operation in the front area. This application embodiment does not specifically limit the operating area for performing the placement operation.

[0099] As another example, the implementation process of the intelligent object grasping in this application can be explained by taking the example of performing a product packaging operation on the target objects stacked in the target container using an intelligent grasping device.

[0100] First, the server determines that the target container is placed in the packaging operation area, and when it needs to perform product packaging operation on the target product in the target container, it controls a single deployed 3D camera to perform a single-frame scan of the target container to obtain the initial point cloud data and RGB image of the target container.

[0101] Secondly, a 3D information prediction model can be invoked to perform image segmentation and 3D information prediction on the RGB image, outputting the predicted point cloud data of the target product. Then, based on the predicted point cloud data, the corresponding part of the point cloud data of the target product in the initial point cloud data is completed to obtain the completed point cloud data of the target product.

[0102] The specific implementation process described above can be found in the text above; no further examples will be provided here.

[0103] Finally, pose estimation can be performed based on the completed point cloud data, and the calculated pose information can be used to control an intelligent grasping device to pick up the target product from the target container and place it into the target packaging box. For example, if the target container is a sorting turnover box, the order products stacked in the turnover box can be picked up and placed into the corresponding packaging box, realizing automated packaging operations.

[0104] In summary, the embodiments of this application can obtain initial point cloud data and RGB images simultaneously generated by a single deployed 3D camera during a single-frame scan. Both are images output after scanning the same target container and the objects stacked inside. Furthermore, the 2D RGB image is unaffected by light beam reflection. Therefore, a scheme is proposed that uses RGB images to infer predicted point cloud data of the target object, and uses the predicted point cloud data to complete the missing point cloud data in the initial point cloud data, effectively solving the problem of severely missing point cloud data. Furthermore, when performing intelligent object grasping based on the completed point cloud data, it also helps improve the accuracy of pose estimation, achieving precise object grasping. This scheme can better adapt to dynamic production line environments and effectively solve the problem of insufficient point cloud data in a single scan by a single 3D camera.

[0105] It should be noted that the embodiments of this application may involve the use of user data. In practical applications, user-specific personal data may be used in the scheme described herein within the scope permitted by applicable laws and regulations, provided that it complies with the applicable laws and regulations of the country (e.g., with the user's explicit consent, with the user being properly notified, etc.).

[0106] Corresponding to the foregoing method embodiments, this application also provides a point cloud data processing apparatus, applied on a server side. See also Figure 3 The device may include: The data acquisition unit 301 is used to acquire the initial point cloud data and RGB image output by a single-frame scan of a target container by a single-deployed 3D camera; when multiple objects are piled up in the target container and a grasping operation needs to be performed on the target object in the target container, the target container is placed in the acquisition area of ​​the 3D camera for single-frame scan. The model calling unit 302 is used to call a three-dimensional information prediction model with the RGB image as input, and the model segments the target object from the RGB image and performs three-dimensional information prediction on the target object, and outputs the predicted point cloud data of the target object. The data completion unit 303 is used to complete the part of the point cloud data corresponding to the target object in the initial point cloud data according to the predicted point cloud data, so as to obtain the completed point cloud data of the target object, so as to realize intelligent grasping of the target object based on the completed point cloud data.

[0107] The device further includes: The confidence level acquisition unit is used to acquire the confidence level information of the model in predicting the three-dimensional information of the target object; The data completion unit can also be used to: when the confidence level is less than a preset value and a 3D model of the target object is pre-modeled, to complete the part of the point cloud data corresponding to the target object in the initial point cloud data according to the 3D model, so as to obtain the point cloud data of the target object after completion.

[0108] The device further includes: The target object determination unit is used to determine the object with the fewest missing points in the initial point cloud data as the target object when multiple objects stacked in the target container are the same object, based on the point cloud missing information of multiple identical objects in the initial point cloud data, so as to perform point cloud data completion processing for the target object.

[0109] The device further includes: The target object determination unit is used to receive a request to perform a grasping operation on the target object in the target container when the multiple objects stacked in the target container are different objects; and to call the target detection model to identify the target object from the RGB image so as to perform point cloud data completion processing on the target object.

[0110] Corresponding to the foregoing method embodiments, this application also provides an object grasping device, applied on a server side, for performing grasping operations on target objects piled up inside a target container using intelligent grasping equipment. See also Figure 4 The device may include: The data acquisition unit 401 is used to acquire the initial point cloud data and RGB image output by a single-frame scan of a target container by a single-deployed 3D camera; when multiple objects are piled up in the target container and a grasping operation needs to be performed on the target object in the target container, the target container is placed in the acquisition area of ​​the 3D camera for single-frame scan. The model calling unit 402 is used to call a three-dimensional information prediction model with the RGB image as input, and the model segments the target object from the RGB image and performs three-dimensional information prediction on the target object, and outputs the predicted point cloud data of the target object. The data completion unit 403 is used to complete the part of the point cloud data corresponding to the target object in the initial point cloud data according to the predicted point cloud data, so as to obtain the point cloud data of the target object after completion. The grasping control unit 404 is used to perform regional plane fitting on the surface of the target object based on the completed point cloud data, determine the target partition with the highest flatness, and control the intelligent grasping device to grasp the target object through the target partition.

[0111] The device further includes: The scanning control unit is used to obtain the grasping result returned by the intelligent grasping device. If the grasping result indicates that the target object has failed to be grasped and the grasping process has caused the stacking method of the objects in the target container to change, the 3D camera is controlled to perform a single-frame scan of the target container again so as to realize the intelligent grasping of the target object based on the new initial point cloud data and the new RGB image.

[0112] The system includes a work area for performing object grasping operations, and a single 3D camera is fixedly deployed in the work area, with the acquisition area of ​​the 3D camera covering the work area. The data completion unit can be used to: control the 3D camera to perform a single-frame scan of the target container in the acquisition area when it is determined that a target container is placed in the work area and a grasping operation needs to be performed on the target object in the target container.

[0113] The system includes setting up a work area for performing object grasping operations, and fixing a single 3D camera on an intelligent grasping device. The intelligent grasping device is then moved to the work area so that the acquisition area of ​​the 3D camera covers the work area. The data completion unit can be used to: control the 3D camera to perform a single-frame scan of the target container in the acquisition area when it is determined that a target container is placed in the work area and a grasping operation needs to be performed on the target object in the target container.

[0114] Corresponding to the foregoing method embodiments, this application also provides an object grasping device, applied on a server side, for performing product picking operations on target products stacked in a target container using intelligent grasping equipment. The device includes: The data acquisition unit is used to determine that when the target container is placed in the picking operation area and a product picking operation needs to be performed on the target product in the target container, control a single deployed 3D camera to perform a single-frame scan of the target container to obtain the initial point cloud data and RGB image of the target container. The model invocation unit is used to invoke the three-dimensional information prediction model, which performs image segmentation and three-dimensional information prediction on the RGB image and outputs the predicted point cloud data of the target product. The data completion unit is used to complete the part of the point cloud data corresponding to the target product in the initial point cloud data according to the predicted point cloud data, so as to obtain the point cloud data of the target product after completion. The grasping control unit is used to control the intelligent grasping device to grasp the target product from the target container based on the completed point cloud data, thereby realizing intelligent picking of the target product.

[0115] Corresponding to the aforementioned method embodiments, this application also provides an object grasping device, applied on a server side, for performing a product shelving operation on target goods stacked in a target container using an intelligent grasping device. The device includes: The data acquisition unit is used to determine that when the target container is placed in the shelving operation area and a product shelving operation needs to be performed on the target product in the target container, control a single deployed 3D camera to perform a single-frame scan of the target container to obtain the initial point cloud data and RGB image of the target container. The model invocation unit is used to invoke the three-dimensional information prediction model, which performs image segmentation and three-dimensional information prediction on the RGB image and outputs the predicted point cloud data of the target product. The data completion unit is used to complete the part of the point cloud data corresponding to the target product in the initial point cloud data according to the predicted point cloud data, so as to obtain the point cloud data of the target product after completion. The grasping control unit is used to control the intelligent grasping device to grasp the target product from the target container and display the target product in the target location based on the completed point cloud data.

[0116] Corresponding to the foregoing method embodiments, this application also provides an object grasping device, applied on a server side, for performing product packaging operations on target objects stacked in a target container using an intelligent grasping device, the device comprising: The data acquisition unit is used to determine that when the target container is placed in the packaging operation area and a product packaging operation needs to be performed on the target product in the target container, it controls a single deployed 3D camera to perform a single-frame scan of the target container to obtain the initial point cloud data and RGB image of the target container. The model invocation unit is used to invoke the three-dimensional information prediction model, which performs image segmentation and three-dimensional information prediction on the RGB image and outputs the predicted point cloud data of the target product. The data completion unit is used to complete the part of the point cloud data corresponding to the target product in the initial point cloud data according to the predicted point cloud data, so as to obtain the point cloud data of the target product after completion. The grasping control unit is used to control the intelligent grasping device to grasp the target product from the target container and place the target product into the target packaging box based on the completed point cloud data.

[0117] Corresponding to the foregoing method embodiments, this application also provides a training device for a three-dimensional information prediction model, the device comprising: The sample data acquisition unit is used to acquire a sample RGB image and a sample 3D model of a preset object. The sample RGB image includes image content generated by single-frame scanning of the preset object. The initial model building unit is used to build an initial model for 3D information prediction. The model training unit is used to train the initial model with the sample RGB image as input, so that the initial model can perform image segmentation on the sample RGB image, predict the three-dimensional information of the preset object, and output the predicted point cloud data of the preset object. The data comparison unit is used to compare the predicted point cloud data with the reference point cloud data determined based on the sample 3D model of the preset object, and to obtain the prediction accuracy of the initial model during the model training process. The model verification unit is used to verify the model effect based on the prediction accuracy, and obtain a three-dimensional information prediction model if the model effect verification is passed, so as to complete the object point cloud data collected by a single 3D camera through the model.

[0118] The device further includes: The model optimization unit is used to obtain knowledge information related to the three-dimensional information of the object and to optimize the three-dimensional information prediction model.

[0119] In addition, embodiments of this application also provide a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the steps of the method described in any of the foregoing method embodiments.

[0120] And an electronic device, comprising: One or more processors; and A memory associated with the one or more processors, the memory being used to store program instructions that, when read and executed by the one or more processors, perform the steps of the method described in any of the foregoing method embodiments.

[0121] A computer program product includes a computer program / computer executable instructions that, when executed by a processor in an electronic device, implement the steps of the method described in the foregoing method embodiments.

[0122] in, Figure 5 The architecture of an electronic device is illustrated by example. For instance, device 500 could be a mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, medical device, fitness equipment, personal digital assistant, aircraft, etc.

[0123] Reference Figure 5 The device 500 may include one or more of the following components: processing component 502, memory 504, power supply component 506, multimedia component 508, audio component 510, input / output (I / O) interface 512, sensor component 514, and communication component 516.

[0124] Processing component 502 typically controls the overall operation of device 500, such as operations associated with display, telephone calls, data communication, camera operation, and recording operations. Processing component 502 may include one or more processors 520 to execute instructions to perform all or part of the steps of the methods provided in this disclosure. Furthermore, processing component 502 may include one or more modules to facilitate interaction between processing component 502 and other components. For example, processing component 502 may include a multimedia module to facilitate interaction between multimedia component 508 and processing component 502.

[0125] Memory 504 is configured to store various types of data to support the operation of device 500. Examples of this data include instructions for any application or method operating on device 500, contact data, phonebook data, messages, pictures, videos, etc. Memory 504 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.

[0126] Power supply component 506 provides power to various components of device 500. Power supply component 506 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to device 500.

[0127] Multimedia component 508 includes a screen that provides an output interface between device 500 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touchscreen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensors may sense not only the boundaries of touch or swipe actions but also the duration and pressure associated with the touch or swipe operation. In some embodiments, multimedia component 508 includes a front-facing camera and / or a rear-facing camera. When device 500 is in an operating mode, such as a shooting mode or a video mode, the front-facing camera and / or rear-facing camera may receive external multimedia data. Each front-facing camera and rear-facing camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

[0128] Audio component 510 is configured to output and / or input audio signals. For example, audio component 510 includes a microphone (MIC) configured to receive external audio signals when device 500 is in an operating mode, such as call mode, recording mode, and voice recognition mode. The received audio signals may be further stored in memory 504 or transmitted via communication component 516. In some embodiments, audio component 510 also includes a speaker for outputting audio signals.

[0129] I / O interface 512 provides an interface between processing component 502 and peripheral interface modules, such as keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to, home buttons, volume buttons, power buttons, and lock buttons.

[0130] Sensor assembly 514 includes one or more sensors for providing state assessments of various aspects of device 500. For example, sensor assembly 514 may detect the on / off state of device 500, the relative positioning of components such as the display and keypad of device 500, changes in the position of device 500 or a component of device 500, the presence or absence of user contact with device 500, the orientation or acceleration / deceleration of device 500, and temperature changes of device 500. Sensor assembly 514 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, sensor assembly 514 may also include an accelerometer, a gyroscope, a magnetometer, a pressure sensor, or a temperature sensor.

[0131] Communication component 516 is configured to facilitate wired or wireless communication between device 500 and other devices. Device 500 can access wireless networks based on communication standards, such as WiFi, or mobile communication networks such as 2G, 5G, 4G / LTE, and 5G. In one exemplary embodiment, communication component 516 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, communication component 516 also includes a near-field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

[0132] In an exemplary embodiment, device 500 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the methods described above.

[0133] In an exemplary embodiment, a non-transitory computer-readable storage medium including instructions is also provided, such as a memory 504 including instructions, which can be executed by a processor 520 of device 500 to perform the method provided by the present disclosure. For example, the non-transitory computer-readable storage medium may be a ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc.

[0134] As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general-purpose hardware platforms. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in various embodiments or some parts of the embodiments of this application.

[0135] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, for system or system embodiments, since they are basically similar to method embodiments, the description is relatively simple, and relevant parts can be referred to the descriptions in the method embodiments. The systems and system embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without creative effort.

[0136] The solutions provided in this application have been described in detail above. Specific examples have been used to illustrate the principles and implementation methods of this application. The descriptions of the above embodiments are only for the purpose of helping to understand the methods and core ideas of this application. Furthermore, those skilled in the art will recognize that, based on the ideas of this application, there will be changes in the specific implementation methods and application scope. Therefore, the content of this specification should not be construed as a limitation of this application.

Claims

1. A method for processing point cloud data, characterized in that, Applied to the server side, the method includes: A single-frame scan of a target container is performed by a 3D camera, outputting initial point cloud data and RGB images. When multiple objects are piled up inside the target container, and a grasping operation needs to be performed on the target object in the target container, the target container is placed in the acquisition area of ​​the 3D camera for single-frame scan. Using the RGB image as input, a three-dimensional information prediction model is invoked. The model segments the target object from the RGB image and performs three-dimensional information prediction on the target object, outputting the predicted point cloud data of the target object. Based on the predicted point cloud data, the partial point cloud data corresponding to the target object in the initial point cloud data is completed to obtain the completed point cloud data of the target object, so as to realize intelligent grasping of the target object based on the completed point cloud data.

2. The method according to claim 1, characterized in that, The method further includes: Obtain the confidence information of the model in predicting the three-dimensional information of the target object; If the confidence level is less than a preset value, and a 3D model of the target object is pre-modeled, then based on the 3D model, the partial point cloud data corresponding to the target object in the initial point cloud data is completed to obtain the completed point cloud data of the target object.

3. The method according to claim 1 or 2, characterized in that, If the multiple objects piled up inside the target container are the same object, then Based on the point cloud missing information of multiple identical objects in the initial point cloud data, the object with the fewest missing points is identified as the target object, so that the point cloud data of the target object can be completed.

4. The method according to claim 1 or 2, characterized in that, If the multiple objects stacked within the target container are different objects, the method further includes: Receive a request to perform a grabbing operation on the target object in the target container; The target detection model is invoked to identify the target object from the RGB image, so as to complete the point cloud data of the target object.

5. A method for grasping an object, characterized in that, Applied to the server side, this method is used to perform a grasping operation on target objects piled up inside a target container using an intelligent grasping device. The method includes: A single-frame scan of a target container is performed by a 3D camera, outputting initial point cloud data and RGB images. When multiple objects are piled up inside the target container, and a grasping operation needs to be performed on the target object in the target container, the target container is placed in the acquisition area of ​​the 3D camera for single-frame scan. Using the RGB image as input, a three-dimensional information prediction model is invoked. The model segments the target object from the RGB image and performs three-dimensional information prediction on the target object, outputting the predicted point cloud data of the target object. Based on the predicted point cloud data, the partial point cloud data corresponding to the target object in the initial point cloud data is completed to obtain the point cloud data of the target object after completion. Based on the completed point cloud data, the surface of the target object is subjected to regional planar fitting to determine the target partition with the highest flatness, and the intelligent grasping device is controlled to grasp the target object through the target partition.

6. The method according to claim 5, characterized in that, The method further includes: If the grasping result returned by the intelligent grasping device indicates that the grasping of the target object failed and the grasping process caused the stacking method of the objects in the target container to change, then the 3D camera is controlled to perform a single-frame scan of the target container again, so as to realize the intelligent grasping of the target object based on the new initial point cloud data and the new RGB image.

7. The method according to claim 5, characterized in that, Set up a work area for performing object grasping operations, and deploy a single 3D camera in the work area, with the acquisition area of ​​the 3D camera covering the work area; When it is determined that a target container is placed in the work area and a grasping operation needs to be performed on the target object in the target container, the 3D camera is controlled to perform a single-frame scan of the target container in the acquisition area.

8. The method according to claim 5, characterized in that, Set up a work area for performing object grasping operations, and fix a single 3D camera on an intelligent grasping device. The intelligent grasping device is moved to the work area so that the acquisition area of ​​the 3D camera covers the work area. When it is determined that a target container is placed in the work area and a grasping operation needs to be performed on the target object in the target container, the 3D camera is controlled to perform a single-frame scan of the target container in the acquisition area.

9. A method for grasping an object, characterized in that, Applied to the server side, this method is used to perform product picking operations on target products stacked in a target container using intelligent grasping devices. The method includes: When it is determined that the target container is placed in the picking operation area and a product picking operation needs to be performed on the target product in the target container, a single deployed 3D camera is controlled to perform a single-frame scan of the target container to obtain the initial point cloud data and RGB image of the target container. The three-dimensional information prediction model is invoked, and the model performs image segmentation and three-dimensional information prediction on the RGB image, outputting the predicted point cloud data of the target product; Based on the predicted point cloud data, the partial point cloud data corresponding to the target product in the initial point cloud data is completed to obtain the completed point cloud data of the target product. Based on the completed point cloud data, the intelligent grasping device is controlled to grasp the target product from the target container, thereby realizing intelligent picking of the target product.

10. A method for grasping an object, characterized in that, Applied to the server side, this method is used to perform a product shelving operation on target products stacked in a target container using intelligent grasping equipment. The method includes: When it is determined that the target container is placed in the shelving operation area and a product shelving operation needs to be performed on the target product in the target container, a single deployed 3D camera is controlled to perform a single-frame scan of the target container to obtain the initial point cloud data and RGB image of the target container. The three-dimensional information prediction model is invoked, and the model performs image segmentation and three-dimensional information prediction on the RGB image, outputting the predicted point cloud data of the target product; Based on the predicted point cloud data, the partial point cloud data corresponding to the target product in the initial point cloud data is completed to obtain the completed point cloud data of the target product. Based on the completed point cloud data, the intelligent grasping device is controlled to grasp the target product from the target container and display the target product in the target location.

11. A method for grasping an object, characterized in that, Applied to the server side, this method is used to perform product packaging operations on target objects stacked inside a target container using intelligent grasping devices. The method includes: When it is determined that the target container is placed in the packaging operation area and a product packaging operation needs to be performed on the target product in the target container, a single deployed 3D camera is controlled to perform a single-frame scan of the target container to obtain the initial point cloud data and RGB image of the target container. The three-dimensional information prediction model is invoked, and the model performs image segmentation and three-dimensional information prediction on the RGB image, outputting the predicted point cloud data of the target product; Based on the predicted point cloud data, the partial point cloud data corresponding to the target product in the initial point cloud data is completed to obtain the completed point cloud data of the target product. Based on the completed point cloud data, the intelligent grasping device is controlled to grasp the target product from the target container and place the target product into the target packaging box.

12. A training method for a three-dimensional information prediction model, characterized in that, The method includes: Obtain a sample RGB image and a sample 3D model of a preset object, wherein the sample RGB image includes image content generated by single-frame scanning of the preset object; Construct an initial model for 3D information prediction; Using the sample RGB image as input, the initial model is trained so that the initial model can perform image segmentation on the sample RGB image and predict the three-dimensional information of the preset object, and output the predicted point cloud data of the preset object. The predicted point cloud data is compared with the baseline point cloud data determined based on the sample 3D model of the preset object to obtain the prediction accuracy of the initial model during the model training process. The model effect is verified based on the prediction accuracy, and if the model effect verification is successful, a three-dimensional information prediction model is obtained so as to complete the point cloud data of objects collected by a single 3D camera.

13. The method according to claim 12, characterized in that, The method further includes: Obtain knowledge information related to the three-dimensional information of the object, and optimize the three-dimensional information prediction model.

14. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the program performs the steps of the method described in any one of claims 1 to 13.

15. An electronic device, characterized in that, include: One or more processors; as well as A memory associated with the one or more processors, the memory being used to store program instructions that, when read and executed by the one or more processors, perform the steps of the method according to any one of claims 1 to 13.

16. A computer program product comprising a computer program / computer-executable instructions, characterized in that, When the computer program / computer-executable instructions are executed by a processor in an electronic device, they implement the steps of the method according to any one of claims 1 to 13.