Method and system for determining occlusion within a camera's field of view

The method addresses occlusion-related errors in robot-camera interactions by determining occlusion within the camera's field of view, improving object detection accuracy and robot operation precision.

DE102020205896B4Undetermined Publication Date: 2026-06-25MUJIN INC

Patent Information

Authority / Receiving Office
DE · DE
Patent Type
Patents
Current Assignee / Owner
MUJIN INC
Filing Date
2020-05-11
Publication Date
2026-06-25

Smart Images

  • Figure 00000000_0000_ABST
    Figure 00000000_0000_ABST
Patent Text Reader

Abstract

Computer system (110), comprising: a communication interface (113) configured to communicate with at least one camera, comprising a first camera (270) with a first camera field of view (272); a control circuit (111) configured, when a stack (250, 750) with multiple objects is located in the first camera field of view (272), to: receive camera data generated by the at least one camera, wherein the camera data describes a stack structure for the stack (250, 750), the stack structure being formed from at least one object structure for a first object of the multiple objects;Identify, based on camera data generated by the at least one camera, a target feature (251B, 251C, 751B) of the object structure or a target feature (251B, 251C, 751B) located on the object structure, wherein the target feature (251B, 251C, 751B) is at least one of the following: a corner (251B) of the object structure, an edge (251C) of the object structure, a visual feature (751B) located on a surface (251A, 751A) of the object structure, or an outline of the surface (251A, 751A) of the object structure; Determine a two-dimensional, 2D, region (520, 620, 720) that is coplanar with the target feature (251B, 251C, 751B) and whose boundary is defined by the target feature (251B, 251C, 751B); Determining a three-dimensional, 3D, region (530, 630, 730) defined by connecting a location of the first camera (270) and the boundary of the 2D region (520, 620, 720), wherein the 3D region (530, 630, 730) is part of the first camera's field of view (272);Determine, based on the camera data and the 3D region (530, 630, 730), a size of an occlusion region (570, 670, 770), where the occlusion region (570, 670, 770) is a region of the stack structure located between the target feature (251B, 251C, 751B) and the at least one camera (270) and within the 3D region (530, 630, 730); Determine a value of an object detection confidence parameter based on the size of the occlusion region (570, 670, 770); and performing an operation to control robot interaction with the stack structure, wherein the operation is performed based on the value of the object recognition confidence parameter; wherein the first camera (270) with which the communication interface (113) is configured to communicate is a 3D camera configured to generate, as part of the camera data, several 3D data points that specify corresponding depth values ​​for locations on one or more faces of the stack structure.
Need to check novelty before this filing date? Find Prior Art

Description

AREA OF INVENTION The present invention relates to a method and system for determining an occlusion or shutter within a camera field of view. GENERAL STATE OF THE ART With increasing automation, robots are being used in more environments, such as warehousing and manufacturing. For example, robots can be used to load or unload items onto or from pallets in a warehouse, or to pick up items from a conveyor belt in a factory. The robot's movement can be predefined or based on input, such as camera data generated by a camera in the warehouse or factory. This camera data might represent the location and / or structure of an object relative to a gripper or other component of the robot configured to interact with the object. Relying on camera data to guide the robot's interaction with the object can be subject to errors caused by noise and / or inaccuracies in the object's detection by the camera data. DE 11 2019 000 049 T5 describes a method comprising: applying sensor data representative of the field of view of at least one sensor of a vehicle in an environment to a first neural network; receiving detected object data representative of the positions of detected objects in the field of view from the first neural network; generating a cluster of the detected objects based at least partially on the positions; determining features for the cluster for use as inputs to a second neural network; receiving a confidence value calculated by the second neural network based at least partially on the inputs, where the confidence value represents a probability that the cluster corresponds to an object in the environment within the field of view of the at least one sensor. SUMMARY One aspect of the embodiments herein relates to a computer system, a method, and / or a non-volatile, computer-readable medium containing instructions for determining occlusion. The computer system may, for example, include a communication interface and a control circuit. The communication interface may be configured to communicate with at least one camera, comprising a first camera with a first camera field of view. The control circuit may be configured to execute the occlusion-determining method when a stack of multiple objects is present in the first camera field of view. In some cases, the control circuit may execute the method by carrying out instructions stored on the non-volatile, computer-readable medium.The method may include receiving camera data generated by the at least one camera, wherein the camera data describes a stack structure for the stack and the stack structure is formed from at least one object structure for a first object of the multiple objects; identifying, based on the camera data generated by the at least one camera, a target feature of the object structure or that is arranged on the object structure (e.g., a corner of the object structure, an edge of the object structure, a visual feature arranged on a surface of the object structure, or an outline of the surface of the object structure).The method can further include determining a two-dimensional (2D) region that is coplanar with the target feature and whose boundary surrounds the target feature; determining a three-dimensional (3D) region defined by connecting a location of the first camera and the boundary of the 2D region, wherein the 3D region is part of the first camera's field of view; and determining, based on the camera data and the 3D region, the size of an occlusion region, wherein the occlusion region is a region of the stack structure located between the target feature and the at least one camera and within the 3D region. In one embodiment, the control circuit can determine a value of an object detection confidence parameter based on the size of the occlusion region.In one embodiment, the control circuit can perform an operation to control robot interaction with the stack structure, wherein the operation is performed based on the value of the object recognition confidence parameter. BRIEF DESCRIPTION OF THE DRAWINGS The foregoing and other features, tasks, and advantages of the invention will become apparent from the following description of embodiments as illustrated in the accompanying drawings. The accompanying drawings, which are included herein and form part of the description, further serve to explain the principles of the embodiments and to enable a person skilled in the art to appropriately manufacture and use the embodiments. The drawings are not to scale. Figures 1A to 1B show, according to one embodiment herein, block diagrams of systems in which camera occlusion can be detected. Figure 1C shows, according to one embodiment herein, a block diagram of a robot operating system in which camera occlusion can be detected.Figure 2 shows, according to one embodiment herein, a block diagram of a computer system configured to determine occlusion within a camera field of view. Figures 3A and 3B show, according to one embodiment herein, systems in which camera occlusion for an object structure within a camera field of view can be detected. Figures 4A to 4B show, according to one embodiment herein, a flowchart of an exemplary method for determining occlusion within a camera field of view. Figures 5A to 5D show, according to one embodiment thereof, an embodiment in which occlusion is determined based on a target feature that is a corner of an object structure. Figures 6A to 6D show, according to one embodiment thereof, an embodiment in which occlusion is determined based on a target feature that is an edge of an object structure.Figures 7A to 7D show an embodiment in which occlusion is determined based on a target feature which is a visual feature located on a surface of an object structure or an outline of the surface. DETAILED DESCRIPTION The following detailed description is by its nature merely exemplary and is not intended to limit the invention or the application and uses thereof. Furthermore, there is no intention to be bound by any express or implied theory presented in the preceding technical field, the general prior art, the abstract, or in the following detailed description. The embodiments described herein relate to the determination of occlusion or occlusion within a camera's field of view, such as by detecting the occlusion within the camera's field of view, assessing the degree of occlusion or occlusion using the camera's field of view, and / or any other aspect of occlusion or occlusion analysis. Occlusion or occlusion may, for example, refer to a situation in which a location within the camera's field of view, or part of a region surrounding that location, is blocked or about to be blocked from being viewed or otherwise captured by the camera. In some cases, the occlusion may result from an object, or part of it, blocking or nearly blocking a line of sight from the camera to that location, or from the camera to the part of the region surrounding that location. For example, the occluded object may be located between the camera and the occluded area.The target feature is located at the obscured location or the occluded or hidden portion of the region surrounding that location. In some cases, a target feature may be located at the location or in the region surrounding it. The target feature may be, for example, a feature of the region used to perform object detection and may be used, for example, to plan robot interaction with a structure in that region. The target feature may be, for example, a corner or edge of an object, or a face of an object in that region, or it may be a visual feature located on the face. The presence of the obscuring object may impair the ability to identify the target feature and / or affect the accuracy of such identification.Accordingly, one aspect of the embodiments herein relates to the detection or other assessment of an occlusion that may affect a target feature or any other feature within a camera's field of view. In one embodiment, determining the occlusion may involve determining the size of an occlusion or occlusion region. The occlusion region may, for example, be a region of an occlusion object located between the camera and the target feature, or between the camera and a portion of a region surrounding the target feature. The occlusion region may, for example, be a 2D region of the occlusion object (which may be a first 2D region) located within a 3D region, the 3D region being defined by connecting a location of the camera to a 2D region surrounding the target feature (which may be a second 2D region). In one embodiment, determining the occlusion may involve determining the size of an occlusion region, which is described in more detail below. In some cases, the size of the occlusion region (and / or the occlusion region) may be used, for example, to determine the occlusion area.To determine a confidence level for object detection, which includes or will include the target feature. In some cases, the confidence level can be determined in such a way that it is inversely proportional to the size of the occlusion region and / or the size of the occluded region. In one embodiment, occlusion analysis can be used, for example, to determine whether object detection should be re-executed or to adjust the way in which object detection is performed. If the confidence level for an object detection operation falls below a defined threshold (e.g., a defined confidence level), the object detection operation can be re-executed. The confidence level for object detection may fall below the defined threshold, for example, as a result of the occlusion or occlusion amount being too high, such as when the ratio between the size of the occlusion region and the size of the 2D region surrounding the target feature exceeds a defined occlusion or occlusion threshold, or when the size of the occlusion region exceeds the defined occlusion threshold.The defined occlusion threshold may, in some cases, be the inverse of the defined confidence threshold and / or inversely proportional to it. In some cases, an assessment of the occlusion can be used to plan the robot interaction with the occlusion object, the target feature, an object on which the target feature is located, or another object or structure thereof. For example, the robot interaction can be planned such that the occlusion object and / or the object on which the target feature is located is moved, thereby reducing the amount of occlusion or, in particular, the size of the occlusion region, as described in more detail below. Fig. 1A illustrates a block diagram of a system 100 for detecting and / or assessing occlusion within a camera's field of view. In one embodiment, the system 100 may be located in a warehouse, a production facility, or other premises. The system 100 may, for example, be an image recognition system used to generate camera data (e.g., images) of objects within the warehouse or production facility. In some cases, the image recognition system may be part of, or communicate with, a robot control system that uses the camera data or information derived from the camera data to generate, for example, motion commands that cause a robot interaction, in which a robot interacts with the objects. As shown in Fig. 1A, the system 100 can comprise a computer system 110 and a camera 170 (which can also be referred to as the first camera 170). In one embodiment, the camera 170 can be configured to generate or otherwise acquire camera data that captures a scene within a field of view of the camera 170 (also referred to as the camera field of view). For example, the camera 170 can be configured to photograph the scene or, in particular, to photograph objects within the camera field of view. In one embodiment, the camera 170 can be a three-dimensional (3D) camera, a two-dimensional (2D) camera, or any combination thereof (the term "or" is used herein to refer to "and / or"). In one embodiment, a 3D camera (which may also be called a depth-sensing camera or structure-sensing device) can be a camera configured to generate camera data that includes 3D information about a scene in the camera's field of view, wherein the 3D information can include depth information for the scene. In particular, the depth information can specify corresponding depth values, relative to the 3D camera, of locations on one or more objects in the camera's field of view. In some cases, the 3D information can include multiple 3D data points, such as 3D coordinates, representing the locations on the one or more objects. For example, the multiple 3D data points can comprise a point cloud representing the locations on one or more faces of the one or more objects in the camera's field of view. In some cases, the 3D camera can, for example,a Time of Flight camera (TOF camera) or a structured light camera. In one embodiment, a 2D camera can be a camera configured to generate camera data that includes 2D information about a scene in the camera's field of view, wherein the 2D information can be a capture of the scene's appearance or otherwise represent it. For example, the 2D information can be a 2D image or another arrangement of pixels that captures or otherwise represents one or more objects in the camera's field of view. The 2D camera can be, for example, a color camera configured to generate a 2D color image, a grayscale camera configured to generate a 2D grayscale image, or any other type of 2D camera. In some cases, the computer system 110 of Fig. 1A can be configured to communicate with the camera 170. For example, the computer system 110 can be configured to control the camera 170. Specifically, the computer system 110 can be configured to generate a camera command that causes the camera 170 to produce camera data capturing a scene within a field of view of the camera 170 (also referred to as the camera's field of view), and it can be configured to communicate the camera command to the camera 170 via a wired or wireless connection. The same command can also cause the camera 170 to communicate the camera data back to the computer system 110, or more generally, to a non-volatile, computer-readable medium (e.g., a storage device) that the computer system 110 can access.Alternatively, the computer system 110 can generate another camera command that causes the camera 170 to communicate the camera data it has generated to the computer system 110 upon receiving the camera command. In one embodiment, the camera 170 can automatically generate camera data that captures or otherwise displays a scene in its camera field of view, either periodically or in response to a defined trigger condition, without requiring a camera command from the computer system 110. In such an embodiment, the camera 170 can also be configured to automatically communicate the camera data to the computer system 110, or more generally to a non-volatile, computer-readable medium accessible to the computer system 110, without requiring a camera command from the computer system 110. In one embodiment, the system 100 may comprise only a single camera. In another embodiment, the system 100 may comprise multiple cameras. Fig. 1B, for example, depicts a system 100A, which may be an embodiment of the system 100 comprising a camera 170 and a camera 180, which may also be referred to as a first camera 170 and a second camera 180. In one example, the first camera 170 may be a 3D camera, while the second camera 180 may be a 2D camera, or vice versa. In some implementations, the computer system 110 may be configured to control the second camera 180 in a similar or identical manner to how it controls the first camera 170, as described above with reference to Fig. 1A. In some cases, different camera commands may be sent to the first camera 170 and the second camera 180 accordingly.In some cases, the same camera command can be sent to the first camera 170 and the second camera 180. In some cases, the first camera 170 and the second camera 180 can be positioned such that the field of view of the first camera 170 substantially overlaps with the field of view of the second camera 180. Therefore, the first camera 170 and the second camera 180 can be positioned to generate camera data (e.g., an image and / or a point cloud) that represent the same area or a substantially similar area. In some cases, the first camera 170 of Fig. 1B may have a fixed location and / or orientation relative to the second camera 180. For example, the first camera 170 may be fixed to the camera 180, either directly or indirectly. Such an arrangement may cause an angle and / or distance between the first camera 170 and the camera 180 to remain constant. In some cases, such an arrangement may also cause a spatial relationship between a coordinate system of the first camera 170 and a coordinate system of the second camera 180 to remain constant. As stated above, in some cases the system 100 / 100A can be a robot operating system or part of a robot operating system. For example, Fig. 1C depicts a system 100B, which can be an embodiment of the system 100 / 100A comprising a robot 150 communicating with the computer system 110. In some cases, the computer system 110 can be configured to use the images or other camera data generated by the first camera 170 and / or the second camera 180 to control the operation of the robot 150 or to implement / execute commands to control the operation of the robot 150. For example, the computer system 110 can be configured to control the robot 150 in such a way that it performs a depalletizing operation in which the robot 150 unloads a stack of boxes or other objects in a warehouse based on camera data generated by the first camera 170 and / or the second camera 180.In one embodiment, the computer system 110 can be configured to communicate with the robot 150 and the first camera 170 and / or the second camera 180 via wired and / or wireless communication. For example, the computer system 110 can be configured to communicate with the robot 150, the first camera 170, and / or the second camera 180 via an RS-232 interface, a Universal Serial Bus (USB) interface, an Ethernet interface, a Bluetooth® interface, an IEEE 802.11 interface, or any combination thereof. In another embodiment, the computer system 110 can be configured to communicate with the robot 150 and / or the camera 170 / 180 via a local computer bus, such as a Peripheral Component Interconnect (PCI) bus. In one embodiment, the computer system 110 and the camera 170 / 180 are located in the same premises (e.g., a warehouse).In one embodiment, the computer system 110 can be located remotely from the robot 150 and / or the camera 170 / 180 and be configured to communicate with the robot 150 and / or the camera 170 / 180 via a network connection (e.g. a local area network (LAN) connection). In one embodiment, the computer system 110 of Fig. 1C can be separate from the robot 150 and communicate with the robot 150 via the wireless or wired connection described above. For example, the computer system 110 can be a standalone computer configured to communicate with the robot 150 and the camera 170 / 180 via a wired or wireless connection. In another embodiment, the computer system 110 of Fig. 1C can be an integral part of the robot 150 and communicate with other components of the robot 150 via the local computer bus described above. In some cases, the computer system 110 can be an associated control system (also referred to as an associated controller) that controls only the robot 150. In other cases, the computer system 110 can be configured to control multiple robots, including the robot 150. Fig. 2 shows a block diagram of the computer system 110. As illustrated in the block diagram, the computer system 110 can comprise a control circuit 111, a communication interface 113, and a non-volatile, computer-readable medium 115 (e.g., memory or other storage device). In one embodiment, the control circuit 111 can comprise one or more processors, a programmable logic circuit (PLC) or programmable logic assembly (PLA), a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), or any other control circuit. In one embodiment, the communication interface 113 can comprise one or more components configured to communicate with the camera 170 / 180 of Fig. 1A-C and / or the robot 150 of Fig. 1C. The communication interface 113 can, for example, comprise a communication circuit configured to perform communication over a wired or wireless protocol. The communication circuit can, for example, comprise an RS-232 port controller, a USB controller, an Ethernet controller, an IEEE 802.11 controller, a Bluetooth® controller, a PCI bus controller, any other communication circuit, or a combination thereof. In one embodiment, the non-volatile, computer-readable medium 115 may comprise an information storage device, such as computer memory. The computer memory may include, for example, dynamic random access memory (DRAM), integrated solid-state memory, and / or a hard disk drive (HDD). In some cases, the determination of occlusion within the camera's field of view may be performed by computer-executable instructions (e.g., computer code) stored on the non-volatile, computer-readable medium 115. In such cases, the control circuit 111 may comprise one or more processors configured to execute the computer-executable instructions for detecting occlusion in the camera's field of view (e.g., the steps illustrated in Figures 4A and 4B).In one embodiment, the non-volatile, computer-readable medium can be configured to store camera data generated by and received directly or indirectly from the camera 170 / 180. In such an embodiment, the computer system 110 can be configured to receive the camera data from the non-volatile, computer-readable medium 115 or otherwise access it. In some cases, the non-volatile, computer-readable medium 115 can store an object recognition template, which is described in more detail below. Fig. 3A shows an example in which the computer system 110 is part of a system 200 for detecting occlusion in a camera's field of view. The system 200 can be an embodiment of the system 100 / 100A and comprises the computer system 110 of Figs. 1A-1B and a camera 270, which can be an embodiment of the camera 170. As shown in Fig. 3A, the camera 270 (which can also be referred to as the first camera 270) can communicate with the computer system 110 and have a field of view (also referred to as the camera field of view) 272. The camera field of view 272 can form an imaginary pyramid, as illustrated in Fig. 3A, or another shape, such as a cone. The apex of the pyramid, cone, or other shape of the camera field of view 272 can be located on the camera 270, e.g., B. at a location of a lens or image sensor (if present) of camera 270. In one embodiment, the camera 270 can be configured to generate camera data that captures one or more objects, or more generally, depicts one or more objects that are at least partially within the camera's field of view 272. Fig. 3A illustrates, for example, a stack 250 of several objects 251-253 that are at least partially within the camera's field of view 272. In the example of Fig. 3A, the several objects comprise a first object 251, a second object 252, and a third object 253. The objects 251-253 can be, for example, boxes to be depalletized, or any other objects. In one embodiment, the stack 250 can comprise several layers. The stack 250 can, for example, comprise a first layer of the first object 251 and the third object 253, and a second layer of the second object 252. An object (e.g. 252) of the second layer can be stacked on top of at least one object (e.g. 253) of the first layer (e.g.The second object 252 can be stacked on top of the third object 253. In such an arrangement, one or more objects (e.g., 252) of the second layer can be located closer to the camera 270 relative to a distance between the camera 270 and one or more objects (e.g., 251) of the first layer, since the second layer is positioned between the camera 270 and the first layer. For example, Fig. 3A illustrates a depth value of Z1 relative to the camera 270 for a surface 252A (e.g., top surface) of the second object 252, where the depth value can refer to a distance between the camera 270 and the surface 252A. The depth value of Z1 for surface 251A can be smaller than a depth value of Z2, which can be a depth value of surface 251A (e.g., upper surface) for the first object and / or a depth value of surface 253A of the third object 253. Fig. 3A further illustrates a depth value of Z3, which can be a depth value of a surface of, for example,a floor on which the stack 250 is placed, or another layer (e.g. a lower layer) of the stack 250 that is further away from the camera 270. In one embodiment, the camera data generated by the camera 270 can describe a structure of the stack 250, which can also be referred to as the stack structure for the stack 250. The stack structure can consist of at least one structure of an object (also referred to as an object structure) of the stack. For example, the stack structure for the stack 250 can consist of at least one object structure for the first object 251 (referring to a structure of the first object 251), one object structure for the second object 252, and one object structure for the third object 253. In some cases, the camera data can describe the stack structure with 3D information that describes the corresponding depth values ​​of the locations on one or more faces of the stack 250 relative to the camera 270. The 3D information can, for example, consist of several 3D data points (e.g.,The 3D data points may include coordinates that describe the corresponding locations on one or more faces (e.g., 251A, 252A, 253A) of stack 250, or more specifically, one or more faces of the stack structure for stack 250. Since stack 250 is composed of objects 251-253, the multiple 3D data points may also describe the corresponding locations on one or more faces of objects 251-253, or more specifically, their object structures. The multiple 3D data points may, for example, specify a depth value of Z1, Z2, or Z3 for these locations. In some cases, the camera data may include 2D information, such as an image that captures or otherwise represents an appearance of stack 250. The image may show a feature (e.g., a corner or edge) of the stack structure and / or an object structure, as further described below. In one embodiment, the system 200 can have two or more cameras. Fig. 3B shows, for example, an exemplary system 200A, which can be an embodiment of the system 100 / 100A / 100B / 200. The system 200A comprises the camera 270 and a camera 280, which can also be referred to as the first camera 270 and the second camera 280. The first camera 270 can be an embodiment of the first camera 170 of Figs. 1A-1C, and the second camera 280 can be an embodiment of the second camera 180 of Figs. 1B-1C. As in Fig. 3A, the first camera 270 can have a camera field of view 272. The second camera 280 can have a second camera field of view 282, which overlaps at least partially with the camera field of view 272.In some aspects, the camera field of view 272 of the first camera 270 can substantially overlap with the camera field of view 282 of the second camera 280, so that the first camera 270 and the second camera 280 can each generate camera data that captures substantially the same part of the stack structure for the stack 250. In one embodiment, the first camera 270 and the second camera 280 can be different camera types. For example, the first camera 270 can be a 3D camera, while the second camera 280 can be a 2D camera, or vice versa. In other embodiments, the first camera 270 and the second camera 280 can be the same camera type. In one embodiment, the computer system 110 can be configured to access or otherwise receive information describing a spatial relationship (e.g., relative location and orientation) between the first camera 270 and the second camera 280. This information may, for example, have been previously determined by the computer system 110 (e.g., through a stereo camera calibration process), or it may have been previously determined manually and stored in the non-volatile, computer-readable medium 115 of the computer system 110 or in another device. As an example, the information may be a transformation matrix describing a translation and rotation between a coordinate system of the first camera 270 and a coordinate system of the second camera 280.In some cases, computer system 110 can be configured to use the information regarding the spatial relationship between the first camera 270 and the second camera 280 in such a way that camera data generated by the first camera 170 and camera data generated by the second camera 280 are placed in a common frame of reference, such as a common coordinate system. For example, if the computer system uses the camera data from the second camera 280 to determine the location of a feature of the stack structure, as described in more detail below, computer system 110 can be configured to compensate for any difference between a frame of reference from camera 280 and the common frame of reference. In some cases, the common frame of reference can be a frame of reference from one of the cameras, such as the first camera 270. In one embodiment, the first camera 270 and the second camera 280 can have a substantially fixed spatial relationship. Fig. 3B illustrates, for example, a mounting structure 202 to which both the first camera 270 and the second camera 280 are fixedly attached. This fixed attachment can ensure that the first camera 270 and the second camera 280 are fixed with respect to their relative location and orientation to each other. In one embodiment, a location in the field of view of a camera (e.g., 272) can be obscured from view because a line of sight between that location and the camera (e.g., 270) may be blocked, or close to being blocked, by an object or part of it within the camera's field of view. In other words, the object or part of it may obscure, or be close to obscuring, that location or part of a region surrounding that location from the camera. In one embodiment, the object may prevent light or any other signal that could be used to acquire information about that location from reaching the camera directly, or it may significantly distort the signal. In Figures 3A and 3B, part of the stack 250, or more specifically the stack structure, may be obscured relative to the camera 270 by another part of the stack structure. For example, as shown in Figure 3B,As shown in Figure 3B, locations in region 251A-1 on surface 251A of the first object 251, or in region 253A-1 on surface 253A1 of the third object 253A, are obscured from camera 270 by the second object 252, or more specifically by a portion thereof occupied by region 252A-1. In some cases, the obscuration may be due to the positioning of camera 270 and the objects 251-253 of the stack 250 relative to camera 270 and to each other. The obscuration may result in the camera data generated by camera 270 providing an incomplete description of the stack 250, or more specifically, its stack structure. If the camera data generated by camera 270 includes multiple 3D data points, these 3D data points may, for example, provide little to no information about region 251A-1 and region 253A-1.If the camera data generated by camera 270 includes a 2D image, the 2D image may not show or may otherwise depict region 251A-1 and region 253A-1. While the preceding discussion relates to the occlusion of a location relative to camera 270, one or more locations of the stack structure for stack 250 in Fig. 3B may also be occluded relative to camera 280. In one embodiment, the occlusion of one or more locations within a camera's field of view can affect robot interaction with objects in the field of view, since robot interaction may depend on camera data that describes, for example, the location, size, and / or orientation of the objects relative to a robot. In some cases, robot interaction may involve performing object detection to identify the objects in the field of view, and occlusion may impair the accuracy of object detection. Therefore, some aspects of the embodiments described herein relate to the detection or other determination of occlusion within a camera's field of view. Such determination may, for example,It can be used to evaluate object detection that may have been performed while occlusion was present in the camera's field of view, to control how object detection is performed, and / or to control robot interaction with objects in the camera's field of view. Figures 4A and 4B show an example of Method 400 for determining occlusion within a camera's field of view. Method 400 can be executed by a computer system, such as the control circuit 111 of computer system 110 shown in Figures 1A to 1C and 2. In one embodiment, the control circuit 111 can be configured to execute method 400 when the communication interface 113 of the computer system 110 is communicating with at least one camera, wherein the at least one camera comprises a first camera (e.g., 170 / 270) with a first camera field of view (e.g., 272). Method 400 can be executed, for example, for the situations shown in Figures 3A, 3B, 5A-5D, 6A-6D, and 7A-7C, wherein the first camera 270, the second camera 280, and the computer system 110 are communicating with the cameras 270 / 280. In another example, method 400 can be executed for situations in which the at least one camera comprises the first camera 270 and the second camera 280 is not present. In one embodiment, method 400 can also be executed when a stack of several objects is located in a first camera field of view (e.g., 272) of the first camera (e.g., 270). For example, the control circuit 111 can execute method 400 when, for example, the stack 250 of objects 251-253 of Fig. 3A, Fig. 3B, Fig. 5A, and Fig. 6A is located in the first camera field of view 272 of the first camera 270, or when a stack 750 of objects 751-753 of Fig. 7A is located in the first camera field of view 272 of the first camera 270. As indicated above, the stack structure for stack 250 in Fig. 5A and Fig. 6A can be formed from at least one object structure for the first object 251 (where the object structure refers to a structure of the object), as well as from an object structure for the second object 252 and the object structure for the third object 253. Similarly, the stack structure for stack 750 of Fig.7A shall be formed from at least one object structure for a first object 751, as well as from an object structure for a second object 752 and an object structure for a third object 753. In one embodiment, the method 400 may begin with, or otherwise include, a step 402 in which the control circuit 111 receives camera data generated by at least one camera comprising a first camera (e.g., 170 / 270) with a first camera field of view (e.g., 272), wherein the camera data describes a stack structure for the stack (e.g., 250 / 750 of Fig. 5A, Fig. 6A, and Fig. 7A), the stack structure being a structure of the stack. In some scenarios, the at least one camera may comprise the first camera (e.g., 170 / 270) and a second camera (e.g., 180 / 280). In such scenarios, the camera data received by the control circuit in step 402 can include camera data generated by both the first camera (e.g. 170 / 270) and the second camera (e.g. 180 / 280). As stated above, the camera data received by the control circuit 111 in step 402 may, in some cases, include 3D information that encompasses depth information regarding a scene in the first camera's field of view. The depth information may, for example, specify depth values ​​of corresponding locations on the stack structure, where the depth values ​​may be relative to the first camera (e.g., 270). In some cases, the depth information may comprise multiple 3D data points describing the depth values. For example, each of the multiple 3D data points may be a 3D coordinate, such as an [XYZ]T coordinate, describing a corresponding location on a face of the stack (which may also be referred to as the face of the stack structure). In this example, the Z component of the 3D data point may be a depth value of the corresponding location represented by the 3D data point.In some cases, the multiple 3D data points can form a point cloud that describes the corresponding locations on one or more faces of the stack structure for the stack (e.g., 250 / 750). As further explained above, the camera data received in step 402 can, in some cases, include a 2D image of the stack (e.g., 250 / 750) or, more specifically, the stack structure. The 2D image can, for example, include multiple pixels corresponding to the pixel coordinates [uv]Tent. In one embodiment, the method 400 can include a step 404 in which the control circuit 111 can identify a target feature of the object structure for an object (which can also be referred to as the target feature of the object) or that is arranged thereon, based on the camera data generated by the at least one camera. For example, the object structure can be a structure of the first object 251 of Fig. 5A and Fig. 6A or a structure of the first object 751 of Fig. 7A. In one embodiment, the target feature can be any feature (e.g., characteristic) of the object used by the computer system 110 to perform object recognition for an object (e.g., 251 of Fig. 5A or Fig. 751 of Fig. 7A) and / or to plan the robot's interaction with the object. As described in more detail below, in some cases the control circuit 111 can be configured to identify the target feature based on information in an object recognition template, which may describe, for example, a size (e.g., dimensions) of the object structure, a shape of the object structure, and / or a visual feature that appears on a surface of the object structure. In one embodiment, the target feature of the object structure (which may also be referred to as the target feature of an object) or that is arranged thereon may be at least one of the following: a corner of the object structure for an object, an edge of the object structure, a visual feature arranged on a surface of the object structure, or an outline of the surface of the object structure. The aforementioned features may also be referred to as a corner of the object, an edge of the object, a visual feature arranged on the surface of the object, or an outline of the surface of the object. In particular, some embodiments for identifying the target feature in step 404 may involve identifying a corner of an object structure as the target feature, such as a corner 251B of the object structure for the first object 251 in Fig. 5A (which may also be referred to as corner 251B of the first object 251). In some cases, the control circuit 111 of the computer system 110 may be configured to identify corner 251B based on 3D information from the camera data generated by the first camera 270 and / or the camera data generated by the second camera 280 of Fig. 5A. Fig. 5B provides an example of 3D information comprising multiple 3D data points that indicate the corresponding depth values ​​of locations on one or more surfaces in the camera field of view 272 (or 282) of camera 270 (or camera 280) of Fig. 3A, Fig. 3B and Fig. 5A.The multiple 3D data points can, for example, comprise a first set of 3D data points specifying a depth value of Z1 for each of one or more locations (indicated by the black circles in Fig. 5B) on surface 252A of the second object 252 relative to the camera 270 / 280. The multiple 3D data points can further comprise a second set of data points specifying a depth value of Z2 for each of one or more locations (indicated by the white circles in Fig. 5B) on surface 251A of the first object 251 and surface 253A of the third object 253 relative to the camera 270 / 280. The multiple 3D data points can further comprise a third set of 3D data points specifying a depth value of Z3 for each of one or more locations from one or more additional surfaces, which may, for example, correspond to a floor covering the stack 250 of Fig. 5B.5A surrounds, or any other surface on which the first object 251 and the third object 253 are arranged. As stated above, in some embodiments each of the multiple 3D data points can be a 3D coordinate, such as an [XYZ] coordinate. In such an embodiment, the depth value can be specified, for example, by a Z-component of the 3D coordinate. In one embodiment, the control circuit 111 can be configured to identify corner 251B based on the identification of a convex corner or a unioned corner based on the multiple 3D data points of Fig. 5B. The identification of a convex corner or a unioned corner is described in more detail in U.S. Patent Application No. 16 / 578,900 entitled "Method and Computing System for Object Identification," which is incorporated herein by reference in its entirety. In one embodiment, the control circuit 111 can be configured to identify corner 251B by identifying 3D data points that represent a region having essentially a first depth value for one quarter of the region and a second depth value (e.g., a greater depth value) for the remaining three-quarters of the region. In some cases, corner 251B can be identified as a center of the region. In one embodiment, identifying corner 251B may involve determining its location, such as determining a coordinate [XYZ]T based on the camera data, as illustrated in Fig. 5B. In some cases, the control circuit 111 may determine the coordinate relative to a common reference frame, as described above. For example, the common reference frame may be a coordinate system of the first camera 270 of Fig. 5A. If the coordinate [XYZ]T is based on the camera data generated by the first camera 270, the coordinate may, in some cases, already be in the common reference frame. In such cases, the coordinate [XYZ]T may be used for other steps of the procedure 400, such as step 406. If, in some cases, the coordinate [XYZ]T is initially based on the camera data generated by the second camera 280 of Fig. 5A, the coordinate may initially be relative to a reference frame (e.g.,The coordinate system of the second camera 280 can be expressed. In such a situation, the control circuit 111 can be configured to generate a modified coordinate [X' Y' Z']T that accounts for a difference in location and / or orientation between a reference frame of the first camera 270 and the reference frame of the second camera 280. For example, the control circuit 111 can be configured to generate the modified coordinate [X' Y' Z']T by applying a transformation matrix to the coordinate [XYZ]T, where the transformation matrix describes the spatial relationship between the first camera 270 and the second camera 280, as described above. In some cases, the first camera 270 and the second camera 280 may be coplanar, which may result in Z' being equal to Z. In the above example, the modified coordinate [X' Y' Z']T can be used in other steps of procedure 400, such as step 406. In one embodiment, the control circuit 111 of the computer system 110 can be configured to identify corner 251B of Fig. 5A based on 2D information that may originate from camera data generated by the first camera 270 or from camera data generated by the second camera 280 of Fig. 5A. For example, the second camera 280 may, in some cases, be a 2D camera configured to generate a 2D image. Fig. 5C shows an example of a 2D image of the stack 250 of Fig. 5A. In this example, the 2D image shows surface 251A of the object structure for the first object 251, surface 252A of the object structure for the second object 252, and surface 253A of the object structure for the third object 253. The control circuit 111 can be configured, for example, to... B. to identify corner 251B from the 2D image of Fig. 5C, such asby determining a pixel coordinate [uv]T at which corner 251B appears in the image and converting the pixel coordinate into a 3D coordinate [XYZ]T. In some cases, the pixel coordinate [uv]T can be identified as the intersection of two lines in the image, where the two lines represent two corresponding outer edges of the object structure for the first object 251. In one embodiment, the conversion of the pixel coordinate [uv]T to the 3D coordinate [XYZ]Tz can be based, for example, on an inverse projection matrix K-1 (and / or any other camera calibration information) of the camera that produced the 2D image, such as the first camera 270 or the second camera 280. In some cases, the conversion can also be based on 3D information produced by the same camera or a different camera.In some cases, the control circuit 111 can be configured to further adjust the 3D coordinate [XYZ]T to express the 3D coordinate in a common reference frame, as described above. As stated above, some embodiments of the target feature identification in step 404 may involve identifying an edge of an object structure as the target feature, such as the edge 251C of the object structure for the first object 251 in Fig. 6A (which may also be referred to as the edge 251C of the first object 251). In some cases, the control circuit 111 of the computer system 110 may be configured to identify the edge 251C based on 3D information that may be derived from the camera data generated by the first camera 270 and / or from the camera data generated by the second camera 280 in Fig. 6A. For example, Fig. 6B presents 3D information comprising multiple 3D data points. The multiple 3D data points shown in Fig. 6B may be essentially the same as those in Fig. 5B. In the example of Fig.In Fig. 6B, the control circuit 111 can be configured to identify the edge 251C of the object structure for the first object 251 based on the 3D information and an object recognition template. The object recognition template can facilitate object recognition for the first object 251 by, for example, describing a size of the object structure for the first object 251 and / or other features of the first object 251 that can be used to perform object recognition. The object recognition template can, for example, specify that the object structure for the first object 251 has a length L and a width W. In such an example, the control circuit 111 can be configured to identify the edge 251C in Fig. 6B by, for example, identifying an outer edge 251D of Fig. 6A and Fig. 6B based on the multiple 3D data points and identifying the edge 251C as a set of locations (e.g.,[X1Y1Z1]T and [X2Y2Z2]T), which are offset from the outer edge 251D by the width W of the object structure identified in the object recognition template. In some cases, the control circuit 111 can be configured to identify the outer edge 251D of Fig. 6A by determining a series of locations where there is a discontinuity in depth values ​​(e.g., a discontinuity from Z2 to Z3, as illustrated in Fig. 3A). In some cases, the control circuit 111 of the computer system 110 can be configured to identify edge 251C based on 2D information that may originate from the camera data generated by the first camera 270 and / or from the camera data generated by the second camera 280 in Fig. 6A. Fig. 6C, for example, represents a 2D image of the stack 250 from Fig. 6A. The 2D image of Fig. 6C may be essentially the same as that of Fig. 5C. In one embodiment, the control circuit 111 can be configured to identify from the 2D image one or more pixel coordinates at which edge 251C appears in the image of Fig. 6C. For example, the control circuit 111 can identify a first pixel coordinate [u1v1]T, which represents a location where a first endpoint of edge 251C appears in the 2D image of Fig. 6A.6C appears, and a second pixel coordinate [u2v2]T identifies a location where a second endpoint of edge 251C appears in the 2D image. In some cases, the control circuit 111 can be configured to convert the first pixel coordinate [u1v1]T and the second pixel coordinate [u2v2]T into a first 3D coordinate [X1Y1Z1]T and a second 3D coordinate [X2Y2Z2]T, as described above with reference to Fig. 5C. As stated above, some embodiments of the target feature identification in step 404 may involve identifying a visual feature located on a surface of the object structure as the target feature. In one embodiment, a visual feature may comprise a graphic element or any other visual marker. For example, Fig. 7A depicts a visual feature 751B on a surface 751A of an object structure for a first object 751 (which may also be referred to as a visual feature 751B located on a surface 751A of the first object 751). In particular, Fig. 7A depicts a situation in which a stack 750 of objects 751-753 is located in a field of view 272 of the first camera 270 and / or in a field of view 282 of the second camera 280.The first object 751 and the third object 753 can form a first layer of the stack 750, while the second object 752 can form a second layer of the stack 750 and can be stacked on top of the first object 751 and the third object 753. As stated above, the first object 751 can have a visual feature 751B located on a surface 751A of the first object 751. In the example of Fig. 7A, the visual feature 751B can be a logo displaying a brand name or trademark (e.g., "A") and can be printed or affixed to the surface 751A of the object structure for the first object 751. In one embodiment, the control circuit 111 of the computer system 110 can be configured to identify the visual feature 751B based on 2D information (e.g., a 2D image) that may originate from the camera data generated by the first camera 270 and / or the camera data generated by the second camera 280 of Fig. 7A. Fig. 7B, for example, depicts the visual feature 751B appearing in a 2D image of the stack 750. In particular, the surface 751A (e.g., the top surface) of the first object 751, the surface 752A of the second object 752, and the surface 753A of the third object 753 may appear in the image shown in Fig. 7B. In one embodiment, the control circuit 111 can be configured to identify the visual feature 751B by determining whether any part of the 2D information (e.g., 2D image) of Fig. 7B matches an appearance or other property of a defined visual feature. For example, the control circuit 111 can perform pattern recognition to attempt to detect whether any part of the 2D image of Fig. 7B matches a shape, marking, pattern, color, or any other aspect of the appearance of the defined visual feature. In one embodiment, information for the defined visual feature can be stored in the non-volatile, computer-readable medium 115. In another embodiment, information for the defined visual feature can be stored in an object recognition template.As stated above, the object recognition template can facilitate object recognition for a particular object or category of objects by describing its properties, such as the size (e.g., dimensions) of an object structure for the object or category of objects, the shape of the object structure, and / or the appearance of a surface of the object structure, such as a visual feature located on the surface of the object structure. For example, the object recognition template can include information describing the "A" logo as a visual feature appearing on a surface (e.g., 751A) of the first object 751 or of an object category to which the first object 751 belongs. In such a case, the control circuit 111 can be configured to identify the visual feature 751B by determining whether any part of the 2D image of Fig.7B matches the information stored in the object recognition template to describe the “A” logo. In one embodiment, the identification of the target feature in step 404 can involve identifying the outline of an area of ​​an object structure as the target feature. The outline can describe a boundary of the area of ​​the object structure. For example, the control circuit 111 can identify the outline of the object structure for the first object 751 in Fig. 7A by identifying all four edges 751C-751F of the area 751A of the object structure for the first object 751. In other words, the four edges 751C-751F can, for example, form an outline of the area 751A. In some cases, the control circuit 111 can identify the edges 751C-751F based on a defined size of the object structure for the first object 751, such as a defined size described in an object recognition template. The defined size can, for example, specify dimensions of the object structure, such as length and width.The control circuit 111 can, for example, be configured to identify at least one of the edges 751C-751F based on 2D information or 3D information in the camera data generated by the camera 270 / 280 and to identify the remaining edges 751C-751F based on the defined size of the object structure for the first object 751. Referring to Figures 4A to 4B, the method 400 may further include a step 406 in which the control circuit 111 determines a 2D region that is coplanar with the target feature and whose edge surrounds the target feature. The 2D region may, for example, be a 2D region with a rectangular shape (e.g., a square shape), a round shape, a hexagonal shape, or any other 2D shape. In some cases, the 2D region may be referred to as the occlusion analysis region, since it is used to determine an occlusion region, as explained in detail below. As an example, Fig. 5A shows corner 251B as the target feature and a 2D region 520 (e.g., a square region) that is coplanar with corner 251B and whose boundary surrounds corner 251B. Specifically, corner 251B can be a corner of surface 251A of an object structure for the first object 251, and the 2D region 520 can be coplanar with this surface 251A. Furthermore, corner 251B can be located within the 2D region 520. The 2D region 520 can be a square region, but it can also be modified to be, for example, a rectangular region or a circular region. In some cases, the control circuit 111 can define the 2D region 520 as a region that: (i) has a defined size and (ii) has a center located at corner 251B. As a further example, Fig. 6A shows edge 251C as described above as a target feature and represents a 2D region 620 (e.g., a rectangular region) that is coplanar with edge 251C of the object structure for the first object 251 and whose boundary surrounds edge 251C. Specifically, edge 251C can be an edge of surface 251A of the object structure for the first object 251, and the 2D region 620 can be coplanar with surface 251A. Furthermore, edge 251C can be located within the 2D region 620. In some aspects, the control circuit 111 can define the 2D region 620 as a region that: (i) has a defined size and (ii) has a center located at edge 251C. In some cases, the center of the 2D region 620 can be the center of edge 251C. Fig. 7A shows the target feature as the visual feature 751B and / or as an outline of the area 751A of the object structure for the first object 751, as described above. In the example of Fig. 7A, the control circuit 111, as part of step 406, can define a 2D region 720 (e.g., a square region). The 2D region 720 can be coplanar with the visual feature 751B and with the outline of the area 751A. Furthermore, the 2D region 720 can have a boundary that surrounds both the visual feature 751B and the outline of the area 751A. In one embodiment, the control circuit 111 can be configured to determine the 2D region 520 / 620 / 720 based on a defined size, which is stored, for example, in the non-volatile, computer-readable medium. The defined size can, for example, be a fixed size for the 2D region 520 / 620 / 720. If the target feature is a corner (e.g., 251B), the 2D region (e.g., 520) that surrounds the corner and is therefore coplanar can be a square region with a fixed size of, for example, 2 cm x 2 cm or 5 cm x 5 cm. In one embodiment, the control circuit 111 can be configured to determine the size of the 2D region (e.g., 620 / 720) based on the size of the target feature. The control circuit 111 can, for example, be configured to determine a length of the 2D region 620 as a ratio multiplied by a length of the edge 250C of Fig. 6A.In another example, the control circuit 111 can be configured to determine the length of the 2D region 720 as the ratio multiplied by the length of a first edge (e.g., 750C) that forms the outline of the surface 751A of the first object 751, and it can be configured to determine the width of the 2D region 720 as the ratio multiplied by the length of a second edge (e.g., 751E) that forms the outline, where the second edge can be perpendicular to the first edge. The 2D region 720 can have a length and a width that have different values ​​or the same value. In some cases, the ratio can be a defined value stored in the non-volatile, computer-readable medium 115 or elsewhere. In some cases, the ratio can be dynamically defined or otherwise determined by the control circuit 111. In one embodiment, the control circuit 111 can be configured to determine the size of the 2D region (e.g., 520 / 620 / 720) based on at least one of: an environmental factor or a feature of the object structure to which the target feature belongs. In some cases, the environmental factor can include an image noise level, which specifies, for example, the amount of light in the camera's environment (e.g., 270 / 280) or any other condition that can affect the camera's ability to accurately capture a scene in its field of view (e.g., 272 / 282). In some cases, the feature of the object structure to which the target feature belongs can include, for example, at least one of the object structure's shapes or a texture of a surface of the object structure.For example, an object structure with a round shape may be more likely to interfere with the operation of a 3D camera and result in reduced accuracy of the 3D information generated by the 3D camera to describe the object structure. In some cases, the texture of the surface may indicate its reflectivity. For example, a more reflective (e.g., shinier) surface may be more likely to interfere with the operation of a 2D camera and reduce the accuracy of the 2D information generated by the 2D camera to capture or otherwise represent the surface's appearance. In one embodiment, determining the size of the 2D region (e.g., 620 / 720) may involve determining the ratio described above, multiplied by a dimension of the target feature.In such an embodiment, the ratio can be determined based on the environmental factor and / or the characteristics of the object structure. In one embodiment, the control circuit 111 can be configured to determine the size of the 2D region (e.g., 520 / 620 / 720) based on a ratio where the size increases as the image noise level increases. In some cases, increasing the size of the 2D region (e.g., 520 / 620 / 720) can increase the size of an occlusion region, as described below. Referring to Figures 4A to 4B, the method may include a step 408 in which the control circuit 111 determines a 3D region defined by connecting a location of the first camera from step 402 (e.g., the first camera 270) and the boundary of the 2D region, where the 3D region is part of the first camera's field of view (e.g., 272). In one embodiment, the location of the first camera (e.g., 270) used to define the 3D region may be a focal point of the first camera, a location on an image sensor of the first camera 270, such as a corner or center of the image sensor, or any other location. In some cases, the 3D region may be part of the first camera's field of view (e.g., 272) used for occlusion analysis and may be referred to as the analysis field of view. As an example for step 408, Fig. 5A shows an exemplary 3D region 530, defined by connecting a location of the first camera 270 and the boundary of the 2D region 520. Specifically, the 3D region 530 can be defined by the lines 530A-530D, which connect the location of the camera 270 to four corresponding corners of the boundary of the 2D region 520. In some cases, determining the 3D region may involve determining information representing the lines 530A-530D. In another example, Fig. 6A shows an exemplary 3D region 630, defined by connecting a location of the first camera 270 and the boundary of the 2D region 620. The 3D region 630 can also be defined by connecting the location of camera 270 to the boundary of the 2D region 620, and in particular by the lines 630A-630D extending from the location of camera 270 to the corresponding corners of the 2D region 620. Fig. 7A shows an example in which a 3D region 730 is defined by connecting the location of the first camera 270 and the boundary of the 2D region 720. In particular, the 3D region 720 can be defined by connecting lines 730A-730D from the location of camera 270 to the four corresponding corners of the 2D region 720. In one embodiment, the 3D region (e.g. 530 / 630 / 730) can form an imaginary pyramid if the corresponding 2D region (e.g. 520 / 620 / 720) is, for example, a rectangular region (e.g., a square region).In other embodiments, the 3D region can form any other 3D shape, such as an imaginary cone defined by connecting a location of the first camera 270 with a circular 2D region. Referring to Figures 4A to 4B, the method 400 can include a step 412 in which the control circuit 111 determines a size (e.g., an area) of an occlusion region based on the camera data and the 3D region. In one embodiment, the occlusion region can be a region of the stack structure (from step 402) located between the target feature and the at least one camera and within the 3D region (e.g., 530 / 630 / 730). In particular, the occlusion region can be a region that is not coplanar with the target feature and is located closer to the first camera (e.g., 270) of the at least one camera than the target feature, such that the occlusion region is located between the target feature and the first camera. The occlusion region can, for example, be a region that is higher than the target feature. Since the occlusion region is located between the first camera and the target feature and within the 3D region (e.g.530 / 630 / 730), it represents part of the stack structure that may be located in a position that blocks, or is close to blocking, the target feature (e.g., 251B / 251C / 751B) or part of a region surrounding the target feature from the first camera (e.g., 270). Therefore, the size of the occlusion region can be used to assess occlusion in the first camera's field of view (e.g., 272). To illustrate an example of step 412, an occlusion region 570 is shown in Fig. 5D. In particular, the occlusion region 570 can be a region of the stack structure for the stack 250, and more specifically, a region of the surface 252A (e.g., the top surface) of the object structure for the second object 252. Furthermore, the occlusion region 570 can be located between the corner 251B (which is the target feature of Fig. 5A and Fig. 5D) and the first camera 270, and within the 3D region 530. As described above, the 3D region 530 can be an imaginary pyramid located within the first camera's field of view 272 (as illustrated in Fig. 3A and Fig. 3B), and it can be defined based on the 2D region 520, the boundary of which surrounds the corner 251B.In this example, the 2D region 520 can be a first 2D region, and the occlusion region 570 can be a second 2D region that is parallel to the first 2D region and lies within the imaginary pyramid of the 3D region 530. In some cases, the occlusion region 570 can include at least one region that is: (i) parallel to the 2D region 520 and (ii) lies within the 3D region 530. In another example, Fig. 6D illustrates an occlusion region 670, which is a region of the stack structure for the stack 250, located between edge 251C (which is the target feature of Fig. 6A and Fig. 6D) and camera 270, and within the 3D region 630. Specifically, the occlusion region 670 can be a region on face 252A of the second object 252, with the region located within the imaginary pyramid formed by the 3D region 630 and between the first camera 270 and edge 251C. In the example of Fig. 6D, the occlusion region 670 can be parallel to the 2D region 620. Fig. 7C represents an occlusion region 770, which is a region of the stack structure for the stack 750, located between the target feature of Fig. 7A and Fig. 7C (e.g., the visual feature 751B or the outline of the area 751A) and the first camera 270 and within the 3D region 730.In particular, the occlusion region 770 can be a region on the surface 752A of the second object 752, wherein the region is located within the imaginary pyramid formed by the 3D region 730 and between the first camera 270 and the target feature. In the example of Fig. 7C, the occlusion region 770 can be parallel to the 2D region 720. As stated above, in one embodiment, the occlusion region 570 / 670 / 770 can be a region located on a surface parallel to the 2D region 520 / 620 / 720 determined in step 406, such as the surface 252A / 752A of the second object 252 / 752. In some cases, the occlusion region 570 / 670 / 770 may be restricted to a surface or surfaces parallel to the 2D region 520 / 620 / 720. In some cases, the occlusion region 570 may extend to a different surface or surfaces, such as...a surface perpendicular to surface 252A / 752A (but still within the 3D region 530 / 630 / 730). For certain aspects, the control circuit 111 can determine the size of the occlusion region (e.g., 570 / 670 / 770) based on 3D information from the camera data, generated, for example, by the first camera 270. The 3D information can, for example, specify depth information that identifies a group of locations on one or more faces of the stack structure for the stack that are closer to the first camera 270 than the target feature with respect to camera 270 (e.g., locations on face 251A / 751A). The control circuit 111 can determine which locations from the group of locations lie within the 3D region (e.g., 530 / 630 / 730) determined in step 408 and determine the size of the occlusion region based on this group of locations. In the example shown in Fig. 5D, Fig. 6D, and Fig. 7C, the control circuit 111 can, for example, be configured to determine several 3D data points (e.g., 3D coordinates) from the camera data to represent the corresponding locations on one or more faces of a stack structure, such as a face 251A / 751A (e.g., top face) of a first object 251 / 751 of the stack 250 / 750, a face 252A / 752A of a second object 252 / 752 of the stack 250 / 750, and a face 253A / 753A of a third object 253 / 753 of the stack 250 / 750. In this example, the control circuit 111 can also determine a depth value of Z2 as an expected depth value, which is associated with the target feature 251B / 251C / 751B. For example, the expected depth value associated with target feature 251B / 251C / 751B can be a Z-component of a 3D coordinate (e.g. [XYZ]T) of target feature 251B / 251C / 751B, where the 3D coordinate can be in a common reference frame as described above.The control circuit 111 can further determine a subset of the multiple 3D data points to represent corresponding locations on one or more faces of the stack structure that are closer to the first camera 270 relative to the expected depth value of Z2 and that lie within the 3D region 530 / 630 / 730. The subset can represent 3D data points associated with locations between the target feature 251B / 251C / 751B and the camera 270 and within the 3D region 530 / 630 / 730. In this example, the subset can be the 3D data points associated with locations on face 252A / 752A of an object structure for the second object 252 / 752 of the stack 250 / 750. In some aspects, the control circuit 111 can determine the size of the occlusion region (e.g., 570 / 670 / 770) by determining an area of ​​the occlusion region, a dimension of the occlusion region, or any combination thereof. In some cases, the control circuit 111 can be configured to determine the size of the occlusion region (e.g., 570 / 670 / 770) based on the subset of 3D data points described above. For example, the size of the occlusion region can be based on the number of 3D data points in the subset of 3D data points, or on determining a boundary of a region defined by the subset of 3D data points and integrating this region to determine an area of ​​it. In one embodiment, the subset of 3D data points can be determined based on comparing the expected depth value (e.g., Z2) with corresponding depth values ​​(e.g., Z1 and Z2) associated with the multiple 3D data points. The subset can be determined, for example, by identifying 3D data points from the multiple 3D data points that: i) are associated with corresponding depth values ​​(e.g., Z1) that are less than the expected depth value (e.g., Z2) by at least one defined difference threshold, and ii) are located within the 3D region (e.g., 530 / 630 / 730). In this example, the defined difference threshold can account for image noise or surface imperfections that may cause slight variations in depth values ​​for locations that are actually on the same surface (e.g., 251A).To determine whether a location is on a surface that is closer than a first surface on which the target feature is located, the control circuit 111 can therefore determine whether a depth value of the location is smaller than the expected depth value of the first surface by at least the defined difference threshold. In one embodiment, the control circuit 111 can be configured to determine the size of a hidden region, such as the hidden region 751C of Fig. 7D. The hidden region (e.g., 751C) can be a region that is coplanar with the target feature, such as target feature 751B and / or target features 251B / 251C of Figs. 5A and 6A. For example, the hidden region 751C can be located on a surface 751A on which the target feature 751B is located. In one embodiment, the control circuit 111 can define the hidden region 751C by identifying an edge or corner of the stack structure for the stack 750 that is closer to the first camera 270 than the target feature 751B (or 251B / 251C of Figs. 5A and 6A).6A) at the first camera 270 by defining a plane (a flat plane or a curved plane) extending from a location of the first camera 270 to the edge or corner of the stacked structure, projecting the plane onto a surface on which the target feature is located, and determining an intersection point between the plane and the surface. In the example of Fig. 7D, the control circuit 111 can identify edge 752B of surface 752A as an edge of the stacked structure that is closer to the first camera 270 than the target feature 751B. The control circuit 111 can define a plane 740 extending from the location of the first camera 270 to edge 752B and can project the plane 740 onto surface 751A on which the target feature 751B is located. The control circuit 111 can also determine the line 741 as the intersection point between the plane 740 and the surface 751A.In this example, line 741 can be an edge that forms part of a boundary of the hidden region 751C. In some cases, the boundary of the hidden region 751C can also be formed by one or more edges of the surface 751A on which the target feature 751B is located, such as edges 751E and 751F. In some cases, the boundary can also be formed by one or more surfaces of the stacking structure, such as surface 752C of the second object 752, which intersects surface 751A on which the target feature 751B is located. With reference to Figures 4A to 4B, the method 400 may further include a step 414 in which the control circuit 111 determines a value of an object detection confidence parameter based on the size of the occlusion region. In some cases, the value of the object detection confidence parameter may be inversely related to the size of the occlusion region. For example, an increase in the size of the occlusion region may cause the value of the object detection confidence parameter to change in a direction that indicates lower confidence in the accuracy of an object detection operation already performed or planned. In one embodiment, the control circuit 111 may be configured to determine the value of the object detection confidence parameter by establishing a ratio between the size of the occlusion region (e.g., 570 / 670 / 770) and a size of the 2D region determined in step 406 (e.g.,520 / 620 / 720) or an inverse of the ratio. In one embodiment, the control circuit 111 can alternatively or additionally determine the value of the confidence parameter for object detection based on the size of the hidden region (e.g., 751C of Fig. 7D), such as based on a ratio between the size of the hidden region and the size of the 2D region (e.g., 520 / 620 / 720) determined in step 406, or an inverse of the ratio. In some cases, the value of the object detection confidence parameter can be based on whether the ratio exceeds a defined occlusion threshold. The defined occlusion threshold can be a value predefined in the non-volatile, computer-readable medium 115 of Fig. 2, or it can be dynamically defined or otherwise determined by the control circuit 111. In some cases, the control circuit 111 can be configured to dynamically define the occlusion threshold, for example, based on the amount of light in the vicinity of a camera (e.g., 270), the shape of an object to which the target feature belongs, and / or the texture of a surface of the object. In some cases, the control circuit 111 can adjust the size of the 2D region (e.g., 520 / 620 / 720) of step 406 instead of, or in addition to, the occlusion threshold. A condition that can decrease the accuracy of the object detection process (e.g.,Image noise (a round object and / or an object with a shiny surface) can be expressed by enlarging the 2D region and / or reducing the defined occlusion threshold. In one embodiment, the method 400 further comprises a step 416 in which the control circuit 111 performs an operation to control robot interaction with the stack structure, wherein the operation can be performed based on the value of the object recognition confidence parameter. In some cases, the operation of step 416 can involve issuing a motion command to effect robot movement. The motion command can be determined, for example, such that the value of the confidence parameter changes in a manner indicating less occlusion of the target feature (e.g., in a decreasing direction) and / or greater confidence in an object recognition operation. The control circuit 111 can, for example, specify a direction of movement for a portion of the stack, such as the first object 251 / 751 or the second object 252 / 752 of the stack 250 / 750 of Fig. 5A and Fig. 5B.7A, determine which can cause a change in the value of the object detection confidence parameter in a direction indicating less occlusion. In such an example, the control circuit 111 can determine a robot movement to effect such a direction of movement for the portion of the stack and determine a movement command to effect the robot movement. The control circuit 111 can further output the movement command via the communication interface 113. If the movement command is received, for example, by the robot 150 of Fig. 1C, the robot 150 can move the portion of the stack in the specified direction in response to the movement command. In one embodiment, the robot's interaction with the stack structure for the stack 250 / 750 can include performing object recognition to facilitate robot interaction. This object recognition can be based on the target feature described above (e.g., 251B / 251C / 751B) and on camera data from, for example, the first camera 270. In such an embodiment, the operation to control the robot interaction can include re-executing the object recognition and / or determining whether to re-execute it based on the value of the confidence parameter. In some cases, the control circuit 111 can further determine a robot movement after the object recognition has been re-executed. For example, the control circuit 111 can determine to re-execute the object recognition in response to determining that the value of the confidence parameter is below a defined confidence threshold.In some cases, the defined confidence threshold can be the inverse of the defined occlusion threshold and / or inversely related to the defined occlusion threshold. In some cases, control circuit 111 may determine to re-execute object detection if the confidence parameter value indicates that the size of the occlusion region (e.g., 570 / 670 / 770) is too large. For example, control circuit 111 may determine to re-execute object detection if the ratio between the size of the occlusion region (e.g., 570 / 670 / 770) and the size of the 2D region (e.g., 520 / 620 / 720) exceeds the defined occlusion threshold. Brief description of different designs One aspect of the present disclosure relates to embodiment 1, which comprises a computer system with a communication interface and a control circuit. In this embodiment, the communication interface is configured to communicate with at least one camera, which comprises a first camera with a first camera field of view. The control circuit in this embodiment is configured, when a stack of multiple objects is located in the first camera field of view, to receive camera data generated by the at least one camera, wherein the camera data describes a stack structure for the stack and the stack structure is formed from at least one object structure for a first object of the multiple objects.In this embodiment, the control circuit is further configured to identify, based on the camera data generated by the at least one camera, a target feature of the object structure or a target feature arranged on the object structure, wherein the target feature is at least one of the following: a corner of the object structure, an edge of the object structure, a visual feature arranged on a surface of the object structure, or an outline of the surface of the object structure. In this embodiment, the control circuit is also configured to determine a two-dimensional (2D) region that is coplanar with the target feature and whose edge surrounds the target feature; to determine a three-dimensional (3D) region defined by connecting a location of the first camera and the edge of the 2D region, wherein the 3D region is part of the first camera's field of view; and, based on the camera data and the 3D region, to determine the size of an occlusion or...to determine the occlusion region, wherein the occlusion region is an area of ​​the stack structure located between the target feature and the at least one camera and within the 3D region; to determine a value of an object detection confidence parameter based on the size of the occlusion region. In this embodiment, the control circuit is further configured to execute an operation to control robot interaction with the stack structure, wherein the operation is executed based on the value of the object detection confidence parameter. Embodiment 2 comprises the computer system of embodiment 1. In embodiment 2, the control circuit is configured to identify the target feature based on information in an object recognition template that describes a size of the object structure or that describes the visual feature that appears on the surface of the object structure. Embodiment 3 comprises the computer system of embodiment 1 or 2. In embodiment 3, the visual feature is an image arranged on the surface of the object structure, and the control circuit is configured to identify the image as the target feature. Embodiment 4 comprises the computer system of one of embodiments 1 to 3. In embodiment 4, the control circuit is configured to determine a size of the 2D region based on an image noise level, a shape of the object structure, or a texture of the surface of the object structure. Embodiment 5 comprises the computer system of any one of embodiments 1 to 4. In embodiment 5, the target feature is the edge of the object structure and the control circuit is configured to determine the 2D region as a region with: (i) a defined size and (ii) a center located at the edge. Embodiment 6 comprises the computer system of any one of embodiments 1 to 4. In embodiment 6, the target feature is the corner of the object structure and the control circuit is configured to determine the 2D region as a region with: (i) a defined size and (ii) a center located at the corner. Embodiment 7 comprises the computer system of one of embodiments 1 to 6. In embodiment 6, the 3D region is an imaginary pyramid located within the first camera field of view, wherein the 2D region whose boundary surrounds the target feature is a first 2D region, and wherein the occlusion region is a second 2D region parallel to the first 2D region and located within the imaginary pyramid. Embodiment 8 comprises the computer system of any one of embodiments 1 to 7. In embodiment 8, the control circuit is configured to determine the size of the occlusion region by: determining, from the camera data, several 3D data points to represent corresponding locations on one or more faces of the stacked structure; determining, as an expected depth value, a depth value for the target feature relative to the first camera; and determining a subset of the several 3D data points to represent corresponding locations on the one or more faces of the stacked structure that are closer to the first camera and within the 3D region relative to the expected depth value, wherein the subset is determined based on comparing the expected depth value with corresponding depth values ​​associated with the several 3D data points. Embodiment 9 comprises the computer system of embodiment 8. In embodiment 9, the subset of 3D data points is determined by identifying, from the multiple 3D data points, those 3D data points that: i) are associated with corresponding depth values ​​that are smaller than the expected depth value by at least one defined difference threshold, and ii) are located within the 3D region. Embodiment 10 comprises the computer system of one of embodiments 1 to 9, wherein in embodiment 10 the first camera with which the communication interface is configured to communicate is a 3D camera configured to generate, as part of the camera data, several 3D data points that specify corresponding depth values ​​for locations on one or more surfaces of the stack structure. Embodiment 11 comprises the computer system of embodiment 10. In embodiment 11, the at least one camera with which the communication interface is configured to communicate further comprises a second camera which is configured to generate a 2D image as part of the camera data, and wherein the control circuit is configured to identify the target feature based on the 2D image. Embodiment 12 comprises the computer system of one of embodiments 1 to 9. In embodiment 12, the first camera with which the communication interface is configured to communicate is a 2D camera, wherein the at least one camera with which the communication interface is configured to communicate further comprises a second camera which is configured to generate, as part of the camera data, several 3D data points to represent corresponding depth values ​​of locations on one or more surfaces of the stack structure. Embodiment 13 comprises the computer system of one of embodiments 1 to 12. In embodiment 13, the control circuit is configured to determine the value of the object detection confidence parameter by: determining a ratio between the size of the occlusion region and a size of the 2D region; and determining the value of the object detection confidence parameter based on the ratio. Embodiment 14 comprises the computer system of embodiment 13. In embodiment 14, the value of the object recognition confidence parameter is determined based on whether the ratio exceeds a defined occlusion or occlusion threshold. Embodiment 15 comprises the computer system of one of embodiments 1 to 14. In embodiment 15, the location of the first camera is a focal point of the first camera. Embodiment 16 comprises the computer system of one of embodiments 1 to 15. In embodiment 16, the operation for controlling the robot interaction comprises issuing a motion command to effect a robot movement, wherein the motion command is determined to effect a change in the value of the object recognition confidence parameter in a manner that indicates a lower occlusion of the target feature. Embodiment 17 comprises the computer system of one of embodiments 1 to 16. In embodiment 17, the control circuit is configured to perform object recognition for the target feature based on the camera data, and wherein the operation to control the robot interaction includes determining whether to re-execute the object recognition based on the value of the object recognition confidence parameter and determining the robot movement after the re-execution of the object recognition. Although various embodiments have been described above, it is understood that they are presented only as illustrations and examples of the present invention and not as limitations. It will be obvious to a person skilled in the art that various modifications to form and detail can be made without departing from the spirit and scope of the invention. Therefore, the breadth and scope of the present invention should not be limited by any of the exemplary embodiments described above, but should only be defined in accordance with the appended claims and their equivalents. It should also be understood that each feature of each embodiment described herein and of each reference cited herein may be used in combination with the features of any other embodiment.All patents and publications described herein are incorporated herein by reference in their entirety.

Claims

Computer system (110), comprising: a communication interface (113) configured to communicate with at least one camera, comprising a first camera (270) with a first camera field of view (272); a control circuit (111) configured, when a stack (250, 750) with multiple objects is located in the first camera field of view (272), to: receive camera data generated by the at least one camera, wherein the camera data describes a stack structure for the stack (250, 750), the stack structure being formed from at least one object structure for a first object of the multiple objects;Identify, based on camera data generated by the at least one camera, a target feature (251B, 251C, 751B) of the object structure or a target feature (251B, 251C, 751B) located on the object structure, wherein the target feature (251B, 251C, 751B) is at least one of the following: a corner (251B) of the object structure, an edge (251C) of the object structure, a visual feature (751B) located on a surface (251A, 751A) of the object structure, or an outline of the surface (251A, 751A) of the object structure; Determine a two-dimensional, 2D, region (520, 620, 720) that is coplanar with the target feature (251B, 251C, 751B) and whose boundary is defined by the target feature (251B, 251C, 751B); Determining a three-dimensional, 3D, region (530, 630, 730) defined by connecting a location of the first camera (270) and the boundary of the 2D region (520, 620, 720), wherein the 3D region (530, 630, 730) is part of the first camera's field of view (272);Determine, based on the camera data and the 3D region (530, 630, 730), a size of an occlusion region (570, 670, 770), where the occlusion region (570, 670, 770) is a region of the stack structure located between the target feature (251B, 251C, 751B) and the at least one camera (270) and within the 3D region (530, 630, 730); Determine a value of an object detection confidence parameter based on the size of the occlusion region (570, 670, 770); and performing an operation to control robot interaction with the stack structure, wherein the operation is performed based on the value of the object recognition confidence parameter; wherein the first camera (270) with which the communication interface (113) is configured to communicate is a 3D camera configured to generate, as part of the camera data, several 3D data points that specify corresponding depth values ​​for locations on one or more faces of the stack structure. Computer system (110) according to claim 1, wherein the control circuit (111) is configured to identify the target feature (251B, 251C, 751B) based on information in an object recognition template that describes a size of the object structure or that describes the visual feature (751B) that appears on the surface (251A, 751A) of the object structure; and / or wherein the visual feature (751B) is an image arranged on the surface (251A, 751A) of the object structure, and the control circuit (111) is configured to identify the image as the target feature (251B, 251C, 751B). Computer system (110) according to claim 1 or 2, wherein the control circuit (111) is configured to determine a size of the 2D region (520, 620, 720) based on an image noise level and / or a shape of the object structure and / or a texture of the surface (251A, 751A) of the object structure. Computer system (110) according to one of the preceding claims, wherein the target feature (251B, 251C, 751B) is the edge (251C) of the object structure and the control circuit (111) is configured to determine the 2D region (520, 620, 720) as a region with: (i) a defined size and (ii) a center located at the edge (251C); and / or wherein the target feature (251B, 251C, 751B) is the corner (251B) of the object structure and the control circuit (111) is configured to determine the 2D region (520, 620, 720) as a region with: (i) a defined size and (ii) a center located at the corner (251B). Computer system (110) according to any of the preceding claims, wherein the 3D region (530, 630, 730) is an imaginary pyramid located within the first camera field of view, wherein the 2D region (520, 620, 720), the boundary of which surrounds the target feature (251B, 251C, 751B), is a first 2D region, and wherein the occlusion region (570, 670, 770) is a second 2D region that is parallel to the first 2D region and located within the imaginary pyramid. Computer system (110) according to one of the preceding claims, wherein the control circuit (111) is configured to determine the size of the occlusion region (570, 670, 770) by: determining, from the camera data, several 3D data points to represent corresponding locations on one or more faces of the stack structure; determining a depth value for the target feature (251B, 251C, 751B) relative to the first camera (270) as an expected depth value;and determining a subset of the multiple 3D data points to represent corresponding locations on one or more faces of the stacked structure that are closer to the first camera (270) relative to the expected depth value and within the 3D region (530, 630, 730), wherein the subset is determined based on comparing the expected depth value with corresponding depth values ​​associated with the multiple 3D data points, and wherein preferably the subset of 3D data points is determined by identifying from the multiple 3D data points 3D data points that: i) are associated with corresponding depth values ​​that are less than the expected depth value by at least one defined difference threshold, and ii) are located within the 3D region (530, 630, 730). Computer system (110) according to one of the preceding claims, wherein the at least one camera with which the communication interface (113) is configured to communicate further comprises a second camera which is configured to generate a 2D image as part of the camera data, and wherein the control circuit (111) is configured to identify the target feature (251B, 251C, 751B) based on the 2D image. Computer system (110) according to one of the preceding claims, wherein the control circuit (111) is configured to determine the value of the object detection confidence parameter by: determining a ratio between the size of the occlusion region (570, 670, 770) and a size of the 2D region (520, 620, 720); and determining the value of the object detection confidence parameter based on the ratio, wherein preferably the value of the object detection confidence parameter is determined based on whether the ratio exceeds a defined occlusion threshold. Computer system (110) according to one of the preceding claims, wherein the location of the first camera (270) is a focal point of the first camera (270). Computer system (110) according to one of the preceding claims, wherein the operation for controlling the robot interaction comprises issuing a motion command to effect a robot movement, wherein the motion command is determined to effect a change in the value of the object recognition confidence parameter in a manner that indicates a lower occlusion of the target feature (251B, 251C, 751B). Computer system (110) according to one of the preceding claims, wherein the control circuit (111) is configured to perform object recognition for the target feature (251B, 251C, 751B) based on the camera data, and wherein the operation to control a robot interaction comprises determining, based on the value of the object recognition confidence parameter, whether to perform object recognition again, and determining a robot movement after object recognition has been performed again. Method executed by a computer system (110), the method comprising: receiving camera data by the computer system (110), wherein the computer system (110) comprises a communication interface (113) configured to communicate with at least one camera, comprising a first camera (270) with a first camera field of view (272), wherein the camera data is generated by the at least one camera when a stack (250, 750) containing multiple objects is located in the first camera field of view, and wherein the camera data describes a stack structure for the stack (250, 750) and the stack structure is formed from at least one object structure for a first object of the multiple objects;Identify, based on camera data generated by the at least one camera, a target feature (251B, 251C, 751B) of the object structure or a target feature (251B, 251C, 751B) located on the object structure, wherein the target feature (251B, 251C, 751B) is at least one of the following: a corner (251B) of the object structure, an edge (251C) of the object structure, a visual feature (751B) located on a surface (251A, 751A) of the object structure, or an outline of the surface (251A, 751A) of the object structure; Determine a two-dimensional, 2D, region (520, 620, 720) that is coplanar with the target feature (251B, 251C, 751B) and whose boundary is defined by the target feature (251B, 251C, 751B); Determining a three-dimensional, 3D, region (530, 630, 730) defined by connecting a location of the first camera (270) and the boundary of the 2D region (520, 620, 720), wherein the 3D region (530, 630, 730) is part of the first camera's field of view (272);Determine, based on the camera data and the 3D region (530, 630, 730), a size of an occlusion region (570, 670, 770), where the occlusion region (570, 670, 770) is a region of the stack structure located between the target feature (251B, 251C, 751B) and the at least one camera (270) and within the 3D region (530, 630, 730); Determine a value of an object detection confidence parameter based on the size of the occlusion region (570, 670, 770); and performing an operation to control a robot interaction with the stack structure, wherein the operation is performed based on the value of the object recognition confidence parameter; wherein the first camera (270) with which the communication interface (113) is configured to communicate is a 3D camera configured to generate, as part of the camera data, several 3D data points that specify corresponding depth values ​​for locations on one or more faces of the stack structure. Method according to claim 12, wherein the 3D region (530, 630, 730) is an imaginary pyramid located within the first camera field of view, wherein the 2D region (520, 620, 720) whose boundary surrounds the target feature (251B, 251C, 751B) is a first 2D region, and wherein the occlusion region (570, 670, 770) is a second 2D region parallel to the first 2D region and located within the imaginary pyramid. Non-volatile, computer-readable medium containing instructions stored on it which, when executed by a control circuit (111) of a computer system (110), cause the control circuit (111) to receive camera data, wherein the computer system (110) comprises a communication interface (113) configured to communicate with at least one camera, which comprises a first camera (270) with a first camera field of view (272), wherein the camera data is generated by the at least one camera when a stack (250, 750) containing multiple objects is located in the first camera field of view (272), and wherein the camera data describes a stack structure for the stack (250, 750) and the stack structure is formed from at least one object structure for a first object of the multiple objects;Identify, based on camera data generated by the at least one camera, a target feature (251B, 251C, 751B) of the object structure or a target feature (251B, 251C, 751B) located on the object structure, wherein the target feature (251B, 251C, 751B) is at least one of the following: a corner (251B) of the object structure, an edge (251C) of the object structure, a visual feature (751B) located on a surface (251A, 751A) of the object structure, or an outline of the surface (251A, 751A) of the object structure; Determine a two-dimensional, 2D, region (520, 620, 720) that is coplanar with the target feature (251B, 251C, 751B) and whose boundary is defined by the target feature (251B, 251C, 751B); Determining a three-dimensional, 3D, region (530, 630, 730) defined by connecting a location of the first camera (270) and the boundary of the 2D region (520, 620, 720), wherein the 3D region (530, 630, 730) is part of the first camera's field of view (272);Determine, based on the camera data and the 3D region (530, 630, 730), the size of an occlusion region (570, 670, 770), where the occlusion region (570, 670, 770) is a region of the stack structure located between the target feature (251B, 251C, 751B) and the at least one camera and within the 3D region (530, 630, 730); and determine a value of an object detection confidence parameter based on the size of the occlusion region (570, 670, 770); and performing an operation to control a robot interaction with the stack structure, wherein the operation is performed based on the value of the object recognition confidence parameter; wherein the first camera (270) with which the communication interface (113) is configured to communicate is a 3D camera configured to generate, as part of the camera data, several 3D data points that specify corresponding depth values ​​for locations on one or more faces of the stack structure.