Method and apparatus for generating robot navigation map from noisy indoor point cloud
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- LG ELECTRONICS INC
- Filing Date
- 2025-12-23
- Publication Date
- 2026-07-02
Smart Images

Figure KR2025022660_02072026_PF_FP_ABST
Abstract
Description
Method and device for generating a robot navigation map from a noisy indoor point cloud
[0001] This application claims priority to Provisional Application No. 63 / 738,549, filed on December 24, 2024, pursuant to 35 USC§119, the entire contents of which are incorporated herein by reference.
[0002] A robot can refer to a machine that automatically processes or operates a given task based on its own capabilities. In particular, robots equipped with the ability to perceive their environment and perform self-determining actions are sometimes referred to as intelligent robots. Depending on their purpose of use or field, robots can be classified into various categories, such as industrial robots, medical robots, household robots, and military robots.
[0003] The drive unit of a robot may include actuators or motors and can perform various physical movements, such as moving robot joints. Additionally, a mobile robot may include wheels, brakes, propellers, etc., in the drive unit and can move on the ground or fly in the air.
[0004] Indoor robot navigation is generally performed using two-dimensional (2D) or three-dimensional (3D) maps. These maps can be used to guide robot navigation in a variety of applications, including, but not limited to, autonomous vacuuming, food and service delivery, tourist guidance, and automated roaming tasks.
[0005] A 3D map can be generated using a red-green-blue-depth (RGB-D) camera along with a simultaneous localization and mapping (SLAM) algorithm. The map data may include object identification information for various objects placed in the space where the robot moves. For example, the map data may include object identification information for fixed objects such as walls and doors, and movable objects such as furniture and desks. The object identification information may include the name, type, distance, and location of a given object.
[0006] The robot can determine a movement path and movement plan using at least one of map data, object information detected by one of the sensors, or object information acquired from an external source, and can control a drive unit so that the robot moves along the determined movement path and movement plan.
[0007] 3D maps can be used in robots equipped with 3D sensors, such as 3D LiDAR sensors. However, not all robots are equipped with such sensors. For example, some robots (e.g., inexpensive robots) may be equipped with 2D sensors, such as 2D LiDAR sensors, which are generally less expensive. These robots rely on 2D maps to navigate in indoor environments such as offices, restaurants, hotels, and airports.
[0008] When generating a 2D map from a 3D map based on a 3D point cloud, the point cloud may contain significant noise and unwanted objects. Common intruders may include pedestrians, furniture (e.g., chairs), cleaning equipment, trash, toys, etc., and all of these elements can negatively affect the accuracy of the generated 2D map and the robot navigation performance.
[0009] An aspect of the present invention relates to a method and apparatus for generating a 2D navigation map for use by a robot from a noisy indoor point cloud. According to one or more aspects, a deep learning-based classification technique is used to perform both 3D and 2D object detection. Unwanted objects are autonomously identified and filtered from the navigation map to generate a clearer and more reliable 2D map suitable for robot navigation.
[0010] For example, a semi-supervised convolutional neural network (CNN) is used for deep learning-based classification to filter sensor noise and detect unwanted 3D objects within the point cloud. An accurate and clear 2D navigation map available for the robot is generated by segmenting the 3D point cloud at the height of the robot sensor on the floor. Unwanted 2D obstacles are further filtered based on the unique features of the 2D contours, enabling contour-based precise removal of non-structural elements.
[0011] According to at least one embodiment, a computer-implemented method for generating a global two-dimensional (2D) map of an indoor environment based on three-dimensional (3D) data is disclosed. The 2D map is intended to guide autonomous navigation of a robot including a 2D sensor. The computer-implemented method comprises: processing a 3D point cloud of an indoor environment; detecting one or more point clusters present in the processed 3D point cloud; removing one or more clusters from the processed 3D point cloud to generate a global 3D point cloud based on a judgment that, in response to the detection, an object corresponding to one or more of the detected clusters is unnecessary; and dividing the global 3D point cloud into a plurality of 3D segments, each segment corresponding to an individual spatial part of the indoor environment, and the method further comprises, for each of the plurality of 3D segments, identifying a local floor as a reference plane of the 3D segment; and collecting a 2D slice of the 3D segment at the height of the robot's 2D sensor based on the identified local floor. This method further includes the step of assembling 2D slices of multiple collected 3D segments to form a global 2D map.
[0012] According to at least one embodiment, an artificial intelligence (AI) device is configured to generate a global two-dimensional (2D) map of an indoor environment based on three-dimensional (3D) data. The 2D map is intended to guide autonomous navigation of a robot including a 2D sensor. The AI device includes at least one transceiver and at least one processor, the processor processes a 3D point cloud of the indoor environment; detects one or more point clusters present in the processed 3D point cloud; in response to the detection, removes one or more clusters from the processed 3D point cloud to generate a global 3D point cloud based on the judgment that an object corresponding to one or more detected clusters is unnecessary; and divides the global 3D point cloud into a plurality of 3D segments, each segment being configured to correspond to an individual spatial part of the indoor environment. At least one processor is further configured to identify, for each of a plurality of 3D segments, a local floor as a reference plane of the 3D segment, and to collect 2D slices of the 3D segments at the height of the robot's 2D sensor based on the identified local floor. At least one processor is further configured to assemble the collected 2D slices of the plurality of 3D segments to form a global 2D map.
[0013] According to at least one embodiment, a non-transient storage medium stores instructions that cause at least one processor to perform an operation during execution. The operation includes the following steps: processing a three-dimensional (3D) point cloud of an indoor environment; detecting one or more point clusters present in the processed 3D point cloud; removing one or more clusters from the processed 3D point cloud to create a global 3D point cloud based on the judgment that, in response to the detection, an object corresponding to one or more detected clusters is unnecessary; and dividing the global 3D point cloud into a plurality of 3D segments, wherein each segment corresponds to an individual spatial part of the indoor environment. The operation further includes, for each of the plurality of 3D segments, identifying a local floor as a reference plane of the 3D segment; and collecting a two-dimensional (2D) slice of the 3D segment at the height of a robot's 2D sensor based on the identified local floor. This operation further includes the step of assembling 2D slices of a plurality of collected 3D segments to form a global 2D map that guides the autonomous navigation of the robot.
[0014] The accompanying drawings, included to provide further understanding of the present invention, serve to illustrate embodiments of the present invention and, together with the detailed description, explain aspects of the present invention.
[0015] FIG. 1 is a block diagram of an artificial intelligence (AI) device according to at least one embodiment of the present invention.
[0016] FIG. 2 illustrates a block diagram of an AI server according to at least one embodiment of the present invention.
[0017] FIG. 3 illustrates an AI system according to at least one embodiment of the present invention.
[0018] FIG. 4 illustrates a perspective view of a robot according to at least one embodiment.
[0019] FIG. 5 is a block diagram of a control module of a robot according to at least one embodiment.
[0020] FIGS. 6a and 6b illustrate a flowchart for generating a global 2D map of an indoor environment based on 3D data according to at least one embodiment.
[0021] Figure 7 illustrates an environment diagram of a robot operating in an indoor environment.
[0022] FIGS. 8a and 8b illustrate a flowchart of a method for generating a global 2D map of an indoor environment based on 3D data according to at least one embodiment.
[0023] Hereinafter, specific embodiments of the present invention will be described in more detail with reference to the drawings.
[0024] When it is described that one element is "fixed" or "connected" to another element, this may mean that the two elements are directly fixed or connected, or that a third element exists between the two elements and the two elements are fixed or connected to each other by said third element. On the other hand, when it is described that one element is "directly fixed" or "directly connected" to another element, it may be understood that no third element exists between the two elements.
[0025] Self-driving refers to the technology of driving on its own, and an autonomous vehicle refers to a vehicle that drives without user intervention or with minimal intervention.
[0026] For example, autonomous driving may include technology for maintaining the lane while driving, technology for automatically adjusting speed such as adaptive cruise control, technology for automatically moving along a predetermined route, and technology for automatically setting a route and moving when a destination is set.
[0027] Vehicles may include vehicles having only an internal combustion engine, hybrid vehicles having both an internal combustion engine and an electric motor, and electric vehicles having only an electric motor, and may include not only automobiles but also trains, motorcycles, etc.
[0028] Autonomous driving vehicles can be viewed as robots with autonomous driving capabilities.
[0029] Artificial Intelligence (AI) refers to the field of research regarding artificial intelligence or the study of methodologies for creating AI, while machine learning refers to the field that defines the various problems addressed within the AI sector and studies methodologies for solving them. Machine learning is defined as an algorithm that improves the performance of a specific task through continuous experience with that task.
[0030] An artificial neural network (ANN) is a model used in machine learning that can refer to an entire model of problem-solving ability composed of artificial neurons (nodes) that form a network through synaptic connections. An artificial neural network can be defined by connection patterns between neurons in different layers, a learning process that updates model parameters, and an activation function to generate output values.
[0031] An ANN may include an input layer, an output layer, and optionally one or more hidden layers. Each layer includes one or more neurons, and the ANN may include synapses connecting the neurons. In an ANN, each neuron may output a function value of an activation function for an input signal, weights, and biases input through the synapses.
[0032] Model parameters refer to parameters determined through learning and include weights for the synaptic connections and biases of neurons. Hyperparameters refer to parameters set in a machine learning algorithm before learning and include the learning rate, number of iterations, mini-batch size, and initialization function.
[0033] The learning objective of an ANN may be to determine model parameters that minimize the loss function. The loss function can be used as an indicator to determine optimal model parameters during the training process of an artificial neural network.
[0034] Machine learning can be classified into supervised learning, unsupervised learning, and reinforcement learning depending on the learning method.
[0035] Supervised learning refers to a method of training an ANN with labels provided for the training data; the labels can represent the correct answer (or result) that the ANN must infer when the training data is input. Unsupervised learning refers to a method of training an ANN without labels provided for the training data. Reinforcement learning refers to a learning method in which an agent defined in a specific environment learns to select an action or sequence of actions that maximizes the cumulative reward in each state.
[0036] Machine learning implemented by a deep neural network (DNN) containing multiple hidden layers among ANNs is also called deep learning, and deep learning is a part of machine learning. Hereinafter, machine learning is used to mean deep learning.
[0037] FIG. 1 is a block diagram of an AI device (10) according to at least one embodiment of the present invention. As described below, the AI device (10) may be a robot (or may include a robot).
[0038] The AI device (10) may be stationary or mobile. For example, the AI device may include a TV, projector, mobile phone, smartphone, desktop computer, laptop, digital broadcast terminal, personal digital assistant (PDA), portable multimedia player (PMP), navigation device, tablet personal computer (PC), wearable device, set-top box (STB), DMB receiver, radio, washing machine, refrigerator, desktop computer, digital signage, robot, vehicle, etc.
[0039] The AI device (10) may include a communication interface (11), an input interface (12), a learning processor (13), a sensor (14), an output interface (15), a memory (17), and a processor (18).
[0040] The communication interface (11) can transmit and receive data to and from external devices such as other AI devices (10a, 10b, 10c, 10d, 10e) and an AI server (20) using wired / wireless communication technology (see, for example, FIG. 3). For example, the communication interface (11) can transmit and receive sensor information, user input, learning models, and control signals to and from external devices.
[0041] Communication technologies used in the communication interface (11) include Global System for Mobile communication (GSM), Code Division Multi Access (CDMA), Long Term Evolution (LTE), 5G, Wireless LAN (WLAN), Wi-Fi, Bluetooth, Radio Frequency Identification (RFID), Infrared Data Association (IrDA), ZigBee, and Near Field Communication (NFC).
[0042] The input interface (12) can acquire various types of data.
[0043] For example, the input interface (12) may include a camera for inputting a video signal, a microphone for receiving an audio signal, and a user input interface for receiving information from a user. The camera or microphone may be treated as a sensor, and the signal obtained from the camera or microphone may be referred to as sensing data or sensor information.
[0044] The input interface (12) can obtain training data for model training and input data to be used when obtaining output using the training model. The input interface (12) can obtain raw input data. In this case, the processor (18) or the training processor (13) can preprocess the input data to extract input features.
[0045] The learning processor (13) can train a model composed of an ANN using training data. The trained ANN may be referred to as a training model. The training model may be used to infer result values for new input data other than the training data, and the inferred values may be used as a basis for judgment to perform specific operations.
[0046] The learning processor (13) can perform AI processing together with the learning processor (24) of the AI server (20) (see, for example, FIG. 2).
[0047] The learning processor (13) may include memory integrated into or implemented in the AI device (10). Alternatively, the learning processor (13) may be implemented using memory (17), external memory directly connected to the AI device (10), or memory stored in an external device.
[0048] The sensor (14) can acquire at least one of internal information about the AI device (10), surrounding environment information about the AI device (10), or user information using various sensors.
[0049] Examples of sensors included in the sensor (14) include a proximity sensor, an illuminance sensor, an accelerometer, a magnetic sensor, a gyroscope, an inertial sensor, a red-green-blue (RGB) sensor, an infrared (IR) sensor, a fingerprint recognition sensor, an ultrasonic sensor, an optical sensor, a microphone, a lidar, and a radar.
[0050] The output interface (15) can generate output related to visual, auditory, or tactile senses.
[0051] The output interface (15) may include a display unit for outputting time information, a speaker for outputting auditory information, and a tactile module for outputting tactile information.
[0052] The memory (17) can store data that supports various functions of the AI device (10). For example, the memory (17) can store input data, training data, training models, training history, etc. obtained by the input interface (12).
[0053] The processor (18) can determine at least one executable action of the AI device (10) based on information determined or generated using a data analysis algorithm or a machine learning algorithm. The processor (18) can control components of the AI device (10) to execute the determined action.
[0054] The processor (18) can request, retrieve, receive, or utilize data from the learning processor (13) or memory (17). The processor (18) can control the components of the AI device (10) to execute a predicted action, or at least one action that is determined to be desirable.
[0055] If a connection to an external device is required to perform a determined operation, the processor (18) can generate a control signal to control the external device and transmit the generated control signal to the external device.
[0056] The processor (18) can obtain intention information regarding user input and can determine the user's requirements based on the obtained intention information.
[0057] The processor (18) collects history information including the operation details of the AI device (10) or user feedback regarding the operation, and can store the collected history information in memory (17) or a learning processor (13), or transmit the collected history information to an external device such as an AI server (20). The collected history information can be used to update a learning model.
[0058] The processor (18) can control at least some of the components of the AI device (10) to run an application program stored in memory (17). Additionally, the processor (18) can operate two or more of the components included in the AI device (10) in combination to run the application program.
[0059] FIG. 2 illustrates a block diagram of an AI server (20) according to at least one embodiment of the present invention. As illustrated in FIG. 2, the AI server (20) is connected to an AI device (10).
[0060] The AI server (20) may refer to a device that uses a machine learning algorithm to train an ANN or a trained artificial neural network. The AI server (20) may include multiple servers to perform distributed processing or may be defined as a 5G network. The AI server (20) may be included as part of the configuration of the AI device (10) and may perform at least part of the AI processing together.
[0061] The AI server (20) may include a communication interface (21), memory (23), a learning processor (24), a processor (26), etc.
[0062] The communication interface (21) can mutually transmit and receive data with an external device such as an AI device (10).
[0063] The memory (23) may include a model storage unit (23a). The model storage unit (23a) may store a model (or ANN (26b) that has been learned or trained through a learning processor (24).
[0064] The learning processor (24) can learn the ANN (26b) using the learning data. The learning model may be used while mounted on the AI server (20) or while mounted on an external device such as the AI device (10).
[0065] The learning model may be implemented in hardware, software, or a combination of hardware and software. If all or part of the learning model is implemented in software, one or more instructions constituting the learning model may be stored in memory (23).
[0066] The processor (26) can use a learning model to infer a result value for new input data and can generate a response or control command based on the inferred result value.
[0067] FIG. 3 illustrates an AI system (1) according to at least one embodiment of the present invention.
[0068] In the AI system (1), at least one of an AI server (20), a robot (10a), an autonomous vehicle (10b), an XR device (10c), a smartphone (10d), or a home appliance (10e) is connected to a cloud network (2). The robot (10a), the autonomous vehicle (10b), the XR device (10c), the smartphone (10d), or the home appliance (10e) to which AI technology is applied may each be referred to as an AI device (10a-10e).
[0069] The cloud network (2) may refer to a network that constitutes part of the cloud computing infrastructure or exists within the cloud computing infrastructure. The cloud network (2) may be configured using a 3G network, a 4G or LTE network, or a 5G network.
[0070] That is, the devices (10a-10e) and the server (20) constituting the AI system (1) can be connected to each other through a cloud network (2). In particular, each device (10a-10e) and the server (20) can communicate with each other through a base station, but they can also communicate directly with each other without using a base station.
[0071] The AI server (20) may include a server that performs AI processing and a server that performs operations on big data.
[0072] The AI server (20) can be connected to at least one AI device constituting the AI system (1), namely a robot (10a), an autonomous vehicle (10b), an XR device (10c), a smartphone (10d), or a home appliance (10e), via a cloud network (2), and can support at least a portion of the AI processing of the connected AI devices (10a-10e).
[0073] For example, the AI server (20) can learn the ANN according to a machine learning algorithm instead of the AI device (10a-10e), and can save the learned model directly or send it to the AI device (10a-10e).
[0074] The AI server (20) receives input data from the AI device (10a-10e), can infer a result value for the received input data using a learning model, can generate a response or control command based on the inferred result value, and can transmit the response or control command to the AI device (10a-10e).
[0075] Alternatively, the AI device (10a-10e) may infer a result value for input data by directly using a learning model and generate a response or control command based on the inference result.
[0076] Hereinafter, various embodiments of the AI device (10a-10e) to which the above-described technology is applied will be explained in more detail. The AI device (10a-10e) of FIG. 3 can be seen as a specific embodiment of the AI device (10) of FIG. 1.
[0077] A robot (10a) with AI technology applied can be implemented as a guide robot, transport robot, cleaning robot, wearable robot, entertainment robot, pet robot, unmanned flying robot, etc.
[0078] The robot (10a) may include a robot control module for controlling operation, and the robot control module may refer to a software module or a chip that implements the software module by hardware.
[0079] The robot (10a) can obtain state information of the robot (10a) using sensor information obtained from various types of sensors, detect (recognize) surrounding environment and objects, generate map data, determine path and movement plans, determine response to user interaction, or determine action.
[0080] The robot (10a) can determine a movement path and a movement plan using sensor information obtained from at least one sensor among a lidar, radar, and camera.
[0081] The robot (10a) can perform the above-described operation using a learning model composed of at least one ANN. For example, the robot (10a) can recognize the surrounding environment and objects using the learning model and determine an operation using the recognized surrounding information or object information. The learning model can be learned directly from the robot (10a) or from an external device such as an AI server (20).
[0082] The robot (10a) can perform operations by directly using a learning model to generate results, but sensor information can be transmitted to an external device such as an AI server (20), and the generated result value can be received to perform operations.
[0083] The robot (10a) can determine a movement path and a movement plan using at least one of map data, object information detected from sensor information, or object information obtained from an external device, and can control a driving unit so that the robot (10a) moves along the determined movement path and movement plan.
[0084] Additionally, the robot (10a) can perform actions or movements by controlling a drive unit based on the user's control / interaction. The robot (10a) can acquire intention information of the interaction resulting from the user's operation or speech utterance, and can perform actions by determining a response based on the acquired intention information.
[0085] A robot (10a) equipped with AI technology and automatic driving technology can be implemented as a guide robot, transport robot, cleaning robot, wearable robot, entertainment robot, pet robot, unmanned flying robot, etc.
[0086] A robot (10a) equipped with AI technology and automatic driving technology may refer to the robot itself having automatic driving capabilities or a robot (10a) interacting with an automatic driving vehicle (10b).
[0087] A robot (10a) having an automatic driving function can be collectively referred to as a device that moves along a given path without user control or determines a path and moves on its own.
[0088] The robot (10a) may include a guide robot that provides various information to users at airports, subways, bus terminals, etc.; a serving robot that can provide various items to guests at restaurants, hotels, etc.; a delivery robot that can transport items such as food, medicine, delivery items (hereinafter referred to as "items"); or an industrial robot that delivers a cart loaded with parts to a destination such as a factory.
[0089] According to various embodiments, a robot includes a device that moves to be used for a specific purpose (cleaning, security assurance, monitoring, guidance, etc.) or to provide functions according to the characteristics of the space in which the robot moves. Accordingly, a device equipped with a means of delivery capable of moving using specific information and sensors, and providing specific functions, is generally referred to as a robot.
[0090] A robot can move using a map stored in the robot. The map represents information about stationary objects in space, such as fixed walls and fixed stairs. Additionally, information about movable obstacles that are periodically placed—that is, information about dynamic objects—can be stored in the map.
[0091] As an example, information about obstacles placed within a certain range based on the direction in which the robot moves forward can also be stored in the map. In this case, unlike the map where the aforementioned fixed objects are stored, the map includes information about obstacles, which is temporarily registered and then removed after the robot moves.
[0092] In addition, the robot can detect external dynamic objects using various sensors. After detecting external dynamic objects, if the robot moves to a destination in an environment with many pedestrians, it can determine if the waypoint to the destination is occupied by an obstacle.
[0093] Furthermore, the robot can determine that it has arrived at a waypoint based on the degree of change in direction of the waypoint. Then, the robot moves to the next waypoint, and thus the robot can successfully travel to the destination.
[0094] FIG. 4 illustrates a perspective view of a robot (100) according to at least one embodiment. FIG. 4 illustrates an exemplary appearance. It is understood that, in addition to the appearance of FIG. 4, the robot may be implemented as a robot having various appearances. Specifically, each component may be positioned at different locations in the up-down and left-right directions based on the shape of the robot.
[0095] The main body (120) can be configured to be long in the vertical direction and can have the shape of a Roly Poly toy that gradually becomes thinner from the bottom to the top.
[0096] The main body (120) may include a case (30) that forms the exterior of the robot (100). The case (30) may include an upper cover (31) positioned at the top, a first intermediate cover (32) positioned at the bottom of the upper cover (31), a second intermediate cover (33) positioned at the bottom of the first intermediate cover (32), and a lower cover (34) positioned at the bottom of the second intermediate cover (33). The first intermediate cover (32) and the second intermediate cover (33) may form a single intermediate cover.
[0097] The upper cover (31) may be placed at the top of the robot (100) and may have the shape of a hemisphere or a dome. The upper cover (31) may be placed at a height lower than the average height of an adult so that instructions from the user can be easily received. Additionally, the upper cover (31) may be configured to rotate at a predetermined angle.
[0098] The robot (100) may additionally include a control module (150) inside it (e.g., see FIG. 5). The control module (150) controls the robot (100) as a kind of computer or a kind of processor. Thus, the control module (150) can be placed in the robot (100), perform functions similar to a main processor, and interact with a user.
[0099] A control module (150) is placed in the robot (100) to detect objects around the robot and control the robot during the robot's movement. The robot's control module (150) can be implemented as a software module, a chip in which the software module is implemented as hardware, etc.
[0100] A display unit (31a) that receives instructions from a user or outputs information, and sensors such as a camera (31b) and a microphone (31c), for example, can be placed on one side of the front of the upper cover (31).
[0101] In addition to the display unit (31a) of the upper cover (31), a display unit (22) may also be placed on one side of the middle cover (32).
[0102] Depending on the function of the robot, information may be output by both display units (31a, 22) or by either of the two display units (31a, 22).
[0103] Additionally, various obstacle sensors (e.g., the sensor (220) of FIG. 5) are positioned on one side or the entire bottom of the robot (100), such as 35a and 35b. As an example, obstacle sensors include time-of-flight (TOF) sensors, ultrasonic sensors, infrared sensors, depth sensors, laser sensors, LiDAR sensors, etc. The sensors detect obstacles outside the robot (100) in various ways.
[0104] Additionally, the robot (100) further includes a moving unit, which is a component that moves the robot at the bottom of the robot. The moving unit is a component that moves the robot, such as a wheel.
[0105] The shape of the robot in FIG. 4 is provided as an example. Embodiments of the present invention are not limited to the examples illustrated. Additionally, various cameras and sensors of the robot may be placed on various parts of the robot (100). As an example, the robot (100) may be a guide robot that provides information to a user and moves to a specific point to guide the user.
[0106] The robot (100) may include a robot that provides cleaning services, security services, or functions. The robot (100) can perform various functions.
[0107] With multiple robots (100) deployed in a service space, the robots can perform specific functions (guidance service, cleaning service, security service, etc.). In this process, the robots (100) can store information about their location, check their current location in the entire space, and generate a path necessary to move to a destination.
[0108] FIG. 5 is a block diagram of a control module (150) of a robot (100) according to at least one embodiment.
[0109] The robot (100) can perform both the function of generating a map and the function of estimating the robot's position using the map.
[0110] Alternatively, the robot (100) may only provide the function of generating a map.
[0111] Alternatively, the robot (100) may only provide the function of estimating the robot's position using a map. According to various embodiments, the robot (100) provides the function of estimating the robot's position using a map. Additionally, the robot (100) may provide the function of creating a map or modifying a map.
[0112] The LiDAR sensor (220) can detect surrounding objects in two or three dimensions. A two-dimensional LiDAR sensor can detect the location of an object within a 360-degree range relative to the robot (100). LiDAR information detected at a specific location can form a single LiDAR frame. That is, the LiDAR sensor (220) generates a LiDAR frame by detecting the distance between the robot and an object placed outside the robot (100).
[0113] As an example, the camera sensor (230) is a standard camera. To overcome the limitation of the field of view, two or more camera sensors (230) may be used. An image captured at a specific location constitutes vision information. That is, the camera sensor (230) captures an object outside the robot (100) to generate a vision frame containing vision information.
[0114] According to various embodiments, the robot (100) uses a LiDAR sensor (220) and a camera sensor (230) to perform fusion-simultaneous localization and mapping (fusion-SLAM).
[0115] In fused SLAM, LiDAR information and visual information can be combined and used. LiDAR information and visual information can be composed of maps.
[0116] Unlike robots using a single sensor (LiDAR-only SLAM, visual-only SLAM), robots using fusion SLAM can improve the accuracy of position estimation. In other words, by performing fusion SLAM by combining LiDAR information and visual information, map quality can be improved.
[0117] Map quality is a standard that applies to both visual maps composed of visual information fragments and LiDAR maps composed of LiDAR information fragments. In fused SLAM, the map quality of each visual map and LiDAR map is improved because information that each sensor has not fully acquired can be shared.
[0118] Additionally, LiDAR information or visual information can be extracted from a single map and used. For example, LiDAR information or visual information, or all LiDAR information and visual information, can be used to determine the location of the robot depending on the amount of memory the robot (100) possesses or the computational capability of the computational processor, etc.
[0119] The interface unit (290) receives information input by the user. The interface unit (290) receives various information input by the user, such as touch and voice, and outputs the input result. Additionally, the interface unit (290) can output a map stored by the robot (100) or output a path on which the robot (100) moves over the map.
[0120] Furthermore, the interface unit (290) can provide the user with predetermined information.
[0121] The controller (250) generates a map and, based on this map, estimates the position of the robot (100) during the process of the robot moving.
[0122] The communication unit (280) enables the robot (100) to communicate with another robot or an external server and to receive and transmit information.
[0123] The robot (100) can generate each map using each sensor (LiDAR sensor and camera sensor), or generate a single map using each sensor, and then generate another map by extracting detailed information corresponding to a specific sensor from the single map.
[0124] Additionally, the map may include odometry information based on the rotation of the wheels. The odometry information is information about the distance traveled by the robot (100) and is calculated using the rotation frequency of the robot wheel or the difference in rotation frequency between the two wheels of the robot. The robot (100) can calculate the distance traveled by the robot based not only on the odometry information but also on information generated using sensors.
[0125] The controller (250) may additionally include an artificial intelligence unit (255) for artificial intelligence tasks and processing.
[0126] Multiple LiDAR sensors (220) and camera sensors (230) are placed outside the robot (100) to identify external objects.
[0127] In addition to the LiDAR sensor (220) and camera sensor (230), various types of sensors (LiDAR sensor, infrared sensor, ultrasonic sensor, depth sensor, image sensor, microphone, etc.) are placed outside the robot (100). The controller (250) collects and processes information detected by the sensors.
[0128] The artificial intelligence unit (255) can input information processed by the LiDAR sensor (220), camera sensor (230), and other sensors, or information accumulated and stored while the robot (100) moves, and the controller (250) can output results necessary to determine the external situation, process the information, and generate a movement path.
[0129] As an example, the robot (100) may store information about the locations of various objects placed in the space where the robot moves as a map. The objects may include fixed objects such as walls and doors, and movable objects such as flowerpots and desks. The artificial intelligence unit (255) may output data about the path the robot (100) moves along and the range of work handled by the robot, using the map information and information provided from the LiDAR sensor (220), camera sensor (230), and other sensors.
[0130] Additionally, the artificial intelligence unit (255) can recognize objects placed around the robot (100) by using information provided from the LiDAR sensor (220), camera sensor (230), and other sensors. The artificial intelligence unit (255) can receive an image and output meta information to the image. The meta information may include information such as the name of the object in the image, the distance between the object and the robot, the type of the object, and whether the object is placed on a map.
[0131] Information provided by the LiDAR sensor (220), camera sensor (230), and other sensors is input into the input node of the deep learning network of the artificial intelligence unit (255), and then a result is output from the output node of the artificial intelligence unit (255) through information processing of the hidden layer of the deep learning network of the artificial intelligence unit (255).
[0132] The controller (250) can calculate the robot's movement path using a date calculated by the artificial intelligence unit (255) or data processed by various sensors.
[0133] As previously mentioned, robots equipped with 2D LiDAR (excluding 3D LiDAR) rely on accurate 2D maps. These 2D maps can be generated based on indoor 3D point clouds. In some approaches, random sample consensus (RANSAC) is used to fit planar candidates within the 3D point cloud. However, the results are susceptible to planar artifacts, and essential 2D functions required for navigation may be lost. Furthermore, this approach requires input of a clear and noise-free 3D point cloud.
[0134] Some approaches have introduced deep learning techniques to generate 2D floor plans. These approaches utilize specialized end-to-end networks that convert point cloud data into 2D floor plans. Additionally, some studies have proposed a growth-based approach to construct global building layouts from noisy stereo camera point clouds.
[0135] However, the above approach and proposal are not about generating a 2D map for robot navigation based on a 3D point cloud representing an environment affected by camera noise. Furthermore, the above approach and proposal are not about generating a 2D map while taking into account the presence of unnecessary objects in the 3D point cloud.
[0136] An aspect of the present invention relates to a method and apparatus for generating a 2D navigation map for robot use from a noisy indoor point cloud. According to one or more aspects, a deep learning-based classification technique is used to perform both 3D and 2D object detection. Since unnecessary objects are autonomously identified and filtered from the navigation map, a clearer and more reliable 2D map suitable for robot navigation is generated.
[0137] As described with reference to various embodiments, noise in 3D sensor data is effectively filtered while preserving important geometric features such as corners and fine details. This feature is particularly useful when processing noisy point clouds generated by stereo cameras. Additionally, deep learning-based classification is used to accurately detect and remove unwanted objects from the point cloud, such as pedestrians, chairs, and moving carts. Furthermore, by slicing the 3D point cloud at a height corresponding to the robot's sensor plane, the resulting 2D map matches the 2D scan data better, thereby improving navigation accuracy for a specific class of ground-based robots.
[0138] FIGS. 6a and 6b illustrate a flowchart for generating a global 2D map of an indoor environment based on 3D data according to at least one embodiment. The processing of the 3D point cloud will first be explained with reference to FIG. 6a.
[0139] In block 602, the indoor point cloud is downsampled to simplify the data and reduce computational complexity. Downsampling may involve removing duplicate points from the dataset to reduce the size of the dataset. As an example, downsampling may include voxel downsampling, where points in a 3D grid (or voxels) are replaced with a smaller number of points (e.g., a single point).
[0140] As shown in Fig. 6a, downsampling generates a sparse 3D point cloud.
[0141] In Block 604, a sparse 3D point cloud is input into a filter configured to smooth the representation of flat surfaces in an indoor environment or sharpen the representation of corners (or edges) in the indoor environment. Examples of such flat surfaces may include furniture surfaces (e.g., the face of a table), floors, and walls. For example, the top surface of a table in an indoor environment is flat, but due to camera noise and other factors, the representation of this surface in the point cloud may not be similarly flat. The filtering in Block 604 serves to smooth the representation of the surface to improve the accuracy of the representation.
[0142] Examples of corners may include corners where two walls meet. Although the two walls meet to form a perfect right angle between them, the representation of the angle in the point cloud may not be similarly perfect due to camera noise and other factors. Filtering in Block 604 serves to sharpen the representation of the corner to improve the accuracy of the representation.
[0143] Similarly, an example of an edge could be an edge where two walls meet. Although the two walls meet to form a sharp edge between them, the representation of the edge in the point cloud may similarly be imperfect due to camera noise and other factors. Filtering in Block 604 serves to sharpen the representation of the edge to improve the accuracy of the representation.
[0144] Therefore, the filter in block 604 not only removes sensor noise but also improves the geometric quality of the resulting map.
[0145] As exemplified in Fig. 6a, filtering generates a pre-filtered point cloud.
[0146] In block 606, the pre-filtered point cloud is input to a detector (e.g., a point clustering algorithm) that detects one or more point clusters present in the point cloud. The detector can detect spatially related groups of points as point clusters. The detected clusters are input to a CNN-based classifier (608) (e.g., a CNN 3D point cluster classification filter).
[0147] The CNN-based classifier (608) utilizes a deep learning model to classify each cluster. The deep learning model is trained to classify clusters as belonging to a specific object (or a specific object class). Training may include unsupervised learning (in relation to classification based on geometric criteria) and supervised learning in relation to point clusters corresponding to unnecessary objects.
[0148] Based on the output of the model, the CNN-based classifier (608) classifies each cluster as belonging to a specific object. The CNN-based classifier (608) may also identify one or more clusters as corresponding to unnecessary objects. For example, for a given use case, unnecessary objects (or object classes) may include specific items of dynamic (or non-static) objects such as furniture and / or humans.
[0149] As explained in more detail below, through this classification and identification, a clearer 3D map capable of object filtering is obtained.
[0150] In block 610, information is provided about clusters identified as corresponding to unnecessary objects. Based on this information, point clusters corresponding to unnecessary objects are removed from the pre-filtered point cloud. This removal generates an object-filtered 3D map.
[0151] According to various embodiments, it is preferable to remove point clusters corresponding to specific objects during 3D point cloud processing rather than removing them during 2D point cloud processing. For such objects, the classification of point clusters may be more accurate when based on 3D data. For example, if the object is a person, point clusters can be classified more easily using 3D data. Since 2D data inherently contains less information, the accuracy of such classification may be lower.
[0152] In block 612, global floor detection is performed based on an object-filtered 3D map. The detection identifies a global floor (or global floor plane) within the 3D map. Then, the map is rotated to standardize the orientation so that the identified floor plane is aligned with a completely horizontal reference plane (e.g., the xy plane).
[0153] In at least some situations, 3D maps can capture indoor environments with multiple floor surfaces that are not necessarily flat with one another. For example, an indoor shopping mall may have floor surfaces of various areas (or sections or rooms) that are not flat with one another. In such situations, identifying the global floor standardizes the orientation.
[0154] In block 614, the longest wall depicted in the 3D map is identified. Therefore, the horizontally rotated 3D map of block 612 is rotated (e.g., around the vertical axis) to ensure a consistent and horizontally aligned map orientation.
[0155] After horizontal leveling of Block 612, the longest wall may be depicted in a non-optimal manner relative to the horizontal plane (e.g., the xy plane). When depicted in this way, the 3D map may appear less attractive to the human eye. Therefore, in Block 614, the map is rotated around the vertical axis (e.g., the z-axis). To improve readability, the map is rotated so that the longest wall is visually aligned with the horizontal plane.
[0156] With reference to Fig. 6a, the generation of a 3D robot navigation map was described. The 3D robot navigation map corresponds to a global 3D point cloud captured to represent an indoor environment.
[0157] The process of generating a 2D map by processing a 3D robot navigation map is explained in more detail with reference to Fig. 6b. As described, the 2D map is configured for use by specific robot(s).
[0158] In Block 616, the 3D robot navigation map is divided into smaller segments. For example, each of these segments can correspond to a different room or a distinct area. As another example, each segment does not necessarily have to correspond to a distinct room, but to a specific dimension (e.g., 5 m relative to the xy plane). 2It can correspond to an area having the area of.
[0159] Processing is performed for each segment. This processing is described with reference to blocks 616 and 618.
[0160] Referring again to block 616, local floor detection is performed to establish a reference plane for the segment. Floor detection may be similar to the global floor detection described earlier with reference to block 612 in FIG. 6a. Here, it is recognized that the height of a segment may differ from the height of other segments. Therefore, a floor unique to the segment is detected.
[0161] In block 618, the segment is scanned at one or more heights relative to the detected floor. Each height corresponds to a robot sensor (e.g., a 2D LiDAR sensor) on which a 2D map is configured.
[0162] For example, FIG. 7 illustrates an environment diagram of a robot (702) operating in an indoor environment. The robot (702) has two sensors. The first sensor ("Sensor 1") of the robot (702) is located at a height (h1) relative to the floor (704). The second sensor ("Sensor 2") of the robot (702) is located at a height (h2) relative to the floor (704). FIG. 7 illustrates, for example, that the robot (702) has two sensors, but it is understood that the robot may have only one sensor (e.g., "Sensor 1" or "Sensor 2") rather than two sensors.
[0163] Referring to FIG. 6b, a first localized 2D map segment is generated by scanning the segment at height (h1) in block 618. The first localized 2D map segment can be considered as a "slice" of the 3D map segment separated from the 3D map segment at height (h1). Essentially, the first localized 2D map segment captures only the objects to be observed by the first sensor ("Sensor 1"). For example, the first localized 2D map segment will capture furniture (706). Thus, scanning at the actual height of the first sensor causes the first localized 2D map segment to match the data captured by the first sensor, thereby improving navigation accuracy.
[0164] Similarly, the segment is scanned at height (h2) to generate a second localized 2D map segment. The second localized 2D map segment captures only the objects to be observed by the second sensor ("Sensor 2"). For example, unlike the first localized 2D map segment, the second localized 2D map segment does not capture the furniture (706) because the height (h2) is higher than the height of the furniture (706).
[0165] In this regard, it should be noted that if scanning at different sensor heights (e.g., h1, h2) is performed due to the presence of furniture such as furniture (706), different room widths may occur.
[0166] In block 620, individual 2D map segments are assembled to form a complete global 2D map. For example, in the case of the robot (702) of FIG. 7, a pair of localized 2D map segments can be generated for each small segment of the 3D robot navigation map (see block 616). In this situation, the pair of localized 2D map segments are merged to create a global 2D map.
[0167] To further refine the map, 2D contour detection is performed. The global 2D map is input to a detector (e.g., a contour detection algorithm) that detects one or more contours present in the global 2D map. The detector can detect spatially related groups of points as contours. The detected clusters are input to a CNN-based classifier (e.g., a CNN contour classification filter) (622).
[0168] The operation of the CNN contour classification filter (622) is similar to the operation of the CNN 3D point cluster classification filter (608) described earlier with reference to FIG. 6a. For brevity, similarities are not described in detail below. However, optional differences are described.
[0169] Instead of 3D point clusters, the CNN contour classification filter (622) operates on 2D contours. Therefore, the speed at which the CNN contour classification filter (622) classifies contours and removes specific contours corresponding to unnecessary objects is significantly faster than that of the CNN 3D point cluster classification filter (608). However, as previously mentioned with reference to FIG. 6a, since 2D data inherently contains less information, the accuracy of contour classification may be lower than that of 3D point cluster classification.
[0170] Contours identified as corresponding to unnecessary objects may correspond to the "remaining" 2D projection after the corresponding 3D point cluster has previously been removed (see block 610 in Fig. 6a).
[0171] In block 624, information about classified contours and contours corresponding to unnecessary objects is provided. Based on this information, contours corresponding to unnecessary objects are removed from the global 2D map. Examples of unnecessary contours include contours corresponding to furniture or temporary obstacles. This removal results in the creation of an object-filtered 2D map (or contour-filtered 2D map).
[0172] In block 626, the object-filtered 2D map is input into a hole-filling and noise-removing filter to further enhance the completeness and clarity of the map. Here, the filled holes may be relatively small holes considered as noise. This noise may have occurred due to the prior removal of 3D point clusters. These holes can be filled using an interpolation algorithm, for example, by generating new points by analyzing the local geometric shape of surrounding points.
[0173] Therefore, a clear and accurate 2D navigation map suitable for robot operation is produced. The 2D robot navigation map represents a global 3D point cloud of the indoor environment.
[0174] FIGS. 8A and FIGS. 8B illustrate a flowchart of a neural network training method (800) for automatically creating an indoor environment according to at least one embodiment.
[0175] In block 802, a 3D point cloud of the indoor environment is processed (see, for example, block 604 in FIG. 6a). The 3D point cloud may have been generated by at least a 3D LiDAR, one or more RGB-D cameras, one or more Time-of-Flight (TOF) cameras, or one or more stereo cameras.
[0176] According to another embodiment, processing a 3D point cloud enhances one or more geometric features of an indoor environment. Processing a 3D point cloud can enhance one or more geometric features by making the walls or floors of the indoor environment smoother on the surface or by sharpening the corners or edges of the indoor environment space.
[0177] In block 804, one or more point clusters present in the processed 3D point cloud are detected.
[0178] For example, as previously described with reference to block 606 in FIG. 6a, a pre-filtered point cloud is input to a detector (e.g., a point clustering algorithm) that detects one or more point clusters present in the point cloud. The detector can detect spatially related groups of points as point clusters.
[0179] In block 806, in response to detection, one or more clusters are removed from the processed 3D point cloud based on the determination that the object corresponding to one or more detected clusters is unnecessary, thereby creating a global 3D point cloud.
[0180] For example, as previously described with reference to block 610 in FIG. 6a, information regarding clusters identified as corresponding to unnecessary objects is provided. Based on this information, point clusters corresponding to unnecessary objects are removed from the pre-filtered point cloud.
[0181] In block 808, the global floor can be identified as the reference plane of the global 3D point cloud.
[0182] In block 810, the global 3D point cloud can be positionally aligned based on the identified global floor.
[0183] For example, as previously described with reference to block 612 in FIG. 6a, global floor detection is performed based on an object-filtered 3D map. The detection identifies a global floor (or global floor plane) within the 3D map. Then, the map is rotated to standardize the orientation so that the identified floor plane is aligned with a completely horizontal reference plane (e.g., the xy plane).
[0184] In block 812, the longest wall of the global 3D point cloud can be identified.
[0185] In block 814, the global 3D point cloud can be rotated based on the longest identified wall.
[0186] For example, as previously described with reference to block 614 in Fig. 6a, the longest wall depicted in the 3D map is identified. Thus, the horizontally rotated 3D map of block 612 is rotated (e.g., around a vertical axis such as the z-axis) to ensure a consistent and horizontally aligned map orientation.
[0187] In block 816, the global 3D point cloud is divided into multiple 3D segments. Each segment corresponds to an individual spatial part of the indoor environment.
[0188] In block 820, for each of the multiple 3D segments, a local floor is identified as the reference plane of the 3D segment.
[0189] For example, as previously described with reference to block 616 in FIG. 6b, the 3D robot navigation map is divided into smaller segments. For example, each of these segments may correspond to a different room or a distinct area. As another example, each segment does not necessarily correspond to a distinct room, but rather to a specific dimension (e.g., 5 m in the xy plane). 2 It can correspond to an area having the area of.
[0190] As previously described with reference to block 616 in FIG. 6b, local floor detection is performed to establish a reference plane for the segment. The segment may have a height different from that of other segments. Therefore, a floor specific to the segment is detected.
[0191] In block 822, 2D slices of 3D segments are collected at the height of the robot's 2D sensor based on the identified local floor.
[0192] For example, as previously described with reference to block 618 in FIG. 6b, the segment is scanned at one or more heights relative to the detected floor. Each height corresponds to a sensor of the robot (e.g., a 2D LiDAR sensor) on which a 2D map is configured. For example, a 2D slice of the 3D segment is collected at the height of "Sensor 1" of the robot (702) in FIG. 7.
[0193] According to another embodiment, the robot additionally includes a second 2D sensor.
[0194] In block 824, a 2D slice of a 3D segment can be collected at the height of the robot's second 2D sensor based on the identified local floor.
[0195] For example, as previously described with reference to block 618 in FIG. 6b, a 2D slice of a 3D segment is collected at the height of "sensor 2" of the robot (702) in FIG. 7.
[0196] In block 826, collected 2D slices of multiple 3D segments are assembled to form a global 2D map.
[0197] For example, a 2D slice of a plurality of 3D segments collected at the height of a 2D sensor and a 2D slice of a plurality of 3D segments collected at the height of a second 2D sensor can be assembled to form a global 2D map.
[0198] For example, as previously described with reference to block 620 in FIG. 6b, individual 2D map segments are assembled to form a complete global 2D map. For example, in the case of the robot (702) in FIG. 7, a pair of localized 2D map segments can be generated for each small segment of the 3D robot navigation map (see block 616 in FIG. 6b). In this situation, the pair of localized 2D map segments are merged to create a global 2D map.
[0199] In block 828, one or more contours in the global 2D map can be detected.
[0200] For example, as previously described with reference to block 620 in FIG. 6b, a global 2D map is input to a detector (e.g., a contour detection algorithm) that detects one or more contours present in the global 2D map. The detector can detect spatially related point groups as contours. The detected clusters are input to a CNN-based classifier (622) (e.g., a CNN contour classification filter).
[0201] In block 830, one or more detected contours may be removed from the global 2D map based on the judgment that objects in the indoor environment corresponding to one or more detected contours are unnecessary.
[0202] For example, as previously described with reference to block 624 in FIG. 6b, information regarding classified contours and contours corresponding to unnecessary objects is provided. Based on this information, contours corresponding to unnecessary objects are removed from the global 2D map. Examples of unnecessary contours include the contours of furniture or contours corresponding to temporary obstacles.
[0203] In block 832, hole filling or noise removal filtering may be performed to improve the completeness of the global 2D map.
[0204] For example, as previously described with reference to block 626 in FIG. 6b, the object-filtered 2D map is input into a hole-filling and noise removal filter to further improve the completeness and clarity of the map. Here, the holes to be filled may be relatively small holes considered as noise. This noise may have resulted from the previous removal of 3D point clusters. These holes can be filled, for example, using an interpolation algorithm that generates new points by analyzing the local geometric shape of surrounding points.
[0205] The embodiments and features described herein with reference to various embodiments relate to the generation of indoor environment maps to support autonomous robot navigation. These embodiments and features can improve the quality, reliability, and efficiency of map generation. The resulting generated map can be used to guide robot navigation in a variety of applications, including, but not limited to, autonomous vacuuming, food and service delivery, tourism support, and automated roaming operations.
[0206] The embodiments described above are combinations of the components and features of the present invention in specific forms. Each component or feature should be considered optional unless otherwise explicitly stated. Each component or feature may be implemented without being combined with other components or features. Additionally, some components and / or features may be combined to implement embodiments of the present invention. The order of operations described in the embodiments of this disclosure may be rearranged. Some components or features of one embodiment may be included in another embodiment, or such components or features may be replaced by related components or features of another embodiment. It is evident that claims not explicitly cited in the appended claims may be combined to form embodiments or incorporated as new claims through post-filing amendments.
[0207] It is obvious to those skilled in the art that the present invention may be embodied in various specific forms within the scope of the features of this disclosure. Accordingly, the above detailed description should not be interpreted restrictively in all respects but should be considered exemplary. The scope of the present invention shall be determined by a reasonable interpretation of the appended claims, and all modifications within the equivalent scope of the present invention are included within the scope of the present invention.
Claims
1. A computer implementation method for generating a global two-dimensional (2D) map of an indoor environment based on three-dimensional (3D) data, wherein the 2D map for guiding the autonomous navigation of a robot includes a 2D sensor, and the computer implementation method comprises: A step of processing the 3D point cloud of the indoor environment above; A step of detecting one or more point clusters existing in the processed 3D point cloud; A step of removing one or more clusters from a processed 3D point cloud to generate a global 3D point cloud, based on the judgment that, in response to detection, an object corresponding to one or more detected clusters is unnecessary; A step of dividing the above global 3D point cloud into a plurality of 3D segments, each segment corresponding to an individual spatial part of an indoor environment; For each of the plurality of 3D segments, a local floor is identified as a reference plane of the 3D segment, and 2D slices of the 3D segment are collected at the height of the robot's 2D sensor based on the identified local floor; and Step of assembling 2D slices of multiple collected 3D segments to form a global 2D map A computer-implemented method including 2. In Paragraph 1, A computer-implemented method characterized in that the above 3D point cloud is generated by at least 3D LiDAR, one or more RGB-D cameras, one or more Time-of-Flight (TOF) cameras, or one or more stereo cameras.
3. In Paragraph 1, A computer-implemented method characterized by the step of processing the 3D point cloud enhancing one or more geometric features of the indoor environment.
4. In Paragraph 3, A computer-implemented method characterized in that the step of processing the 3D point cloud can enhance one or more geometric features by smoothing the surface of a wall or floor of the indoor environment or by sharpening the corners or edges of a space in the indoor environment.
5. In Paragraph 1, A step of identifying a global floor as a reference plane of the above global 3D point cloud; and A computer-implemented method further comprising the step of aligning the global 3D point cloud based on the identified global floor.
6. In Paragraph 1, A step of identifying the longest wall of the above global 3D point cloud; and A computer-implemented method further comprising the step of rotating the global 3D point cloud based on the identified longest wall.
7. In Paragraph 1, The above robot additionally includes a second 2D sensor, and The above computer implementation method is, For each of the plurality of 3D segments, the method further includes the step of collecting a 2D slice of the 3D segment at the height of the second 2D sensor of the robot based on the identified local floor; A computer-implemented method characterized by assembling a plurality of 2D slices of 3D segments collected at a 2D sensor height and a plurality of 2D slices of 3D segments collected at a second 2D sensor height to form the global 2D map.
8. In Paragraph 1, A step of detecting one or more contours existing in the global 2D map; and In response to detecting one or more contours, a step of removing one or more detected contours from the global 2D map based on the judgment that an object in the indoor environment corresponding to one or more detected contours is unnecessary. A computer implementation method that additionally includes 9. In Paragraph 1, A computer-implemented method further comprising the step of performing hole filling or noise removal filtering to improve the completeness of a global 2D map.
10. In Paragraph 1, A computer-implemented method characterized in that the above global 2D map is configured to guide the autonomous navigation of the robot to perform at least vacuum cleaning, food or service delivery, travel guidance, or autonomous roaming.
11. An artificial intelligence (AI) device configured to generate a global two-dimensional (2D) map of an indoor environment based on three-dimensional (3D) data, wherein the 2D map for guiding the autonomous navigation of a robot includes a 2D sensor, and The artificial intelligence (AI) device comprises at least one transceiver; and at least one processor; The above-mentioned at least one processor is, Process the 3D point cloud of the above indoor environment; Detect one or more point clusters existing in a processed 3D point cloud; In response to detection, based on the judgment that objects corresponding to one or more detected clusters are unnecessary, one or more clusters are removed from the processed 3D point cloud to generate a global 3D point cloud; The above global 3D point cloud is divided into a plurality of 3D segments, each corresponding to an individual spatial part of the indoor environment; For each of the above plurality of 3D segments, Identifying a local floor as a reference plane for a 3D segment; collecting a 2D slice of the 3D segment at the height of the robot's 2D sensor based on the identified local floor; and An artificial intelligence (AI) device configured to form the global 2D map by assembling 2D slices of a plurality of collected 3D segments.
12. In Paragraph 11, An artificial intelligence (AI) device characterized in that the above 3D point cloud is generated by at least 3D LiDAR, one or more RGB-D cameras, one or more Time-of-Flight (TOF) cameras, or one or more stereo cameras.
13. In Paragraph 11, An artificial intelligence (AI) device characterized by the step of processing the above 3D point cloud enhancing one or more geometric features of the indoor environment.
14. In Paragraph 13, An artificial intelligence (AI) device characterized by the step of processing the above 3D point cloud to smooth the surface of a wall or floor of the indoor environment, or to sharpen the corners or edges of a space of the indoor environment, thereby enhancing one or more geometric features.
15. In Paragraph 11, At least one processor identifies a global floor as a reference plane of the global 3D point cloud; and An artificial intelligence (AI) device characterized by being additionally configured to align the global 3D point cloud based on an identified global floor.
16. In Paragraph 11, At least one processor identifies the longest wall of the global 3D point cloud; and An artificial intelligence (AI) device characterized by being additionally configured to rotate the global 3D point cloud based on the longest identified wall.
17. In Paragraph 11, The above robot additionally includes a second 2D sensor, and The above at least one processor, for each of the plurality of 3D segments, It is further configured to collect a 2D slice of the 3D segment at the height of the second 2D sensor of the robot based on the identified local layer; An artificial intelligence (AI) device characterized in that a 2D slice of a plurality of 3D segments collected at the height of the 2D sensor and a 2D slice of a plurality of 3D segments collected at the height of the second 2D sensor are assembled to form the global 2D map.
18. In Paragraph 11, The above-mentioned at least one processor is, Detecting one or more contours existing in the above global 2D map; An artificial intelligence (AI) device characterized by being further configured to remove one or more detected contours from a global 2D map in response to the detection of one or more contours, based on the judgment that an object in an indoor environment corresponding to one or more detected contours is unnecessary.
19. In Paragraph 11, An artificial intelligence (AI) device characterized in that at least one processor is further configured to perform hole filling and noise removal filtering to improve the completeness of the global 2D map.
20. A non-transient storage medium that stores instructions for at least one processor to perform an operation when executed, wherein the operation is, Step of processing a three-dimensional (3D) point cloud of an indoor environment; A step of detecting one or more point clusters existing in a processed 3D point cloud; A step of removing one or more clusters from a processed 3D point cloud to generate a global 3D point cloud, based on the judgment that, in response to detection, an object corresponding to one or more detected clusters is unnecessary; A step of dividing the global 3D point cloud into a plurality of 3D segments, wherein each of the segments corresponds to an individual spatial part of the indoor environment; For each of the multiple 3D segments, Identifying a local floor as a reference plane for a 3D segment; and collecting a two-dimensional (2D) slice of the 3D segment at the height of the robot's 2D sensor based on the identified local floor; and A step of assembling 2D slices of a plurality of collected 3D segments to form a global 2D map that guides the autonomous navigation of the above robot. A non-transient storage medium comprising