Map updating method and device, mobile robot, and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By acquiring the original and depth images of the mobile robot, the location and 3D bounding box of the semantic target are determined using the target detection model, and the target map is updated. This solves the problem of insufficient map accuracy in SLAM algorithm and achieves higher-precision navigation tasks.

WO2026138764A1PCT designated stage Publication Date: 2026-07-02HANGZHOU EZVIZ SOFTWARE CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: HANGZHOU EZVIZ SOFTWARE CO LTD
Filing Date: 2025-12-23
Publication Date: 2026-07-02

Application Information

Patent Timeline

23 Dec 2025

Application

02 Jul 2026

Publication

WO2026138764A1

IPC: G01C21/00

AI Tagging

Technology Topics

Pattern recognition Image pair

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A method for identifying a light-reflecting marker cluster for a robot swarm
CN122313571APattern recognitionMotion capture
A method, device and medium for intelligent identification of wire harness defects based on deep learning
CN122244512ACharacter and pattern recognition Inference methods Pattern recognition Algorithm
Methods for generating and modifying synthetic on-person screening images
US12664764B2Image enhancement Image analysis Pattern recognition Computer vision
Weed detection method and device based on hypergraph enhanced YOLOv11 framework
CN122265829AMake up for the limitationsImplement long-range dependency captureCharacter and pattern recognition Biological models Pattern recognitionWeed detection
Visual odometry using a multi-scale feature aggregation network
CN122312772APattern recognition Engineering

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN2025144587_02072026_PF_FP_ABST

Patent Text Reader

Abstract

The present application provides a map updating method and device, a mobile robot, and a storage medium. The method comprises: acquiring an original image and a depth image corresponding to the original image; on the basis of the original image and the depth image, determining a corresponding current pose of a mobile robot at a current moment; inputting the original image into a target detection model to obtain an initial position of a semantic target under the current pose; on the basis of a conversion relationship between the current pose and an initial pose having been stored, converting the initial position into a target position of the semantic target under the initial pose, wherein the initial pose is the pose of the mobile robot at an initial moment, and the initial moment is a moment when a target map starts to be updated; determining a target three-dimensional box of the semantic target on the basis of the target position; and updating the target map on the basis of the target three-dimensional box of the semantic target.

Need to check novelty before this filing date? Find Prior Art

Description

Map update methods, devices, mobile robots, and storage media Technical Field

[0001] This application relates to, but is not limited to, the field of robotics, and particularly to a map updating method, apparatus, mobile robot, and storage medium. Background Technology

[0002] In recent years, various types of mobile robots have developed rapidly in terms of technology and market. Mobile robots are machines that automatically perform tasks, relying on their own power and control capabilities to achieve various functions. Mobile robots can be commanded by humans, run pre-programmed routines, and act according to strategies formulated using artificial intelligence. For example, a user can use a manual remote control to control a mobile robot to perform related operations. The remote control can wirelessly send operation commands to the mobile robot, which, upon receiving the command, will execute the specified action to complete the relevant function.

[0003] With the rapid development of mobile robot technology, mobile robots are being used more and more widely in fields such as home use, logistics, warehousing, and factory production. For example, robotic vacuum cleaners can perform cleaning functions, and automated guided vehicles (AGVs) can transport various types of goods.

[0004] To enable mobile robots to perform their tasks, maps need to be built. Simultaneous Localization and Mapping (SLAM), as a mapping method, is widely used in various types of mobile robots. The principle of SLAM mapping is as follows: the mobile robot starts moving from an unknown location in an unknown environment, performs self-localization based on its location and the map during the movement, and builds an incremental map based on its self-localization, thus achieving autonomous localization and navigation for the mobile robot. Summary of the Invention

[0005] This application provides a map updating method applied to a mobile robot. The method includes: acquiring an original image and a depth image corresponding to the original image, the original image including information for obtaining semantic targets; determining the current pose of the mobile robot based on the original image and the depth image, the current pose being the pose of the mobile robot at the current moment; inputting the original image to a target detection model to obtain the initial position of the semantic target in the current pose of the mobile robot; converting the initial position into the target position of the semantic target in the initial pose of the mobile robot based on the conversion relationship between the current pose of the mobile robot and the stored initial pose of the mobile robot; wherein, the initial pose is the pose of the mobile robot at an initial moment, the initial moment being the moment when the target map update begins; determining the target 3D bounding box of the semantic target based on the target position of the semantic target; and updating the target map based on the target 3D bounding box of the semantic target.

[0006] This application provides a map updating device applied to a mobile robot. The device includes: an acquisition module, configured to acquire an original image and a depth image corresponding to the original image, the original image including information for obtaining semantic targets; determining the current pose of the mobile robot based on the original image and the depth image, the current pose being the pose of the mobile robot at the current moment; a processing module, configured to input the original image to a target detection model to obtain the initial position of the semantic target under the current pose of the mobile robot; converting the initial position into the target position of the semantic target under the initial pose of the mobile robot based on the conversion relationship between the current pose of the mobile robot and the stored initial pose of the mobile robot; wherein, the initial pose is the pose of the mobile robot at the initial moment, the initial moment being the moment when the target map update begins; a determination module, configured to determine the target 3D bounding box of the semantic target based on the target position of the semantic target; and a map management module, configured to update the target map based on the target 3D bounding box of the semantic target.

[0007] This application provides a mobile robot, including: a processor and a machine-readable storage medium, wherein the machine-readable storage medium stores machine-executable instructions that can be executed by the processor; the processor is used to execute the machine-executable instructions to implement the map update method of the above embodiment.

[0008] This application provides a machine-readable storage medium storing machine-executable instructions that can be executed by a processor; wherein the processor is used to execute the machine-executable instructions to implement the map update method of the above embodiments. Attached Figure Description

[0009] Figure 1 is a flowchart illustrating a map update method according to one embodiment of this application.

[0010] Figure 2 is a schematic diagram of the structure of a mobile robot according to one embodiment of this application.

[0011] Figure 3 is a flowchart illustrating a map update method according to one embodiment of this application.

[0012] Figure 4 is a schematic diagram of the coverage model in one embodiment of this application.

[0013] Figure 5 is a flowchart illustrating a map update method according to one embodiment of this application.

[0014] Figure 6 is a schematic diagram of the map updating device in one embodiment of this application.

[0015] Figure 7 is a hardware structure diagram of an electronic device according to one embodiment of this application. Detailed Implementation

[0016] SLAM algorithms can construct either dense or sparse maps. Sparse maps, when used for navigation tasks, suffer from insufficient information; dense maps, when used for navigation tasks, provide a lot of redundant information. In conclusion, maps constructed using SLAM algorithms cannot effectively guide the movement of mobile robots.

[0017] In view of this, this application proposes a map updating method that can be applied to mobile robots. Referring to Figure 1, which is a flowchart of the map updating method, the method includes steps 101 to 104.

[0018] In step 101, the original image and the corresponding depth image are obtained. The original image includes information for obtaining the semantic target. Based on the original image and the depth image, the current pose of the mobile robot is determined. The current pose is the pose of the mobile robot at the current moment.

[0019] In step 102, the original image is input to the object detection model to obtain the initial position of the semantic target in the current pose of the mobile robot; based on the conversion relationship between the current pose of the mobile robot and the stored initial pose of the mobile robot, the initial position is converted into the target position of the semantic target in the initial pose of the mobile robot. For example, the initial pose can be the pose of the mobile robot at the initial moment, and the initial moment can be the moment when the target map is started to be updated.

[0020] In step 103, the target 3D bounding box of the semantic target is determined based on the target location of the semantic target.

[0021] In step 104, the target map is updated based on the target 3D bounding box of the semantic target.

[0022] For example, updating the target map based on the target 3D bounding box of the semantic target can include, but is not limited to: if the target map does not exist at the current time, a target map can be created, and map 3D bounding boxes can be added to the target map based on the target 3D bounding box; or, if the target map already exists at the current time, and the distance between the target 3D bounding box and each map 3D bounding box on the target map is not less than a first threshold, map 3D bounding boxes can be added to the target map based on the target 3D bounding box. Then, primary and secondary attributes of the semantic target are added to the map 3D bounding box. When the original image is input to the object detection model, the object detection model can also output the primary attributes of the semantic target, and the primary attributes can represent the category of the semantic target. Furthermore, the original image can be input to the image retrieval model to obtain the secondary attributes of the semantic target, and the secondary attributes can represent the detailed information of the semantic target, and different semantic targets have different secondary attributes.

[0023] For example, updating the target map based on the target 3D bounding box of a semantic target may include, but is not limited to: if a target map already exists at the current time, and the distance between the target 3D bounding box and a first map 3D bounding box on the target map is less than a first threshold, then a second map 3D bounding box can be determined based on the target 3D bounding box and the first map 3D bounding box; wherein the second map 3D bounding box is the target 3D bounding box, or the second map 3D bounding box is the first map 3D bounding box, or the second map 3D bounding box is a fused 3D bounding box of the target 3D bounding box and the first map 3D bounding box. Then, the first map 3D bounding box on the target map is replaced by the second map 3D bounding box.

[0024] For example, determining a second map 3D bounding box based on a target 3D bounding box and a first map 3D bounding box may include, but is not limited to: determining the size error between the size of the target 3D bounding box and the size of the first map 3D bounding box; if the size error is not greater than a second threshold, then the point cloud data corresponding to the target 3D bounding box and the point cloud data corresponding to the first map 3D bounding box can be fused to obtain fused point cloud data; generating a fused 3D bounding box based on the fused point cloud data; if the size of the fused 3D bounding box matches the size of the first map 3D bounding box, then the fused 3D bounding box can be determined as the second map 3D bounding box; if the size of the fused 3D bounding box does not match the size of the first map 3D bounding box, then... The target 3D bounding box is projected onto the image coordinate system to obtain the target 2D bounding box, and the first map 3D bounding box is projected onto the image coordinate system to obtain the first map 2D bounding box. If the distance between the target 2D bounding box and the image center is less than the distance between the first map 2D bounding box and the image center, the target 3D bounding box can be identified as the second map 3D bounding box. If the distance between the target 2D bounding box and the image center is greater than the distance between the first map 2D bounding box and the image center, the first map 3D bounding box can be identified as the second map 3D bounding box. If the distance between the target 2D bounding box and the image center is equal to the distance between the first map 2D bounding box and the image center, either the target 3D bounding box or the first map 3D bounding box can be identified as the second map 3D bounding box.

[0025] For example, determining the second map 3D bounding box based on the target 3D bounding box and the first map 3D bounding box may include, but is not limited to: determining the size error between the size of the target 3D bounding box and the size of the first map 3D bounding box; if the size error is greater than a second threshold, obtaining point cloud data corresponding to the overlapping area between the target 3D bounding box and the first map 3D bounding box; projecting the target 3D bounding box onto the image coordinate system to obtain the target 2D bounding box, and projecting the first map 3D bounding box onto the image coordinate system to obtain the first map 2D bounding box; if the distance between the target 2D bounding box and the image center is less than the distance between the first map 2D bounding box and the image center, fusing the point cloud data corresponding to the target 3D bounding box and the point cloud data corresponding to the overlapping area to obtain fused point cloud data; if the distance between the target 2D bounding box and the image center is greater than the distance between the first map 2D bounding box and the image center, fusing the point cloud data corresponding to the first map 3D bounding box and the point cloud data corresponding to the overlapping area to obtain fused point cloud data; if the distance between the target 2D bounding box and the image center is equal to the distance between the first map 2D bounding box and the image center, fusing the point cloud data corresponding to the target 3D bounding box (or the point cloud data corresponding to the first map 3D bounding box) and the point cloud data corresponding to the overlapping area to obtain fused point cloud data. Then, the fused 3D bounding box corresponding to the fused point cloud data is determined as the second map 3D bounding box.

[0026] For example, if a third map 3D bounding box and a fourth map 3D bounding box are found on the target map, a fifth map 3D bounding box can be determined based on these two bounding boxes. Specifically, the first-level attributes of the third map 3D bounding box and the first-level attributes of the fourth map 3D bounding box are the same, the second-level attributes of the third map 3D bounding box and the second-level attributes of the fourth map 3D bounding box are the same, and the distance between the third map 3D bounding box and the fourth map 3D bounding box is not less than a first threshold and less than a third threshold. If the fifth map 3D bounding box is the third map 3D bounding box, the fourth map 3D bounding box can be deleted from the target map; or, if the fifth map 3D bounding box is the fourth map 3D bounding box, the third map 3D bounding box can be deleted from the target map.

[0027] For example, determining a fifth map 3D frame based on a third map 3D frame and a fourth map 3D frame includes, but is not limited to: projecting the third map 3D frame onto an image coordinate system to obtain a third map 2D frame, and projecting the fourth map 3D frame onto an image coordinate system to obtain a fourth map 2D frame; if the distance between the third map 2D frame and the image center is less than the distance between the fourth map 2D frame and the image center, then the third map 3D frame is determined as the fifth map 3D frame; if the distance between the third map 2D frame and the image center is greater than the distance between the fourth map 2D frame and the image center, then the fourth map 3D frame is determined as the fifth map 3D frame; if the distance between the third map 2D frame and the image center is equal to the distance between the fourth map 2D frame and the image center, then either the third map 3D frame or the fourth map 3D frame is determined as the fifth map 3D frame.

[0028] In this embodiment, the position of the semantic target is obtained based on the target detection model, and the target 3D bounding box of the semantic target is determined based on the position of the semantic target. Then, the target 3D bounding box is used to update the target map, achieving higher accuracy in target map updates and resulting in a more precise target map. To improve the estimation accuracy of the target 3D bounding box in 3D space, the position of the semantic target is output in real time based on the target detection model, such as a large visual model, to obtain an accurate and reliable target 3D bounding box. This improves the association standard between the target 3D bounding box and the target map, significantly enhancing the accuracy of the target map. Using the target 3D bounding box as the core for data association and fusion can greatly reduce the "deformation" of the target 3D bounding box caused by the increased pose error of the mobile robot, achieving higher accuracy in target map updates.

[0029] The embodiments of this application will be described below in conjunction with specific application scenarios.

[0030] This application proposes a map updating method applicable to mobile robots. The mobile robot may include a camera to capture raw images and depth images. The raw image can be of any type, such as RGB, RAW, or YUV images; the type of the raw image is not limited, and the following explanation will use an RGB image as an example. The depth image may include the depth value of each pixel in the raw image, where the depth value represents the distance between the physical location of the pixel and the camera.

[0031] For example, a mobile robot can include an RGB-D camera, which can capture RGB images and depth images, with the RGB image serving as the raw image. Of course, besides RGB-D cameras, mobile robots can also include other types of cameras, as long as they can capture both raw and depth images.

[0032] Referring to Figure 2, which shows a schematic diagram of the mobile robot's structure, the mobile robot can be equipped with a robotic arm and an RGB-D camera. The RGB-D camera is mounted at the end of the robotic arm, with its field of view facing downwards at a certain angle to the horizontal plane. This tilted orientation allows the RGB-D camera to acquire more information. The robotic arm can be a 6-DOF (DoF) robotic arm. The mobile robot can be equipped with a movable platform to control its movement. The chassis of the mobile robot carries a logic processing unit, which executes the map update method.

[0033] When the mobile robot is stationary, the RGB-D camera can be controlled to move along with the robotic arm, thereby controlling the field of view of the RGB-D camera and acquiring images within a suitable range.

[0034] During the movement of the mobile robot, the robotic arm remains stationary, the field of view of the RGB-D camera remains unchanged, and the extrinsic parameter matrix between the center of the robot's chassis and the RGB-D camera remains fixed. The extrinsic parameter matrix describes the relative position and orientation between the RGB-D camera and the robot's coordinate system, and consists of a rotation matrix and a translation vector. In a mobile robot, a fixed extrinsic parameter matrix means that the position and orientation of the RGB-D camera relative to the center of the chassis are fixed.

[0035] In the above application scenarios, this application proposes a map updating method based on the fusion of an RGB-D camera and a large model, namely a target detection model, to update the target map. Referring to Figure 3, which is a flowchart of the map updating method, the method may include steps 301 to 311.

[0036] In step 301, the original image and the corresponding depth image are obtained. The original image includes information used to obtain the semantic target.

[0037] For example, a mobile robot can periodically acquire raw images and depth images, with the acquisition time of the raw image being the same as the acquisition time of the depth image. For instance, at time t1, an RGB image and a depth image are acquired using an RGB-D camera; that is, the acquisition time of both the RGB image and the depth image is time t1. At time t2, an RGB image and a depth image are acquired using the RGB-D camera; that is, the acquisition time of both the RGB image and the depth image is time t2, and so on.

[0038] In summary, for each time point, when the current time is that time, the original image and depth image can be acquired, and then the subsequent map update process can be performed based on the original image and the depth image. For ease of description, the following explanation will use the processing of an original image and its corresponding depth image as an example.

[0039] In step 302, the current pose of the mobile robot is determined based on the original image and the depth image. That is, the pose of the mobile robot at the current moment is called the current pose.

[0040] For example, based on the original image and depth image acquired by the RGB-D camera, the SLAM algorithm can be used to update the current pose of the mobile robot, and there are no restrictions on the pose update process.

[0041] For example, based on the original image and the previous frame of the original image, multiple feature points can be found. For each feature point, it corresponds to the coordinates of a first pixel in the original image, the coordinates of a second pixel in the previous frame of the original image, and the depth value in the depth image. Based on the first pixel coordinates, second pixel coordinates, and depth value corresponding to each feature point, the current pose of the mobile robot can be determined. There are no restrictions on this determination process.

[0042] In step 303, the original image is input to the target detection model, which outputs the initial position of the semantic target in the current pose of the mobile robot and the first-level attributes of the semantic target.

[0043] For example, an object detection model can be pre-trained. The input to the object detection model can be an image, such as an RGB image, and the output can be the location of the semantic object and its first-level attributes. For instance, the object detection model can be trained by a server and distributed to each mobile robot, or each mobile robot can train its own object detection model; the training process for this object detection model is not limited. The object detection model is a large-scale visual model. The object detection model can be a neural network model or a deep learning network model; the network structure of the object detection model is not limited.

[0044] For example, based on a trained object detection model, each time an original image is obtained, the original image can be input into the object detection model. That is, the object detection model is an online model used to process the original image in real time. After receiving the original image, the object detection model can process it to obtain the initial position of the semantic target and the first-level attributes of the semantic target.

[0045] For example, a semantic target can be an object within the camera's field of view; that is, objects within the camera's field of view are called semantic targets. Semantic targets can be objects with attributes such as category, color, and text. During the movement of the mobile robot, each time an original image is captured by the camera, this original image may not include semantic targets. In this case, the object detection model will not output the initial position and primary attributes of the semantic targets. Alternatively, the original image may include semantic targets, in which case the object detection model outputs the initial position and primary attributes of the semantic targets. The original image may include at least one semantic target; in this case, the object detection model outputs the initial position and primary attributes of each semantic target. The following explanation will use an original image containing only one semantic target as an example.

[0046] For example, the original image contains multiple pixels. The object detection model outputs which pixels belong to the semantic target and which do not, i.e., it outputs the semantic target's mask information. The semantic target's mask information can determine the semantic target's position, which includes the pixels in the original image that belong to that semantic target. Since the mobile robot is in its current pose at the current moment, the position of the semantic target can be called the semantic target's initial position in the current pose.

[0047] For example, considering that each semantic target has complex semantic information, this information can be divided into primary attributes and secondary attributes. Secondary attributes are more difficult to detect than primary attributes. Primary attributes can be detected and output by object detection models, while secondary attributes can be detected and output by image retrieval models. Primary attributes can represent the category of the semantic target (i.e., category attributes), such as "cup" or "table." Secondary attributes can represent the detailed information of the semantic target (i.e., detailed attributes), such as the color, material, and text on the cup.

[0048] It is important to note that the secondary attributes differ for different semantic targets. For example, the secondary attributes of semantic targets of different categories are different, such as the secondary attributes of a cup being different from those of a table. Even the secondary attributes of semantic targets of the same category can differ; for example, the secondary attributes of cup 1 are different from those of cup 2. For instance, different cups may be made of different materials, and the text on different cups may also be different.

[0049] When the original image is input into the object detection model, the object detection model can analyze the semantic target, obtain the first-level attributes of the semantic target, and output the first-level attributes of the semantic target.

[0050] In step 304, based on the transformation relationship between the current pose of the mobile robot and the stored initial pose of the mobile robot, the initial position is converted into the target position of the semantic target in the initial pose of the mobile robot. For example, the initial pose can be the pose of the mobile robot at the initial moment, and the initial moment can be the moment when the target map is started to be updated.

[0051] For example, if there is no target map at the current time, the current time can be used as the initial time, that is, the current time is the time when the target map is updated. In this case, the current pose can be used as the initial pose, and therefore, the initial position can be used as the target position of the semantic target under the initial pose.

[0052] Alternatively, if a target map exists at the current moment, after obtaining the current pose of the mobile robot, the pose difference between the current pose and the initial pose can be calculated. This pose difference represents the transformation relationship between the current pose and the initial pose. Then, using the initial position of the semantic target in the current pose and this pose difference, the target position of the semantic target in the initial pose can be obtained. The target position represents the position of the semantic target in the initial pose, not the position of the semantic target in the current pose.

[0053] In abnormal situations, the target map may not be successfully created at the initial moment. The target map will be reinitialized later. At this time, the moment of reinitialization of the target map will be taken as the initial moment of the target map. The pose of the mobile robot at the moment of reinitialization of the target map is the initial pose of the mobile robot.

[0054] In step 305, the target 3D bounding box of the semantic target is determined based on the target location of the semantic target.

[0055] For example, the target 2D bounding box (i.e. the 2D bounding box of the semantic target) of the semantic target can be determined based on the target location of the semantic target. The target 2D bounding box is the smallest 2D bounding box that encloses the semantic target with orientation. In other words, the target 2D bounding box can be the smallest 2D bounding box that can enclose the target location of the semantic target.

[0056] For example, a coverage model can be used to determine the target 2D bounding box of a semantic object. The coverage model can include component convex hulls, oriented bounding boxes, and axis-aligned bounding boxes. The component convex hull is the finest-grained detection primitive, the axis-aligned bounding box is the coarsest-grained detection primitive, and the oriented bounding box is an intermediate-grained detection primitive. See Figure 4 for a schematic diagram of the coverage model. The deepest layer is the component convex hull, which serves as a detection primitive. The component convex hull can be simply referred to as the convex hull. It fits and encloses the semantic object using multiple adjacent triangular facets. For example, the component convex hull is a method for representing convex polyhedra, used in 3D reconstruction and detection. By fitting and enclosing a complex 3D model with many adjacent triangular facets, it simplifies the model. In this example, the semantic object can be fitted and enclosed by multiple adjacent triangular facets to obtain the component convex hull. There are no restrictions on the acquisition of this component convex hull; it can be configured.

[0057] The detection primitives of the second layer are oriented bounding boxes, which can be called OBBs (Oriented Bounding Boxes). An oriented bounding box is the smallest circumscribed cube used to enclose the convex hull of a component. The outermost detection primitives are axis-aligned bounding boxes, which can be called Axis-Aligned Bounding Boxes (AABBs). An axis-aligned bounding box is the smallest axis-aligned circumscribed cube used to enclose the oriented bounding box.

[0058] Based on the aforementioned coverage model, the convex hull of a component encloses the target 2D bounding box. Of course, the above is merely an example of determining the target 2D bounding box for a semantic target, and this method of determination is not limited.

[0059] For example, after obtaining the two-dimensional bounding box of the semantic target, it can be converted into a three-dimensional bounding box, that is, the 2D bounding box of the semantic target can be converted into a 3D bounding box. For instance, the pixel point corresponding to the semantic target in the original image can be determined, and the depth image includes the depth value corresponding to each pixel point in the original image. Therefore, the depth value corresponding to the semantic target can be determined, and based on the depth value corresponding to the semantic target, the 2D bounding box of the semantic target can be converted into a 3D bounding box, that is, the three-dimensional bounding box of the semantic target can be obtained. There are no restrictions on the method of obtaining this three-dimensional bounding box.

[0060] For example, after obtaining the target location of the semantic target, the target location can be converted into point cloud data of the semantic target. The point cloud data can include multiple three-dimensional points. For instance, the pixel corresponding to the target location of the semantic target in the original image can be determined, and the depth image includes the depth value corresponding to each pixel in the original image. Therefore, the depth value corresponding to the target location can be determined, and based on the depth value corresponding to the target location, the target location can be converted into point cloud data of the semantic target, thereby obtaining the point cloud data of the semantic target. There are no restrictions on the method of obtaining this point cloud data.

[0061] In summary, we can obtain the target 3D bounding box, the first-level attributes of the semantic target, and the point cloud data of the semantic target, and we can associate the target 3D bounding box, the first-level attributes, and the point cloud data.

[0062] In step 306, determine whether the target map already exists at the current moment.

[0063] If not, meaning there is no target map at the current moment, then step 307 can be executed.

[0064] If so, meaning the target map already exists at the current moment, then step 308 can be executed.

[0065] In step 307, a target map is created. The target map is a 3D map, and a map 3D bounding box is added to the target map based on the semantic target. The first-level attribute of the semantic target is added to the map 3D bounding box.

[0066] For example, if a target map does not exist at the current moment, it means that no target map has been established yet and needs to be established first. This target map is an initial map, which needs to be updated in subsequent processes. Alternatively, when a target map does not exist at the current moment, the current moment and the current pose can be used as the initial moment and stored.

[0067] After the target map is established, a map 3D bounding box can be added to the target map based on the target 3D bounding box of the semantic target. This map 3D bounding box corresponds to the semantic target and can be called the target map 3D bounding box. The position of this target map 3D bounding box is the same as the position of the target 3D bounding box, that is, the size of the target map 3D bounding box can be the same as the size of the target 3D bounding box, and the center position of the target map 3D bounding box can be the same as the center position of the target 3D bounding box. In other words, this target 3D bounding box can be added to the target map to obtain this target map 3D bounding box.

[0068] After adding the target map 3D bounding box to the target map, you can also add a first-level attribute of semantic target to the target map 3D bounding box, that is, associate the target map 3D bounding box with the first-level attribute on the target map.

[0069] In step 308, calculate the distance between the target 3D bounding box and each map 3D bounding box on the target map.

[0070] For example, for each 3D bounding box on the target map, the distance between the center of the target 3D bounding box and the center of the map 3D bounding box can be calculated, such as Euclidean distance. Obviously, the center of the target 3D bounding box can be a 3D location point, and the center of the map 3D bounding box can be a 3D location point. The distance between the two 3D location points can be calculated, and there are no restrictions on the calculation method.

[0071] In step 309, based on the distance between the target 3D bounding box and each map 3D bounding box on the target map, it is determined whether there is a distance less than the first threshold.

[0072] The first threshold can be configured based on experience.

[0073] If not, meaning the distance between the target 3D bounding box and each map 3D bounding box on the target map is not less than the first threshold, then step 310 can be executed. If yes, meaning the distance between the target 3D bounding box and a certain map 3D bounding box on the target map (denoted as the first map 3D bounding box) is less than the first threshold, then step 311 can be executed.

[0074] In step 310, a target map 3D bounding box is added to the target map based on the semantic target. This target map 3D bounding box corresponds to the semantic target, and the first-level attribute of the semantic target is added to the target map 3D bounding box.

[0075] For example, if the distance between the target 3D bounding box and every map 3D bounding box on the target map is not less than a first threshold, then the target 3D bounding box is considered a new target 3D bounding box, and the semantic target is considered a new semantic target. Therefore, a target map 3D bounding box can be added to the target map based on the target 3D bounding box of the semantic target, thereby updating the target map. The position of this target map 3D bounding box is the same as the position of this target 3D bounding box.

[0076] After adding the target map 3D bounding box to the target map, you can also add a first-level attribute of semantic target to the target map 3D bounding box, that is, associate the target map 3D bounding box with the first-level attribute on the target map.

[0077] In step 311, a second map 3D bounding box is determined based on the target 3D bounding box and the first map 3D bounding box, and the first map 3D bounding box on the target map is replaced by the second map 3D bounding box.

[0078] For example, if the distance between the target 3D bounding box and the first map 3D bounding box on the target map is less than a first threshold, it means that the target 3D bounding box and the first map 3D bounding box are 3D bounding boxes of the same semantic target. That is, before the current time, the map 3D bounding box of the semantic target has been added to the target map. When the target 3D bounding box of the semantic target is obtained again at the current time, the second map 3D bounding box of the semantic target needs to be redetermined and the first map 3D bounding box on the target map is replaced by the second map 3D bounding box.

[0079] For example, adding a second 3D bounding box to the target map and deleting the first 3D bounding box updates the target map. After adding the second 3D bounding box, you can add a first-level attribute of the semantic target to the second 3D bounding box and delete the first-level attribute associated with the first 3D bounding box. Alternatively, you can directly associate the first-level attribute of the second 3D bounding box with that of the first 3D bounding box.

[0080] The second map 3D bounding box can be the target 3D bounding box, or the second map 3D bounding box can be the first map 3D bounding box, or the second map 3D bounding box can be a fused 3D bounding box of the target 3D bounding box and the first map 3D bounding box. Based on this, for step 311, the target map can be updated using the following steps S11-S21.

[0081] In step S11, the size error between the size of the target 3D frame and the size of the first map 3D frame is determined.

[0082] For example, based on the length of the target 3D bounding box and the length of the first map 3D bounding box, the length difference can be determined, i.e., the difference between the two lengths. Similarly, based on the width of the target 3D bounding box and the width of the first map 3D bounding box, the width difference can be determined, i.e., the difference between the two widths. Furthermore, based on the height of the target 3D bounding box and the height of the first map 3D bounding box, the height difference can be determined, i.e., the difference between the two heights.

[0083] Dimensional errors can be determined based on length, width, and height differences. For example, the average of the length, width, and height differences can be used as the dimensional error, or a weighted average of the length, width, and height differences can be used to obtain the dimensional error. There are no restrictions on how this dimensional error is determined.

[0084] In step S12, it is determined whether the size error is greater than the second threshold, which can be configured based on experience.

[0085] If not, that is, if the size error is not greater than the second threshold, then step S13 can be executed.

[0086] If so, that is, the size error is greater than the second threshold, then step S18 can be executed.

[0087] In step S13, the point cloud data corresponding to the target 3D bounding box and the point cloud data corresponding to the first map 3D bounding box are fused to obtain fused point cloud data; a fused 3D bounding box is generated based on the fused point cloud data.

[0088] For example, if the size error is not greater than the second threshold, it means that the size of the target 3D bounding box and the first map 3D bounding box are relatively close, and the target 3D bounding box and the first map 3D bounding box can be fused.

[0089] Referring to step 305, the target 3D bounding box and point cloud data can be associated, thus finding the point cloud data corresponding to the target 3D bounding box (denoted as point cloud data A). When updating the first map 3D bounding box on the target map, the first map 3D bounding box is also used as the target 3D bounding box, and the first map 3D bounding box and point cloud data are associated, thus finding the point cloud data corresponding to the first map 3D bounding box (denoted as point cloud data B).

[0090] Based on this, point cloud data A and point cloud data B can be fused to obtain fused point cloud data, which can include the union of point cloud data A and point cloud data B.

[0091] After obtaining the fused point cloud data, a fused 3D bounding box is generated based on the fused point cloud data. The fused point cloud data includes multiple 3D points, and the fused 3D bounding box is the smallest 3D bounding box that can enclose these 3D points. For example, the coverage range model of part convex hull, oriented bounding box, and axis-aligned bounding box can be used to determine the smallest 3D bounding box that can enclose these 3D points. There are no restrictions on the method of obtaining this fused 3D bounding box.

[0092] In step S14, it is determined whether the size of the fused 3D frame matches the size of the first map 3D frame.

[0093] If yes, meaning the size of the merged 3D bounding box matches the size of the first map 3D bounding box, then proceed to step S15. If no, meaning the size of the merged 3D bounding box does not match the size of the first map 3D bounding box, then proceed to step S16.

[0094] For example, if the size of the merged 3D bounding box is the same as the size of the first map 3D bounding box, then the size of the merged 3D bounding box matches the size of the first map 3D bounding box; if the size of the merged 3D bounding box is different from the size of the first map 3D bounding box, then the size of the merged 3D bounding box does not match the size of the first map 3D bounding box.

[0095] For example, determine the size error between the size of the fused 3D frame and the size of the first map 3D frame. If the size error is less than a threshold, the size of the fused 3D frame matches the size of the first map 3D frame. If the size error is not less than the threshold, the size of the fused 3D frame does not match the size of the first map 3D frame.

[0096] In step S15, the fused 3D frame is determined as the second map 3D frame, and the first map 3D frame on the target map is replaced by the second map 3D frame. That is, the first map 3D frame on the target map needs to be deleted, and the first-level attributes of the second map 3D frame and the first map 3D frame are associated.

[0097] In step S16, the target 3D bounding box is projected onto the image coordinate system to obtain the target 2D bounding box, and the first map 3D bounding box is projected onto the image coordinate system to obtain the first map 2D bounding box.

[0098] For example, the target 3D bounding box can be projected onto the image coordinate system to obtain the target 2D bounding box in the image coordinate system. That is, each 3D point of the target 3D bounding box is projected onto the image coordinate system to obtain the corresponding 2D point (this embodiment does not limit how the 3D point is mapped to the 2D point in the image coordinate system). The region formed by these 2D points is the target 2D bounding box. Similarly, the first map 3D bounding box can be projected onto the image coordinate system to obtain the first map 2D bounding box in the image coordinate system. That is, each 3D point of the first map 3D bounding box is projected onto the image coordinate system to obtain the corresponding 2D point. The region formed by these 2D points is the first map 2D bounding box.

[0099] In step S17, if the distance between the target 2D bounding box and the image center is less than the distance between the first map 2D bounding box and the image center, then the target 3D bounding box is determined as the second map 3D bounding box; if the distance between the target 2D bounding box and the image center is greater than the distance between the first map 2D bounding box and the image center, then the first map 3D bounding box is determined as the second map 3D bounding box; if the distance between the target 2D bounding box and the image center is equal to the distance between the first map 2D bounding box and the image center, then either the target 3D bounding box or the first map 3D bounding box is determined as the second map 3D bounding box.

[0100] For example, the original image is located in an image coordinate system, and the center pixel of the original image can be used as the image center. The distance between the center point of the target 2D bounding box and the image center (such as Euclidean distance) can be determined, as can the distance between the center point of the first map 2D bounding box and the image center. Based on this, if the distance between the target 2D bounding box and the image center is less than the distance between the first map 2D bounding box and the image center, it indicates that the target 3D bounding box has a higher confidence level, and the target 3D bounding box is used as the second map 3D bounding box. If the distance between the target 2D bounding box and the image center is greater than the distance between the first map 2D bounding box and the image center, it indicates that the first map 3D bounding box has a higher confidence level, and the first map 3D bounding box is used as the second map 3D bounding box.

[0101] If the target 3D bounding box is used as the second map 3D bounding box, then the first map 3D bounding box on the target map can be replaced by the second map 3D bounding box. In this case, the first map 3D bounding box on the target map needs to be deleted, and the first-level attribute of the second map 3D bounding box should be associated with the first map 3D bounding box. If the first map 3D bounding box is used as the second map 3D bounding box, then the first map 3D bounding box on the target map can be replaced by the second map 3D bounding box. In this case, the first map 3D bounding box on the target map can be retained.

[0102] In step S18, the point cloud data corresponding to the overlapping area between the target 3D bounding box and the first map 3D bounding box is obtained. For example, the overlapping area between the target 3D bounding box and the first map 3D bounding box can be determined first, and then the point cloud data corresponding to the overlapping area can be obtained.

[0103] For example, if the size error between the target 3D bounding box and the first map 3D bounding box is greater than a second threshold, it indicates that the sizes of the target 3D bounding box and the first map 3D bounding box differ significantly, and they may intersect or overlap. Therefore, the overlapping area between the target 3D bounding box and the first map 3D bounding box can be determined. Based on the point cloud data A corresponding to the target 3D bounding box and the point cloud data B corresponding to the first map 3D bounding box, the point cloud data corresponding to the overlapping area can be found from point cloud data A and point cloud data B, and denoted as the overlapping area point cloud data.

[0104] In step S19, the target 3D bounding box is projected onto the image coordinate system to obtain the target 2D bounding box, and the first map 3D bounding box is projected onto the image coordinate system to obtain the first map 2D bounding box, see step S16.

[0105] In step S20, if the distance between the target 2D bounding box and the image center is less than the distance between the first map 2D bounding box and the image center, the point cloud data corresponding to the target 3D bounding box and the point cloud data of the overlapping area are fused to obtain fused point cloud data; if the distance between the target 2D bounding box and the image center is greater than the distance between the first map 2D bounding box and the image center, the point cloud data corresponding to the first map 3D bounding box and the point cloud data of the overlapping area are fused to obtain fused point cloud data; if the distance between the target 2D bounding box and the image center is equal to the distance between the first map 2D bounding box and the image center, the point cloud data corresponding to the target 3D bounding box (or the point cloud data corresponding to the first map 3D bounding box) and the point cloud data of the overlapping area are fused to obtain fused point cloud data.

[0106] For example, when fusing point cloud data A corresponding to the target 3D bounding box and point cloud data of the overlapping area to obtain fused point cloud data, the fused point cloud data includes the union of point cloud data A and point cloud data of the overlapping area. Similarly, when fusing point cloud data B corresponding to the first map 3D bounding box and point cloud data of the overlapping area to obtain fused point cloud data, the fused point cloud data includes the union of point cloud data B and point cloud data of the overlapping area.

[0107] In step S21, the fused 3D bounding box corresponding to the fused point cloud data is determined as the second map 3D bounding box. The first map 3D bounding box on the target map is replaced by the second map 3D bounding box, that is, the first map 3D bounding box on the target map needs to be deleted, and the first-level attributes of the second map 3D bounding box and the first map 3D bounding box are associated.

[0108] At this point, step 311 is complete, and the 3D bounding box of the semantic target can be updated on the target map.

[0109] In one possible implementation, an image retrieval model can be pre-trained. The input to the image retrieval model can be an image (such as an RGB image), and the output can be a secondary attribute of the semantic target. For example, the image retrieval model can be trained by a server and then distributed to each mobile robot, or each mobile robot can train its own image retrieval model. There are no restrictions on the training process for this image retrieval model. The image retrieval model can be a neural network model or a deep learning network model; there are no restrictions on the network structure of the image retrieval model.

[0110] For example, considering that image retrieval models are used to analyze the secondary attributes of semantic targets, and that the detection difficulty of secondary attributes is higher than that of primary attributes, making real-time detection of secondary attributes impossible, the image retrieval model can be an offline model. When the original image is obtained, it is not directly input into the image retrieval model. After updating the 3D bounding boxes of semantic targets on the target map, for each 3D bounding box on the target map, if the bounding box is already associated with a secondary attribute, the associated secondary attribute is not retrieved. If the bounding box is not associated with a secondary attribute, the original image corresponding to the bounding box is retrieved, and the original image is input into the image retrieval model to obtain the secondary attributes of the semantic target. These secondary attributes are then associated with the bounding box. Based on this, the bounding box is associated with both primary and secondary attributes. The primary attribute can represent the category of the semantic target, and the secondary attribute can represent the detailed information of the semantic target. It is important to note that different semantic targets have different secondary attributes; for example, different categories of semantic targets have different secondary attributes, and even semantic targets of the same category have different secondary attributes.

[0111] In certain application scenarios, when a semantic target moves from location A to location B, multiple map bounding boxes of that semantic target may be added to the target map. Therefore, based on the characteristic that the primary and secondary attributes of the same semantic target are the same, while the secondary attributes of different semantic targets are different, it is possible to distinguish multiple map bounding boxes belonging to the same semantic target, and then delete some map bounding boxes, including only one map bounding box of the semantic target, thereby removing the ghosting of the semantic target in the target map.

[0112] Considering that the primary attributes of semantic targets in the target map may repeat and change abruptly, multiple 3D bounding boxes of the same semantic target can be filtered based on the distance between them. Considering that different semantic targets have different secondary attributes, multiple 3D bounding boxes of the same semantic target can be filtered based on the associated secondary attributes of the bounding boxes. Based on this, multiple 3D bounding boxes with completely identical secondary attributes and a distance not less than a first threshold and less than a third threshold can be identified as 3D bounding boxes of the same semantic target.

[0113] Based on the above principles, this application proposes a map updating method. As shown in Figure 5, which is a flowchart of the map updating method, the map updating method may include steps 501 to 504.

[0114] In step 501, the target map is traversed to the third map 3D frame and the fourth map 3D frame.

[0115] For example, the first-level attributes of the third map 3D frame are the same as those of the fourth map 3D frame, the second-level attributes of the third map 3D frame are the same as those of the fourth map 3D frame, and the distance between the third map 3D frame and the fourth map 3D frame is not less than a first threshold and less than a third threshold.

[0116] For example, if a third map bounding box and a fourth map bounding box that meet the above conditions exist on the target map, the target map can be updated using subsequent steps. If no third map bounding box and a fourth map bounding box that meet the above conditions exist on the target map, the target map update process ends, and the process waits for the next cycle to continue traversing the third map bounding box and the fourth map bounding box on the target map.

[0117] For example, if the primary attributes of map bounding box A and map bounding box B are the same, and the secondary attributes of map bounding box A and map bounding box B are the same, then map bounding box A and map bounding box B may belong to the same semantic target. If the primary attributes of map bounding box A and map bounding box B are different, and / or the secondary attributes of map bounding box A and map bounding box B are different, then map bounding box A and map bounding box B do not belong to the same semantic target.

[0118] For example, if the distance between map bounding boxes A and B is less than a first threshold, then map bounding boxes A and B are considered to be bounding boxes of the same semantic target. The implementation process is shown in Figure 3. If the distance between map bounding boxes A and B is not less than a third threshold, and the third threshold is determined based on the interval of multiple target placement, then it indicates that the distance between the two semantic targets is too far, exceeding the interval of multiple target placement, and they are unlikely to be bounding boxes of the same semantic target. Clearly, if the distance between map bounding boxes A and B is not less than the first threshold and less than the third threshold, then map bounding boxes A and B are likely bounding boxes of the same semantic target.

[0119] If the primary attributes of map 3D bounding box A and map 3D bounding box B are the same, the secondary attributes of map 3D bounding box A and map 3D bounding box B are the same, and the distance between map 3D bounding box A and map 3D bounding box B is not less than the first threshold and less than the third threshold, then map 3D bounding box A and map 3D bounding box B are confirmed to be map 3D bounding boxes of the same semantic target, and they are designated as the third map 3D bounding box and the fourth map 3D bounding box.

[0120] For example, when multiple matching pairs exist on the target map (each matching pair includes a third map 3D bounding box and a fourth map 3D bounding box), these multiple matching pairs can be added to the set of targets to be associated. In subsequent processes, each matching pair in the set of targets to be associated needs to be traversed sequentially, and the target map is updated for the currently traversed matching pair. The following explanation will use the processing of a single matching pair as an example.

[0121] In step 502, the third map 3D frame is projected onto the image coordinate system to obtain the third map 2D frame, and the fourth map 3D frame is projected onto the image coordinate system to obtain the fourth map 2D frame, see step S16.

[0122] In step 503, if the distance between the third map 2D frame and the image center is less than the distance between the fourth map 2D frame and the image center (the confidence level of the third map 3D frame is higher), then the third map 3D frame is determined as the fifth map 3D frame; if the distance between the third map 2D frame and the image center is greater than the distance between the fourth map 2D frame and the image center (the confidence level of the fourth map 3D frame is higher), then the fourth map 3D frame is determined as the fifth map 3D frame; if the distance between the third map 2D frame and the image center is equal to the distance between the fourth map 2D frame and the image center, then either the third map 3D frame or the fourth map 3D frame is determined as the fifth map 3D frame.

[0123] In summary, when the confidence level of the third map's 3D bounding box is higher, it is designated as the fifth map's 3D bounding box; similarly, when the confidence level of the fourth map's 3D bounding box is higher, it is designated as the fifth map's 3D bounding box. This allows us to use the 3D bounding box closest to the camera center as the core (when the camera is directly facing the target, the 3D bounding box is more accurate and has less depth fluctuation, resulting in more accurate 3D bounding box estimation), and then filter, register, and fuse the point cloud sets of 3D bounding boxes from other poses into the core 3D bounding box, thereby updating the target map.

[0124] Therefore, the fifth map 3D frame can be determined based on the third and fourth map 3D frames. The fifth map 3D frame is the third map 3D frame, or the fifth map 3D frame is the fourth map 3D frame.

[0125] In step 504, if the fifth map 3D frame is the third map 3D frame, then the fourth map 3D frame is deleted from the target map, and the third map 3D frame is retained, so that only one map 3D frame with the same semantic target is retained; if the fifth map 3D frame is the fourth map 3D frame, then the third map 3D frame is deleted from the target map, and the fourth map 3D frame is retained, so that only one map 3D frame with the same semantic target is retained.

[0126] In this embodiment, the position of the semantic target is obtained based on the target detection model, and the target 3D bounding box of the semantic target is determined based on the position of the semantic target. Then, the target 3D bounding box is used to update the target map, achieving higher accuracy in target map updates and resulting in a more precise target map. To improve the estimation accuracy of the target 3D bounding box in 3D space, the position of the semantic target is output in real time based on the target detection model (e.g., a large visual model), obtaining an accurate and reliable target 3D bounding box. This allows for optimization of the semantic target size, improving the association standard between the target 3D bounding box and the target map, significantly enhancing the accuracy of the target map, and enabling rapid application to various target exploration and localization products. Using the target 3D bounding box as the core for data association and fusion can greatly reduce the "deformation" of the target 3D bounding box caused by the increasing pose error of the mobile robot, achieving higher accuracy in target map updates.

[0127] To improve the estimation accuracy of 3D bounding boxes of targets in 3D space and remove ghosting of semantic targets in the target map, this application proposes a high-precision map update method that combines an RGB-D camera with a large visual model (e.g., real-time target detection) and an image retrieval model (e.g., offline retrieval). The method obtains the 3D bounding boxes of semantic targets based on the pose of the mobile robot updated in real-time using RGB-D-SLAM, thereby recording the position, size, point cloud data, and primary attributes of the semantic targets to establish an initial semantic map (i.e., the target map). Since the primary attributes of semantic targets in the target map may repeat or change, initial filtering and updating are performed based on the Euclidean distance between semantic targets being less than a certain threshold. Then, all semantic targets in the target map are input into the image retrieval model for secondary semantic attribute detection to obtain secondary attributes. The secondary attributes of all semantic targets are superimposed onto the target map. Targets with completely identical secondary attributes but whose Euclidean distance is not less than the first threshold and less than the third threshold (interval between multiple targets) are listed as the set of targets to be associated. Finally, the set of targets to be associated is traversed, and the 3D bounding box closest to the center of the camera is used as the core (when the camera is facing the target, the 3D bounding box is more accurate and the depth fluctuation is smaller, resulting in more accurate 3D bounding box estimation). The point clouds of 3D bounding boxes in other poses are filtered, registered, and fused into the core 3D bounding box, thereby updating the target map. The above method can be quickly applied to various robotic arm products, with the characteristics of strong adaptability, low cost, and rapid deployment, and has a wide range of application scenarios.

[0128] Based on the same application concept as the above method, this application proposes a map updating device 60 for use in a mobile robot. As shown in Figure 6, which is a schematic diagram of the device, the device includes an acquisition module 61, a processing module 62, a determination module 63, and a map management module 64.

[0129] The acquisition module 61 is used to acquire an original image and a depth image corresponding to the original image, wherein the original image includes information for obtaining semantic targets; and to determine the current pose of the mobile robot based on the original image and the depth image, wherein the current pose is the pose of the mobile robot at the current moment.

[0130] The processing module 62 is used to input the original image into the target detection model to obtain the initial position of the semantic target in the current pose of the mobile robot; based on the conversion relationship between the current pose of the mobile robot and the stored initial pose of the mobile robot, the initial position is converted into the target position of the semantic target in the initial pose of the mobile robot; wherein, the initial pose is the pose of the mobile robot at the initial moment, and the initial moment is the moment when the target map is started to be updated.

[0131] The determination module 63 is used to determine the target 3D bounding box of the semantic target based on the target location of the semantic target.

[0132] The map management module 64 is used to update the target map based on the target 3D bounding box of the semantic target.

[0133] For example, when the map management module 64 updates the target map based on the target 3D bounding box of the semantic target, it specifically performs the following: if the target map does not exist at the current time, a target map is created, and a map 3D bounding box is added to the target map based on the target 3D bounding box; or, if the target map already exists at the current time, and the distance between the target 3D bounding box and each map 3D bounding box on the target map is not less than a first threshold, a map 3D bounding box is added to the target map based on the target 3D bounding box; and the primary and secondary attributes of the semantic target are added to the map 3D bounding box; wherein, when the original image is input to the target detection model, the target detection model outputs the primary attribute of the semantic target, and the primary attribute represents the category of the semantic target; the original image is input to the image retrieval model to obtain the secondary attribute of the semantic target, and the secondary attribute represents the detailed information of the semantic target, and the secondary attributes are different for different semantic targets.

[0134] For example, when the map management module 64 updates the target map based on the target 3D bounding box of the semantic target, it is specifically used to: if the target map already exists at the current time, and the distance between the target 3D bounding box and the first map 3D bounding box on the target map is less than a first threshold, then determine a second map 3D bounding box based on the target 3D bounding box and the first map 3D bounding box; the second map 3D bounding box is the target 3D bounding box, or the first map 3D bounding box, or a fused 3D bounding box of the target 3D bounding box and the first map 3D bounding box; and replace the first map 3D bounding box on the target map with the second map 3D bounding box.

[0135] For example, when the map management module 64 determines the second map 3D frame based on the target 3D frame and the first map 3D frame, it specifically performs the following steps: determining the size error between the size of the target 3D frame and the size of the first map 3D frame; if the size error is not greater than a second threshold, fusing the point cloud data corresponding to the target 3D frame and the point cloud data corresponding to the first map 3D frame to obtain fused point cloud data; generating a fused 3D frame based on the fused point cloud data; if the size of the fused 3D frame matches the size of the first map 3D frame, determining the fused 3D frame as the second map 3D frame; if the size of the fused 3D frame does not match the size of the first map 3D frame, projecting the target 3D frame onto the image coordinate system to obtain a target 2D frame, and projecting the first map 3D frame onto the image coordinate system to obtain a first map 2D frame; if the distance between the target 2D frame and the image center is less than the distance between the first map 2D frame and the image center, determining the target 3D frame as the second map 3D frame; if the distance between the target 2D frame and the image center is greater than the distance between the first map 2D frame and the image center, determining the first map 3D frame as the second map 3D frame.

[0136] For example, when the map management module 64 determines the second map 3D frame based on the target 3D frame and the first map 3D frame, it specifically performs the following steps: determining the size error between the size of the target 3D frame and the size of the first map 3D frame; if the size error is greater than a second threshold, obtaining point cloud data corresponding to the overlapping area between the target 3D frame and the first map 3D frame; projecting the target 3D frame onto the image coordinate system to obtain a target 2D frame, and projecting the first map 3D frame onto the image coordinate system to obtain a first map 2D frame; if the distance between the target 2D frame and the image center is less than the distance between the first map 2D frame and the image center, fusing the point cloud data corresponding to the target 3D frame and the point cloud data corresponding to the overlapping area to obtain fused point cloud data; if the distance between the target 2D frame and the image center is greater than the distance between the first map 2D frame and the image center, fusing the point cloud data corresponding to the first map 3D frame and the point cloud data corresponding to the overlapping area to obtain fused point cloud data; and determining the fused 3D frame corresponding to the fused point cloud data as the second map 3D frame.

[0137] For example, the map management module 64 is further configured to: if a third map 3D frame and a fourth map 3D frame are found on the target map, determine a fifth map 3D frame based on the third map 3D frame and the fourth map 3D frame; wherein the first-level attributes of the third map 3D frame are the same as the first-level attributes of the fourth map 3D frame, the second-level attributes of the third map 3D frame are the same as the second-level attributes of the fourth map 3D frame, and the distance between the third map 3D frame and the fourth map 3D frame is not less than the first threshold and less than the third threshold; if the fifth map 3D frame is the third map 3D frame, delete the fourth map 3D frame from the target map; or, if the fifth map 3D frame is the fourth map 3D frame, delete the third map 3D frame from the target map.

[0138] For example, when the map management module 64 determines the fifth map 3D frame based on the third map 3D frame and the fourth map 3D frame, it specifically performs the following steps: projecting the third map 3D frame onto the image coordinate system to obtain a third map 2D frame, and projecting the fourth map 3D frame onto the image coordinate system to obtain a fourth map 2D frame; if the distance between the third map 2D frame and the image center is less than the distance between the fourth map 2D frame and the image center, then the third map 3D frame is determined as the fifth map 3D frame; if the distance between the third map 2D frame and the image center is greater than the distance between the fourth map 2D frame and the image center, then the fourth map 3D frame is determined as the fifth map 3D frame.

[0139] Based on the same application concept as the above method, this application proposes an electronic device, such as a mobile robot, as shown in FIG7. The electronic device includes a processor 71 and a machine-readable storage medium 72. The machine-readable storage medium 72 stores machine-executable instructions that can be executed by the processor 71, such as map update instructions. The processor 71 is used to execute the machine-executable instructions to implement the map update method disclosed in the above embodiments of this application.

[0140] Based on the same concept as the above method, this application also provides a machine-readable storage medium storing a plurality of computer instructions, which, when executed by a processor, can implement the map update method disclosed in the above embodiments of this application.

[0141] The aforementioned machine-readable storage medium can be any electronic, magnetic, optical, or other physical storage device that can contain or store information, such as executable instructions, data, etc. For example, machine-readable storage media can be: random access memory (RAM), volatile memory, non-volatile memory, flash memory, storage drives (such as hard disk drives), solid-state drives, any type of storage disk (such as optical discs, DVDs, etc.), or similar storage media, or combinations thereof.

[0142] Based on the same application concept as the above method, this application embodiment also provides a computer program product, which may include a computer program that, when executed by a processor, can implement the map update method disclosed in the above embodiments of this application.

[0143] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of this application can take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0144] The above description is merely an embodiment of this application and is not intended to limit the scope of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of the claims of this application.

Claims

1. A map update method characterized by comprising: Applied to mobile robots, the method includes: Obtain an original image and a depth image corresponding to the original image, wherein the original image includes information for obtaining semantic targets; The current pose of the mobile robot is determined based on the original image and the depth image, wherein the current pose is the pose of the mobile robot at the current moment; The original image is input into the target detection model to obtain the initial position of the semantic target in the current pose of the mobile robot; Based on the conversion relationship between the current pose of the mobile robot and the stored initial pose of the mobile robot, the initial position is converted into the target position of the semantic target under the initial pose of the mobile robot; wherein, the initial pose is the pose of the mobile robot at the initial moment, and the initial moment is the moment when the target map is started to be updated. The target 3D bounding box of the semantic target is determined based on the target location of the semantic target; The target map is updated based on the target 3D bounding box of the semantic target.

2. The method according to claim 1, characterized in that, The updating of the target map based on the target 3D bounding box of the semantic target includes: If the target map does not exist at the current time, then the target map is created, and a target map 3D bounding box is added to the target map based on the target 3D bounding box, wherein the target map 3D bounding box is the map 3D bounding box corresponding to the semantic target; or... If the target map already exists at the current time, and the distance between the target 3D bounding box and each map 3D bounding box on the target map is not less than a first threshold, then the target map 3D bounding box is added to the target map based on the target 3D bounding box; Add the first-level and second-level attributes of the semantic target to the target map 3D bounding box; When the original image is input into the object detection model, the object detection model outputs the first-level attribute of the semantic target, and the first-level attribute represents the category of the semantic target; The original image is input into the image retrieval model to obtain the secondary attributes of the semantic target. The secondary attributes represent the detailed information of the semantic target, and different semantic targets have different secondary attributes.

3. The method according to claim 1 or 2, characterized in that, The updating of the target map based on the target 3D bounding box of the semantic target includes: If the target map already exists at the current time, and the distance between the target 3D bounding box and the first map 3D bounding box on the target map is less than a first threshold, then a second map 3D bounding box is determined based on the target 3D bounding box and the first map 3D bounding box; wherein, the second map 3D bounding box is the target 3D bounding box, or the first map 3D bounding box, or a fused 3D bounding box of the target 3D bounding box and the first map 3D bounding box; The first map 3D frame on the target map is replaced by the second map 3D frame.

4. The method according to claim 3, characterized in that, Determining the second map 3D bounding box based on the target 3D bounding box and the first map 3D bounding box includes: Determine the dimensional error between the size of the target 3D bounding box and the size of the first map 3D bounding box; If the size error is not greater than the second threshold, the point cloud data corresponding to the target 3D bounding box and the point cloud data corresponding to the first map 3D bounding box are fused to obtain fused point cloud data; the fused 3D bounding box is generated based on the fused point cloud data; if the size of the fused 3D bounding box matches the size of the first map 3D bounding box, the fused 3D bounding box is determined as the second map 3D bounding box. If the size of the fused 3D bounding box does not match the size of the first map 3D bounding box, then the target 3D bounding box is projected onto the image coordinate system to obtain the target 2D bounding box, and the first map 3D bounding box is projected onto the image coordinate system to obtain the first map 2D bounding box; If the distance between the target 2D bounding box and the center of the image is less than or equal to the distance between the first map 2D bounding box and the center of the image, then the target 3D bounding box is determined as the second map 3D bounding box; If the distance between the target 2D bounding box and the center of the image is greater than the distance between the first map 2D bounding box and the center of the image, then the first map 3D bounding box is determined as the second map 3D bounding box.

5. The method according to claim 3 or 4, characterized in that, Determining the second map 3D bounding box based on the target 3D bounding box and the first map 3D bounding box includes: Determine the dimensional error between the size of the target 3D bounding box and the size of the first map 3D bounding box; If the size error is greater than the second threshold, then the point cloud data corresponding to the overlapping area between the target 3D bounding box and the first map 3D bounding box is obtained; The target 3D bounding box is projected onto the image coordinate system to obtain the target 2D bounding box, and the first map 3D bounding box is projected onto the image coordinate system to obtain the first map 2D bounding box; If the distance between the target 2D bounding box and the image center is less than or equal to the distance between the first map 2D bounding box and the image center, then the point cloud data corresponding to the target 3D bounding box and the point cloud data corresponding to the overlapping area are fused to obtain fused point cloud data; if the distance between the target 2D bounding box and the image center is greater than the distance between the first map 2D bounding box and the image center, then the point cloud data corresponding to the first map 3D bounding box and the point cloud data corresponding to the overlapping area are fused to obtain fused point cloud data. The fused 3D bounding box corresponding to the fused point cloud data is determined as the second map 3D bounding box.

6. The method according to any one of claims 1 to 5, characterized in that, After updating the target map based on the target 3D bounding box of the semantic target, the method further includes: If a third map 3D bounding box and a fourth map 3D bounding box are found on the target map, a fifth map 3D bounding box is determined based on the third map 3D bounding box and the fourth map 3D bounding box; wherein the first-level attributes of the third map 3D bounding box and the first-level attributes of the fourth map 3D bounding box are the same, the second-level attributes of the third map 3D bounding box and the second-level attributes of the fourth map 3D bounding box are the same, and the distance between the third map 3D bounding box and the fourth map 3D bounding box is not less than a first threshold and less than a third threshold; If the fifth map 3D bounding box is the third map 3D bounding box, then delete the fourth map 3D bounding box from the target map; or, If the fifth map 3D bounding box is the fourth map 3D bounding box, then the third map 3D bounding box is deleted from the target map.

7. The method according to claim 6, characterized in that, Determining the fifth map 3D frame based on the third and fourth map 3D frames includes: The third map 3D frame is projected onto the image coordinate system to obtain the third map 2D frame, and the fourth map 3D frame is projected onto the image coordinate system to obtain the fourth map 2D frame; If the distance between the third map 2D frame and the center of the image is less than or equal to the distance between the fourth map 2D frame and the center of the image, then the third map 3D frame is determined as the fifth map 3D frame; If the distance between the third map 2D frame and the center of the image is greater than the distance between the fourth map 2D frame and the center of the image, then the fourth map 3D frame is determined as the fifth map 3D frame.

8. A map updating device, characterized in that, The device, used in mobile robots, includes: An acquisition module is used to acquire an original image and a depth image corresponding to the original image, wherein the original image includes information for obtaining semantic targets; and to determine the current pose of the mobile robot based on the original image and the depth image, wherein the current pose is the pose of the mobile robot at the current moment. The processing module is used to input the original image into the target detection model to obtain the initial position of the semantic target under the current pose of the mobile robot; based on the conversion relationship between the current pose of the mobile robot and the stored initial pose of the mobile robot, the initial position is converted into the target position of the semantic target under the initial pose of the mobile robot; wherein, the initial pose is the pose of the mobile robot at the initial moment, and the initial moment is the moment when the target map is started to be updated; A determination module is used to determine the target 3D bounding box of the semantic target based on the target location of the semantic target; The map management module is used to update the target map based on the target 3D bounding box of the semantic target.

9. The apparatus according to claim 8, characterized in that, When the map management module updates the target map based on the target's 3D bounding box of the semantic target, it is further used for: If the target map does not exist at the current time, the target map is created, and a target map 3D bounding box is added to the target map based on the target 3D bounding box. The target map 3D bounding box is a map 3D bounding box corresponding to the semantic target. Alternatively, if the target map already exists at the current time, and the distance between the target 3D bounding box and each map 3D bounding box on the target map is not less than a first threshold, then the target map 3D bounding box is added to the target map based on the target 3D bounding box; Add the first-level and second-level attributes of the semantic target to the target map 3D bounding box; Specifically, when the original image is input to the target detection model, the target detection model outputs the first-level attribute of the semantic target, which represents the category of the semantic target; when the original image is input to the image retrieval model, the second-level attribute of the semantic target is obtained, which represents the detailed information of the semantic target, and different semantic targets have different second-level attributes.

10. The apparatus according to claim 8 or 9, characterized in that, When the map management module updates the target map based on the target's 3D bounding box of the semantic target, it is further used for: If the target map already exists at the current time, and the distance between the target 3D bounding box and the first map 3D bounding box on the target map is less than a first threshold, then a second map 3D bounding box is determined based on the target 3D bounding box and the first map 3D bounding box; The second map 3D frame is the target 3D frame, or the first map 3D frame, or a fused 3D frame of the target 3D frame and the first map 3D frame; The first map 3D frame on the target map is replaced by the second map 3D frame.

11. The apparatus according to claim 10, characterized in that, When the map management module determines the second map 3D frame based on the target 3D frame and the first map 3D frame, it is further used for: Determine the size error between the size of the target 3D bounding box and the size of the first map 3D bounding box; if the size error is not greater than a second threshold, fuse the point cloud data corresponding to the target 3D bounding box and the point cloud data corresponding to the first map 3D bounding box to obtain fused point cloud data; generate the fused 3D bounding box based on the fused point cloud data; If the size of the fused 3D frame matches the size of the first map 3D frame, then the fused 3D frame is determined as the second map 3D frame; If the size of the fused 3D bounding box does not match the size of the first map 3D bounding box, then the target 3D bounding box is projected onto the image coordinate system to obtain the target 2D bounding box, and the first map 3D bounding box is projected onto the image coordinate system to obtain the first map 2D bounding box. If the distance between the target 2D bounding box and the center of the image is less than or equal to the distance between the first map 2D bounding box and the center of the image, the target 3D bounding box is determined as the second map 3D bounding box; If the distance between the target 2D bounding box and the center of the image is greater than the distance between the first map 2D bounding box and the center of the image, the first map 3D bounding box is determined as the second map 3D bounding box.

12. The apparatus according to claim 10 or 11, characterized in that, When the map management module determines the second map 3D frame based on the target 3D frame and the first map 3D frame, it is further used for: Determine the dimensional error between the size of the target 3D bounding box and the size of the first map 3D bounding box; If the size error is greater than the second threshold, then the point cloud data corresponding to the overlapping area between the target 3D bounding box and the first map 3D bounding box is obtained; The target 3D bounding box is projected onto the image coordinate system to obtain the target 2D bounding box, and the first map 3D bounding box is projected onto the image coordinate system to obtain the first map 2D bounding box; If the distance between the target 2D bounding box and the image center is less than or equal to the distance between the first map 2D bounding box and the image center, then the point cloud data corresponding to the target 3D bounding box and the point cloud data corresponding to the overlapping area are fused to obtain fused point cloud data; if the distance between the target 2D bounding box and the image center is greater than the distance between the first map 2D bounding box and the image center, then the point cloud data corresponding to the first map 3D bounding box and the point cloud data corresponding to the overlapping area are fused to obtain fused point cloud data. The fused 3D bounding box corresponding to the fused point cloud data is determined as the second map 3D bounding box.

13. The apparatus according to any one of claims 8 to 12, characterized in that, The map management module is also used for: If a third map 3D bounding box and a fourth map 3D bounding box are found on the target map, a fifth map 3D bounding box is determined based on the third map 3D bounding box and the fourth map 3D bounding box; wherein the first-level attributes of the third map 3D bounding box and the first-level attributes of the fourth map 3D bounding box are the same, the second-level attributes of the third map 3D bounding box and the second-level attributes of the fourth map 3D bounding box are the same, and the distance between the third map 3D bounding box and the fourth map 3D bounding box is not less than a first threshold and less than a third threshold; If the fifth map 3D bounding box is the third map 3D bounding box, then delete the fourth map 3D bounding box from the target map; or, If the fifth map 3D bounding box is the fourth map 3D bounding box, then the third map 3D bounding box is deleted from the target map.

14. The apparatus according to claim 13, characterized in that, When the map management module determines the fifth map 3D frame based on the third and fourth map 3D frames, it is further used for: The third map 3D frame is projected onto the image coordinate system to obtain the third map 2D frame, and the fourth map 3D frame is projected onto the image coordinate system to obtain the fourth map 2D frame; If the distance between the third map 2D frame and the center of the image is less than or equal to the distance between the fourth map 2D frame and the center of the image, then the third map 3D frame is determined as the fifth map 3D frame; If the distance between the third map 2D frame and the center of the image is greater than the distance between the fourth map 2D frame and the center of the image, then the fourth map 3D frame is determined as the fifth map 3D frame.

15. A mobile robot, characterized in that, include: A processor and a machine-readable storage medium, the machine-readable storage medium storing machine-executable instructions that can be executed by the processor; The processor is configured to execute the machine-executable instructions to implement the method of any one of claims 1 to 7.

16. A machine-readable storage medium storing machine-executable instructions executable by a processor; wherein, The processor is configured to execute the machine-executable instructions to implement the method of any one of claims 1 to 7.