Robot, positioning information determination method and device, and storage medium

By fusing visual odometry using direct and feature-based methods in a robot, key points and feature points in environmental images are extracted to construct a target map. This solves the problem of inaccurate localization of the robot in environments with weak textures, repetitive textures, and strong lighting variations, achieving higher localization accuracy and robustness.

CN116030340BActive Publication Date: 2026-06-26SHENZHEN PUDU TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHENZHEN PUDU TECH CO LTD
Filing Date
2021-10-26
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Robots using vision-based localization methods are not accurate enough in their positioning within the environment, especially in environments with weak textures, repetitive textures, and strong lighting variations.

Method used

Visual odometry, which combines the direct method and the feature method, extracts key points and feature points from environmental images by equipping the robot with sensors. It integrates the advantages of the direct method and the feature method to construct a target map to improve positioning accuracy.

Benefits of technology

In environments with weak textures, repetitive textures, and strong lighting variations, the robot's localization accuracy and robustness are significantly improved, ensuring that the robot can move autonomously and build accurate maps in these environments.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116030340B_ABST
    Figure CN116030340B_ABST
Patent Text Reader

Abstract

The application relates to a robot, a positioning information determination method and device, and a storage medium. The robot is provided with a sensor, and comprises a memory, a processor, and a computer program stored in the memory and capable of running on the processor. When the processor executes the computer program, the following steps are implemented: key points and feature points of each environment image are extracted according to at least one environment image collected by the sensor; initial pose information of the robot corresponding to each environment image is determined according to the key points and feature points of each environment image; a target map is constructed according to each initial pose information and a key frame image; and pose information of the robot corresponding to the target map is determined based on the target map. In this way, when the robot moves in a weak-texture environment or an environment with no obvious light change, effective points can be extracted from the environment image to realize positioning and mapping, and the positioning precision is improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of mobile device positioning technology, and in particular to a robot, a method for determining positioning information, an apparatus, and a storage medium. Background Technology

[0002] Visual localization is the process of obtaining a stable and accurate camera pose by using visual information and a constructed visual map. For example, a robot obtains its current camera pose by matching image information with feature map information.

[0003] Visual odometry (VO) is a commonly used method for visual localization. Its goal is to estimate the robot's motion based on captured images. Specifically, it is a method of incrementally acquiring the robot's pose by analyzing a series of image sequences to determine the robot's orientation and attitude, that is, to estimate the changes in the robot's position and attitude over time.

[0004] However, the VO-based visual localization method for robots suffers from insufficient localization accuracy when performing visual localization in the environment. Summary of the Invention

[0005] Therefore, it is necessary to provide a robot, a method for determining positioning information, a device, and a storage medium that can improve the accuracy of visual positioning in response to the above-mentioned technical problems.

[0006] In a first aspect, a robot is provided, which is equipped with sensors. The robot includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it performs the following steps:

[0007] Based on at least one environmental image acquired by the sensor, extract key points and feature points from each environmental image. Key points are pixels in each environmental image whose gradient change value relative to the surrounding pixels is greater than a preset value, and feature points are proper subsets of key points.

[0008] Based on the key points and feature points of each environmental image, determine the robot's initial pose information for each environmental image;

[0009] A target map is constructed based on each initial pose information and keyframe image; the keyframe image is the image in each environmental image that meets the preset conditions for environmental indicators.

[0010] Based on the target map, determine the robot's pose information corresponding to the target map.

[0011] In one embodiment, the initial pose information of the robot in each environmental image is determined based on the key points and feature points of each environmental image, including:

[0012] Based on the key points of each environmental image, obtain the first pose information of the robot for each environmental image; based on the feature points of each environmental image, obtain the second pose information of the robot for each environmental image.

[0013] Based on the first pose information and the second pose information, the initial pose information of the robot for each environmental image is determined.

[0014] In one embodiment, the initial pose information of the robot corresponding to each environmental image is determined based on the first pose information and the second pose information, including:

[0015] The first and second pose information are fused to determine the initial pose information of the robot for each environmental image.

[0016] In one embodiment, environmental metrics include map point visibility, baseline length, and illumination variation.

[0017] Before constructing the target map based on the initial pose information and keyframe images, the method also includes:

[0018] Obtain the map point visibility, baseline length, and illumination variation of each environmental image;

[0019] Keyframe images are selected from each environmental image based on the visibility of map points, baseline length, and illumination variation.

[0020] In one embodiment, keyframe images are selected from each environmental image based on the map point visibility, baseline length, and illumination variation of each environmental image, including:

[0021] Obtain visibility weights, baseline weights, and illumination weights;

[0022] The weighted sum of environmental indicators is obtained based on the visibility and visibility weight of map points in each environmental image, the baseline length and baseline weight of each environmental image, and the illumination variation and illumination weight of each environmental image.

[0023] Environmental images whose weighted sum of environmental indicators exceeds a preset value are identified as keyframe images.

[0024] In one embodiment, constructing a target map based on initial pose information and keyframe images includes:

[0025] Insert keyframe images sequentially into a preset time window;

[0026] Delete the keyframe image with the largest parallax from the time window to obtain the first keyframe image;

[0027] The second keyframe image is obtained from the keyframe images that have a co-view relationship in the first keyframe image;

[0028] The target map is obtained by updating the map points in the current target map based on the second keyframe image.

[0029] In one embodiment, obtaining a second keyframe image based on keyframe images with co-view relationships in the first keyframe image includes:

[0030] Obtain the number of projection points for each keyframe image with a co-view relationship;

[0031] The keyframe image with the largest number of projection points is determined as the second keyframe image.

[0032] Secondly, a method for determining location information is provided, the method comprising:

[0033] Based on at least one environmental image acquired by the sensor, extract key points and feature points from each environmental image; key points are pixels in each environmental image whose gradient change value relative to surrounding pixels is greater than a preset value, and feature points are proper subsets of key points;

[0034] Based on the key points and feature points of each environmental image, determine the robot's initial pose information for each environmental image;

[0035] A target map is constructed based on each initial pose information and keyframe image; the keyframe image is the image in each environmental image that meets the preset conditions for environmental indicators.

[0036] Based on the target map, determine the robot's pose information corresponding to the target map.

[0037] Thirdly, a location information determining device is provided, the device comprising:

[0038] The feature acquisition module is used to extract key points and feature points from at least one environmental image acquired by the sensor. Key points are pixels in each environmental image whose gradient change value relative to the surrounding pixels is greater than a preset value, and feature points are a proper subset of the key points.

[0039] The first pose determination module is used to determine the robot's initial pose information for each environmental image based on the key points and feature points of each environmental image.

[0040] The mapping module is used to construct a target map based on the initial pose information and keyframe images; the keyframe images are images in each environmental image that meet the preset conditions for environmental indicators.

[0041] The second pose determination module is used to determine the pose information of the robot corresponding to the target map based on the target map.

[0042] Fourthly, a computer device is provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the positioning information determination method provided in the second aspect above.

[0043] Fifthly, a readable storage medium is provided, on which a computer program is stored, wherein when the computer program is executed by a processor, the steps of the positioning information determination method provided in the second aspect are implemented.

[0044] The aforementioned robot, positioning information determination method, device, and storage medium include a robot equipped with sensors, a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it performs the following steps: extracting key points and feature points from at least one environmental image acquired by the sensors; key points are pixels in each environmental image whose gradient change value relative to surrounding pixels is greater than a preset value, and feature points are proper subsets of the key points; determining the robot's initial pose information corresponding to each environmental image based on the key points and feature points; constructing a target map based on the initial pose information and keyframe images; keyframe images are images in each environmental image whose environmental indicators meet preset conditions; and determining the robot's pose information corresponding to the target map based on the target map. In other words, when the robot moves in an environment with weak or repetitive textures, it is difficult to obtain effective feature points from the acquired environmental images. For environmental images acquired in such environments, key points are extracted from each environmental image based on the gradient change values ​​of each pixel relative to its surrounding pixels. When the robot moves in an environment with strong lighting changes, it is also difficult to obtain effective key points from the acquired environmental images. For environmental images acquired in such environments, feature points are further extracted from each environmental image based on a proper subset of the key points. Thus, based on the key points and feature points extracted from each environmental image, the initial pose information of the robot when acquiring each environmental image can be accurately determined. The target map constructed based on the initial pose information and multiple frames of environmental images has higher accuracy. In addition, after constructing the target map, this application further determines the robot's pose information corresponding to the target map, enabling the robot to move more autonomously in the environment based on the target map and its own pose information relative to the target map. Attached Figure Description

[0045] Figure 1 This is an application environment diagram of a location information determination method in one embodiment;

[0046] Figure 2 This is a flowchart illustrating a method for determining location information in one embodiment;

[0047] Figure 3This is a schematic diagram of some weak and repetitive texture scenes in one embodiment;

[0048] Figure 4 This is a flowchart illustrating the location information determination method in another embodiment;

[0049] Figure 5 This is a schematic diagram of the front-end localization process that integrates the direct method and the feature method in one embodiment;

[0050] Figure 6 This is a flowchart illustrating a keyframe filtering method in one embodiment;

[0051] Figure 7 This is a flowchart illustrating an environment mapping method in one embodiment;

[0052] Figure 8 This is a schematic diagram illustrating the localization of feature points in an environmental image in one embodiment.

[0053] Figure 9 This is a schematic diagram illustrating the localization of feature points in another environmental image in one embodiment.

[0054] Figure 10 This is a structural block diagram of a positioning information determination device in one embodiment;

[0055] Figure 11 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation

[0056] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0057] The process of a robot achieving autonomous movement, localizing itself, and building an environmental map can be achieved using SLAM (Simultaneous Location and Mapping) technology. Visual odometry, as the front end of a SLAM system, processes environmental images based on direct or feature-based methods to determine the robot's pose and the depth of various feature points in the image, thereby constructing a local map.

[0058] The feature-based method first extracts feature points and descriptors from the image, then performs feature point matching between images to obtain multiple paired points, and finally solves the camera pose based on these paired points. By using the extracted feature points and descriptors, the processed image exhibits good rotation invariance and good illumination stability, making this method more robust in scenarios with high real-time requirements. However, visual odometry based on the feature method can only rely on a limited number of feature points. When there is little texture or high texture repetition in the environment, and the number of extracted feature points is less than a certain threshold, this method will fail.

[0059] The direct method does not require descriptor calculation; it directly estimates the motion of the camera-mounted subject based on the pixel information of the image. It calculates the camera pose and the position of image points by minimizing the photometric error of certain pixels. This photometric error is the objective function to be minimized, and is typically determined by the errors between images. The direct method is based on a strong assumption of grayscale invariance, and when ambient lighting changes are not significant, it exhibits stronger robustness than the feature-based method in weakly textured regions. However, visual odometry based on the direct method relies on limited lighting variations; when lighting changes drastically, this method often results in localization loss.

[0060] Given that visual odometry based on the direct method and visual odometry based on the feature method suffer from poor positioning accuracy or even positioning loss in some special environments, this application integrates the direct method and the feature method, combining the advantages of both methods to make up for their respective shortcomings.

[0061] Based on this, this application provides a robot, a method for determining localization information, an apparatus, and a storage medium. The robot is equipped with sensors and includes a memory and a processor, which executes the aforementioned method for determining localization information. This method aims to integrate direct methods with feature-based methods to improve the robot's localization accuracy in environments with weak textures, repetitive textures, and strong lighting variations.

[0062] The positioning information determination method provided in this application can be applied to robots, which can be any type of terminal device capable of autonomous movement in an environment.

[0063] The internal structure of the robot is as follows: Figure 1As shown, the sensors in this internal structure are used to acquire environmental images, and the processor is used to extract feature points and key points from each environmental image, determine the robot's pose information when acquiring each environmental map, and construct a target map based on the multi-frame environmental maps acquired by the robot. The memory in this internal structure includes a non-volatile storage medium and internal memory. The non-volatile storage medium stores the operating system, computer programs, and a database; the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database is used to store relevant data from each environmental image acquired by the robot. The network interface is used for communication with external terminals via a network connection. When the computer program is executed by the processor, it implements the positioning information determination method provided in this application.

[0064] In one embodiment, a robot is provided, the robot including a memory, a processor, and a computer program stored in the memory and executable on the processor, such as... Figure 2 As shown, when a processor executes a computer program, it performs the following steps:

[0065] Step 210: Based on at least one environmental image acquired by the sensor, extract key points and feature points from each environmental image; key points are pixels in each environmental image whose gradient change value relative to surrounding pixels is greater than a preset value, and feature points are proper subsets of key points.

[0066] It should be noted that the sensors installed in the robot are used to collect environmental images of the surrounding environment during the robot's movement. These sensors can be monocular cameras, binocular cameras, or a combination of monocular and binocular cameras. This application embodiment does not impose any restrictions on this.

[0067] It should be understood that when a robot moves in its environment, it takes pictures of the environment from different positions and postures to acquire environmental images. The robot then constructs an environmental map based on these images. Furthermore, to achieve autonomous movement, the robot needs to determine its specific posture within the surrounding environment. Based on its posture and the constructed environmental map, it plans a movement path and moves along that path to reach the desired location.

[0068] As an example, see Figure 3 In some weak textures (such as Figure 3 The white wall in example a), repeating textures (e.g.) Figure 3 Example b: a long corridor), with strong changes in lighting (e.g., Figure 3In environments such as the ceiling (example c), it is difficult to extract effective feature points from the environmental images acquired by the robot. Feature-based visual odometry is difficult to operate in such environments. However, the direct method does not rely on feature points. As long as there are pixel gradients in the image, it can run smoothly in real time. In environments with weak or repetitive textures, even if there are very few feature points, it can still bring some constraints to the front end of the visual odometry and provide reliable data association, thereby greatly improving the robustness and accuracy of the entire SLAM system.

[0069] In practical implementation, extracting keypoints using the DSO (Direct Sparse Odometry) algorithm only requires knowledge of the grayscale gradients present in the grayscale image, making it easier than extracting feature points using the ORB (Oriented Fast and Rotated BRIEF) algorithm. Therefore, for environmental images acquired by the robot, keypoints are first extracted using DSO, and then feature points are extracted using ORB based on these keypoints. The feature points are a proper subset of the keypoints.

[0070] In one possible implementation, step 210 can be performed as follows: For each environmental image acquired by the sensor, key points are extracted from each environmental image using DSO (Discrete Segmentation), and feature points are extracted from the environmental image corresponding to the extracted key points using ORB (Organic Point Extraction). That is, the key points and feature points have a corresponding relationship within the same environmental image. The key points extracted by DSO are pixels in the same environmental image whose gradient change value relative to surrounding pixels is greater than a preset value. The feature points extracted by ORB for each environmental image are a proper subset of the key points.

[0071] It's important to note that DSO (Visual Odometry) is a sparse direct method, not a complete SLAM (Single-Loop Detection and Map Reuse). It lacks loop closure detection and map reuse capabilities. Therefore, the direct method inevitably introduces accumulated errors, which, although small, cannot be eliminated. ORB-based SLAM, on the other hand, is a pure feature-based method, where the calculation results depend entirely on feature matching.

[0072] Step 220: Determine the initial pose information of the robot for each environmental image based on the key points and feature points of each environmental image.

[0073] ORB calculates the robot's initial pose and the positions of map points in the environment image by minimizing the reprojection error; while DSO calculates the robot's initial pose and the positions of map points in the environment image by minimizing the photometric error. The photometric error is usually determined by the errors between images, rather than the geometric errors after reprojection.

[0074] DSO places data association and pose estimation in a unified nonlinear optimization problem, while ORB solves it step by step: first, it finds the association between data points by matching feature points, and then estimates the pose based on the association. These two steps are usually independent. In the second step, the reprojection error can be used to identify outliers in the data association and can also be used to correct the matching results.

[0075] Therefore, in step 220, based on the key points in the current environmental image and the continuously acquired previous environmental image, the initial pose information of the robot when acquiring the environmental image is determined by DSO; based on the matched feature points in the current environmental image and the continuously acquired previous environmental image, the initial pose information of the robot when acquiring the current environmental image is determined by ORB.

[0076] Furthermore, based on the initial pose information determined by the DSO and ORB methods, the initial pose information of the robot in each environmental image is determined.

[0077] Step 230: Construct a target map based on the initial pose information and keyframe images.

[0078] Among them, the keyframe image is the image in each environmental image that meets the preset conditions for environmental indicators.

[0079] In one possible implementation, for keypoints (feature matching is not possible because descriptors are not calculated), assuming a keypoint P1 in environment image 1, the corresponding keypoint P2 in environment image 2 is found based on the robot's initial pose estimate (theoretically, P1 and P2 are projections of the same map point onto the two images, but due to observation errors, the photometric values ​​of the two keypoints will differ). Then, the photometric error is minimized to obtain the optimal P2 point, and the matching result (P1-P2 is a matching pair) is saved. For the keypoints (including descriptors), feature matching is performed, directly obtaining the P1-P2 matching pair, and the resulting matching result.

[0080] Using the robot's initial pose information, the world coordinates of feature points are initially calculated (using only adjacent keyframes); then, a direct method map point is created using a depth filter (where the depth filter uses every ordinary image, i.e., both ordinary frames and keyframes need to be used), providing initial values ​​for subsequent optimization calculations and tracking the local map.

[0081] The world coordinates of the feature points obtained above are used to provide initial values ​​for reprojection error optimization. A depth filter is used to obtain the depth of the keypoints, acquiring the coordinates and depth of each feature point in the robot coordinate system, thus calculating the 3D coordinates of the keypoints in the robot coordinate system. Furthermore, by observing the robot's pose changes, the 3D coordinates of the keypoints in the world coordinate system can be deduced, and the 3D coordinates of the keypoints are optimized using photometric error.

[0082] In another possible implementation, for feature points, the Hamming distance between the BRIEF (Binary Robust Independent Elementary Features) descriptors of two adjacent keyframes is first calculated. A BRIEF is a binary descriptor whose description vector consists of many 0s and 1s, where the 0s and 1s encode the size relationship between two pixels near the keypoint (e.g., p and q): if p is smaller than q, a 1 is used; otherwise, a 0 is used. Further, FLANN (Fast Library for Approximate Nearest Neighbors) is used for feature point matching. Further, a parallel thread simultaneously calculates the homography matrix H and the fundamental matrix F, and calculates the judgment model score RH. If RH is greater than a threshold, the homography matrix H is selected; otherwise, the fundamental matrix F is selected. Based on the selected model, the robot's initial pose and the 3D coordinates of each feature point in the environmental image are estimated.

[0083] In other words, at the back end of the hybrid odometry system, a target map is constructed based on the robot's initial pose information and keyframe images.

[0084] Understandably, the target map here needs to be continuously optimized, that is, while adding map points, redundant map points that already exist in the target map are also removed, the viewing angle is expanded, and obstruction is prevented.

[0085] Step 240: Based on the target map, determine the robot's pose information corresponding to the target map.

[0086] It should be noted that the robot's pose is constantly changing during movement. After building a map based on previously collected environmental images, the robot needs to clarify its pose corresponding to the target map again, and then plan a path based on the current pose, continue to move, and collect environmental images to improve the previously built target map.

[0087] In one possible implementation, based on the target map, the robot determines its pose information corresponding to the target map by using the inverse depth of points in the latest keyframe and tracking frame.

[0088] In this embodiment, key points and feature points of each environmental image are obtained based on at least one environmental image acquired by the sensor. Key points are pixels in each environmental image whose gradient change value relative to surrounding pixels is greater than a preset value, and feature points are proper subsets of key points. Based on the key points and feature points of each environmental image, the initial pose information of the robot corresponding to each environmental image is determined. A target map is constructed based on the initial pose information and keyframe images. Keyframe images are images in each environmental image whose environmental indicators meet preset conditions. Based on the target map, the pose information of the robot corresponding to the target map is determined. That is, when the robot moves in an environment with weak or repetitive textures, it is difficult to obtain effective feature points from the acquired environmental images. For environmental images acquired in such environments, key points are extracted from each environmental image based on the gradient change value of each pixel relative to surrounding pixels. When the robot moves in an environment with strong lighting changes, it is difficult to obtain effective key points from the acquired environmental images. For environmental images acquired in such environments, feature points in each environmental image are further extracted based on a proper subset of the key points in the environmental images. Thus, based on the key points and feature points extracted from each environmental image, the initial pose information of the robot when collecting each environmental image can be accurately determined, and the target map constructed based on the initial pose information and environmental images has higher accuracy. Furthermore, after constructing the target map, this application further determines the robot's pose information corresponding to the target map, enabling the robot to move more autonomously in the environment based on the target map and its own pose information relative to the target map.

[0089] Based on the above embodiments, for environmental images acquired by the robot, the initial pose information of the robot can be obtained using ORB or DSO. Since this application uses a hybrid odometry that integrates ORB and DSO for front-end localization, it is necessary to comprehensively determine the robot's initial pose information based on the initial pose information obtained from DSO and the initial pose information obtained from ORB.

[0090] In one embodiment, such as Figure 4 As shown, the process of determining the robot's initial pose information for each environmental image (corresponding to step 220 above) based on the key points and feature points of each environmental image includes the following steps:

[0091] Step 410: Based on the key points of each environmental image, obtain the first pose information of the robot for each environmental image; based on the feature points of each environmental image, obtain the second pose information of the robot for each environmental image.

[0092] In one possible implementation, for key points in each environmental image, the robot's first pose information in each environmental image is determined by minimizing photometric error, and for feature points in each environmental image, the robot's second pose information in each environmental image is determined by minimizing reprojection error.

[0093] Step 420: Determine the initial pose information of the robot for each environmental image based on the first pose information and the second pose information.

[0094] In one possible implementation, step 420 can be performed by: performing pose fusion processing on the first pose information and the second pose information to determine the initial pose information of the robot corresponding to each of the environmental images.

[0095] If T1 is the pose obtained by DSO and T2 is the pose obtained by ORB, then pose fusion processing is performed on the first pose information and the second pose information, that is, the weighted average of T1 and T2 is taken to obtain the robot's initial pose T.

[0096] As an example, T can be determined using the following formula (1):

[0097] T = T1(T1 -1 T2) ω (1)

[0098] Here, ω is a weighting factor, which is related to the number of feature points extracted. When ω is 1, the SLAM system degenerates into ORB, and when ω is 0, the SLAM system degenerates into DSO.

[0099] It should be noted that this pose fusion method belongs to a loosely coupled strategy between DSO and ORB. A more advanced method is a tightly coupled strategy, which optimizes the photometric error of DSO and the reprojection error of ORB together. The former is simpler and easier to implement, while the latter is more accurate but more complex to implement. In practical applications, the weighting factor should be set according to the actual situation, and this application does not impose any restrictions on this.

[0100] In this embodiment, based on key points in each environmental image, the robot's first pose information for each environmental image is obtained through DSO (Displacement Positioning). Based on feature points in each environmental image, the robot's second pose information for each environmental image is obtained through ORB (Oriented Positioning Brush). Furthermore, the first and second pose information are fused to determine the robot's initial pose. Thus, by combining ORB and DSO, the determined robot pose information is more accurate.

[0101] Based on the above Figure 2 and Figure 4 For a corresponding embodiment, see Figure 5This paper describes the front-end positioning process that integrates DSO and ORB in this application.

[0102] For each frame of environmental image acquired by a sensor (e.g., a monocular camera), keypoints are extracted from the image using DSO (Discrete Signal Occlusion), and then a proper subset of the keypoints is extracted using ORB (Organic Reflection Block), yielding feature points. For the keypoints in the environmental image, pose T1 is obtained by minimizing photometric error; for the feature points in the environmental image, pose T2 is obtained by minimizing reprojection error. A weighted average of poses T1 and T2 is then performed to obtain pose T, achieving local optimization of the direct method poses T1 and T2. Furthermore, pose T is used as the robot's initial pose information.

[0103] It should be noted that pose includes the robot's coordinates and orientation information.

[0104] Furthermore, after completing front-end localization—that is, after determining its own pose—the robot needs to build a map to achieve autonomous movement. This means determining the specific locations of various map points in its environment. In other words, only by knowing the distribution of obstacles in its environment can the robot adjust its pose to move along a planned safe path and avoid collisions with obstacles.

[0105] In one possible implementation, a target map is constructed based on the robot's initial pose information and keyframe images. This target map can be a local map or a global map obtained through continuous post-processing and updates.

[0106] It should be noted that the environmental images collected by the robot may contain overlapping environmental information or discontinuous environmental information (for example, the environmental images collected when the robot is drifting have almost zero common viewpoints with the environmental images collected under normal circumstances). If the environmental images are not preprocessed and the map is built directly, the constructed target map will contain a lot of redundant information, the map points in the target map will have low accuracy, and it will not be able to better reflect the actual environment in which the robot is located.

[0107] When constructing a target map, ORB-based visual odometry employs a mechanism to filter redundant keyframe images during local map construction. Therefore, new keyframe images need to be inserted as quickly as possible to ensure the tracking thread is more robust to camera motion, especially rotational motion. Thus, the inserted keyframe images are selected based on the following criteria:

[0108] (1) Since the last global relocation, 20 frames of environmental images have been collected;

[0109] (2) The local map construction is in an idle state, or more than 20 environmental images have been collected since the previous keyframe image was inserted;

[0110] (3) The current environmental image tracks fewer than 50 map points;

[0111] (4) The number of map points tracked in the current environment image is less than 90% of the number of map points observed in the reference keyframe.

[0112] However, the localization front end of this application achieves robot localization by fusing DSO and ORB, which involves the impact of changes in luminance in the environmental image on DSO. The keyframe image selection strategy applied to ORB mentioned above is no longer applicable, and a new keyframe image selection strategy needs to be developed.

[0113] In one embodiment, this application provides a keyframe filtering method, which can be applied to step 230 above to filter keyframe images from multiple environmental images collected by the robot, and use the filtered keyframe images to construct a target map, instead of all environmental images collected by the robot.

[0114] Keyframe images are those in which environmental metrics meet preset conditions. As an example, environmental metrics include map point visibility, baseline length, and illumination variation.

[0115] In one possible implementation, such as Figure 6 As shown, the keyframe filtering method includes the following steps:

[0116] Step 610: Obtain the map point visibility, baseline length, and illumination variation of each environmental image;

[0117] The visibility of map points reflects the change in the number of visible points in the latest keyframe image relative to the tracking frame image.

[0118] As an example, map point visibility can be determined using the following formula (2):

[0119]

[0120] Where N is the total number of visible points in the latest keyframe, p z p z '' represents the inverse point depth of the latest keyframe and the tracking frame, respectively. The inverse point depth is the reciprocal of the depth, and its function is to create more keyframe images when the camera moves closer to an object.

[0121] Wherein, the baseline length of the environmental image is the inter-frame displacement t and the average inverse depth on the tracking local map. The ratio is such that its effect is equivalent to a translation distance that is too large to be seen from the line of sight;

[0122] As an example, the baseline length of an environmental image can be determined using the following formula (3):

[0123]

[0124] Among them, the illumination change of the environmental image is used to reflect the illumination change of the latest keyframe image relative to the tracking frame image.

[0125] As an example, the amount of illumination change in an environmental image can be determined using the following formula (4):

[0126] S a =|a k -a i | (4)

[0127] Among them, a k a represents the illumination of the tracked frame image. i This indicates the illumination level of the latest keyframe image.

[0128] Step 620: Select keyframe images from each environmental image based on the map point visibility, baseline length, and illumination variation.

[0129] In one possible implementation, step 620 can be implemented as follows: obtain visibility weight, baseline weight, and illumination weight; obtain a weighted sum of environmental indicators based on the map point visibility and visibility weight of each environmental image, the baseline length and baseline weight of each environmental image, and the illumination change and illumination weight of each environmental image; and determine the environmental images whose weighted sum of environmental indicators is greater than a preset value as keyframe images.

[0130] As an example, whether an environment image is a keyframe image can be determined by the following formula (5).

[0131] w u S u +w a S a +w t S t >1 (5)

[0132] Among them, w u w is the visibility weight. a For the baseline weights, w t The illumination weights, visibility weights, baseline weights, and illumination weights can be preset to values ​​less than 1. The specific values ​​in this application embodiment are not limited.

[0133] In this embodiment, target frame images are selected from the environmental images collected by the robot by using the map point visibility and visibility weight of each environmental image, the baseline length and baseline weight of each environmental image, and the illumination variation and illumination weight of each environmental image. The target frame images are then used for mapping, which improves the mapping accuracy while reducing the amount of data processing in the mapping process, resulting in higher mapping efficiency.

[0134] Based on the above embodiments, after the hybrid visual odometry front end that integrates DSO and ORB determines the robot's initial pose information and constructs the target map, it is not only necessary to determine the pose information of the robot corresponding to the target map, but also to update the constructed target map during the robot's movement, adding new map points while removing the original redundant information.

[0135] In ORB-based visual odometry, the co-view graph is an undirected weighted graph where each node is a keyframe. If two keyframes satisfy a certain co-view relationship (at least 15 jointly observed map points), they are connected by an edge, and the weight of the edge is the number of co-view map points. The co-view strategy is very important and can be considered one of the core algorithms of ORB-based SLAM systems.

[0136] This application integrates DSO and ORB positioning, and the keyframe selection strategy is different from that of ORB and DSO. Therefore, neither the sliding window optimization used in DSO nor the co-view optimization used in ORB is applicable to the target map creation process of this application.

[0137] Based on this, this application establishes a combined mechanism of co-view (for ORB) and sliding window (for DSO), providing a local map co-view window, which consists of time and co-view strategy, both of which are relative to the latest keyframe.

[0138] The primary purpose of the time window is to optimize map points in newly added keyframes and ensure the accuracy of the hybrid odometry. A fixed time window size is used, and keyframes with the largest parallax are removed from the time window. The co-view window consists of keyframe images that co-view with the time window. Map points are optimized by using keyframes with map point projections in the blank areas of the latest keyframes to remove points larger than a certain viewing angle, thus preventing occlusion and other issues.

[0139] As an example, firstly, multiple time-series keyframe images are acquired from the environment images in chronological order. Then, based on the co-view relationships between the environment images and the multiple time-series keyframe images, multiple co-view keyframe images are further selected from the environment images. Further, these co-view keyframes are inserted into the multiple time-series keyframe images. After the time-series keyframe images and the co-view keyframe images are arranged in a new order, the keyframe images that constitute the aforementioned local map co-view window are obtained.

[0140] Thus, based on the local map shared view window, the selected keyframe images are added sequentially to the local map shared view window, and then the target map is constructed based on the keyframe images in the local map shared view window.

[0141] Based on the aforementioned local map shared view window, in one embodiment, such as Figure 7 As shown, the process of constructing the target map based on the initial pose information and keyframe images (step 230 above) includes the following steps:

[0142] Step 710: Insert the keyframe images sequentially into the preset time window.

[0143] The size of the time window is preset. For example, when the preset time window size is 10, when the keyframe images selected from the front end are added to the time window, while adding new keyframe images, old keyframe images with the same data will also be removed to ensure that the size of the time window remains unchanged.

[0144] It should be noted that the two previous keyframe images are always retained in this time window. Images other than these two old keyframe images can be removed from this time window to add new keyframe images. Each time a keyframe image is added, steps 710-740 need to be executed once, and this process is repeated until no new keyframe images can be added to the time window.

[0145] In this way, the accuracy of hybrid vision odometry can be guaranteed even in challenging environments (such as when feature points in environmental images collected by the robot rotate significantly), avoiding premature fixation of keyframe images and ensuring that map points in the target map are fully optimized.

[0146] Step 720: Delete the keyframe image with the largest parallax from the time window to obtain the first keyframe image.

[0147] The disparity of the keyframe image can be determined by the following formula (6):

[0148]

[0149] Where I represents the keyframe image, and subscripts 1 and 2 indicate the order of the keyframe images within the time window; N represents the size of the time window, and t has no specific meaning. t Here, j refers to the time window; j is the number of keyframe images; and d is the baseline length of the keyframe images.

[0150] As an example, if the time window is 5, the disparity of each of the 5 keyframe images is determined, and the keyframe image with the largest disparity is removed from the time window. The remaining 4 keyframe images are the first keyframe images.

[0151] Step 730: Obtain the second keyframe image based on the keyframe images with co-viewing relationships in the first keyframe image.

[0152] In one possible implementation, the number of projection points of each keyframe image with a co-view relationship is obtained, and the keyframe image with the largest number of projection points is determined as the second keyframe image.

[0153] Step 740: Update the map points in the current target map based on the second keyframe image to obtain the target map.

[0154] In this step, the map points in the current target map are updated based on the map points in the second keyframe image to obtain the target map.

[0155] Updating map points in the current target map can be done by adding new map points to the current target map to broaden the viewpoint of the target map; updating map points in the current target map can also be done by deleting redundant map points from the current target map to prevent occlusion and other issues.

[0156] In this embodiment, the keyframe images are further filtered based on the parallax and the number of projection points through the local map common viewing window provided in this application. Then, the map points in the current target map are updated based on the robot pose corresponding to the second keyframe image and the final filtered second keyframe image, so as to obtain the target map with higher mapping accuracy.

[0157] In addition, this application also provides a method for determining location information, the method comprising:

[0158] Based on at least one environmental image acquired by the sensor, extract key points and feature points from each environmental image; key points are pixels in each environmental image whose gradient change value relative to surrounding pixels is greater than a preset value, and feature points are proper subsets of key points;

[0159] Based on the key points and feature points of each environmental image, determine the robot's initial pose information for each environmental image;

[0160] A target map is constructed based on each initial pose information and keyframe image; the keyframe image is the image in each environmental image that meets the preset conditions for environmental indicators.

[0161] Based on the target map, determine the robot's pose information corresponding to the target map.

[0162] The positioning information determination method provided in this embodiment is similar in principle and technical effect to the steps executed by the processor in the robot described above when implementing the above steps, and will not be repeated here.

[0163] It should be understood that although the steps in the above embodiments are shown sequentially as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some of the steps in the above embodiments may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages in other steps.

[0164] Based on the positioning information determination method provided in this application, the data type input through the visual SLAM system is an image. Data acquired from the real world is also real-time environmental images obtained through a camera. At this level, real-world experiments and dataset simulation experiments are essentially the same. Furthermore, publicly available datasets are generally collected by professionals using professional cameras, taking into account various factors more comprehensively. Therefore, this application also uses publicly available datasets for simulation experiments to replace real-world experiments, verifying the effectiveness of the technical solution provided in this application.

[0165] Tested on the Euroc MH 01-04 dataset, the ORB-based visual odometry works correctly on the MH 01-03 dataset, but on the MH 04 difficulty dataset, in a repetitive texture scene, such as... Figure 8 There is a possibility of location loss, meaning that effective feature points cannot be extracted. For example... Figure 9 As shown, the hybrid visual odometry that integrates ORB and DSO in this application can still detect some key points, allowing the system to continue working.

[0166] It should be noted that the ORB-based SLAM system has a three-thread structure, including a tracking thread, a mapping thread, and a loop closure detection thread. The tracking and mapping threads are part of the front-end. To ensure fairness, the loop closure detection thread was disabled in the ORB-based SLAM system. The hybrid visual odometry proposed in this application was compared with the ORB-based visual odometry after adding the DSO front-end. The experimental results are shown in Table 1, comparing the median absolute trajectory error (ATE) for different VOs. The average of ten experimental results was used.

[0167] Table 1

[0168]

[0169] The absolute trajectory error is the direct difference between the estimated pose and the true pose, and it can intuitively reflect the algorithm's accuracy and global trajectory consistency. The median error, compared to the average error, more accurately reflects the overall system performance because averaging the errors masks some excessively large values.

[0170] As shown in Table 1, the hybrid visual odometry that integrates DSO and ORB does not exhibit an increase in median error compared to ORB-based visual odometry. In other words, the hybrid visual odometry that integrates DSO and ORB in this application, while maintaining the positioning accuracy of the original SLAM system, can adapt to more scenarios, especially weakly textured scenarios, such as... Figure 8 and 9 As shown, in scenarios where ORB-based visual odometry fails to achieve effective localization and localization is lost, the hybrid visual odometry that integrates DSO and ORB can still function.

[0171] In one embodiment, such as Figure 10 As shown, a positioning information determination device 1000 is provided, comprising: a feature acquisition module 1010, a first pose determination module 1020, a mapping module 1030, and a second pose determination module 1040, wherein:

[0172] The feature acquisition module 1010 is used to extract key points and feature points of each environmental image based on at least one environmental image acquired by the sensor; key points are pixels in each environmental image whose gradient change value relative to the surrounding pixels is greater than a preset value, and feature points are a proper subset of the key points;

[0173] The first pose determination module 1020 is used to determine the initial pose information of the robot for each environmental image based on the key points and feature points of each environmental image.

[0174] The mapping module 1030 is used to construct a target map based on each initial pose information and keyframe image; the keyframe image is an image in each environmental image that meets preset conditions for environmental indicators.

[0175] The second pose determination module 1040 is used to determine the pose information of the robot corresponding to the target map based on the target map.

[0176] In one embodiment, the first pose determination module 1020 includes:

[0177] The first acquisition unit is used to acquire the first pose information of the robot corresponding to each environmental image based on the key points of each environmental image; and to acquire the second pose information of the robot corresponding to each environmental image based on the feature points of each environmental image.

[0178] The first determining unit is used to determine the initial pose information of the robot for each environmental image based on the first pose information and the second pose information.

[0179] In one embodiment, the first determining unit is further configured to:

[0180] The first and second pose information are fused to determine the initial pose information of the robot for each environmental image.

[0181] In one embodiment, environmental metrics include map point visibility, baseline length, and illumination variation.

[0182] The device 1000 also includes:

[0183] The environmental indicator acquisition module is used to acquire the map point visibility, baseline length, and illumination variation of each environmental image.

[0184] The keyframe filtering module is used to filter keyframe images from each environmental image based on the map point visibility, baseline length, and illumination variation of each environmental image.

[0185] In one embodiment, the keyframe filtering module includes:

[0186] The second acquisition unit is used to acquire visibility weight, baseline weight, and illumination weight.

[0187] The third acquisition unit is used to acquire a weighted sum of environmental indicators based on the visibility and visibility weight of map points in each environmental image, the baseline length and baseline weight of each environmental image, and the illumination change and illumination weight of each environmental image.

[0188] The second determining unit is used to determine the environmental image whose weighted sum of environmental indicators is greater than a preset value as the keyframe image.

[0189] In one embodiment, the mapping module 1030 includes:

[0190] The keyframe insertion unit is used to sequentially insert keyframe images into a preset time window;

[0191] The keyframe deletion unit is used to delete the keyframe image with the largest parallax from the time window to obtain the first keyframe image.

[0192] The keyframe co-viewing unit is used to obtain the second keyframe image based on the keyframe images with co-viewing relationships in the first keyframe image;

[0193] The map update unit is used to update the map points in the current target map based on the second keyframe image to obtain the target map.

[0194] In one embodiment, the keyframe co-viewing unit is further configured to:

[0195] Obtain the number of projection points for each keyframe image with a co-view relationship;

[0196] The keyframe image with the largest number of projection points is determined as the second keyframe image.

[0197] Specific limitations regarding the positioning information determination device can be found in the limitations of the positioning information determination method described above, and will not be repeated here. Each module in the aforementioned positioning information determination device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in the robot's processor in hardware form or independently of it, or stored in the robot's memory in software form, so that the processor can call and execute the corresponding operations of each module.

[0198] In one embodiment, a robot is provided, which can be any terminal device, and its internal structure diagram can be as follows: Figure 11As shown, the robot includes a processor, memory, communication interface, display screen, and input devices connected via a system bus. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The communication interface allows for wired or wireless communication with external terminals; wireless communication can be achieved through Wi-Fi, carrier networks, NFC (Near Field Communication), or other technologies. When executed by the processor, the computer program implements a positioning information determination method. The robot's display screen can be an LCD screen or an e-ink screen. The input devices can be a touch layer covering the display screen, buttons, a trackball, or a touchpad mounted on the robot's shell, or an external keyboard, touchpad, or mouse.

[0199] Those skilled in the art will understand that Figure 11 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the robot to which the present application is applied. A specific robot may include more or fewer parts than shown in the figure, or combine certain parts, or have different part arrangements.

[0200] In one embodiment, a computer device is provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to perform the following steps:

[0201] Based on at least one environmental image acquired by the sensor, extract key points and feature points from each environmental image; key points are pixels in each environmental image whose gradient change value relative to surrounding pixels is greater than a preset value, and feature points are proper subsets of key points;

[0202] Based on the key points and feature points of each environmental image, determine the robot's initial pose information for each environmental image;

[0203] A target map is constructed based on each initial pose information and keyframe image; the keyframe image is the image in each environmental image that meets the preset conditions for environmental indicators.

[0204] Based on the target map, determine the robot's pose information corresponding to the target map.

[0205] The computer device provided in this embodiment implements the above steps in a similar manner to the above method embodiments, and will not be described again here.

[0206] In one embodiment, a readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, performs the following steps:

[0207] Based on at least one environmental image acquired by the sensor, extract key points and feature points from each environmental image; key points are pixels in each environmental image whose gradient change value relative to surrounding pixels is greater than a preset value, and feature points are proper subsets of key points;

[0208] Based on the key points and feature points of each environmental image, determine the robot's initial pose information for each environmental image;

[0209] A target map is constructed based on each initial pose information and keyframe image; the keyframe image is the image in each environmental image that meets the preset conditions for environmental indicators.

[0210] Based on the target map, determine the robot's pose information corresponding to the target map.

[0211] The computer-readable storage medium provided in this embodiment implements the above steps in a similar manner to the method embodiments described above, and will not be repeated here.

[0212] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include ROM (Read-Only Memory), magnetic tape, floppy disk, flash memory, or optical storage, etc. Volatile memory can include RAM (Random Access Memory) or external cache memory. By way of illustration and not limitation, RAM can be in various forms, such as SRAM (Static Random Access Memory) or DRAM (Dynamic Random Access Memory), etc.

[0213] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0214] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this patent application should be determined by the appended claims.

Claims

1. A robot equipped with sensors, characterized in that, The robot includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it performs the following steps: Based on at least one environmental image acquired by the sensor, key points and feature points of each environmental image are extracted; the key points are pixels in each environmental image whose gradient change value relative to surrounding pixels is greater than a preset value, and the feature points are a proper subset of the key points; Based on the key points and feature points of each environmental image, the initial pose information of the robot corresponding to each environmental image is determined; Construct a target map based on the initial pose information and keyframe images; The keyframe image is an image in which the environmental indicators of each of the environmental images meet the preset conditions; Based on the target map, determine the robot's pose information corresponding to the target map; The step of constructing a target map based on the initial pose information and keyframe images includes: sequentially inserting the keyframe images into a preset time window; deleting the keyframe image with the largest disparity from the time window to obtain a first keyframe image; obtaining a second keyframe image based on the keyframe images with co-view relationships in the first keyframe image; and updating the map points in the current target map based on the second keyframe image to obtain the target map.

2. The robot according to claim 1, characterized in that, The step of determining the initial pose information of the robot corresponding to each of the environmental images based on the key points and feature points of each environmental image includes: Based on the key points of each environmental image, the first pose information of the robot corresponding to each environmental image is obtained; based on the feature points of each environmental image, the second pose information of the robot corresponding to each environmental image is obtained. Based on the first pose information and the second pose information, the initial pose information of the robot corresponding to each of the environmental images is determined.

3. The robot according to claim 2, characterized in that, Determining the initial pose information of the robot for each of the environmental images based on the first pose information and the second pose information includes: The first pose information and the second pose information are subjected to pose fusion processing to determine the initial pose information of the robot corresponding to each of the environmental images.

4. The robot according to any one of claims 1-3, characterized in that, The environmental metrics include map point visibility, baseline length, and illumination variation; wherein, map point visibility reflects the change in visible points in the latest keyframe image relative to the tracking frame image, and is calculated using the following formula: in, For map point visibility; This represents the total number of visible points in the latest keyframe. , These are the inverse depths of the points in the latest keyframe and the tracking frame, respectively. Before constructing the target map based on the initial pose information and keyframe images, the method further includes: Obtain the map point visibility, baseline length, and illumination variation of each of the environmental images; The keyframe images are selected from each of the environmental images based on the map point visibility, baseline length, and illumination variation of each environmental image.

5. The robot according to claim 4, characterized in that, The step of selecting the keyframe images from each of the environmental images based on the map point visibility, baseline length, and illumination variation includes: Obtain visibility weights, baseline weights, and illumination weights; Based on the map point visibility and visibility weight of each environmental image, the baseline length and baseline weight of each environmental image, and the illumination change and illumination weight of each environmental image, a weighted sum of environmental indicators is obtained. The environmental image whose weighted sum of environmental indicators is greater than a preset value is determined as the keyframe image.

6. The robot according to claim 1, characterized in that, Based on the target map, the pose information of the robot corresponding to the target map is determined, including: Based on the target map, the pose information of the robot corresponding to the target map is determined according to the inverse depth of the points in the latest keyframe and tracking frame.

7. The robot according to claim 1, characterized in that, The step of obtaining the second keyframe image based on the keyframe images with co-view relationships in the first keyframe image includes: Obtain the number of projection points of each keyframe image with co-view relationship; The keyframe image with the largest number of projection points is determined as the second keyframe image.

8. A method for determining location information, characterized in that, The method includes: Based on at least one environmental image acquired by the sensor, key points and feature points of each environmental image are extracted; the key points are pixels in each environmental image whose gradient change value relative to the surrounding pixels is greater than a preset value, and the feature points are a proper subset of the key points; Based on the key points and feature points of each environmental image, determine the robot's initial pose information for each environmental image; A target map is constructed based on the initial pose information and keyframe images, wherein the keyframe images are images in the environmental images in which the environmental indicators meet preset conditions. Based on the target map, determine the robot's pose information corresponding to the target map; The step of constructing a target map based on the initial pose information and keyframe images includes: sequentially inserting the keyframe images into a preset time window; deleting the keyframe image with the largest disparity from the time window to obtain a first keyframe image; obtaining a second keyframe image based on the keyframe images with co-view relationships in the first keyframe image; and updating the map points in the current target map based on the second keyframe image to obtain the target map.

9. A positioning information determining device, characterized in that, The device includes: The feature acquisition module is used to extract key points and feature points of each environmental image based on at least one environmental image acquired by the sensor; the key points are pixels in each environmental image whose gradient change value relative to the surrounding pixels is greater than a preset value, and the feature points are a proper subset of the key points; The first pose determination module is used to determine the robot's initial pose information corresponding to each of the environmental images based on the key points and feature points of each environmental image; The mapping module is used to construct a target map based on the initial pose information and keyframe images; the keyframe images are images in the environmental images whose environmental indicators meet preset conditions. The second pose determination module is used to determine the pose information of the robot corresponding to the target map based on the target map. The mapping module includes: A keyframe insertion unit is used to sequentially insert the keyframe images into a preset time window; A keyframe deletion unit is used to delete the keyframe image with the largest parallax from the time window to obtain the first keyframe image. A keyframe co-viewing unit is used to obtain a second keyframe image based on keyframe images that have a co-viewing relationship in the first keyframe image; The map update unit is used to update the map points in the current target map based on the second keyframe image to obtain the target map.

10. A readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the steps of the method of claim 8.