Visual positioning method, apparatus, device, and medium

By performing feature matching and coordinate point correction on image frames in the visual positioning system, the shortcomings of visual positioning and visual inertial odometry are addressed, enabling a higher-precision positioning and update mechanism and improving the user experience.

CN116704034BActive Publication Date: 2026-06-23BEIJING BAIDU NETCOM SCI & TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING BAIDU NETCOM SCI & TECH CO LTD
Filing Date
2023-06-15
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

In visual positioning and augmented reality systems, visual positioning technology cannot cope with the irregularly changing scenes on the map, resulting in incorrect positioning results. Furthermore, long-term tracking by visual inertial odometry can cause trajectory drift, affecting the user experience.

Method used

Image retrieval and feature matching are performed by acquiring image frames. The visual pose is corrected using pre-set coordinate points, and the visual map is updated when conditions are met. The reprojection error of the visual map and coordinate points is combined for correction to improve positioning accuracy.

Benefits of technology

It improves the accuracy of visual positioning, avoids the problem of virtual positioning not aligning when the user returns to the initial position, and enhances the user experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116704034B_ABST
    Figure CN116704034B_ABST
Patent Text Reader

Abstract

The disclosure provides a visual positioning method, device, equipment and medium, relates to the technical field of artificial intelligence, further relates to the technical fields of computer vision, augmented reality, virtual reality, deep learning and the like, and can be applied to scenes such as meta universe and smart city. The specific implementation scheme is as follows: an image frame used for visual positioning is acquired; image retrieval and feature matching are performed on the image frame to determine a target visual map corresponding to the image frame; a visual pose of a camera coordinate system of the image frame in a visual map coordinate system is determined according to the target visual map; and the visual pose is corrected by using a pre-set coordinate point.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] It relates to the field of artificial intelligence technology, and further to the fields of computer vision, augmented reality, virtual reality, deep learning, etc., and can be applied to scenarios such as metaverse and smart cities. Specifically, it relates to a visual positioning method, device, equipment and medium. Background Technology

[0002] In Visual Positioning and Augmentation System (VPAS) tasks, visual positioning and visual-inertial odometry (VIO) are two key technologies. Visual positioning involves uploading images taken by the user's mobile phone to a server and performing 6-DOF positioning within a pre-built visual map. VIO, on the other hand, continuously tracks the user's 6DOF pose on the user's mobile phone for accurate localization of virtual objects in augmented reality (AR) / virtual reality (VR) scenes. Typically, visual positioning uses a lower frequency to acquire user position information to update the visual map. VIO, however, operates at a higher frequency, thus achieving continuous 6DOF pose estimation.

[0003] However, both of these key technologies have problems in VPAS tasks. For visual positioning, some scenes on the map change periodically. If a user takes a picture at these locations, the VPAS service will not be able to obtain the correct positioning result or will calculate an incorrect positioning result. Therefore, the visual map needs to be updated regularly to deal with this situation. For VIO tasks, the 6DOF pose is defined in a local coordinate system. After long-term tracking, trajectory drift will occur. That is, after the user returns to the original starting point, the estimated 6DOF pose will not be completely consistent with the starting point. This will cause virtual objects to not fit with real objects in AR / VR scenes, thus affecting the user experience. Summary of the Invention

[0004] This disclosure provides a visual positioning method, apparatus, device, and medium.

[0005] According to a first aspect of this disclosure, a visual positioning method is provided, comprising:

[0006] Acquire image frames for visual positioning;

[0007] Image retrieval and feature matching are performed on image frames to determine the target visual map corresponding to the image frames;

[0008] Determine the visual pose of the image frame in the visual map coordinate system based on the target visual map.

[0009] The visual pose is corrected using pre-set coordinate points.

[0010] In one possible implementation, the method provided in this disclosure, which determines the visual pose of the camera coordinate system of an image frame in the visual map coordinate system based on the target visual map, includes:

[0011] Based on the camera coordinate system of the image frame, the breadth of the image frame is corrected;

[0012] Based on the target visual map and the corrected image frame, determine the visual pose of the camera coordinate system of the corrected image frame in the visual map coordinate system.

[0013] In one possible implementation, the method provided in this disclosure corrects the visual pose using pre-set coordinate points, including:

[0014] Determine the pre-set coordinate points corresponding to the image frames based on the target visual map;

[0015] Determine the position of the coordinate point within the image frame;

[0016] The visual pose is corrected based on the position and pose of the coordinate point in the image frame.

[0017] In one possible implementation, the method provided in this disclosure corrects the visual pose based on the position and pose of the coordinate point in the image frame, including:

[0018] Based on the position of the coordinate point in the image frame and the pose of the coordinate point, determine the reprojection error between the coordinate point and the visual pose;

[0019] Visual pose is corrected using reprojection error.

[0020] In one possible implementation, the method provided in this disclosure further includes:

[0021] Acquire multiple target image frames within a preset time period;

[0022] When multiple target image frames meet preset conditions, the target visual map is updated using multiple target image frames.

[0023] In one possible implementation, the method provided in this disclosure updates the target visual map using multiple target image frames when multiple target image frames meet preset conditions, including:

[0024] When multiple target image frames meet preset conditions, the target visual map is determined to be ready for updating.

[0025] An initial map is obtained by using multiple target image frames and prior knowledge of pose.

[0026] The initial map is corrected using coordinate points to obtain a visual map used to update the target visual map.

[0027] According to a second aspect of this disclosure, a visual positioning device is provided, comprising:

[0028] The acquisition unit is used to acquire image frames for visual positioning.

[0029] The first determining unit is used to perform image retrieval and feature matching on the image frame to determine the target visual map corresponding to the image frame.

[0030] The second determining unit is used to determine the visual pose of the camera coordinate system of the image frame in the visual map coordinate system based on the target visual map.

[0031] The correction unit is used to correct the visual pose using pre-set coordinate points.

[0032] In one possible implementation, the second determining unit in the apparatus provided by this disclosure is specifically used for:

[0033] Based on the camera coordinate system of the image frame, the breadth of the image frame is corrected;

[0034] Based on the target visual map and the corrected image frame, determine the visual pose of the camera coordinate system of the corrected image frame in the visual map coordinate system.

[0035] In one possible implementation, the correction unit in the apparatus provided by this disclosure is specifically used for:

[0036] Determine the pre-set coordinate points corresponding to the image frames based on the target visual map;

[0037] Determine the position of the coordinate point within the image frame;

[0038] The visual pose is corrected based on the position and pose of the coordinate point in the image frame.

[0039] In one possible implementation, the correction unit in the apparatus provided by this disclosure is further configured to:

[0040] Based on the position of the coordinate point in the image frame and the pose of the coordinate point, determine the reprojection error between the coordinate point and the visual pose;

[0041] Visual pose is corrected using reprojection error.

[0042] In one possible implementation, the apparatus provided in this disclosure further includes an updating unit for:

[0043] Acquire multiple target image frames within a preset time period;

[0044] When multiple target image frames meet preset conditions, the target visual map is updated using multiple target image frames.

[0045] In one possible implementation, the updating unit in the apparatus provided by this disclosure is specifically used for:

[0046] When multiple target image frames meet preset conditions, the target visual map is determined to be ready for updating.

[0047] An initial map is obtained by using multiple target image frames and prior knowledge of pose.

[0048] The initial map is corrected using coordinate points to obtain a visual map used to update the target visual map.

[0049] According to a third aspect of this disclosure, a non-transitory computer-readable storage medium is provided storing computer instructions, wherein the computer instructions are used to cause the computer to perform the method described in any one of the first aspects.

[0050] According to a fourth aspect of this disclosure, a computer program product is provided, comprising a computer program / instructions that, when executed by a processor, implement the steps of the method described in any one of the first aspects.

[0051] In the embodiments of this disclosure, an image frame for visual positioning is first acquired, then image retrieval and feature matching are performed on the image frame to determine the target visual map corresponding to the image frame, then the visual pose of the camera coordinate system of the image frame in the visual map coordinate system is determined according to the target visual map, and finally the visual pose is corrected using pre-set coordinate points. By using the scheme of this disclosure, the image frame is positioned using the target visual map and the visual pose is corrected using pre-set coordinate points, resulting in higher accuracy of visual positioning and avoiding the situation where the virtual positioning does not overlap when the user moves to the initial position, thus improving the user experience.

[0052] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description

[0053] The accompanying drawings are provided to better understand this solution and do not constitute a limitation of this disclosure. Wherein:

[0054] Figure 1 This is a flowchart illustrating a visual positioning method provided according to an embodiment of the present disclosure;

[0055] Figure 2 This is a schematic flowchart illustrating a visual positioning method according to an embodiment of the present disclosure;

[0056] Figure 3 This is a schematic diagram illustrating the specific process of updating a map in a visual positioning method according to an embodiment of this disclosure;

[0057] Figure 4 This is a block diagram of a visual positioning device provided according to an embodiment of the present disclosure;

[0058] Figure 5 This is a block diagram of an electronic device used to implement the visual positioning method of the embodiments of this disclosure. Detailed Implementation

[0059] The exemplary embodiments of this disclosure are described below with reference to the accompanying drawings, including various details of the embodiments to aid understanding, and should be considered merely exemplary. Therefore, those skilled in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of this disclosure. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description.

[0060] The collection, storage, use, processing, transmission, provision, and disclosure of user personal information involved in the technical solution disclosed herein comply with the provisions of relevant laws and regulations and do not violate public order and good morals.

[0061] The acquisition, storage, and application of user personal information involved in the technical solution disclosed herein comply with the provisions of relevant laws and regulations and do not violate public order and good morals.

[0062] In Visual Positioning and Augmentation System (VPAS) tasks, visual positioning and visual-inertial odometry (VIO) are two key technologies. Visual positioning involves uploading images taken by the user's mobile phone to a server and performing 6-DOF positioning within a pre-built visual map. VIO, on the other hand, continuously tracks the user's 6DOF pose on the user's mobile phone for accurate localization of virtual objects in augmented reality (AR) / virtual reality (VR) scenes. Typically, visual positioning uses a lower frequency to acquire user position information to update the visual map. VIO, however, operates at a higher frequency, thus achieving continuous 6DOF pose estimation.

[0063] However, both of these key technologies have problems in VPAS tasks. For visual positioning, some scenes on the map change periodically. If a user takes a picture at these locations, the VPAS service will not be able to obtain the correct positioning result or will calculate an incorrect positioning result. Therefore, the visual map needs to be updated regularly to deal with this situation. For VIO tasks, the 6DOF pose is defined in a local coordinate system. After long-term tracking, trajectory drift will occur. That is, after the user returns to the original starting point, the estimated 6DOF pose will not be completely consistent with the starting point. This will cause virtual objects to not fit with real objects in AR / VR scenes, thus affecting the user experience.

[0064] Before providing a detailed description of the visual positioning method provided in this disclosure, let's first introduce the key terms involved, including:

[0065] VPAS: Visual Positioning and Augmentation System. It is a technology that combines a CPU, a camera, and deep learning algorithms to provide users with accurate indoor location information and augmented reality experiences.

[0066] VIO: Visual-Inertial Odometry is a technology that combines a camera and an inertial measurement unit (IMU) to track the motion of a camera or other device in three-dimensional space. VIO technology infers the motion of the camera or other device in three-dimensional space by calculating the transformation matrix between adjacent frames in real time.

[0067] EKF: Extended Kalman Filter, is a commonly used state estimation method. In EKF, the system state consists of multiple variables such as translation vectors, quaternions, and velocities, where the rotation and translation components are nonlinear.

[0068] Camera intrinsic parameter matrix: Focal length (f_x, f_y): Represents the focal length of the camera along the x and y axes, in pixels. Optical center coordinates (c_x, c_y): Represents the projected coordinates of the camera's optical center on the image plane, in pixels. Therefore, the camera intrinsic parameter matrix can be represented in the following form: K = [fx, 0, cx; 0, fy, cy; 0, 0, 1]

[0069] Pose prior: A pose prior is a prior distribution used to describe the initial or predicted state of a camera or robot, including information such as position, orientation, and velocity. The pose prior is generally obtained by inference from the IMU and previous visual data, and is often used to initialize / reset the VIO system and provide temporary position tracking when there is no stable localization.

[0070] The technical solutions provided by the embodiments of this disclosure are described below with reference to the accompanying drawings.

[0071] Figure 1 This is a flowchart illustrating a visual positioning method provided in an embodiment of this disclosure, as shown below. Figure 1 As shown, the method includes:

[0072] S110, acquire image frames for visual positioning.

[0073] In this embodiment of the disclosure, user visual positioning data, i.e. image frames, are first acquired. These image frames can be captured by a camera or other sensors. The image frames contain not only the captured image information but also parameters obtained by other sensors at that moment, such as velocity, acceleration, and gyroscope bias parameters.

[0074] S120, perform image retrieval and feature matching on the image frame to determine the target visual map corresponding to the image frame.

[0075] In this embodiment of the disclosure, image frames are first retrieved and feature matched in a database of multiple pre-stored visual maps to match the target visual map corresponding to the image frame.

[0076] S130, determine the visual pose of the camera coordinate system of the image frame in the visual map coordinate system based on the target visual map.

[0077] In this embodiment of the disclosure, by correcting the width of the image frame, the camera coordinate system positioning of the image frame is made more accurate. Then, using the corrected image frame and the target visual map, the visual pose of the camera coordinate system of the image frame in the visual map coordinate system is determined.

[0078] S140, corrects visual pose using pre-set coordinate points.

[0079] In this embodiment of the disclosure, firstly, a pre-set coordinate point corresponding to the image frame is determined according to the target visual map. Then, the position of the coordinate point in the image frame is determined. Finally, the visual pose is corrected according to the position of the coordinate point in the image frame and the pose of the coordinate point. Specifically, the reprojection error between the coordinate point and the visual pose is determined by the position of the marker in the image frame and the pose of the coordinate point. Then, the visual pose is corrected using the reprojection error.

[0080] In this embodiment of the disclosure, the target visual map can also be updated. First, multiple target image frames within a preset time period are acquired. Then, when the multiple target image frames meet preset conditions, it is determined that the target visual map needs to be updated. Next, the initial map is obtained by using the multiple target image frames through pose priors. Finally, the initial map is corrected using coordinate points to obtain a visual map used to update the target visual map.

[0081] In the embodiments of this disclosure, an image frame for visual positioning is first acquired, then image retrieval and feature matching are performed on the image frame to determine the target visual map corresponding to the image frame, then the visual pose of the camera coordinate system of the image frame in the visual map coordinate system is determined according to the target visual map, and finally the visual pose is corrected using pre-set coordinate points. By using the scheme of this disclosure, the image frame is positioned using the target visual map and the visual pose is corrected using pre-set coordinate points, resulting in higher accuracy of visual positioning and avoiding the situation where the virtual positioning does not overlap when the user moves to the initial position, thus improving the user experience.

[0082] In some embodiments of this disclosure, such as Figure 2 As shown, the specific implementation method of the visual positioning method is as follows:

[0083] S210, acquire image frames for visual positioning.

[0084] In practice, the user's visual positioning data, i.e., image frames, is first acquired. These image frames can be captured by a camera or other sensors. The image frame contains not only the captured image information but also parameters obtained from other sensors at that moment, such as velocity, acceleration, and gyroscope bias parameters. When inputting image frames, one image is sent at regular intervals, denoted as S. locate .

[0085] S220, perform image retrieval and feature matching on the image frame to determine the target visual map corresponding to the image frame.

[0086] In practice, the target visual map corresponding to the image frame is determined through steps such as image retrieval and feature matching.

[0087] S230, determine the visual pose of the camera coordinate system of the image frame in the visual map coordinate system based on the target visual map.

[0088] In practice, the breadth error of the image frames is minimized first, and then based on T... CW =[R CW ,p CW ] Calculate the 6dofT of the camera coordinate system in the visual map coordinate system for this image frame. CW That is, visual pose, where R CW p is the rotation matrix from the map coordinate system to the camera coordinate system. CW This represents the position of the camera coordinate system center within the map coordinate system.

[0089] S240, determine the pre-set coordinate points corresponding to the image frame based on the target visual map.

[0090] In practice, a pre-set coordinate point, or 3D point, is determined near the corresponding position of the image frame using a target visual map.

[0091] S250, determine the position of the coordinate point in the image frame.

[0092] In practice, the position of the coordinate point on the image frame is determined by scanning, image retrieval, feature matching, etc., so that the image frame can be associated with the preset coordinate point.

[0093] S260 corrects the visual pose based on the position and pose of the coordinate point in the image frame.

[0094] In practice, the visual pose can be determined and corrected using the Kalman filter state estimation method (EKF). Generally, the update variable x in EKF is... ekf as follows:

[0095]

[0096] x imu =[q IG ,bg,v GI ,ba,t GI ]

[0097] Where, q IG Represents the attitude quaternion (from world coordinate system to IMU coordinate system), v GI t represents the velocity of the IMU coordinate system in the world coordinate system. GI This represents the position of the IMU coordinate system in the world coordinate system, and bg and ba represent the biases of the gyroscope and accelerometer, respectively. The optimization variables in EKF include the poses of the past M frames of images: Represents the attitude quaternion (from world coordinates to the k-th frame IMU coordinates). This represents the position of the IMU coordinate system in the world coordinate system at frame k. The rotation matrix and translation vector from the IMU coordinate system to the camera coordinate system are R and R, respectively. CI ,t CI .

[0098] The x ekf By changing the states in the world coordinate system G to the states in the visual map coordinate system W, we obtain the following expression:

[0099]

[0100] x imu =[q IW ,bg,v WI ,ba,t WI ]

[0101] Then, the camera's intrinsic parameter matrix, the pose in the camera coordinate system of the kth frame, and the position of feature point i on the kth frame image are input to calculate the reprojection error of the coordinate point.

[0102] Camera intrinsic parameter matrix:

[0103]

[0104] The pose in the camera coordinate system at frame k:

[0105]

[0106]

[0107]

[0108] The position of feature point i in the k-th frame:

[0109]

[0110] The calculated residual term is: the position of the coordinate point. W P i The distance between the projection onto the image plane of the k-th frame and the original measurement. Taking the projection onto the k-th frame as an example:

[0111]

[0112]

[0113]

[0114] The reprojection error term can correct the visual pose in the EKF:

[0115] In the embodiments of this disclosure, visual positioning is corrected by adjusting the EKF parameters using pre-set coordinate points. This allows the visual positioning method of this disclosure to increase visual positioning accuracy while reducing computational load and improving user experience.

[0116] In some embodiments of this disclosure, such as Figure 3 As shown, the specific update implementation method in the visual positioning method is as follows:

[0117] S310: Acquire multiple target image frames within a preset time period.

[0118] In practice, multiple target image frames are acquired within a preset time period, which can be 5 seconds or 1 minute. The specific settings can be customized according to requirements, and this embodiment does not limit them.

[0119] S320: When multiple target image frames meet preset conditions, determine that the target visual map needs to be updated.

[0120] In practice, when multiple target image frames meet the preset conditions, that is, when it is determined that there is a discrepancy between the target visual map and the actual captured content, the target visual map needs to be updated.

[0121] Specific preset conditions can be set by the user. Preferably, an update is considered to be required if the following three conditions are met simultaneously:

[0122] Condition (1):

[0123] Calculate VIO at t k to t k+N Relative motion over time

[0124]

[0125] If the statistical relative rotation matrix Calculated axis angle > Threshold α TH relative displacement Magnitude > threshold t TH This indicates that the user is in continuous motion during this period, rather than stationary, and condition (1) is satisfied.

[0126] Condition (2):

[0127] t k to t k+N During the period, the user's mobile phone uploaded the query image [ k ,…, k+N A global index is performed on the map database of shopping mall A. The purpose of the global index is to find the M images in the map database that are most similar to the query image. For the query image [ k ,…, k+N Each image in ] i Calculate the similarity score between it and the top M most similar images in the database. average

[0128] The average similarity score for each query frame in (2) is statistically analyzed. In the middle, if at least 50% of the scores are less than the threshold s TH This indicates that there is a significant discrepancy between these location query frames and the map database.

[0129] Condition (3):

[0130] For t k to t k+N Each query image uploaded by the user's mobile phone during the period [ k ,…,k+N ], all were feature-matched with the first M images queried in (2), and the number of matches were respectively Statistical mean

[0131] Calculate the average value of matching points for each image. The value is less than the threshold m TH The number of location query frames and the table name show that there are few matches between these location query frames and the image frames in the map database, indicating that the scene changes significantly.

[0132] S330 uses multiple target image frames to obtain an initial map through pose priors.

[0133] In practice, based on the N images sent by the mobile terminal, if a visual map update is detected, the server generates a local map calculated from these N images using the pose prior provided by VIO. The coordinate system of this local map is defined as W. ′ .

[0134] Using the N images sent and their corresponding 6DOF poses, local and global features are extracted from each of the N images. Then, for each image, feature matching is performed with the w images before and after it in time sequence.

[0135] S340, use coordinate points to correct the initial map and obtain a visual map for updating the target visual map.

[0136] In practice, for all coordinate points, since VIO provides the keyframe poses of these feature points, triangulation is used to calculate the 3D position of the feature points in the world coordinate system. All successfully triangulated map points, along with the 6DOF poses of N images, constitute the initial local map. A global problem is then established to optimize the 6DOF poses of the map points and the N images.

[0137]

[0138] enter:

[0139] Camera intrinsic parameter matrix

[0140]

[0141] pose of frame i

[0142]

[0143] pose at frame i+1

[0144]

[0145] The position of coordinate point j on the i-th frame image

[0146]

[0147] The position of coordinate point j in the (i+1)th frame of the image

[0148]

[0149] Calculate the camera's projection matrix from the camera's intrinsic parameter matrix and pose matrix:

[0150] The projection matrix of the i-th frame

[0151]

[0152] Projection matrix of frame i+1

[0153]

[0154] Construct a system of linear homogeneous equations:

[0155]

[0156] Solving the above system of linear homogeneous equations yields the position of the coordinate point in the local map coordinate system:

[0157]

[0158] Finally, the image set S sent to the location server in section 1.1 will be... lccate Based on whether the location was successfully established, it is divided into two categories: the set of successfully established locations, S. locate_ Locating the failed set S locate_ .

[0159] For each successfully localized image i, its 6DOF pose in the visual map coordinate system is denoted as . Its 6DOF pose in the new local map coordinate system is denoted as Using all successfully localized image pose pairs, calculate the new local map coordinate system W. ′ The transformation relationship T between the visual map coordinate system W and the visual map coordinate system. w′W This will transfer the poses of all image frames in the new map to the coordinate system of the old map.

[0160] Using the transformation relationship between the old and new map coordinate systems T W′W The process involves transferring all map point coordinates from the new map to the old map and then adding these map points to the old map, thus completing the visual map update.

[0161] In this embodiment of the disclosure, a scheme is provided for detecting whether a visual map needs to be updated. At the same time, the visual map that needs to be updated is updated based on the image frames obtained in visual positioning, without having to update the map separately. This enhances the timeliness of visual positioning and saves the cost of updating the visual map.

[0162] Based on the same inventive concept, this disclosure also provides a visual positioning device, such as... Figure 4 As shown, the visual positioning device 400 may include:

[0163] Acquisition unit 401 is used to acquire image frames for visual positioning;

[0164] The first determining unit 402 is used to perform image retrieval and feature matching on the image frame to determine the target visual map corresponding to the image frame.

[0165] The second determining unit 403 is used to determine the visual pose of the camera coordinate system of the image frame in the visual map coordinate system based on the target visual map.

[0166] The correction unit 404 is used to correct the visual pose using pre-set coordinate points.

[0167] In one possible implementation, the second determining unit 403 in the apparatus provided by this disclosure is specifically used for:

[0168] Based on the camera coordinate system of the image frame, the breadth of the image frame is corrected;

[0169] Based on the target visual map and the corrected image frame, determine the visual pose of the camera coordinate system of the corrected image frame in the visual map coordinate system.

[0170] In one possible implementation, the correction unit 404 in the apparatus provided by this disclosure is specifically used for:

[0171] Determine the pre-set coordinate points corresponding to the image frames based on the target visual map;

[0172] Determine the position of the coordinate point within the image frame;

[0173] The visual pose is corrected based on the position and pose of the coordinate point in the image frame.

[0174] In one possible implementation, the correction unit 404 in the apparatus provided by this disclosure is further configured to:

[0175] Based on the position of the coordinate point in the image frame and the pose of the coordinate point, determine the reprojection error between the coordinate point and the visual pose;

[0176] Visual pose is corrected using reprojection error.

[0177] In one possible implementation, the apparatus provided in this disclosure further includes an updating unit for:

[0178] Acquire multiple target image frames within a preset time period;

[0179] When multiple target image frames meet preset conditions, the target visual map is updated using multiple target image frames.

[0180] In one possible implementation, the updating unit in the apparatus provided by this disclosure is specifically used for:

[0181] When multiple target image frames meet preset conditions, the target visual map is determined to be ready for updating.

[0182] An initial map is obtained by using multiple target image frames and prior knowledge of pose.

[0183] The initial map is corrected using coordinate points to obtain a visual map used to update the target visual map.

[0184] According to embodiments of this disclosure, this disclosure also provides an electronic device, a non-transitory computer-readable storage medium, and a computer program product.

[0185] Figure 5 A schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the present disclosure described and / or claimed herein.

[0186] like Figure 5 As shown, device 500 includes a computing unit 501, which can perform various appropriate actions and processes based on a computer program stored in read-only memory (ROM) 502 or a computer program loaded from storage unit 508 into random access memory (RAM) 503. RAM 503 may also store various programs and data required for the operation of device 500. The computing unit 501, ROM 502, and RAM 503 are interconnected via bus 504. Input / output (I / O) interface 505 is also connected to bus 504.

[0187] Multiple components in electronic device 500 are connected to I / O interface 505, including: input unit 506, such as keyboard, mouse, etc.; output unit 507, such as various types of monitors, speakers, etc.; storage unit 508, such as disk, optical disk, etc.; and communication unit 509, such as network card, modem, wireless transceiver, etc. Communication unit 509 allows device 500 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0188] The computing unit 501 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the various methods and processes described above, such as the visual positioning method. For example, in some embodiments, the visual positioning method may be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and / or installed on device 500 via ROM 502 and / or communication unit 509. When the computer program is loaded into RAM 503 and executed by the computing unit 501, one or more steps of the deep learning compiler operation method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the visual positioning method by any other suitable means (e.g., by means of firmware).

[0189] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), payload-programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.

[0190] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0191] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0192] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0193] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as data servers), or middleware components (e.g., application servers), or frontend components (e.g., user computers with graphical user interfaces or web browsers through which users can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., communication networks). Examples of communication networks include local area networks (LANs), wide area networks (WANs), the Internet, and blockchain networks.

[0194] Computer systems can include clients and servers. Clients and servers are generally geographically separated and typically interact via communication networks. The client-server relationship is created by computer programs running on the respective computers and having a client-server relationship with each other. A server can be a cloud server, also known as a cloud computing server or cloud host, a hosting product within the cloud computing service system that addresses the shortcomings of traditional physical hosts and VPS (Virtual Private Server, or simply "VPS") services, such as high management difficulty and weak task scalability. Servers can also be servers for distributed systems or servers incorporating blockchain technology.

[0195] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.

[0196] The specific embodiments described above do not constitute a limitation on the scope of protection of this disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this disclosure should be included within the scope of protection of this disclosure.

Claims

1. A visual positioning method, characterized in that, include: Acquire image frames for visual positioning; Image retrieval and feature matching are performed on the image frame in a database of multiple pre-stored visual maps to determine the target visual map corresponding to the image frame; Based on the camera coordinate system of the image frame, the width of the image frame is corrected based on minimizing the width error; Based on the target visual map and the corrected image frame, determine the visual pose of the camera coordinate system of the corrected image frame in the visual map coordinate system; Determine the pre-set coordinate points corresponding to the image frames based on the target visual map; Determine the position of the coordinate point in the image frame; The reprojection error of the coordinate point is determined by the Kalman filter EKF state estimation method based on the position and pose of the coordinate point in the image frame, and the visual pose is corrected by the reprojection error.

2. The method according to claim 1, characterized in that, The method further includes: Acquire multiple target image frames within a preset time period; When the multiple target image frames meet the preset conditions, the target visual map is updated using the multiple target image frames.

3. The method according to claim 2, characterized in that, The step of updating the target visual map using the multiple target image frames when the multiple target image frames meet the preset conditions includes: When the multiple target image frames meet the preset conditions, it is determined that the target visual map needs to be updated; An initial map is obtained by using the multiple target image frames and prior knowledge of pose. The initial map is corrected using the coordinate points to obtain a visual map used to update the target visual map.

4. A visual positioning device, characterized in that, include: The acquisition unit is used to acquire image frames for visual positioning. The first determining unit is used to perform image retrieval and feature matching on the image frame in a database of multiple pre-stored visual maps to determine the target visual map corresponding to the image frame. The second determining unit is used to correct the width of the image frame based on the camera coordinate system of the image frame and by minimizing the width error. Based on the target visual map and the corrected image frame, determine the visual pose of the camera coordinate system of the corrected image frame in the visual map coordinate system; The correction unit is used to determine a pre-set coordinate point corresponding to the image frame based on the target visual map; Determine the position of the coordinate point in the image frame; The reprojection error of the coordinate point is determined by the Kalman filter EKF state estimation method based on the position and pose of the coordinate point in the image frame, and the visual pose is corrected by the reprojection error.

5. The apparatus according to claim 4, characterized in that, The device further includes an updating unit for: Acquire multiple target image frames within a preset time period; When the multiple target image frames meet the preset conditions, the target visual map is updated using the multiple target image frames.

6. The apparatus according to claim 5, characterized in that, The update unit is specifically used for: When the multiple target image frames meet the preset conditions, it is determined that the target visual map needs to be updated; An initial map is obtained by using the multiple target image frames and prior knowledge of pose. The initial map is corrected using the coordinate points to obtain a visual map used to update the target visual map.

7. An electronic device, characterized in that, include: processor; Memory used to store the processor's executable instructions; The processor is configured to execute the instructions to implement the visual positioning method as described in any one of claims 1 to 3.

8. A computer storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the visual positioning method as described in any one of claims 1 to 3.

9. A computer program product comprising a computer program / instructions that, when executed by a processor, implement the visual positioning method according to any one of claims 1 to 3.