Positioning method and apparatus of mobile device, electronic device, and storage medium
By acquiring image frames using a visual camera and generating a bag-of-words model, the laser SLAM algorithm is assisted in calculating the pose, solving the problem of mobile device localization in scenarios where laser features are missing, and improving localization accuracy and robustness.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- 重庆中科汽车软件创新中心
- Filing Date
- 2022-10-10
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies struggle to achieve accurate positioning of mobile devices in scenarios where laser features are lacking. In particular, robots find it difficult to rely on laser sensors for accurate positioning when the environment is dynamically changing and laser features are missing.
Image frames are acquired using a visual camera, key frames are determined and feature descriptors are extracted, a bag-of-words model and an image database are generated, and the pose information is calculated by the laser SLAM algorithm assisted by the visual bag-of-words model, providing the prior pose of the mobile device.
It improves the positioning accuracy and robustness of robot positioning in scenarios where laser features are missing, and adapts to complex and ever-changing environmental requirements.
Smart Images

Figure CN115588045B_ABST
Abstract
Description
TECHNICAL FIELD
[0001] The present application relates to the technical field of positioning and navigation, and in particular to a positioning method and device for a mobile device, an electronic device and a storage medium. BACKGROUND
[0002] With the rapid development of the fields of inspection robots, automated guided vehicles and the like, mobile robots have higher requirements for autonomous navigation.
[0003] In the prior art, simultaneous localization and mapping (SLAM) is an indispensable technology in robot navigation. However, when robots are applied in actual scenes, they often face situations of laser feature loss and dynamic changes in the environment, and in these scenes, the robot is difficult to rely on the laser sensor for accurate positioning. Therefore, there is an urgent need for a method that can achieve accurate positioning of a mobile device in a laser feature loss scenario. SUMMARY
[0004] Embodiments of the present application provide a positioning method and device for a mobile device, an electronic device and a storage medium to solve the technical problem that the prior art is difficult to achieve accurate positioning of a mobile device in a laser feature loss scenario.
[0005] In a first aspect, embodiments of the present application provide a positioning method for a mobile device, comprising: acquiring an image frame through a visual camera installed on the mobile device, and determining a key frame from the image frame; determining mapping information and a feature descriptor of the key frame, wherein the mapping information is used to represent a node index corresponding to the key frame, and the node index is used to calculate pose information based on a simultaneous localization and mapping (SLAM) algorithm; obtaining a bag-of-words model and a key frame image database according to the feature descriptor of the key frame; acquiring a current image frame, and converting the current image frame into a current frame bag-of-words vector based on the bag-of-words model; determining a candidate key frame with the highest matching score from the current image frame from the key frame image database according to the current frame bag-of-words vector, and determining a target key frame according to the candidate key frame; determining a target node index corresponding to the target key frame according to the mapping information, and calculating pose information of the mobile device according to the target node index to perform positioning of the mobile device.
[0006] In some embodiments, acquiring image frames using a vision camera installed on a mobile device and determining keyframes from the image frames includes: acquiring corresponding left-view image frames, front-view image frames, and right-view image frames using vision cameras installed on the left, front, and right sides of the mobile device, respectively; determining the front-view image frame as the keyframe when it is determined that the front-view image frame meets preset conditions, and determining that both the left-view image frame and the right-view image frame are keyframes.
[0007] In some embodiments, after determining that the front view image frame is the key frame, and that the left view image frame and the right view image frame are both key frames, the method further includes: setting the key frame sequence numbers corresponding to the left view image frame, the front view image frame, and the right view image frame in ascending order; determining the mapping information of the key frames includes: determining the node index corresponding to each key frame sequence number.
[0008] In some embodiments, the current front view image frame, the current left view image frame, and the current right view image frame are acquired, and the current front view image frame, the current left view image frame, and the current right view image frame are converted into corresponding current front view frame bag-of-words vectors, current left view frame bag-of-words vectors, and current right view frame bag-of-words vectors, respectively, based on the bag-of-words model. The step of determining the candidate keyframe with the highest matching score to the current image frame from the keyframe image database based on the current frame bag-of-words vector, and determining the target keyframe based on the candidate keyframe, includes: determining the first candidate keyframe with the highest matching score to the current front view image frame from the keyframe image database based on the current front view frame bag-of-words vector; if the matching score corresponding to the first candidate keyframe is greater than a first preset threshold, then the first candidate keyframe is determined as the target keyframe; if the matching score corresponding to the first candidate keyframe is less than a second preset threshold, then the step of acquiring the current front view image frame, the current left view image frame, and the current right view image frame is performed.
[0009] In some embodiments, a second candidate keyframe with the highest matching score to the current left view image frame is determined from the keyframe image database based on the bag-of-words vector of the current left view frame, and a third candidate keyframe with the highest matching score to the current right view image frame is determined from the keyframe image database based on the bag-of-words vector of the current right view frame. If the matching score corresponding to the first candidate keyframe is greater than a second preset threshold and less than a first preset threshold, the first, second, and third candidate keyframes are sorted according to their corresponding keyframe numbers. If the keyframe numbers corresponding to the three sorted candidate keyframes are adjacent, the candidate keyframe located in the middle position is determined as the target keyframe. If the keyframe numbers corresponding to the three sorted candidate keyframes are not adjacent, the step of obtaining the current front view image frame, the current left view image frame, and the current right view image frame is performed.
[0010] In some embodiments, determining the front view image frame as the key frame and determining the left view image frame and the right view image frame as key frames when the front view image frame meets a preset condition includes: acquiring odometer information of the mobile device; determining the front view image frame as the key frame and determining the left view image frame and the right view image frame as key frames when the odometer information determines that the mobile device has translated or rotated more than a preset value.
[0011] In some embodiments, after determining the mapping information and feature descriptors of the keyframes, the method further includes: saving the mapping information and feature descriptors of the keyframes to a preset file; obtaining the bag-of-words model and the keyframe image database based on the feature descriptors of the keyframes includes: reading the feature descriptors of the keyframes from the preset file, and generating the bag-of-words model and the keyframe image database based on the feature descriptors of the keyframes.
[0012] Secondly, embodiments of the present invention provide a positioning device for a mobile device, comprising: an acquisition module, configured to acquire image frames via a vision camera installed on the mobile device, and determine keyframes from the image frames; a determination module, configured to determine mapping information and feature descriptors of the keyframes, wherein the mapping information is used to represent the node index corresponding to the keyframe, and the node index is used to calculate pose information based on a Laser Synchronous Localization and Mapping (SLAM) algorithm; a training module, configured to obtain a bag-of-words model and a keyframe image database based on the feature descriptors of the keyframes; the acquisition module is further configured to acquire a current image frame, and convert the current image frame into a current frame bag-of-words vector based on the bag-of-words model; a matching module, configured to determine, based on the current frame bag-of-words vector, a candidate keyframe with the highest matching score from the keyframe image database, and determine a target keyframe based on the candidate keyframe; and a calculation module, configured to determine the target node index corresponding to the target keyframe based on the mapping information, and calculate the pose information of the mobile device based on the target node index, for mobile device positioning.
[0013] Thirdly, embodiments of the present invention provide an electronic device, including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus; the memory is used to store computer programs; and the processor is used to implement the steps of the positioning method of the mobile device as described in any one of the first aspects when executing the program stored in the memory.
[0014] Fourthly, embodiments of the present invention provide a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the positioning method for a mobile device as described in any of the first aspects.
[0015] The mobile device positioning method, apparatus, electronic device, and storage medium provided in this invention acquire image frames using a visual camera installed on the mobile device and determine keyframes from the image frames; determine the mapping information and feature descriptors of the keyframes, wherein the mapping information is used to represent the node index corresponding to the keyframe, and the node index is used to calculate pose information based on the Laser Synchronous Localization and Mapping (SLAM) algorithm; obtain a bag-of-words model and a keyframe image database based on the feature descriptors of the keyframes; acquire the current image frame and convert the current image frame into a bag-of-words vector based on the bag-of-words model; determine the candidate keyframe with the highest matching score to the current image frame from the keyframe image database based on the current frame bag-of-words vector, and determine the target keyframe based on the candidate keyframe; determine the target node index corresponding to the target keyframe based on the mapping information, and calculate the pose information of the mobile device based on the target node index to perform mobile device positioning; that is, this invention addresses scenarios where laser features are missing by providing prior pose of the mobile device through a visual bag-of-words model to assist in mobile device positioning, improve positioning accuracy, adapt to complex and changing environmental requirements, and enhance the robustness of robot positioning. Attached Figure Description
[0016] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with the invention and, together with the description, serve to explain the principles of the invention.
[0017] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0018] Figure 1 A flowchart illustrating a positioning method for a mobile device provided in an embodiment of the present invention;
[0019] Figure 2 A flowchart illustrating another mobile device positioning method provided in an embodiment of the present invention;
[0020] Figure 3a This is a flowchart illustrating a mapping mode provided in an embodiment of the present invention.
[0021] Figure 3bA flowchart illustrating a positioning mode provided in an embodiment of the present invention;
[0022] Figure 4 This is a schematic diagram of the structure of a positioning device for a mobile device provided in an embodiment of the present invention;
[0023] Figure 5 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation
[0024] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0025] With the rapid development of fields such as inspection robots and automated guided vehicles, mobile robots are placing higher demands on autonomous navigation. SLAM is a key technology for map building and localization. Due to the advantages of laser in measurement accuracy, laser SLAM algorithms have become an indispensable technology in robot navigation. As shown in the figure, the optimized Cartographer algorithm has advantages such as high accuracy and the ability to use the map for path planning, and is currently the most mainstream 2D laser SLAM algorithm.
[0026] In real-world scenarios, robots often face situations where laser features are missing or the environment is dynamically changing. Examples include situations where the robot needs to activate its re-localization function, switch maps (e.g., change floors), long corridors where laser localization cannot reach the end, open spaces (few effective radar points), and structurally symmetrical environments (where similar radar data can easily lead to two different poses for laser localization). In these scenarios, robots struggle to achieve accurate localization using laser sensors.
[0027] To address the aforementioned technical problems, the present invention provides a localization method that integrates a laser SLAM algorithm and a bag-of-words (DBoW3) visual model. DBoW3 can calculate visual image information and provide an initial pose to assist robot localization in scenarios lacking laser features. Furthermore, the integration of DBoW3 and the laser SLAM algorithm can adapt to complex and changing environmental requirements, improving the robustness of robot localization. For example, when the robot activates its re-localization function or switches maps, and the robot is unsure of its pose, DBoW3 can provide a priori pose information. Additionally, in long corridors, open spaces, or structurally symmetrical environments, DBoW3 can increase the scene's discernibility.
[0028] Figure 1 This is a flowchart illustrating a positioning method for a mobile device provided in an embodiment of the present invention, as shown below. Figure 1 As shown, the positioning method for this mobile device includes:
[0029] Step S101: Acquire image frames using a vision camera installed on the mobile device, and determine keyframes from the image frames.
[0030] Specifically, the mobile device is an electronic device with positioning and navigation functions, such as an inspection robot, an automated guided vehicle, a mobile phone, or a tablet. The following explanation uses a mobile robot as an example. Generally, a vision camera is installed on the front of the mobile device to collect surrounding images. The positioning method in this embodiment is divided into a mapping mode implementation step (steps S101-S103) and a positioning mode implementation step (steps S104-S106). During the mapping process, the vision camera installed on the mobile device collects multiple frames of images and extracts keyframes from these frames. Keyframes can reduce the number of frames to be processed, improving subsequent processing efficiency.
[0031] In some embodiments, determining a keyframe from the image frame in step S101 includes: acquiring odometer information of the mobile device; and determining the image frame as the keyframe if the odometer information indicates that the mobile device has translated or rotated more than a preset value. Specifically, a motion sensor is installed on the mobile device to collect odometer information (including displacement and rotation angle, etc.). When the mobile device performs a sufficiently large displacement or rotation, or both, the motion is determined to be a keyframe. The preset value is set based on the experience of those skilled in the art, and this invention does not limit it.
[0032] In some embodiments, a keyframe sequence number Image_ID is set for each keyframe, and then the vision node publishes the keyframe sequence number Image_ID topic. It should be noted that a node is a concept in ROS (Robot Operating System). For example, a node that implements robot path planning is called a navigation node, a node that implements robot mapping and outputs robot pose in real time is called a Cartographer node, and a node that implements vision-assisted localization is called a vision node. These nodes run on the development board, and different nodes can communicate with each other by publishing and receiving messages.
[0033] Step S102: Determine the mapping information and feature descriptors of the keyframe.
[0034] The mapping information is used to represent the node index corresponding to the keyframe. The node index is used to calculate pose information based on the laser-synchronized localization and mapping (SLAM) algorithm. Preferably, the laser SLAM algorithm is the Cartographer algorithm, implemented through Cartographer nodes. The Cartographer node consists of two modules: a front-end local mapping module and a back-end optimization module for eliminating accumulated errors. The implementation process is as follows: The LiDAR transmits scan data to the Cartographer front-end local mapping module at a fixed frequency (15Hz). The front-end processes each frame of scan data, and each frame of scan data is matched with the local map to obtain the robot's current coarse pose P. To reduce the computational load, the front-end also selects representative scan data and sends it to the back-end for optimization to eliminate accumulated errors. The scan data sent from the front-end to the back-end is used as a Node. Each Node contains a node index Node_ID and a precise pose. Nodes are assigned Node_ID values in ascending order from 0 (Node and Node_ID correspond one-to-one). At the same time, the Node is optimized by matching the global map with P as the initial pose to calculate the precise pose Q of the Node, which is the precise pose of the robot at the current moment.
[0035] In this step, the Image_ID and Node_ID of the keyframe are bound together, and the Image_ID and the precise pose Q corresponding to the Node_ID are also in one-to-one correspondence; all feature points of each keyframe are extracted, and the feature descriptors of all feature points are calculated. The feature descriptors can be saved into a feature descriptor container.
[0036] Step S103: Obtain the bag-of-words model and keyframe image database based on the feature descriptors of the keyframes.
[0037] Specifically, using the DBoW3 method, feature descriptors for all keyframes are trained, generating a bag-of-words model and a keyframe image database, which are then saved. The keyframe image database contains the bag-of-words vectors corresponding to each keyframe.
[0038] In some embodiments, after step S102, the method further includes: saving the mapping information and feature descriptors of the keyframes to a preset file; step S103 includes: reading the feature descriptors of the keyframes from the preset file, and generating a bag-of-words model and a keyframe image database based on the feature descriptors of the keyframes. Specifically, the mapping between keyframes and Node_IDs is saved to a file, and the feature descriptor container of the keyframes is saved to a file. During robot localization, it is first determined whether the Database exists. If it exists, the Database is loaded directly; if it does not exist, the feature descriptors of the keyframes are read from the file, and the Vocabulary and Database are trained and saved.
[0039] It should be noted that in this embodiment, feature descriptors are extracted from keyframes during the mapping process, and a database is generated based on the feature descriptors of the keyframes. In other words, each mapping scene has a unique database. Compared with offline loaded databases, this embodiment further improves the accuracy of localization.
[0040] Step S104: Obtain the current image frame and convert the current image frame into a current frame bag-of-words vector based on the bag-of-words model.
[0041] Specifically, during the positioning process of the mobile device, the current image frame captured by the visual camera is acquired in real time, and the current image frame is transformed into a visual bag-of-words vector (BoW) through the bag-of-words model (Vocabulary).
[0042] Step S105: Determine the candidate keyframe with the highest matching score to the current image frame from the keyframe image database based on the bag-of-words vector of the current frame, and determine the target keyframe based on the candidate keyframe.
[0043] Specifically, the BoW corresponding to the current image frame is indexed and queried in the database to calculate the candidate keyframe with the highest matching degree (highest score) during the mapping process, and the target keyframe is determined from the candidate keyframes based on the matching score.
[0044] Step S106: Determine the target node index corresponding to the target keyframe based on the mapping information, and calculate the pose information of the mobile device based on the target node index to locate the mobile device.
[0045] Specifically, the Node_ID bound to the target keyframe is determined based on the mapping information, thereby obtaining the accurate prior pose and thus realizing the positioning of the assisted mobile device.
[0046] The mobile device localization method provided in this invention acquires image frames using a vision camera installed on the mobile device and determines keyframes from the image frames; determines the mapping information and feature descriptors of the keyframes, wherein the mapping information represents the node index corresponding to the keyframe, and the node index is used to calculate pose information based on the Laser Synchronous Localization and Mapping (SLAM) algorithm; obtains a bag-of-words model and a keyframe image database based on the feature descriptors of the keyframes; acquires the current image frame and converts the current image frame into a bag-of-words vector based on the bag-of-words model; determines the candidate keyframe with the highest matching score to the current image frame from the keyframe image database based on the current frame bag-of-words vector, and determines the target keyframe based on the candidate keyframe; determines the target node index corresponding to the target keyframe based on the mapping information, and calculates the pose information of the mobile device based on the target node index to perform mobile device localization; that is, this invention addresses scenarios where laser features are missing by providing prior pose of the mobile device through a visual bag-of-words model, assisting in mobile device localization, improving localization accuracy, adapting to complex and changing environmental requirements, and improving the robustness of robot localization.
[0047] Based on the above embodiments, Figure 2 This is a flowchart illustrating another mobile device positioning method provided in an embodiment of the present invention. Figure 2 As shown, the positioning method for this mobile device includes:
[0048] Step S201: Acquire the corresponding left view image frame, front view image frame and right view image frame respectively through the vision cameras installed on the left, front and right sides of the mobile device.
[0049] Step S202: If the front view image frame meets the preset conditions, determine the front view image frame as the key frame, and determine that the left view image frame and the right view image frame are both key frames.
[0050] Step S203: Set the keyframe numbers corresponding to the left view image frame, front view image frame, and right view image frame in ascending order.
[0051] Step S204: Determine the node index corresponding to each key frame number to form the mapping information, and determine the feature descriptor of the key frame.
[0052] Step S205: Obtain the bag-of-words model and keyframe image database based on the feature descriptors of the keyframes.
[0053] Step S206: Obtain the current front view image frame, the current left view image frame, and the current right view image frame, and based on the bag-of-words model, convert the current front view image frame, the current left view image frame, and the current right view image frame into the corresponding bag-of-words vectors of the current front view frame, the current left view frame, and the current right view frame, respectively.
[0054] Step S207: Determine the first candidate keyframe with the highest matching score to the current forward-looking image frame from the keyframe image database based on the bag-of-words vector of the current forward-looking frame.
[0055] Step S208: Determine the second candidate keyframe with the highest matching score to the current left view image frame from the keyframe image database based on the bag-of-words vector of the current left view frame, and determine the third candidate keyframe with the highest matching score to the current right view image frame from the keyframe image database based on the bag-of-words vector of the current right view frame.
[0056] Based on the matching score corresponding to the first candidate keyframe, execute steps S209 and S214, or execute step S210, or execute steps S211-S214.
[0057] Step S209: If the matching score corresponding to the first candidate keyframe is greater than the first preset threshold, then the first candidate keyframe is determined to be the target keyframe.
[0058] Step S210: If the matching score corresponding to the first candidate keyframe is less than the second preset threshold, then proceed to step S206.
[0059] Step S211: If the matching score corresponding to the first candidate keyframe is greater than the second preset threshold and less than the first preset threshold, then sort the first candidate keyframe, the second candidate keyframe and the third candidate keyframe according to their corresponding keyframe numbers.
[0060] Step S212: Determine whether the keyframe numbers corresponding to the three candidate keyframes after sorting are adjacent.
[0061] If yes, proceed to step S213; otherwise, proceed to step S206.
[0062] Step S213: Determine the candidate keyframe located in the middle position as the target keyframe.
[0063] Step S214: Determine the target node index corresponding to the target keyframe based on the mapping information, and calculate the pose information of the mobile device based on the target node index to locate the mobile device.
[0064] The implementation of steps S205 and S214 in this embodiment is similar to that of steps S103 and S106 in the above embodiment, and will not be repeated here.
[0065] The difference from the above embodiments is that, in order to further improve the accuracy of positioning, in this embodiment, visual cameras installed on the left, front, and right sides of the mobile device respectively acquire corresponding left view image frames, front view image frames, and right view image frames; when the front view image frame meets preset conditions, the front view image frame is determined to be the key frame, and the left view image frame and the right view image frame are also determined to be key frames; furthermore, the key frame sequence numbers corresponding to the left view image frame, front view image frame, and right view image frame are set sequentially according to an ascending order; and each key frame sequence number is determined. The corresponding node indexes constitute the mapping information, and the feature descriptors of the keyframes are determined; the current front view image frame, the current left view image frame, and the current right view image frame are also obtained, and based on the bag-of-words model, the current front view image frame, the current left view image frame, and the current right view image frame are transformed into corresponding bag-of-words vectors for the current front view frame, the current left view frame, and the current right view frame, respectively; based on the bag-of-words vector of the current front view frame, the first candidate keyframe with the highest matching score with the current front view image frame is determined from the keyframe image database; based on the current left view... The bag-of-words vector of the frame determines the second candidate keyframe with the highest matching score to the current left view image frame from the keyframe image database, and the bag-of-words vector of the current right view frame determines the third candidate keyframe with the highest matching score to the current right view image frame from the keyframe image database. If the matching score corresponding to the first candidate keyframe is greater than a first preset threshold, then the first candidate keyframe is determined as the target keyframe. If the matching score corresponding to the first candidate keyframe is less than a second preset threshold, then the step of obtaining the current front view image frame, the current left view image frame, and the current right view image frame is executed. If the matching score corresponding to the first candidate keyframe is greater than the second preset threshold and less than the first preset threshold, then the first candidate keyframe, the second candidate keyframe, and the third candidate keyframe are sorted according to their corresponding keyframe numbers. If the keyframe numbers corresponding to the three sorted candidate keyframes are adjacent, then the candidate keyframe located in the middle position is determined as the target keyframe. If the keyframe numbers corresponding to the three sorted candidate keyframes are not adjacent, then the step of obtaining the current front view image frame, the current left view image frame, and the current right view image frame is executed.
[0066] Specifically, three vision cameras are installed in front of, to the left and to the right of the mobile device. Compared with using only one vision camera, this embodiment can obtain more information about the surroundings of the mobile device.
[0067] In the mapping mode, the front view image frame, left view image frame, and right view image frame are first acquired from the three vision cameras respectively. Then, based on the odometry information, it is determined whether the front view image frame is a keyframe. If so, the left view image frame and right view image frame are also keyframes. Then, the keyframe sequence number Image_ID corresponding to the left view image frame, front view image frame, and right view image frame is assigned in ascending order, starting from 1. Then, the Image_ID of the keyframe is mapped and bound together with the Node_ID in the Cartographer algorithm to form mapping information, and the feature descriptors of the keyframes are extracted and stored in the feature descriptor container. Then, the feature descriptors of all keyframes are trained using the DBoW3 method to generate Vocabulary and Database, and then saved.
[0068] In the localization mode, the current front view image frame, current left view image frame, and current right view image frame acquired in real time by the three visual cameras are first converted into corresponding bag-of-words vectors for the current front view frame, current left view frame, and current right view frame, respectively, using Vocabulary. Then, the bag-of-words vector for the current front view frame is indexed and queried in the database to calculate the first candidate keyframe Image_ID with the highest matching degree (highest score) in the mapping process. The bag-of-words vectors for the current left view frame and current right view frame are indexed and queried in the database to calculate the second and third candidate keyframe Image_IDs with the highest matching degree (highest score) in the mapping process, respectively. If the matching score corresponding to the first candidate keyframe is greater than the maximum threshold (i.e., ...), ... If the matching score of the first candidate keyframe is less than the minimum threshold (i.e., the second preset threshold), the current image frame is discarded, and the process returns to wait for the next image frame, i.e., step S206 is repeated. If the matching score of the first candidate keyframe is greater than the minimum threshold, the Image_IDs of the three keyframes obtained from the BoW query (left, middle, and right) are sorted in descending order or ascending order. If the three Image_IDs are adjacent, the middle Image_ID is published to the Cartographer node. If the three Image_IDs are not adjacent, the process returns to wait for the next image frame, i.e., step S206 is repeated. Finally, the Cartographer node receives the Image_ID, determines the Node_ID bound to the Image_ID, and obtains the pose of the Node_ID, which serves as the robot's prior pose to assist in robot localization.
[0069] It should be noted that there is no restriction on the execution order of step S208. That is, it can be executed before determining the matching score corresponding to the first candidate keyframe. For example, in step S207, while determining the first candidate keyframe with the highest matching score of the current front view image frame, the second candidate keyframe corresponding to the bag-of-words vector of the current left view frame is determined, and the third candidate keyframe corresponding to the bag-of-words vector of the current right view frame is determined. Alternatively, it can be executed after determining that the matching score corresponding to the first candidate keyframe is not greater than the first preset threshold and not less than the second preset threshold.
[0070] In some embodiments, step S202 includes: acquiring odometer information of the mobile device; and, if the odometer information determines that the mobile device has translated or rotated more than a preset value, determining the front view image frame as the keyframe, and determining that the left view image frame and the right view image frame are also keyframes. Specifically, it is determined whether the front view image frame is a keyframe based on the odometer information; if so, the left view image frame and the right view image frame are also determined to be keyframes.
[0071] The mobile device positioning method provided in this invention can acquire more information about the surroundings of the mobile device through visual cameras installed on the left, front, and right sides of the mobile device. During the positioning process, the query results Image_ID of the current image frames of the three visual cameras can be mutually verified to check whether the visual calculation result is correct. If the Image_IDs of the query results of the three camera images are adjacent, it proves that the visual calculation result is correct, thereby improving the accuracy of visual positioning.
[0072] Figure 3a This is a flowchart illustrating a mapping mode provided in an embodiment of the present invention. Figure 3b This is a flowchart illustrating a positioning mode provided by an embodiment of the present invention. Three vision cameras are installed on the front, left, and right sides of the mobile device. Now, in conjunction with... Figure 3a and Figure 3b The embodiments of the present invention will be further described below.
[0073] refer to Figure 3a As shown, enter the mapping mode.
[0074] Step 1: Determine whether to start saving the image. If yes, proceed to the next step; otherwise, discard the captured image.
[0075] Step 2: Select keyframe images based on odometer information. First, determine if the current image from the forward-looking camera (i.e., the forward view image frame) is a keyframe. If it is, then determine the current images from the left and right cameras (i.e., the left view image frame and the right view image frame) as keyframes respectively. If the current image from the forward-looking camera is not a keyframe, discard the image. The determination of whether the current image from the forward-looking camera is a keyframe is based on odometer information. If the translation or rotation of the odometer is sufficiently large (exceeding a preset value), then the forward view image frame is determined to be a keyframe; otherwise, it is not.
[0076] Step 3: Assign Image_IDs to the three keyframes in ascending order. For the left, center, and right keyframes, start assigning values from Image_ID = 1.
[0077] Step 4: Extract feature descriptors for the three keyframes. Extract feature points from the three keyframes, calculate the feature descriptor for each keyframe, and save them into a feature descriptor container.
[0078] Step 5: The visual node publishes the three keyframe Image_ID topic, and the Cartographer node subscribes to the keyframe Image_ID topic, mapping and binding the keyframe image Image_ID to the Cartographer's Node_ID.
[0079] Step 6: Determine whether to save the map. If yes, save the feature descriptors of all keyframes to a file, and then save the mapping between Image_ID and Node_ID to a file. If no, return to step 2.
[0080] Step 7: Using the DBoW3 method, train the feature descriptors of all keyframes into a Vocabulary and a Database.
[0081] refer to Figure 3b As shown, it enters location mode.
[0082] Step 8: Determine if the Database exists. If it exists, load the Database directly; if it does not exist, read the feature descriptors of all keyframes from the file, train and save the Vocabulary and Database, and then load the Database.
[0083] Step 9: Acquire real-time images (including the current front view image frame, the current left view image frame, and the current right view image frame), and extract feature descriptors for each of the three current image frames. The feature descriptors are ORBs.
[0084] Step 10: Use Vocabulary to convert the current three frames into corresponding bag-of-words (BoW) vectors.
[0085] Step 11: Index and query the BoW of the current three camera images in the Database to obtain the keyframe Image_ID that is most similar to the mapping process (high matching score).
[0086] Step 12: Determine whether the BoW matching score of the front view image frame is greater than the maximum threshold. If yes, directly publish the Image_ID of the front view image frame to Cartographer and execute step 15. If no, execute step 13.
[0087] Step 13: Determine whether the matching score of the BoW of the previous image frame is greater than the minimum threshold. If yes, proceed to step 14. If no, return to step 9 and wait for the next image frame to arrive.
[0088] Step 14: Sort the Image_IDs obtained from the BoW query of the left, middle, and right camera images. If the three Image_IDs are adjacent, publish the Image_ID in the middle of the sort to the Cartographer node and execute Step 15; if the three Image_IDs are not adjacent, return to Step 9 and wait for the next frame image to arrive.
[0089] Step 15: The Cartographer node queries the bound Node_ID based on the Image_ID, determines the pose of the Node_ID, and publishes the visual positioning result.
[0090] Step 16: Determine whether to exit positioning mode. If yes, exit positioning mode and release memory. If no, return to step 9, wait for the next frame image, and reposition.
[0091] In summary, the embodiments of this invention, by installing three vision cameras in front of, to the left and right of the robot, and fusing the DBoW3 and Cartographer localization methods, are divided into mapping mode implementation steps and localization mode implementation steps. The mapping mode mainly involves selecting keyframe images, publishing the keyframe image Image_ID, binding the keyframe image Image_ID together with the Cartographer Node_ID, and training and saving the Vocabulary and Database when saving the map. The localization mode primarily involves loading a keyframe image database. Images acquired by the robot in real time are indexed and queried in the keyframe image database. This allows for the comparison score (the higher the score, the more similar the images) between the current image and the most similar image in the database, along with the Image_ID of the most similar image. The robot's pose is then calculated based on the Node_ID bound to the keyframe image's Image_ID. In other words, this embodiment utilizes three vision cameras (left, center, and right) to acquire more information about the robot's surroundings. The Image_IDs from the three camera image queries can corroborate each other, verifying the accuracy of the visual calculation results. If the Image_IDs from the three camera image query results are adjacent, it proves the visual calculation result is correct, improving the accuracy of visual localization. Compared to other solutions that load pre-trained databases offline, this embodiment extracts feature descriptors from keyframes during mapping and generates a database during map saving, ensuring each mapping scene has a unique database. Compared to Cartographer, which uses only a single laser sensor for mapping and localization, the DBoW3 fusion method can adapt to complex and changing environmental requirements, enhancing the robot's environmental adaptability.
[0092] Figure 4 This is a schematic diagram of the structure of a positioning device for a mobile device provided in an embodiment of the present invention, as shown below. Figure 4 As shown, the positioning device of the mobile device includes an acquisition module 401, a determination module 402, a training module 403, a matching module 404, and a calculation module 405.
[0093] The acquisition module 401 is used to acquire image frames through a vision camera installed on the mobile device and determine keyframes from the image frames; the determination module 402 is used to determine the mapping information and feature descriptors of the keyframes, wherein the mapping information is used to represent the node index corresponding to the keyframe, and the node index is used to calculate pose information based on the Laser Synchronous Localization and Mapping (SLAM) algorithm; the training module 403 is used to obtain a bag-of-words model and a keyframe image database based on the feature descriptors of the keyframes; the acquisition module 401 is also used to acquire the current image frame and convert the current image frame into a bag-of-words vector based on the bag-of-words model; the matching module 404 is used to determine the candidate keyframe with the highest matching score to the current image frame from the keyframe image database based on the bag-of-words vector of the current frame, and determine the target keyframe based on the candidate keyframe; the calculation module 405 is used to determine the target node index corresponding to the target keyframe based on the mapping information, and calculate the pose information of the mobile device based on the target node index for mobile device positioning.
[0094] In some embodiments, the acquisition module 401 is specifically configured to: acquire corresponding left-view image frames, front-view image frames, and right-view image frames respectively through visual cameras installed on the left, front, and right sides of the mobile device; and, if the front-view image frame meets preset conditions, determine that the front-view image frame is the key frame, and determine that both the left-view image frame and the right-view image frame are the key frames.
[0095] In some embodiments, the acquisition module 401 is further configured to: set the keyframe numbers corresponding to the left view image frame, the front view image frame, and the right view image frame in ascending order; the determination module 402 is further configured to: determine the node index corresponding to each keyframe number.
[0096] In some embodiments, the acquisition module 401 is specifically used to: acquire the current front view image frame, the current left view image frame, and the current right view image frame, and convert the current front view image frame, the current left view image frame, and the current right view image frame into corresponding current front view frame bag vector, current left view frame bag vector, and current right view frame bag vector, respectively, based on the bag-of-words model; the matching module 404 is specifically used to: if the matching score corresponding to the first candidate keyframe is greater than a first preset threshold, then determine the first candidate keyframe as the target keyframe; if the matching score corresponding to the first candidate keyframe is less than a second preset threshold, then the acquisition module 401 is used to perform the step of acquiring the current front view image frame, the current left view image frame, and the current right view image frame.
[0097] In some embodiments, the matching module 404 is further configured to: determine, based on the bag-of-words vector of the current left view frame, a second candidate keyframe with the highest matching score to the current left view image frame from the keyframe image database; determine, based on the bag-of-words vector of the current right view frame, a third candidate keyframe with the highest matching score to the current right view image frame from the keyframe image database; if the matching score corresponding to the first candidate keyframe is greater than a second preset threshold and less than a first preset threshold, then sort the first candidate keyframe, the second candidate keyframe, and the third candidate keyframe according to their corresponding keyframe numbers; if the keyframe numbers corresponding to the sorted three candidate keyframes are adjacent, then determine the candidate keyframe located in the middle position as the target keyframe; if the keyframe numbers corresponding to the sorted three candidate keyframes are not adjacent, then the acquisition module 401 performs the step of acquiring the current front view image frame, the current left view image frame, and the current right view image frame.
[0098] In some embodiments, the acquisition module 401 is specifically used to: acquire the odometer information of the mobile device; and, if it is determined from the odometer information that the translation or rotation of the mobile device is greater than a preset value, determine the front view image frame as the key frame, and determine that the left view image frame and the right view image frame are both key frames.
[0099] In some embodiments, the determining module 402 is further configured to: save the mapping information and feature descriptors of the keyframes to a preset file; the training module 403 is specifically configured to: read the feature descriptors of the keyframes from the preset file, and generate a bag-of-words model and a keyframe image database based on the feature descriptors of the keyframes.
[0100] The positioning device for mobile devices provided in the embodiments of the present invention has a similar implementation principle and technical effect to the above embodiments, and will not be described again here.
[0101] like Figure 5 As shown, this embodiment of the invention provides an electronic device, including a processor 501, a communication interface 502, a memory 503, and a communication bus 504, wherein the processor 501, the communication interface 502, and the memory 503 communicate with each other via the communication bus 504.
[0102] Memory 503 is used to store computer programs;
[0103] In one embodiment of the present invention, when the processor 501 executes the program stored in the memory 503, it implements the steps of the positioning method for a mobile device provided in any of the foregoing method embodiments.
[0104] The electronic device provided in this embodiment of the invention has a similar implementation principle and technical effect to the above embodiments, and will not be described again here.
[0105] The aforementioned memory 503 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read-Only Memory), EPROM, hard disk, or ROM. Memory 503 has storage space for program code used to perform any of the method steps described above. For example, the storage space for program code may include individual program codes for implementing the various steps in the methods described above. This program code can be read from or written to one or more computer program products. These computer program products include program code carriers such as hard disks, optical discs (CDs), memory cards, or floppy disks. Such computer program products are typically portable or fixed storage units. The storage unit may have storage segments or storage spaces arranged similarly to memory 503 in the aforementioned electronic device. The program code may be compressed, for example, in a suitable form. Typically, the storage unit includes programs for performing the method steps according to embodiments of the invention, i.e., code that can be read by a processor such as 501, which, when run by the electronic device, causes the electronic device to perform the various steps in the methods described above.
[0106] Embodiments of the present invention also provide a computer-readable storage medium. The computer-readable storage medium stores a computer program, which, when executed by a processor, implements the steps of the positioning method for a mobile device as described above.
[0107] The computer-readable storage medium may be included in the device / apparatus described in the above embodiments; or it may exist independently and not assembled into the device / apparatus. The computer-readable storage medium carries one or more programs that, when executed, implement the method according to the embodiments of the present invention.
[0108] According to embodiments of the present invention, the computer-readable storage medium may be a non-volatile computer-readable storage medium, such as including, but not limited to: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In the present invention, the computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
[0109] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0110] The above description is merely a specific embodiment of the present invention, enabling those skilled in the art to understand or implement the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the present invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features claimed herein.
Claims
1. A positioning method for a mobile device, characterized in that, include: Image frames are acquired using a vision camera installed on a mobile device, and keyframes are determined from the image frames. Determine the mapping information and feature descriptor of the key frame, wherein the mapping information is used to represent the node index corresponding to the key frame, and the node index is used to calculate pose information based on the laser-synchronized localization and mapping (SLAM) algorithm; Based on the feature descriptors of the keyframes, a bag-of-words model and a keyframe image database are obtained; Obtain the current image frame and convert the current image frame into a bag-of-words vector based on the bag-of-words model; Based on the bag-of-words vector of the current frame, the candidate keyframe with the highest matching score to the current image frame is determined from the keyframe image database, and the target keyframe is determined based on the candidate keyframe; The target node index corresponding to the target keyframe is determined based on the mapping information, and the pose information of the mobile device is calculated based on the target node index to locate the mobile device. The step of acquiring image frames using a vision camera installed on a mobile device and determining keyframes from the image frames includes: The left-view image frame, front-view image frame, and right-view image frame are acquired by the vision cameras installed on the left, front, and right sides of the mobile device, respectively. Obtain the odometer information of the mobile device; If the odometer information indicates that the mobile device has translated or rotated more than a preset value, the front view image frame is determined to be the key frame, and the left view image frame and the right view image frame are also determined to be the key frames. The keyframe numbers corresponding to the left view image frame, the front view image frame, and the right view image frame are set in ascending order. Determining the mapping information of the key frames includes: determining the node index corresponding to each key frame sequence number; The step of obtaining the current image frame and converting the current image frame into a current frame bag-of-words vector based on the bag-of-words model includes: Obtain the current front view image frame, the current left view image frame, and the current right view image frame, and based on the bag-of-words model, convert the current front view image frame, the current left view image frame, and the current right view image frame into the corresponding bag-of-words vectors of the current front view frame, the current left view frame, and the current right view frame, respectively. The step of determining the candidate keyframe with the highest matching score to the current image frame from the keyframe image database based on the bag-of-words vector of the current frame, and determining the target keyframe based on the candidate keyframe, includes: Based on the bag-of-words vector of the current forward-looking frame, the first candidate keyframe with the highest matching score with the current forward-looking image frame is determined from the keyframe image database. If the matching score corresponding to the first candidate keyframe is greater than the first preset threshold, then the first candidate keyframe is determined to be the target keyframe. If the matching score corresponding to the first candidate keyframe is less than the second preset threshold, then the step of obtaining the current front view image frame, the current left view image frame, and the current right view image frame is executed.
2. The method according to claim 1, characterized in that, The method further includes: Based on the bag-of-words vector of the current left view frame, the second candidate keyframe with the highest matching score with the current left view image frame is determined from the keyframe image database; based on the bag-of-words vector of the current right view frame, the third candidate keyframe with the highest matching score with the current right view image frame is determined from the keyframe image database. If the matching score corresponding to the first candidate keyframe is greater than the second preset threshold and less than the first preset threshold, then the first candidate keyframe, the second candidate keyframe and the third candidate keyframe are sorted according to their corresponding keyframe numbers. If the keyframe numbers of the three candidate keyframes after sorting are adjacent, then the candidate keyframe in the middle position is determined as the target keyframe. If the keyframe numbers corresponding to the three candidate keyframes after sorting are not adjacent, then the step of obtaining the current front view image frame, the current left view image frame, and the current right view image frame is executed.
3. The method according to claim 1 or 2, characterized in that, After determining the mapping information and feature descriptors of the keyframe, the method further includes: Save the mapping information and feature descriptors of the keyframes to a preset file; The step of obtaining the bag-of-words model and keyframe image database based on the feature descriptors of the keyframes includes: The feature descriptors of keyframes are read from the preset file, and a bag-of-words model and a keyframe image database are generated based on the feature descriptors of the keyframes.
4. A positioning device for a mobile device, characterized in that, include: An acquisition module is used to acquire image frames through a vision camera installed on a mobile device and determine keyframes from the image frames; The determination module is used to determine the mapping information and feature descriptor of the key frame, wherein the mapping information is used to represent the node index corresponding to the key frame, and the node index is used to calculate the pose information based on the laser-synchronized localization and mapping (SLAM) algorithm. The training module is used to obtain a bag-of-words model and a keyframe image database based on the feature descriptors of the keyframes; The acquisition module is further configured to acquire the current image frame and convert the current image frame into a current frame bag-of-words vector based on the bag-of-words model; The matching module is used to determine the candidate keyframe with the highest matching score to the current image frame from the keyframe image database based on the bag-of-words vector of the current frame, and to determine the target keyframe based on the candidate keyframe; The calculation module is used to determine the target node index corresponding to the target keyframe based on the mapping information, and to calculate the pose information of the mobile device based on the target node index for mobile device positioning. The acquisition module is specifically used for: The left-view image frame, front-view image frame, and right-view image frame are acquired by the vision cameras installed on the left, front, and right sides of the mobile device, respectively. Obtain the odometer information of the mobile device; If the odometer information indicates that the mobile device has translated or rotated more than a preset value, the front view image frame is determined to be the key frame, and the left view image frame and the right view image frame are also determined to be the key frames. The keyframe numbers corresponding to the left view image frame, the front view image frame, and the right view image frame are set in ascending order. The determining module is further configured to: determine the node index corresponding to each keyframe sequence number; The acquisition module is specifically used for: Obtain the current front view image frame, the current left view image frame, and the current right view image frame, and based on the bag-of-words model, convert the current front view image frame, the current left view image frame, and the current right view image frame into the corresponding bag-of-words vectors of the current front view frame, the current left view frame, and the current right view frame, respectively. The matching module is specifically used for: Based on the bag-of-words vector of the current forward-looking frame, the first candidate keyframe with the highest matching score with the current forward-looking image frame is determined from the keyframe image database. If the matching score corresponding to the first candidate keyframe is greater than the first preset threshold, then the first candidate keyframe is determined to be the target keyframe. If the matching score corresponding to the first candidate keyframe is less than the second preset threshold, the acquisition module is used to perform the steps of acquiring the current front view image frame, the current left view image frame, and the current right view image frame.
5. An electronic device, characterized in that, It includes a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus; Memory, used to store computer programs; A processor, when executing a program stored in memory, implements the steps of the positioning method of the mobile device according to any one of claims 1-3.
6. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the steps of the positioning method for a mobile device as described in any one of claims 1-3.