A dynamic scene autonomous positioning and mapping method based on a fusion bag-of-words model
By integrating RGB cameras, LiDAR, and GPS/IMU sensors, a point cloud map is constructed and dynamic obstacles are removed. Using bag-of-words modeling and NDT registration technology, high-precision autonomous localization and mapping in autonomous driving is achieved, solving the problem of inaccurate localization caused by signal obstruction and cumulative errors.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- JILIN UNIVERSITY
- Filing Date
- 2023-11-20
- Publication Date
- 2026-06-23
AI Technical Summary
In autonomous driving, differential GPS signals are lost due to obstructions such as tall buildings and tunnels, IMU accumulated errors lead to low positioning accuracy, visual sensors have poor positioning performance in dark scenes, the SLAM framework cannot obtain absolute pose, and GPS signal obstruction causes a lack of positioning information for autonomous parking.
By integrating RGB cameras, LiDAR, and GPS/IMU sensors, a point cloud map is constructed using the laser-inertial SLAM algorithm. Dynamic obstacles are removed using the YOLOv5 model, bag-of-words vectors are generated and the mapping relationship is saved. The initial pose is provided by combining NDT registration and the bag-of-words model, and the vehicle positioning information is output in real time.
It achieves high-precision autonomous localization and mapping in dynamic scenes, solves the problems of SLAM framework's inability to obtain absolute pose and GPS signal occlusion causing missing localization information, and improves the stability and accuracy of localization.
Smart Images

Figure CN117589151B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of autonomous driving technology, and in particular relates to a dynamic scene autonomous localization and mapping method that integrates a bag-of-words model. Background Technology
[0002] Precise positioning is a prerequisite for autonomous driving, enabling vehicle path planning and motion control. Differential GPS sensors experience prolonged signal loss due to the influence of tall buildings, tunnels, and trees in parking scenarios. When GPS signals are unstable, the integrated navigation and positioning algorithm is essentially operating only with the IMU (Integrated Measurement Unit), and the accumulated error inherent in the IMU results in low positioning accuracy. Positioning and mapping schemes that extract features using visual sensors are ineffective in dark environments and struggle to achieve centimeter-level accuracy, failing to meet the requirements for autonomous parking positioning.
[0003] With the rapid development of the laser industry, the price of LiDAR is decreasing. LiDAR has low requirements for lighting conditions and is not affected by objective factors such as day / night or weather. Furthermore, LiDAR can perform high-precision ranging within a short distance and offers high stability in localization and mapping. Therefore, the application of LiDAR in autonomous driving systems has gained widespread recognition from academia and industry. For example, LiDAR SLAM frameworks such as LeGO-LOAM and LIO-SAM maintain good robustness in dynamic scenes. However, SLAM frameworks cannot obtain absolute pose, and GPS signals are easily blocked, leading to missing localization information for autonomous parking. To address this, we propose a dynamic scene autonomous localization and mapping method that integrates a bag-of-words model. Summary of the Invention
[0004] The purpose of this invention is to provide a dynamic scene autonomous localization and mapping method that integrates the bag-of-words model, aiming to solve the problems mentioned in the background art.
[0005] To achieve the above objectives, the present invention provides the following technical solution:
[0006] A dynamic scene autonomous localization and mapping method integrating the bag-of-words model includes the following steps:
[0007] Step 1: Activate the RGB camera, LiDAR, and GPS / IMU sensors to record real-world data of the parking lot environment;
[0008] Step 2: Construct a point cloud map of the parking lot using the laser-inertial SLAM algorithm, and align the point cloud map to the geodetic coordinate system using GPS / IMU sensors; generate visual keyframes during the point cloud map construction process, and save the poses corresponding to the visual keyframes at the same time.
[0009] Step 3: Use the YOLOv5 model to perform object detection, remove movable obstacles from the visual keyframes, generate bag-of-words vectors from the keyframes with removed dynamic features, and map the bag-of-words vectors to the poses, saving them as a bag-of-words model.
[0010] Step 4: During the localization process, the constructed point cloud map is filtered and rasterized, and the preprocessed point cloud map is loaded; image data is acquired using the camera, and the dynamic features of the current frame are removed using the YOLOv5 model to generate bag-of-words vectors. These vectors are then retrieved in the constructed bag-of-words model to map the vehicle's location and initialize it.
[0011] Step 5: After completing the positioning initialization, use NDT registration to register the current point cloud with the point cloud map, and use the position provided by the bag-of-words model and GPS / IMU fusion as the initial value for registration, and output the vehicle's pose information in real time.
[0012] Furthermore, the specific steps of step 1 include:
[0013] Step 11: Open the ROS boot files for the RGB camera, LiDAR, and GPS / IMU;
[0014] Step 12: Record the ROS topic from multiple sensors and save it as a rosbag file.
[0015] Furthermore, the specific steps of step 2 include:
[0016] Step 21: Preprocess the point cloud scan frame to remove outliers, perform voxel filtering, and use IMU pre-integration to distort the point cloud.
[0017] Step 22: Based on the curvature c of the point cloud, extract line features and surface features to perform point cloud registration, obtain the point cloud pose, and construct a laser odometry.
[0018] Step 23: Construct ScanContext loop closure detection factors, fuse odometry information and IMU information for factor graph optimization, and trigger graph optimization again to perform global pose correction when a loop closure is detected;
[0019] Step 24: Align the constructed point cloud map to the geodetic coordinate system using GPS and IMU information;
[0020] Step 25: During the mapping process, select one frame of image data as a visual keyframe at regular intervals and save the visual keyframe and its corresponding pose.
[0021] Furthermore, step 3 specifically includes the following steps:
[0022] Step 31: Import the selected visual keyframe images into the YOLOv5 model for target detection, obtain the pixel range of movable obstacles, and set the range as a dynamic range.
[0023] Step 32: Extract ORB feature points from the visual keyframes, remove feature points in the dynamic range, extract ORB feature points uniformly, and calculate feature descriptors.
[0024] Step 33: Generate bag-of-words vectors from the visual keyframes after removing dynamic features using DBoW3. While constructing the bag-of-words vectors, subscribe to the pose information of the keyframes and save them into the bag-of-words model.
[0025] Furthermore, step 33 specifically includes the following steps:
[0026] Step 331: Generate static bag-of-words vectors from the visual keyframes after removing dynamic features using the DBOW3 bag-of-words library;
[0027] Step 332: Based on the timestamp, establish a mapping relationship between the bag-of-words vector and the pose information corresponding to the visual keyframe;
[0028] Step 333: Save the bag-of-words vectors and their corresponding pose information as a bag-of-words model for relocalization.
[0029] Furthermore, step 4 includes the following specific steps:
[0030] Step 41: Start the localization process, activate the camera, LiDAR and GPS / IMU sensors, and perform voxel filtering on the constructed 3D point cloud map;
[0031] Step 42: Load the preprocessed 3D point cloud map into Rviz;
[0032] Step 43: Use the camera to collect the current image data, input it into the YOLOv5 model, obtain the pixel regions of vehicles and pedestrians, and define them as dynamic regions;
[0033] Step 44: Remove ORB feature points in the dynamic region, homogenize the extracted feature points, calculate feature descriptors, and generate bag-of-words vectors using DBoW3.
[0034] Step 45: Map the bag-of-words vector of the current frame to the pose of the current frame, encapsulate it into a geometry_msgs type, publish the ROS pose topic, and complete the vehicle localization initialization.
[0035] Furthermore, step 5 includes the following specific steps:
[0036] Step 51: After the point cloud map is loaded, initialize it using the initial pose information;
[0037] Step 52: Use IMU pre-integration information as the initial attitude value for NDT point cloud registration, GPS information as the initial position value for point cloud registration, and use the bag-of-words model as compensation for the initial registration value.
[0038] Step 53: Perform scan-to-map registration of point cloud and map frame by frame, and output vehicle positioning data.
[0039] Compared with the prior art, the beneficial effects of the present invention are:
[0040] This dynamic scene autonomous localization and mapping method, which integrates bag-of-words model, uses multi-line LiDAR and depth camera to obtain surrounding environmental information. Then, it uses optical flow to track feature points in the image to mark dynamic points. Finally, it uses point cloud data after removing dynamic and outlier points to estimate vehicle pose and map the environment. This method solves the problems that the SLAM framework cannot obtain absolute pose and that GPS and other signals are easily blocked, resulting in the loss of autonomous parking positioning information. Attached Figure Description
[0041] Figure 1 This is a schematic diagram of the steps of the present invention.
[0042] Figure 2 This is a flowchart of the present invention.
[0043] Figure 3 This is a schematic diagram illustrating the construction of a point cloud map in Rviz according to the present invention.
[0044] Figure 4 This is a schematic diagram of autonomous localization in a parking scenario, which incorporates a semantic bag-of-words model, as described in this invention. Detailed Implementation
[0045] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.
[0046] The specific implementation of the present invention will be described in detail below with reference to specific embodiments.
[0047] like Figure 1 As shown, an embodiment of the present invention provides a dynamic scene autonomous localization and mapping method based on a bag-of-words model, comprising the following steps:
[0048] Step 1: Activate the RGB camera, LiDAR, and GPS / IMU sensors to record real-world data of the parking lot environment;
[0049] In a preferred embodiment of the present invention, step 1 is specifically implemented as follows:
[0050] 1.1 Launch the launch files for the camera, LiDAR, and GPS / IMU sensors;
[0051] 1.2 Open the Rviz visualization tool, import the relevant sensor topics, and visualize the recorded data;
[0052] 1.3 Record rosbag data from multiple sensors using the rosbag record command.
[0053] Step 2: Construct a point cloud map of the parking lot using the laser-inertial SLAM algorithm, and align the point cloud map to the geodetic coordinate system using GPS / IMU sensors. During the construction of the point cloud map, visual keyframes are generated and the poses corresponding to the visual keyframes are saved at the same time.
[0054] In a preferred embodiment of the present invention, step 2 is implemented as follows:
[0055] Step 21: Preprocess the point cloud scan frame to remove outliers, perform voxel filtering, and use IMU pre-integration to distort the point cloud.
[0056] Step 22: Based on the curvature c of the point cloud, extract line and surface features for point cloud registration to obtain the point cloud pose and construct a laser odometry system; Step 22 specifically includes the following steps:
[0057] Step 221: Calculate the curvature c based on 10 points near the point cloud. Set the features with larger curvature c as line features and the features with smaller curvature c as surface features. The specific extraction method is as follows:
[0058]
[0059] Where g k,L i Let S represent the i-th point on the L-th scan line of the k-th frame of the lidar, S represent the set of 10 points with different depths, |S| represents the potential of the set S, and ||.|| represents the L2 norm in space. The smoothness of each point can be obtained through this calculation method.
[0060] In actual calculations, the calculations can usually be simplified because smoothness does not require absolute values, but only relative comparisons. Therefore, in actual calculations, only the L2 norm of the depth difference between the set S and the point to be calculated can be calculated. Then, by sorting them from smallest to largest, the angular features and surface features can be compared. This can reduce the amount of calculation and obtain the smoothness of the feature processing in the later stage.
[0061] Step 222: Extract line features and surface features respectively, perform ICP point cloud registration, and obtain the vehicle's pose transformation. Specific features include:
[0062] (a) NDT has a better initialization effect, runs faster than most point cloud matching algorithms, and is more stable in operation;
[0063] (b) When a relatively accurate initial value is provided, NDT can quickly register point clouds and output accurate pose transformations.
[0064] Step 23: Construct Scan Context loop closure detection factors, fuse odometry information and IMU information for factor graph optimization, and trigger graph optimization again to perform global pose correction when a loop closure is detected;
[0065] Step 24: Align the constructed point cloud map to the geodetic coordinate system using GPS and IMU information;
[0066] The specific alignment method is as follows:
[0067] (a) Record the vehicle attitude angles of the initial mapping frame when the mapping algorithm starts;
[0068] (b) Based on the attitude angle of the GPS-calibrated northeast-sky coordinate system, the constructed global map is converted to the geodetic coordinate system;
[0069] Step 25: During the mapping process, select one frame of image data as a visual keyframe at regular intervals and save the visual keyframe and its corresponding pose.
[0070] Step 3: Use the YOLOv5 model to perform object detection, remove movable obstacles from the visual keyframes, generate bag-of-words vectors from the keyframes with removed dynamic features, and map the bag-of-words vectors to the poses, saving them as a bag-of-words model.
[0071] In a preferred embodiment of the present invention, step 3 is implemented as follows:
[0072] Step 31: Import the selected visual keyframe images into the YOLOv5 model for object detection, obtain the pixel range of movable obstacles, and set the range as a dynamic range; Step 31 specifically includes the following steps:
[0073] Step 311: Convert the images of the visual keyframes into Cv::Mat format, import them into the YOLOv5 model, and perform object detection.
[0074] Step 312: Target detection will determine the type of the target based on the visual features in the image. When the type is detected as a vehicle, pedestrian, or other features that are likely to exist in road conditions, it will be determined as a movable obstacle and the interval will be set as a dynamic interval.
[0075] Step 313: Output the pixel coordinates of the dynamic range in real time as a two-dimensional array for subsequent feature point removal.
[0076] Step 32: Extract ORB feature points from the visual keyframes, remove feature points within the dynamic range, extract ORB feature points uniformly, and calculate feature descriptors. This step removes dynamic features from the visual keyframes, retaining only static features. This ensures high robustness during global bag-of-words vector retrieval, unaffected by vehicles or pedestrians. Step 32 specifically includes the following steps:
[0077] Step 321: Divide each frame of image into an n*n grid, remove the grids in the dynamic range, and obtain the number of static grids.
[0078] Step 322: Using the total number of feature points to be extracted and the static grid, calculate the number of feature points to be extracted for each static grid.
[0079] Step 323: Extract ORB feature points and calculate the corresponding feature descriptors.
[0080] Step 33: Generate bag-of-words vectors from the visual keyframes after removing dynamic features using DBoW3. While constructing the bag-of-words vectors, subscribe to the pose information of the keyframes and save them into the bag-of-words model. Step 33 specifically includes the following steps:
[0081] Step 331: Generate static bag-of-words vectors from the visual keyframes after removing dynamic features using the DBOW3 bag-of-words library;
[0082] Step 332: Based on the timestamp, establish a mapping relationship between the bag-of-words vector and the pose information corresponding to the visual keyframe;
[0083] Step 333: Save the bag-of-words vectors and their corresponding pose information as a bag-of-words model for relocalization.
[0084] Step 4: During the localization process, the constructed point cloud map is filtered and rasterized (this removes noise and invalid information from the map, thereby improving the accuracy of localization. Rasterization converts continuous map data into discrete grids, facilitating subsequent calculations). The preprocessed point cloud map is then loaded. Image data is acquired using a camera, and the YOLOv5 model is used to remove dynamic features from the current frame (this can identify and remove dynamic features in the current frame, such as moving vehicles and pedestrians, helping to filter out dynamic objects unrelated to static environmental features and improving the accuracy of bag-of-words model retrieval). A bag-of-words vector is generated, and the vehicle's location is mapped and initialized in the constructed bag-of-words model.
[0085] In a preferred embodiment of the present invention, step 4 includes the following specific steps:
[0086] Step 41: Start the localization process, activate the camera, LiDAR and GPS / IMU sensors, and perform voxel filtering on the constructed 3D point cloud map;
[0087] Step 42: Load the preprocessed 3D point cloud map into Rviz;
[0088] Step 43: Use the camera to collect the current image data, input it into the YOLOv5 model, obtain the pixel regions of vehicles and pedestrians, and define them as dynamic regions;
[0089] Step 44: Remove ORB feature points from the dynamic region, homogenize the extracted feature points, calculate feature descriptors, and generate bag-of-words vectors using DBoW3; Step 44 specifically includes the following steps:
[0090] Step 441: Divide the current frame image into an n*n grid, remove the grids in the dynamic range, and obtain the number of static grids and the number of feature points to be extracted from each static grid.
[0091] Step 442: Uniformly extract ORB features of the current frame and calculate feature descriptors;
[0092] Step 443: Construct the DBoW3 bag-of-words vector based on the calculated feature descriptors.
[0093] Step 45: Map the bag-of-words vector of the current frame to the pose of the current frame, encapsulate it into a geometry_msgs type, publish the ROS pose topic, and complete the vehicle localization initialization.
[0094] Step 5: After completing the localization initialization, use NDT registration to register the current point cloud with the point cloud map (using NDT registration for scan-to-map registration can obtain the absolute pose). Use the position provided by the fusion of the bag-of-words model and GPS / IMU as the initial value for registration (the bag-of-words model is used to predict the initial pose of the reference point cloud, which can convert image features into high-dimensional vector representations and find the best matching image features through the bag-of-words model, providing accurate initial values in parking scenarios with weak GPS signals). Output the vehicle's pose information in real time.
[0095] In a preferred embodiment of the present invention, step 5 includes the following specific steps:
[0096] Step 51: After the point cloud map is loaded, initialize it using the initial pose information;
[0097] Step 52: Use IMU pre-integration information as the initial attitude value for NDT point cloud registration, and GPS information as the initial position value for point cloud registration. In scenarios with weak GPS, use the bag-of-words model as compensation for the initial registration value.
[0098] Step 53: Perform scan-to-map registration of point cloud and map frame by frame to output high-precision positioning data of vehicle.
[0099] The above are merely preferred embodiments of the present invention. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present invention, and these should also be considered within the scope of protection of the present invention. These modifications and improvements will not affect the effectiveness of the implementation of the present invention or the practicality of the patent.
Claims
1. A method for dynamic scene autonomous localization and mapping that integrates a bag-of-words model, characterized in that, Includes the following steps: Step 1: Activate the RGB camera, LiDAR, and GPS / IMU sensors to record real-world data of the parking lot environment; Step 2: Construct a point cloud map of the parking lot using the laser-inertial SLAM algorithm, and align the point cloud map to the geodetic coordinate system using GPS / IMU sensors; generate visual keyframes during the point cloud map construction process, and save the poses corresponding to the visual keyframes at the same time. Step 3: Use the YOLOv5 model to perform object detection, remove movable obstacles from the visual keyframes, generate bag-of-words vectors from the keyframes with removed dynamic features, and map the bag-of-words vectors to the poses, saving them as a bag-of-words model. Step 4: During the localization process, the constructed point cloud map is filtered and rasterized, and the preprocessed point cloud map is loaded; image data is acquired using the camera, and the dynamic features of the current frame are removed using the YOLOv5 model to generate bag-of-words vectors. These vectors are then retrieved in the constructed bag-of-words model to map the vehicle's location and initialize it. Step 5: After completing the positioning initialization, the current point cloud is registered with the point cloud map using NDT registration. The IMU pre-integration information is used as the initial attitude value for NDT point cloud registration, and the GPS information is used as the initial position value for point cloud registration. The bag-of-words model is used as compensation for the initial registration value, and the vehicle's pose information is output in real time.
2. The dynamic scene autonomous localization and mapping method based on the fusion bag-of-words model according to claim 1, characterized in that, The specific steps of step 1 include: Step 11: Open the ROS boot files for the RGB camera, LiDAR, and GPS / IMU; Step 12: Record the ROS topic from multiple sensors and save it as a rosbag file.
3. The dynamic scene autonomous localization and mapping method based on the fusion bag-of-words model according to claim 1, characterized in that, The specific steps of step 2 include: Step 21: Preprocess the point cloud scan frame to remove outliers, perform voxel filtering, and use IMU pre-integration to distort the point cloud. Step 22: Based on the curvature c of the point cloud, extract line features and surface features to perform point cloud registration, obtain the point cloud pose, and construct a laser odometry. Step 23: Construct Scan Context loop closure detection factors, fuse odometry information and IMU information for factor graph optimization, and trigger graph optimization again to perform global pose correction when a loop closure is detected; Step 24: Align the constructed point cloud map to the geodetic coordinate system using GPS and IMU information; Step 25: During the mapping process, select one frame of image data as a visual keyframe at regular intervals and save the visual keyframe and its corresponding pose.
4. The dynamic scene autonomous localization and mapping method based on the fusion bag-of-words model according to claim 1, characterized in that, The specific steps of step 3 include: Step 31: Import the selected visual keyframe images into the YOLOv5 model for target detection, obtain the pixel range of movable obstacles, and set the range as a dynamic range. Step 32: Extract ORB feature points from the visual keyframes, remove feature points in the dynamic range, extract ORB feature points uniformly, and calculate feature descriptors. Step 33: Generate bag-of-words vectors from the visual keyframes after removing dynamic features using DBoW3. While constructing the bag-of-words vectors, subscribe to the pose information of the keyframes and save them into the bag-of-words model.
5. The dynamic scene autonomous localization and mapping method based on the fusion bag-of-words model according to claim 4, characterized in that, The specific steps of step 33 include: Step 331: Generate static bag-of-words vectors from the visual keyframes after removing dynamic features using the DBOW3 bag-of-words library; Step 332: Based on the timestamp, establish a mapping relationship between the bag-of-words vector and the pose information corresponding to the visual keyframe; Step 333: Save the bag-of-words vectors and their corresponding pose information as a bag-of-words model for relocalization.
6. The dynamic scene autonomous localization and mapping method based on the fusion bag-of-words model according to claim 1, characterized in that, The specific steps of step 4 include: Step 41: Start the localization process, activate the camera, LiDAR and GPS / IMU sensors, and perform voxel filtering on the constructed 3D point cloud map; Step 42: Load the preprocessed 3D point cloud map into Rviz; Step 43: Use the camera to collect the current image data, input it into the YOLOv5 model, obtain the pixel regions of vehicles and pedestrians, and define them as dynamic regions; Step 44: Remove ORB feature points in the dynamic region, homogenize the extracted feature points, calculate feature descriptors, and generate bag-of-words vectors using DBoW3. Step 45: Map the bag-of-words vector of the current frame to the pose of the current frame, encapsulate it into a geometry_msgs type, publish the ROS pose topic, and complete the vehicle localization initialization.