A method of updating a map and related devices
By calculating the similarity between the location image and the comparison image, and using a multi-task network architecture to update the high-precision map, the problem of the high-precision map not being updated in a timely manner is solved, thus improving the accuracy and stability of VPS positioning.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HUAWEI TECH CO LTD
- Filing Date
- 2021-02-20
- Publication Date
- 2026-06-19
AI Technical Summary
Existing high-precision maps cannot be updated in a timely manner when the real-world environment changes, resulting in inaccurate or failed VPS positioning, which affects the stability and accuracy of the positioning system.
By acquiring localized images and their location information, calculating image similarity, determining whether the environment has changed, using a multi-task network architecture to process image similarity, generating comparison images and performing local and global similarity analysis, and updating the high-precision map.
It achieves real-time synchronization between high-precision maps and the real environment, improving the accuracy and stability of VPS positioning and avoiding positioning failures.
Smart Images

Figure CN114969221B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of positioning and image processing technology, and in particular to a method and related equipment for updating maps. Background Technology
[0002] A Visual Positioning System (VPS) is a technology that uses images captured by an electronic device to locate the device. For example, the electronic device captures a first image, and by matching the first image with images in an image library, a second image with the highest matching degree is obtained from the image library. The current location of the electronic device is determined based on the location where the second image was captured.
[0003] In this system, the shooting location of each image in the image library is known; the shooting location is the location information of the device that captured the image. Therefore, by matching the first image with images in the image library, a second image with the highest similarity to the first image is obtained. Based on the shooting location of the second image, the shooting location of the first image can be determined, thus achieving the purpose of locating the electronic device.
[0004] With the widespread application of Augmented Reality (AR) and the continuous development of technologies such as autonomous robots and self-driving cars, especially when applying Virtual Private Servers (VPS) to autonomous robots and self-driving vehicles, higher demands are placed on the positioning accuracy, positioning speed, and stability of the VPS positioning system. High-precision maps are used for positioning during the VPS positioning process to improve accuracy. However, the real-world environment changes; for example, billboards may be added to buildings, and road signs may be added to roads. If the real-world environment changes but the environment reflected in the high-precision map does not, inaccurate positioning or positioning failures may occur when using the high-precision map for VPS positioning. Therefore, real-time updates to the high-precision maps used by the VPS to ensure that the environment reflected in the high-precision map is consistent with the real-world environment are key factors in ensuring the stability and sustainability of the VPS. Summary of the Invention
[0005] This application provides a method and related equipment for updating maps, enabling the maps used by VPS to be updated in real time, ensuring that the environment reflected by the map is consistent with the real-world environment, thereby improving the accuracy of VPS positioning.
[0006] To achieve the above technical objectives, this application adopts the following technical solution:
[0007] Firstly, this application provides a method for updating a map, which may include: acquiring a positioning image and corresponding location information. Understandably, the location information may indicate the location of the device that captured the positioning image. The location information is obtained by visually locating the positioning image based on a map. A corresponding comparison image is determined based on the location information, and the similarity between the positioning image and the comparison image is calculated. This similarity can characterize the degree of similarity between the positioning image and the comparison image.
[0008] In other words, in the method provided in this application embodiment, the positioning image is an image captured by VPS positioning, and a corresponding comparison image is generated based on the location information of the positioning image. That is, if the location information of the comparison image and the positioning image is the same, then the comparison image and the positioning image can be considered identical images. By comparing the comparison image and the positioning image, the similarity can be used to determine whether the real environment reflected by the positioning image has changed. Specifically, if the similarity indicates that the positioning image and the comparison image are similar, it means that the real environment reflected by the positioning image has not changed, and the map does not need to be updated. If the similarity indicates that the positioning image and the comparison image are dissimilar, it means that the real environment reflected by the positioning image has changed, and the map can be updated based on the positioning image.
[0009] In one possible design approach of the first aspect, similarity can include local similarity and global similarity. Local similarity can characterize the degree of local similarity between the localized image and the comparison image, while global similarity can characterize the degree of global similarity between the localized image and the comparison image.
[0010] The calculation of the similarity between the localized image and the comparison image can include: first, calculating the global similarity between the localized image and the comparison image; if the global similarity, indicating the global similarity between the localized image and the comparison image, is less than a preset threshold, it means that the localized image and the comparison image are completely different images, and the probability of localization failure is high. Therefore, in this case, to avoid modifying the map in case of localization failure and affecting the map's accuracy, the map is not updated. If the global similarity, indicating the global similarity between the localized image and the comparison image, is greater than a preset threshold, it means that the localized image and the comparison image are globally similar images. In this case, the local similarity between the localized image and the comparison image can be further calculated to determine whether to update the high-precision map based on the localized image.
[0011] In another possible design of the first aspect, similarity includes local similarity and global similarity. Local similarity can characterize the degree of similarity between the localized image and the comparison image locally, while global similarity can characterize the degree of similarity between the localized image and the comparison image globally.
[0012] The calculation of similarity between the localized image and the comparison image can specifically include: constructing a multi-task network architecture to process the comparison image and the localized image to obtain the image processing result. It should be noted that the multi-task network architecture can simultaneously process both tasks and output the results when processing the localized and comparison images. For example, the multi-task network architecture can simultaneously calculate the global and local similarities of the comparison and localized images, and the output result includes both global and local similarities.
[0013] Specifically, the multi-task network architecture inputs a localization image and a comparison image, and obtains the processing results, which include global similarity and local similarity. In other words, the image processing task of the multi-task network architecture includes calculating the global similarity between the comparison image and the localization image, as well as calculating the local similarity between the comparison image and the localization image.
[0014] In another possible design approach of the first aspect, the location information includes the shooting location and shooting pose, the shooting pose indicates the shooting field of view of the electronic device that generates the positioning image, and the map is a point cloud map.
[0015] The process of determining the corresponding comparison image based on location information can specifically include: determining the position coordinates of the point cloud map indicated by the shooting location; generating a 3D image corresponding to the shooting field of view based on the shooting position and the shooting pose. In other words, the comparison image and the positioning image have the same field of view. The 3D image is then rendered to generate a 2D image, which is used as the comparison image.
[0016] In another possible design approach of the first aspect, a preset image library is provided, which includes at least one preset image and the corresponding location information of the preset image. The location information includes the shooting location and shooting pose, and the shooting pose indicates the field of view of the electronic device generating the positioning image.
[0017] The above-mentioned determination of the corresponding comparison image based on the location information may specifically include: obtaining at least one preset image based on the location information of the positioning image, and using an image synthesis method to generate a comparison image from the at least one preset image, so that the location information of the comparison image is the same as the location information of the positioning image.
[0018] In another possible design approach of the first aspect, the above method may further include: acquiring an original image from an electronic device, which is an image captured by the electronic device for visual positioning of the electronic device; and using an image processing algorithm to remove dynamic objects from the original image to generate a positioning image. The dynamic objects include vehicles and / or animals.
[0019] Understandably, the original image is a picture of the real environment captured by an electronic device. The original image may include dynamic objects such as vehicles, people, and pets in the environment. These dynamic objects can interfere with image contrast. In order to reduce the interference of dynamic objects on the localization image, image processing algorithms are used to remove these dynamic objects after the original image is acquired in order to generate the localization image.
[0020] In another possible design approach of the first aspect, the calculation of the local similarity between the localized image and the comparison image may specifically include: dividing the localized image into at least one ROI based on a preset ROI partitioning method, and dividing the comparison image into at least one ROI. The ROI includes a target object, which may be at least one of a building, road, or road sign. For ROIs at the same location on the localized image and the comparison image, the following operation is performed: comparing the ROIs on the localized image and the ROIs on the comparison image, and calculating the similarity of the ROIs to obtain the local similarity.
[0021] Understandably, a corresponding local similarity can be calculated for each ROI, meaning that at least one local similarity result exists between the localized image and the comparison image. Therefore, it is necessary to evaluate each local similarity result separately to determine whether each ROI has changed, in order to update the map.
[0022] In another possible design approach of the first aspect, if the similarity indicator shows that the localized image and the comparison image are not similar, updating the map based on the localized image can specifically include: if the local similarity indicator shows that the comparison image and the localized image are not similar, generating a 3D bounding box based on the ROIs in the localized image. This 3D bounding box includes the ROI region point cloud and the number of times the ROI region point cloud is visited. It should be noted that the above calculation of local similarity is performed separately for each ROI. When the local similarity indicator corresponding to that ROI shows that the localized image and the comparison image are not similar, this applies to ROIs at the same location in both the localized image and the comparison image, meaning that the ROIs in the localized image and the ROIs in the comparison image are not similar.
[0023] Visual localization is performed based on the map, and the number of times the ROI region point cloud is accessed during visual localization is counted. It is understood that the map for visual localization includes a 3D bounding box. Based on the number of times the ROI region point cloud is accessed within the 3D bounding box, if the number of accesses exceeds a preset limit, the original ROI in the map is updated to the ROI in the localization image.
[0024] In another possible design approach in the first aspect, the above method may further include: generating a corresponding ROI region point cloud based on the ROI region in the positioning image.
[0025] In another possible design approach of the first aspect, the above method may further include: if the global similarity representation comparison image and the localization image are not similar, it is determined that the localization of the localization image has failed.
[0026] Secondly, this application provides an apparatus for updating a map, including one or more processors; a memory; and one or more computer programs. The one or more computer programs are stored in the memory, and include instructions that, when executed by the map updating apparatus, cause the processor to perform the following steps:
[0027] The process involves acquiring a positioning image and its corresponding location information. This location information indicates the location of the device that captured the image. The location information is obtained through visual positioning of the image using a map. A comparison image is then determined based on the location information, and the similarity between the positioning image and the comparison image is calculated. This similarity score characterizes the degree of similarity between the two images. If the similarity score indicates that the positioning image and the comparison image are similar, it means that the real environment reflected by the positioning image has not changed, and the map does not need to be updated. If the similarity score indicates that the positioning image and the comparison image are dissimilar, it means that the real environment reflected by the positioning image has changed, and the map can be updated based on the positioning image.
[0028] In one possible design approach of the second aspect, similarity can include local similarity and global similarity. Local similarity can characterize the degree of local similarity between the localized image and the comparison image, while global similarity can characterize the degree of global similarity between the localized image and the comparison image.
[0029] When calculating the similarity between the localized image and the comparison image, the processor performs the following steps: First, it calculates the global similarity between the localized image and the comparison image. If the global similarity, indicating the overall similarity between the localized image and the comparison image, is less than a preset threshold, it means that the localized image and the comparison image are completely different images, and the probability of localization failure is high. If the global similarity, indicating the overall similarity between the localized image and the comparison image, is greater than the preset threshold, it means that the localized image and the comparison image are globally similar images. Then, it calculates the local similarity between the localized image and the comparison image to determine whether to update the high-precision map based on the localized image.
[0030] In another possible design approach, similarity includes local similarity and global similarity. Local similarity characterizes the degree of similarity between the localized image and the comparison image, while global similarity characterizes the degree of similarity between the localized image and the comparison image globally.
[0031] When calculating the similarity between the localized image and the comparison image, the processor specifically performs the following steps: Constructing a multi-task network architecture to process the comparison and localized images to obtain the image processing results. The localized image and the comparison image are input into the multi-task network architecture, and the processing results are obtained, including global similarity and local similarity. In other words, the image processing task of the multi-task network architecture includes calculating the global similarity between the comparison and localized images, as well as calculating their local similarity.
[0032] In another possible design approach, the location information includes the shooting location and shooting pose, the shooting pose indicates the shooting field of view of the electronic device that generates the positioning image, and the map is a point cloud map.
[0033] When the processor determines the corresponding comparison image based on the location information, it specifically performs the following steps: determining the position coordinates of the point cloud map indicated by the shooting location; generating a 3D image corresponding to the shooting field of view based on the shooting position and pose; rendering the 3D image to generate a 2D image; and using this 2D image as the comparison image.
[0034] In another possible design approach, the map-updating device can have a preset image library, which includes at least one preset image and the corresponding location information. The location information includes the shooting location and shooting pose, with the shooting pose indicating the field of view of the electronic device generating the positioning image.
[0035] When the processor determines the corresponding comparison image based on the location information, it specifically performs the following: based on the location information of the positioning image, it acquires at least one preset image, and uses an image synthesis method to generate a comparison image from the at least one preset image, so that the location information of the comparison image is the same as the location information of the positioning image.
[0036] In another possible design approach, the processor is further configured to: acquire a raw image from the electronic device, which is captured by the electronic device and used for visual positioning of the electronic device; and employ an image processing algorithm to remove moving objects from the raw image to generate a positioning image. The moving objects include vehicles and / or animals.
[0037] In another possible design approach, when the processor calculates the local similarity between the localized image and the comparison image, it specifically performs the following: based on a preset ROI partitioning method, the localized image is divided into at least one ROI, and the comparison image is divided into at least one ROI. Each ROI includes a target object, which can be at least one of a building, road, or road sign. For ROIs at the same location in the localized image and the comparison image, the following operation is performed: comparing the ROIs in the localized image and the ROIs in the comparison image, the similarity of the ROIs is calculated to obtain the local similarity.
[0038] In another possible design approach, when the processor updates the map based on the localization image if the similarity representation of the localization image and the comparison image are not similar, it can specifically perform the following steps: If the local similarity representation of the comparison image and the localization image are not similar, a 3D bounding box is generated based on the ROI in the localization image. This 3D bounding box includes the ROI region point cloud and the number of times the ROI region point cloud is accessed. Visual localization is performed based on the map, and the number of times the ROI region point cloud is accessed during visual localization is counted. It is understood that the map for visual localization includes a 3D bounding box. Based on the number of times the ROI region point cloud is accessed in the 3D bounding box, if it is determined that the number of times the ROI region point cloud is accessed exceeds a preset number, the original ROI in the map is updated to the ROI in the localization image.
[0039] In another possible design approach, the processor can also be used to perform: generating a corresponding ROI region point cloud based on the ROI region in the localization image.
[0040] In another possible design approach, the processor can also be used to perform the following: if the global similarity representation comparison image and the localization image are not similar, determine that the localization of the localization image has failed.
[0041] Thirdly, this application also provides an electronic device, comprising: one or more processors; a memory; and one or more computer programs, wherein the one or more computer programs are stored in the memory, and the one or more computer programs include instructions that, when executed by the electronic device, cause the electronic device to perform the methods described in the first aspect and any of its possible design embodiments.
[0042] Fourthly, embodiments of this application provide a computer-readable storage medium including computer instructions that, when executed on an electronic device, cause the electronic device to perform the methods described in the first aspect and any of its possible designs.
[0043] Fifthly, embodiments of this application provide a computer program product that, when run on a computer, causes the computer to perform the method described in the first aspect and any possible design of the electronic device.
[0044] Sixthly, embodiments of this application provide a chip system applied to an electronic device. The chip system includes one or more interface circuits and one or more processors; the interface circuits and processors are interconnected via lines; the interface circuits are used to receive signals from the electronic device's memory and send signals to the processors, the signals including computer instructions stored in the memory; when the processor executes the computer instructions, it causes the electronic device to perform the methods described in the first aspect and any of its possible designs.
[0045] It is understood that the beneficial effects achieved by the map updating device of the second aspect, the electronic device of the third aspect, the computer-readable storage medium of the fourth aspect, the computer program product of the fifth aspect, and the chip system of the sixth aspect provided in this application can be referred to as the beneficial effects of the first aspect and any possible design thereof, which will not be repeated here. Attached Figure Description
[0046] Figure 1 This application provides a schematic diagram of a map updating process.
[0047] Figure 2 A schematic diagram of a positioning image provided in an embodiment of this application;
[0048] Figure 3 A flowchart for updating a high-precision map provided in this application embodiment;
[0049] Figure 4 A schematic diagram of an original image provided for an embodiment of this application;
[0050] Figure 5A This application provides an example of an image from a preset image library.
[0051] Figure 5B This is a schematic diagram of an image from another preset image library provided in the embodiments of this application;
[0052] Figure 5C This is a schematic diagram of an image from another preset image library provided in the embodiments of this application;
[0053] Figure 6 A comparative image schematic diagram provided for an embodiment of this application;
[0054] Figure 7 Another comparative image schematic diagram provided for embodiments of this application;
[0055] Figure 8 A flowchart illustrating a method for updating a map provided in this application embodiment;
[0056] Figure 9A flowchart illustrating another updated high-precision map provided in this application embodiment;
[0057] Figure 10A Another comparative image schematic diagram provided for embodiments of this application;
[0058] Figure 10B This is another schematic diagram of a positioning image provided in an embodiment of this application;
[0059] Figure 11 This is a schematic diagram of a heat map provided in an embodiment of this application. Detailed Implementation
[0060] Hereinafter, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this embodiment, unless otherwise stated, "a plurality of" means two or more.
[0061] In layman's terms, high-precision maps are electronic maps with higher positioning accuracy and more data dimensions. Higher positioning accuracy means that high-precision maps can achieve positioning accuracy down to the centimeter level. More data dimensions mean that high-precision maps also include traffic-related static information, such as lane width, lane line type, lane height restrictions, guardrails, road edge types, and roadside landmarks. In other words, high-precision maps include visual feature description files, which can include traffic-related static information, as well as environmental information related to positioning, such as static information about indoor scenes.
[0062] VPS positioning is based on high-precision maps, providing centimeter-level accuracy. For example, when real-world traffic-related driving assistance information changes, the static information in the high-precision map also needs to be updated accordingly. This ensures the accuracy of VPS positioning during high-precision map navigation. In other words, high-precision maps need to be updated frequently to guarantee accurate VPS positioning.
[0063] Generally, high-definition maps can be maintained through manual image acquisition. Specifically, when changes are detected in traffic-related driving assistance information, relevant personnel collect images of the real-world environment and update the high-definition map based on these images. This method suffers from information lag, is inefficient, and has high manual acquisition costs.
[0064] With the development of network technology, network transmission bandwidth has increased and network latency has decreased. Therefore, high-precision maps can be updated using crowdsourced data through network transmission. For example, map information collection devices can be installed in taxis to collect driving data along the routes the vehicles travel. After a taxi has traveled a certain distance, the collection device can obtain the driving data for that route, and the onboard equipment can upload the driving data to update the high-precision map.
[0065] In the first implementation, the computing system receives a set of region description files from multiple first mobile devices, and the computing system can update the high-precision map based on the set of region description files. The set of region description files includes multiple region description files, where each region description file is a spatial feature point cloud of that region.
[0066] Specifically, the computing system includes a merging module, a positioning module, and a query module. The merging module receives a region description file from the first mobile device and stores it in a first data storage device. The positioning module generates a positioning region description file from the region description file and stores it in a second storage device. The computing system receives a positioning image from the second mobile device, and the query module sends the positioning region description file to the second mobile device. The positioning region description file is a spatial feature point cloud of a region in the positioning image of the second mobile device.
[0067] The calculation system updates the location area description file based on location feedback data from the second mobile device (e.g., data on the accuracy of the location result). The location area description file includes multiple spatial features within the location area. The calculation system maintains the spatial features within the location area description file and scores them. When the calculation system detects a spatial feature in the location area in the feedback data from the second mobile device, it increases the score of that spatial area. If the calculation system detects that the score of a spatial feature in the location area is below a preset threshold, it removes that spatial feature from the location area description file.
[0068] In the above implementation, the first mobile device generates a region description file based on detected spatial feature data and statistics. The first mobile device does not have the capability to detect whether the real-world environment has changed. The computing system receives a set of region description files from the first mobile device. A large set of region description files places a significant computational burden on the merging module and a significant storage burden on the first data storage device.
[0069] Secondly, the data acquisition device installed on the first mobile device does not capture images; instead, it generates a region description file based on sensor data. The sensors in the acquisition device include positioning sensors and inertial measurement sensors. This limits the application range of the acquisition device. For example, in indoor environments, there may be situations where the acquisition device cannot receive positioning signals, thus preventing it from obtaining indoor positioning information.
[0070] In the second implementation, a crowdsourced map is constructed based on crowdsourced data, and this crowdsourced map is used to update the high-precision map. The crowdsourced map includes lane markings and other information related to driving routes.
[0071] Specifically, after constructing a crowdsourced map based on crowdsourced data, the accuracy of the crowdsourced map can be improved by comparing the relationships between roads and features in the crowdsourced map with those in the high-precision map. The crowdsourced map and the high-precision map are then compared to obtain their differences. If the difference exceeds a preset threshold, the high-precision map is updated using the crowdsourced map.
[0072] This implementation method is used in the field of autonomous driving, specifically for high-definition maps designed for autonomous vehicles. In this case, road-related information such as lane lines in the high-definition map is particularly important. Furthermore, the update process for high-definition maps mainly considers updating elements with topological relationships, such as lane lines, resulting in overly simplistic map elements. Therefore, this map update method is difficult to apply to updating high-definition maps on VPS.
[0073] This application provides a method for updating a map. After VPS positioning using a high-precision map, a positioning image used for VPS positioning is acquired, and a comparison image with the same field of view as the positioning image is generated. The image comparison is used to determine whether the real environment reflected by the positioning image has changed. If it is determined that the real environment reflected by the positioning image has changed, the high-precision map is updated based on the positioning image; if it is determined that the real environment reflected by the positioning image has not changed, the high-precision map does not need to be updated. The positioning image used for VPS positioning is uploaded by the user; this positioning image is crowdsourced data. In other words, the method of this application update the high-precision map based on crowdsourced data.
[0074] It should be noted that when using VPS positioning, the positioning information obtained by the electronic device includes six degrees of freedom (6DOF) pose. 6DOF refers to six degrees of freedom, including three degrees of displacement and three degrees of rotation. The three degrees of freedom include forward / backward, up / down, and left / right directions. In this embodiment, the 6DOF pose represents the displacement and rotation of the electronic device in the three degrees of freedom of the world coordinate system when the electronic device is placed in the real-world world coordinate system. The world coordinate system can be a coordinate system defined at any location in the real world; for example, the coordinate system in which the electronic device captures the positioning image can be the world coordinate system.
[0075] It's worth mentioning that the VPS uses high-precision maps for positioning, which are accurate to the centimeter level. Therefore, VPS positioning using high-precision maps can achieve centimeter-level accuracy.
[0076] The methods provided in the embodiments of this application will be described below with reference to the accompanying drawings.
[0077] The application scenario of the map updating method provided in this application embodiment is: an electronic device interacts with a server (or cloud device), the server includes a high-precision map, and the server maintains the high-precision map. Specifically, after the electronic device performs VPS positioning, it can send a positioning image and the corresponding 6DOF pose to the server.
[0078] It is understood that the electronic device can be a mobile phone, tablet computer, desktop computer, laptop computer, handheld computer, notebook computer, in-vehicle computer, in-vehicle equipment, ultra-mobile personal computer (UMPC), netbook, as well as cellular phone, personal digital assistant (PDA), augmented reality (AR) / virtual reality (VR) device, etc. The embodiments of this application do not impose any special restrictions on the specific form of the electronic device.
[0079] Please refer to Figure 1 This illustrates the process by which a server acquires raw images from an electronic device and updates a high-precision map based on these raw images. The raw images are those captured by the electronic device.
[0080] In some implementations, an electronic device captures at least one raw image and sends it to a server. The server performs VPS localization on a raw image to determine the corresponding 6DOF pose.
[0081] In other implementations, the electronic device captures a positioning video and sends it to the server. Since the positioning video consists of multiple consecutive frames, the server uses one frame from the positioning video as the original image, performs VPS positioning on that original image, and determines the corresponding 6DOF pose.
[0082] Since the original image may contain moving objects such as vehicles and animals, which can affect the accuracy of image matching, image processing algorithms are needed to remove these moving objects. Specifically, the server uses image processing algorithms to remove moving objects from the original image and generate a localization image.
[0083] Alternatively, if the server includes a preset image library, it can select multiple images from the library based on the 6DOF pose and synthesize them into a comparison image using a viewpoint compositing method. If the server does not include a preset image library, it can use image rendering to generate a rendered image based on the 6DOF pose and use this rendered image as the comparison image.
[0084] Furthermore, the localization image and the comparison image are compared to determine whether the real world reflected by the localization image has changed.
[0085] For example, the global similarity between the localization image and the comparison image is calculated. If the global similarity between the localization image and the comparison image is higher than a preset threshold, it indicates that the localization image and the comparison image have the same viewpoint, and VPS localization can be determined to be successful. If the global similarity between the localization image and the comparison image is lower than the preset threshold, it indicates that the localization image and the comparison image have different viewpoints, and VPS localization can be determined to be unsuccessful. In this case, since VPS localization failed and the comparison image and the localization image have different viewpoints, it is impossible to determine whether the real world reflected by the localization image has changed. Therefore, it is not necessary to update the high-precision map based on the localization image.
[0086] For another example, the local similarity between the location image and the comparison image is calculated. Local similarity can characterize the degree of similarity between the comparison image and the location image in local regions of the image. It is worth mentioning that when the global similarity indicates that the VPS location is successful, the server calculates the local similarity to determine whether the high-precision map needs to be updated based on the location image.
[0087] Specifically, the localization image and the comparison image are segmented into multiple Regions of Interest (ROIs). The ROIs of the localization and comparison images are then compared to calculate their local similarity. If the local similarity of an ROI indicates that the comparison image and the localization image are identical, then the local region has not changed; that is, the real-world environment is the same as the environment reflected in the high-definition map, and the high-definition map does not need to be updated. If the local similarity of an ROI indicates that the comparison image and the localization image are different, then the local region has changed, and the high-definition map is updated based on the localization image.
[0088] In this embodiment of the application, during image processing, the Region of Interest (ROI) is a region selected from the image that is of interest to the image analysis. For example, when the processor processes... Figure 2 When the image shown is displayed, Figure 2 The image shown includes a first building 01, a second building 02, and a street lamp 03. If... Figure 2 The changes in the first building 01, the second building 02, and the street lamp 03 in the image indicate that changes have occurred in the real world. In other words, the image processing focuses on the areas of the first building 01, the second building 02, and the street lamp 03. Therefore, as shown... Figure 2 The image shown is segmented into three Regions of Interest (ROIs): ROI 11 includes the first building 01, ROI 12 includes the second building 02, and ROI 13 includes the streetlight 03. ROIs can be bounded by boxes (e.g.,...). Figure 2 The area to be processed is outlined in the form of a circle, ellipse, or irregular polygon, as shown in the diagram.
[0089] When it is determined that the ROI region of the positioning image has changed, the high-precision map can be updated based on the positioning image.
[0090] For example, based on the Structure from Motion (SFM) algorithm, 3D reconstruction is performed to obtain an SFM model (or sparse point cloud) corresponding to the localization image. When the ROI region of the localization image changes, the components (i.e., point cloud) related to the ROI region in the SFM model are modified. A 3D bounding box is used to record the number of visits to the ROI region point cloud. The 3D bounding box is a rectangular bounding box used to define the boundary of the localization image, including the ROI region point cloud.
[0091] The ROI region point cloud is inserted into the original RGB point cloud to obtain the SFM model RGB point cloud. At this point, the point cloud includes the original ROI region point cloud and the ROI region point cloud in the localized image. The 3D bounding box list includes the regions in the point cloud that have changed, and records the number of times each region point cloud has been accessed.
[0092] For example, the 3D bounding box list includes the original ROI region point cloud (or first point cloud) and the ROI region point cloud in the positioning image (second point cloud). The 3D bounding box list is used to record the number of times the first and second point clouds are accessed during VPS positioning. Within a preset time period, when the server VPS is positioning, the server can count the number of times the first and second point clouds are accessed. If the number of accesses to the first point cloud is greater than the number of accesses to the point cloud corresponding to the positioning image, it means that nothing has changed in the real world, and the point cloud is not updated. If the number of accesses to the first point cloud is less than the number of accesses to the point cloud corresponding to the positioning image, it means that something has changed in the real world, and the point cloud in the positioning image is updated to the position of the first point cloud to complete the point cloud map update and update the high-precision map.
[0093] Example 1
[0094] Please refer to Figure 3 This is a flowchart of a map updating method provided in an embodiment of this application. Figure 3 As shown, the method includes steps 301-308.
[0095] It should be noted that in this embodiment, the interaction is between a mobile phone and a cloud device. The mobile phone sends an image to the cloud device, and the cloud device determines the location information of the mobile phone based on the image. The cloud device includes a high-precision map and provides VPS positioning services based on the high-precision map. The cloud device can implement the map updating method provided in this embodiment.
[0096] Step 301: Obtain the localization image and the corresponding 6DOF pose.
[0097] Among them, 6DOF pose represents the location information of the positioning image captured by the mobile phone.
[0098] In some implementations, the mobile phone captures the original image and sends it to a cloud device. Because the original image contains moving objects (such as animals or vehicles), these objects may interfere with the real-world environment depicted in the image. Therefore, the cloud device acquires the original image and uses image processing algorithms to remove these moving objects to obtain a localization image (or the first localization image).
[0099] For example, cloud devices can use deep learning-related algorithms to process the original image and remove dynamic objects from it. These deep learning-related algorithms include, but are not limited to, semantic segmentation, instance segmentation, and object detection.
[0100] For example, semantic segmentation can be used to process the original image. Semantic segmentation can classify pixels in the original image, grouping pixels of the same type together. If the image contains two people, semantic segmentation will label them with the same color (e.g., red). If the original image contains buildings, streets, vehicles, trees, and pedestrians, semantic segmentation will assign different colors to each type of object, with each type corresponding to a different color. For example, vehicles are blue, pedestrians are red, etc. Dynamic objects in the original image are then removed, i.e., the pixels corresponding to blue and red are deleted, resulting in a localized image.
[0101] For example, instance segmentation can be used to process the original image. Compared to semantic segmentation, instance segmentation distinguishes different individuals within the same category of objects. Suppose we use instance segmentation to identify people in an image containing two people. Instance segmentation would segment each person in the "people" category into pixels, meaning each person is labeled with a different color. If the original image includes buildings, streets, vehicles, trees, and pedestrians, instance segmentation can remove vehicles and pedestrians from the original image. In this way, instance segmentation can classify each vehicle and pedestrian in the original image, label them with different colors, and remove the labeled vehicles and pedestrians to obtain a localized image.
[0102] For example, object detection can be used to process the original image. Object detection can identify target objects (such as vehicles, pedestrians, etc.) in the original image, as well as the location of the target objects in the original image. Assuming that the target objects are vehicles and pedestrians, by using object detection to process the original image, the location of vehicles and pedestrians in the original image can be located, and the marked vehicles and pedestrians can be removed to obtain the localized image.
[0103] In other implementations, the mobile phone captures a location video, which is then sent to a cloud device. The cloud device uses this video to determine the VPS location. In this latter implementation, the cloud device acquires the location video, selects any frame from it as the original image, and uses image processing algorithms to remove moving objects from the original image to obtain the location image. The specific implementation for obtaining the location image is the same as described above and will not be repeated here.
[0104] It is worth mentioning that when a mobile phone sends raw images or location videos to a cloud device, the raw images or location videos also include camera parameter information such as camera intrinsics and field of view.
[0105] Among them, camera intrinsic parameters are parameters related to the camera's own characteristics, such as the camera's focal length and pixel size.
[0106] For example, a camera includes an optical lens and an optical sensor. Light reflected from an object travels to the optical lens, and after being reflected or refracted by the lens, it travels to the optical sensor, where it is sensed and an image is formed. Ideally, the optical axis of the optical lens passes through the center point of the imaging area. However, in actual imaging, the optical axis of the camera's optical lens does not pass through the center point of the imaging area; parameters in the camera's intrinsic parameters can describe this error.
[0107] For example, during camera imaging, the image of an object should ideally be scaled down by the same proportion in both the x and y directions when the optical sensor captures an image. However, in reality, lenses are not perfectly circular, and the pixels on the optical sensor are not necessarily closely aligned in the positive direction. These variations cause the image obtained by the optical sensor to be scaled down by different proportions in the x and y directions. Parameters in the camera intrinsics can describe the difference in scaled-down proportions in these two directions, and allow the number of pixels in the image to describe the size of the object (i.e., the correspondence between the number of pixels and the object size). Furthermore, the camera intrinsics can also describe the object size as a number of pixels, so that the size of the object in three-dimensional space can be determined based on the parameters in the camera intrinsics.
[0108] The field of view of a camera is defined by the optical lens as its apex and the maximum range of light reflected from the object being photographed that can pass through the lens as its edge. The size of the field of view determines the field of view of the optical lens; the larger the field of view, the wider the field of view of the optical lens.
[0109] It's important to note that the angle formed by connecting the center point of the camera lens to the two endpoints of the diagonal line of the imaging plane is the lens's angle of view. The angle of view refers to the range of vision that an electronic device can capture, or the width of the field of view. The angle of view is related to the shooting posture of the electronic device.
[0110] Step 302: Generate a comparison image based on the 6DOF pose.
[0111] In the first implementation, the cloud device includes a preset image library containing multiple images (or preset images), each corresponding to a specific 6DOF pose. The phone captures the localization image from the position indicated by the 6DOF pose. The cloud device then generates a comparison image based on the position information indicated by the 6DOF pose and the field of view of the localization image, using the multiple images stored in the preset image library. In other words, the capture position of the comparison image is the same as the position indicated by the 6DOF pose. Theoretically, the field of view of the comparison image is the same as that of the localization image.
[0112] For example, suppose the location image is as follows: Figure 4 The image shown. Among them, Figure 4 This is the location image after removing moving objects (people). When generating comparison images, the cloud device determines images related to the location image from a preset image library, such as... Figure 5A , Figure 5B and Figure 5C .in, Figure 5A , Figure 5B and Figure 5C The shooting location is close to the shooting location of the positioning image, therefore, it is determined that... Figure 5A , Figure 5B and Figure 5C Related to the localization image. An image synthesis method is used to... Figure 5A , Figure 5B and Figure 5C The images are synthesized into one with the same field of view as the localized image. For example... Figure 6 As shown, Figure 5A , Figure 5B and Figure 5C Comparison images after synthesis.
[0113] In the second implementation, the cloud device does not include a pre-set image library; instead, it generates a comparison image based on a point cloud. Specifically, the cloud device can determine the location where the image was captured based on the positional information indicated by the 6DOF pose. Based on the capture location and the field of view of the location image, the cloud device generates a corresponding 3D image from the point cloud, renders this 3D image, and generates a 2D comparison image. Similarly, it can be determined that the field of view of the comparison image is the same as that of the location image.
[0114] For example, continue as follows Figure 4 Taking the positioning image shown as an example, the cloud device acquires, based on the 6DOF pose and field of view of the positioning image, as shown... Figure 7 The point cloud diagram shown. Cloud devices will be like... Figure 7 The point cloud map shown is converted into a 3D image, and the 3D image is rendered to generate a 2D comparison image.
[0115] Step 303: Compare the localized image and the comparison image, and calculate the global similarity between the localized image and the comparison image.
[0116] Global similarity characterizes the degree of similarity between the localized image and the comparison image based on the overall image (or understood as the image outline). For example, the localized image includes roads, houses, and streetlights, and the comparison image includes roads, houses, and streetlights. The global similarity between the localized image and the comparison image can be determined based on the number of objects included in the localized image and the similarity of each object.
[0117] It should be noted that the viewing angle is the angle between the lens and the imaging plane in an electronic device. This means that when acquiring images of the same scene, different viewing angles will result in different target objects included in the captured images. It can be assumed that if the positioning image and the comparison image have the same viewing angle, then the global similarity between the positioning image and the comparison image is high. Although the field of view of the comparison image generated by the electronic device is the same as that of the positioning image, it cannot be currently determined that the viewing angles of the positioning image and the comparison image are the same, and therefore, it cannot be determined that the global similarity between the comparison image and the positioning image is high. Therefore, image processing algorithms are still needed to calculate the global similarity between the positioning image and the comparison image.
[0118] The viewpoint of an image can affect the number of objects it contains. A larger viewpoint generally indicates a greater number of objects. The field of view of a localized image and a comparison image can be compared based on the number of objects and their similarity. If the number of objects in the localized and comparison images is the same, and the similarity of each object is high, it indicates that the localized and comparison images have the same viewpoint and a high degree of global similarity. Understandably, global similarity can be used to determine whether the localized and comparison images have the same viewpoint. Based on global similarity, it can be determined whether the localized and comparison images are identical as a whole.
[0119] Understandably, common image processing methods can be used when calculating the global similarity between the localized image and the comparison image. For example, a hash algorithm can be used to calculate the global similarity of the images, the Euclidean distance between the two images can be calculated to obtain the global similarity, and a convolutional neural network can be used to calculate the global similarity of the images.
[0120] For example, taking the calculation of Euclidean distance as an example, the value of Euclidean distance is used as the global similarity. Euclidean distance is used to measure the spatial distance between objects (or individuals) in an image. In other words, Euclidean distance can measure the similarity between two ROIs in an image.
[0121] For example, the ROI regions of the localization image and the comparison image are determined, the similarity of multiple ROI regions is obtained, and the similarity information of multiple ROI regions is used as the global similarity. The method of dividing the ROI regions in the localization image is the same as the method of dividing the ROI regions in the comparison image.
[0122] Specifically, by comparing the similarity of Regions of Interest (ROIs) at the same location, the similarity of each ROI in the two images is calculated, yielding the similarity information for each ROI. The sum of the similarities of all ROIs is then calculated, and the result is used as the global similarity. The degree of similarity between the localized image and the comparison image is determined based on the global similarity.
[0123] This method employs image processing techniques to calculate the global similarity between the localized image and the comparison image. Furthermore, a preset first threshold can be set to determine the degree of similarity between the localized image and the comparison image.
[0124] If the calculated global similarity is less than or equal to (or less than) the first threshold, it means that the field of view of the localized image and the comparison image are the same, and from the perspective of the image as a whole, the comparison image and the localized image are the same. Proceed to steps 305-308.
[0125] If the calculated global similarity is greater than (or greater than or equal to) the first threshold, it indicates that the field of view of the localized image and the comparison image are different. Proceed to step 304.
[0126] Step 304: If the global similarity representation of the localized image and the comparison image are different, it is determined that the VPS localization has failed and the high-precision map is not updated.
[0127] Understandably, since the comparison image is obtained from the 6DOF pose of the localization image, theoretically, the localization image and the comparison image should have the same field of view. If the field of view of the localization image and the comparison image are determined to be different, it means that the 6DOF pose of the localization image is not the true location information from which the localization image was captured. Therefore, the 6DOF pose of the localization image is inaccurate, meaning that the localization image and the comparison image reflect different real environments. In this case, it is impossible to determine whether the real environment corresponding to the comparison image has changed based on the localization image, and therefore, it is impossible to update the high-precision map based on the localization image.
[0128] One possible implementation is to calculate the global similarity between the comparison image and the location image to determine whether the VPS location was successful.
[0129] Step 305: If the global similarity representation of the localized image and the comparison image are the same, the VPS localization is confirmed to be successful.
[0130] The global similarity indicator shows that the localized image and the comparison image are identical, meaning they have the same field of view. This allows the system to determine whether to update the high-precision map based on the localized image.
[0131] Step 306: Calculate the local similarity between the localized image and the comparison image.
[0132] As is understandable, both the localization image and the comparison image contain at least one object (such as a building, road sign, etc.), and local similarity is the degree of similarity between the objects in the localization image and the comparison image.
[0133] For example, the same location in the localization image and the comparison image represents the first building. The similarity between the localization image and the comparison image for the first building is then compared. If the similarity indicates that the building in the comparison image is the same as the first building in the localization image, it means that the first building in the real environment has not been changed. The comparison image represents the first building marked on the high-precision map, and therefore there is no need to update the first building marked on the high-precision map.
[0134] For example, based on the same ROI region segmentation method, the comparison image and the localization image are divided into multiple ROI regions. ROI regions at the same location in the image are treated as a pair of ROI regions, and the similarity (i.e., local similarity) of the pair of ROI regions is calculated to determine whether the ROI region has changed based on the local similarity.
[0135] The image processing method described above can be used to calculate the similarity between the ROI regions in the localized image and the comparison image. The specific implementation method can be found in the description above and will not be repeated here.
[0136] In some implementations, a preset second threshold can be set to determine whether the ROI region has changed. For example, if the calculated local similarity is less than or equal to (or less than) the second threshold, it means that the ROI region has not changed, and step 307 is executed; if the calculated local similarity is greater than (or greater than or equal to) the second threshold, it means that the ROI region has changed, and step 308 is executed.
[0137] Step 307: If the local similarity indicates that the ROI regions are the same, do not update the high-precision map.
[0138] It is understandable that the high similarity between the ROI regions in the localization image and the comparison image indicates that the real environment reflected by the ROI region in the localization image has not changed. Therefore, there is no need to update the portion of the ROI region in the high-precision map.
[0139] Step 308: If the local similarity represents different ROI regions, update the high-precision map based on the localization image.
[0140] It should be noted that the low similarity between the ROI regions in the location image and the comparison image indicates that the real-world environment corresponding to the ROI region reflected in the location image differs from the real-world environment reflected in the high-definition map. This means that the location image can be used to determine if the ROI region in the real environment has changed. Therefore, the high-definition map can be updated based on the location image to correct it and improve the accuracy of VPS positioning.
[0141] It is worth mentioning that multiple ROI regions can be segmented in the localization image and the comparison image. The cloud device can calculate the local similarity of each ROI region in turn and execute the above steps 306-308 until the local similarity of each ROI region in the localization image has been calculated.
[0142] The following explains the specific implementation of updating a high-precision map based on comparison images.
[0143] When updating high-precision maps using comparative images, the following methods can be used: Figure 8 The method shown updates the high-precision map. Figure 8 As shown, the method includes steps 308-1 to 308-4.
[0144] Step 308-1: Create a corresponding 3D model based on the ROI region of the localization image. This 3D model is a point cloud model of the object in the ROI region.
[0145] High-precision maps are point cloud maps. Updating a high-precision map based on location images requires creating point cloud models of the Regions of Interest (ROIs) based on the location images. The current high-precision map contains point clouds corresponding to the ROIs. If it can be determined that the real environment reflected by the ROIs has changed, the point clouds of the ROIs in the high-precision map can be updated to achieve the goal of updating the high-precision map.
[0146] In some implementations, high-precision maps can be updated based on existing point cloud generation algorithms. For example, the structure from motion (SFM) algorithm can be used to generate a corresponding point cloud model based on the localization image.
[0147] Step 308-2: Generate an updated point cloud map based on the 3D model. The updated point cloud map includes the point cloud model of the objects in the ROI region. Save the original local point cloud in the high-precision map. The original local point cloud includes the objects in the ROI region in the comparison image.
[0148] Understandably, in this case, a high-precision map includes point cloud models of objects in the ROI region of the contrasting image, as well as point cloud models of objects in the ROI region of the localized image.
[0149] During the process of updating the high-precision map, changes in the real environment can be verified based on a large number of positioning images. If a discrepancy is found between a positioning image and a comparison image, the high-precision map is updated accordingly. This may lead to frequent changes to the high-precision map, affecting the accuracy of VPS positioning. Furthermore, since the positioning image only reflects changes in the real environment from one angle, the accuracy of the point cloud model created from the positioning image is lower than that of the original high-precision map. Therefore, when it is determined that the ROI regions of the positioning image and the comparison image are different, a point cloud model of the ROI region in the positioning image is generated, and the changes in the real environment are verified based on images transmitted from other devices (such as a second positioning image).
[0150] Step 308-3: In VPS location, record the number of visits to update the point cloud map and the original local point cloud.
[0151] Among them, the number of visits is the number of times the other localized images appear in the updated point cloud map and the original local point cloud.
[0152] For example, a cloud device receives a VPS location service request, which includes a second location image, and determines the 6DOF pose corresponding to the second location image based on a high-precision map.
[0153] In this case, assuming that the second localization image includes an ROI region, if the morphological structure of the ROI region in the second localization image is the same as that of the updated point cloud map, then the number of visits to the updated point cloud map is increased by one.
[0154] Understandably, during the VPS location process, the number of times images including the ROI region access the updated point cloud map in the high-precision map is counted to verify whether the real environment has changed.
[0155] In some implementations, the cloud device sets up 3D bounding boxes, which include updated point cloud maps and counters. The counter is used to count the number of times the updated point cloud is accessed. When there are multiple updated point cloud maps, the cloud device includes a list of 3D bounding boxes, which contains multiple 3D bounding boxes.
[0156] It should be noted that during the process of increasing the number of accesses to update the point cloud map, the ROI region in the second localization image may include richer object details. Therefore, the updated point cloud map can be further optimized based on the second localization image to improve its accuracy.
[0157] Step 308-4: If it is determined that the number of visits to update the point cloud map exceeds the preset threshold, the original local point cloud in the high-precision map is updated to the updated point cloud map, and the updated high-precision map is generated.
[0158] For example, a third threshold can be set. If the number of visits to update the point cloud map exceeds (greater than) the third threshold, it is determined that the ROI area in the real environment has indeed changed. The original local point cloud is then replaced with the updated point cloud map to achieve the purpose of updating the high-precision map.
[0159] In some implementations, the number of times the point cloud map is updated within a preset time period can be counted. If the number of times the point cloud map is updated within the preset time period exceeds the third threshold, it is determined that the ROI area corresponding to the real environment has indeed changed, and the original local point cloud is replaced by updating the point cloud map.
[0160] In other implementations, the number of visits to the original local point cloud and the number of visits to the updated point cloud map can be counted. If the number of visits to the updated point cloud map is greater than the number of visits to the original local point cloud, it is determined that the ROI region corresponding to the real environment has changed, and the updated point cloud map is used to replace the original local point cloud.
[0161] First, this invention incorporates an environmental change detection function at the front end, comparing the differences between crowdsourced data and high-precision maps. If no environmental change is detected, the map update is immediately abandoned, avoiding redundant calculations at the back end. Second, this invention applies a pure vision-based solution, enabling real-time updates of high-precision maps across all weather conditions and scenarios. Third, this invention maintains the regional continuity of old and new spatial features by preserving 3D bounding boxes, either retaining all old spatial features or all new spatial features, taking into account the properties of the real-world object components to which each spatial feature belongs.
[0162] Example 2
[0163] The above implementation calculates the global similarity between the localized image and the comparison image when determining whether they are the same. This application embodiment can create a multi-task network architecture to process the localized image and the comparison image, obtain processing results, and then determine whether to update the high-precision map based on the processing results.
[0164] Please refer to Figure 9 This is a flowchart of a map updating method provided in an embodiment of this application. Figure 9 As shown, the method includes steps 901-908.
[0165] Step 901: Obtain the localization image and the corresponding 6DOF pose.
[0166] Step 902: Generate a comparison image based on the 6DOF pose.
[0167] Steps 901-902 are the same as steps 301-302 in the above embodiment. For specific implementation details, please refer to the above implementation method. They will not be repeated here.
[0168] Step 903: Compare the localized image and the comparison image, and calculate the similarity between the localized image and the comparison image. The similarity includes global similarity and local similarity.
[0169] The cloud device employs a multi-task network architecture, which calculates the global similarity and local similarity between the localized and contrasted images. In other words, the output of this multi-task network architecture includes both the global and local similarity results between the localized and contrasted images. It should be noted that after calculating the local similarity, the multi-task network architecture presents the results as a heatmap.
[0170] As is understandable, a heatmap is an image that distinguishes a localized image from a comparison image based on color markings. For example, red marks areas with low local similarity in a heatmap, indicating that these areas are completely different from the localized and comparison images; green marks areas with high local similarity, indicating that these areas are exactly the same as the localized and comparison images. The closer a color is to red on the heatmap, the lower the similarity of the area; the closer a color is to green, the higher the similarity of the area.
[0171] For example, suppose Figure 10A The images shown are for comparison. Figure 10B For localization images, when calculating the local similarity between the comparison image and the localization image, the localization image and the comparison image are divided into multiple ROI regions. For example... Figure 10A The comparison image shown is divided into ROI region 11, ROI region 12, and ROI region 13; as shown Figure 10B The localization image shown is divided into ROI regions 21, 22, 23, and 24. Image processing is performed on the ROI regions in the localization image and the comparison image respectively, and local similarity is calculated. The multi-task network output is as follows: Figure 11 The heatmap shown. (As shown) Figure 11 As shown, in this heatmap, ROI region 24 is the first color, indicating that the similarity between the localization image and the comparison image in this region is low, while other regions are the second color, indicating that the similarity between the localization image and the comparison image in this region is high.
[0172] It should be noted that the aforementioned multi-task network can also be called an end-to-end network, that is, a model from input to output. This end-to-end model provides the global similarity results and local similarity results of the localized image and the comparison image in one go.
[0173] Step 904: If the global similarity representation of the localized image and the comparison image are different, it is determined that the VPS localization has failed and the high-precision map is not updated.
[0174] Understandably, the multi-task network outputs global similarity and local similarity results. When determining whether to update the high-precision map based on the localized image, it is necessary to first determine whether the VPS positioning was successful based on the global similarity and local similarity results. If the VPS positioning fails, there is no need to check the local similarity results.
[0175] The method for determining global similarity is the same as in step 304 above, and will not be repeated here.
[0176] Step 905: If the global similarity representation of the localized image and the comparison image are the same, the VPS localization is confirmed to be successful. Based on the local similarity results, determine whether to update the high-precision map.
[0177] Step 906: Determine whether the local similarity indicates that the ROI regions are the same. If the local similarity indicates that the ROI regions are the same, proceed to step 907; if the local similarity indicates that the ROI regions are different, proceed to step 908.
[0178] It is worth mentioning that global similarity indicates that VPS positioning is successful. Then, based on the local similarity results, i.e. the heatmap, it can be determined whether the high-precision map needs to be updated based on the positioning image.
[0179] Step 907: Do not update the high-precision map.
[0180] Step 908: Update the high-precision map based on the positioning image.
[0181] If it is determined that the high-precision map is updated based on the positioning image, the specific update method is the same as described above. Figure 8 The method shown is the same, so it will not be repeated here.
[0182] The above explanation uses a cloud-based electronic device as an example. When the electronic device is another type of device, the same method can be used to update the map. This will not be elaborated upon here.
[0183] It is understood that, in order to achieve the aforementioned functions, the electronic device includes corresponding hardware structures and / or software modules for performing each function. Those skilled in the art should readily recognize that, based on the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein, the embodiments of this application can be implemented in hardware or a combination of hardware and computer software. Whether a function is executed by hardware or by computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of the embodiments of this application.
[0184] This application embodiment can divide the above-described electronic device into functional modules based on the method example described above. For example, each function can be divided into its own functional modules, or two or more functions can be integrated into one processing module. The integrated modules can be implemented in hardware or as software functional modules. It should be noted that the module division in this application embodiment is illustrative and only represents one logical functional division; other division methods may be used in actual implementation.
[0185] This application also provides an electronic device, including one or more processors and one or more memories. The one or more memories are coupled to the one or more processors, and the one or more memories are used to store computer program code, including computer instructions. When the one or more processors execute the computer instructions, the electronic device can perform the aforementioned method steps to implement the map updating method in the above embodiments.
[0186] This application also provides a chip system including at least one processor and at least one interface circuit. The processor and the interface circuit are interconnected via lines. For example, the interface circuit can be used to receive signals from other devices (e.g., the memory of an electronic device). As another example, the interface circuit can be used to send signals to other devices (e.g., the processor). Exemplarily, the interface circuit can read instructions stored in the memory and send the instructions to the processor. When the instructions are executed by the processor, the electronic device can perform the steps in the above embodiments. Of course, the chip system may also include other discrete devices, and this application does not specifically limit this.
[0187] This application also provides a computer storage medium that includes computer instructions. When the computer instructions are executed on the electronic device, the electronic device causes the electronic device to perform various functions or steps performed by the mobile phone in the above method embodiment.
[0188] This application also provides a computer program product that, when run on a computer, causes the computer to perform the various functions or steps performed by the mobile phone in the above method embodiments.
[0189] Through the above description of the embodiments, those skilled in the art can clearly understand that, for the sake of convenience and brevity, only the division of the above functional modules is used as an example. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above.
[0190] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another device, or some features may be ignored or not executed. Furthermore, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.
[0191] The units described as separate components may or may not be physically separate. A component shown as a unit can be one or more physical units; that is, it can be located in one place or distributed in multiple different locations. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0192] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit described above can be implemented in hardware or as a software functional unit.
[0193] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium. Based on this understanding, the technical solutions of the embodiments of this application, essentially, or the parts that contribute to the prior art, or all or part of the technical solutions, can be embodied in the form of a software product. This software product is stored in a storage medium and includes several instructions to cause a device (which may be a microcontroller, chip, etc.) or processor to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, ROM, RAM, magnetic disks, or optical disks.
[0194] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions within the technical scope disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A method of updating a map, characterized by, The method includes: Acquire a positioning image and the corresponding location information of the positioning image, wherein the location information is used to indicate the location where the positioning image was captured, and the location information is obtained by visually locating the positioning image based on a map; The corresponding comparison image is determined based on the location information; the comparison image is an image composed of multiple images selected from a preset image library based on the location information of the positioning image, which has the same field of view as the positioning image, or the comparison image is an image generated and rendered from point cloud data based on the location information of the positioning image, which has the same field of view as the positioning image. Calculate the similarity between the localized image and the comparison image, whereby the similarity is used to characterize the degree of similarity between the localized image and the comparison image; If the similarity indicates that the location image and the comparison image are similar, the map is not updated; If the similarity indicates that the location image and the comparison image are not similar, the map is updated based on the location image.
2. The method of claim 1, wherein, The similarity includes local similarity and global similarity. The local similarity is used to characterize the similarity of local regions in the localized image and the comparison image, and the global similarity is used to characterize the global similarity between the localized image and the comparison image. The calculation of the similarity between the localized image and the comparison image includes: Calculate the global similarity between the localized image and the comparison image; If the global similarity, which represents the global similarity between the localized image and the comparison image, is less than a preset threshold, the map will not be updated. If the global similarity, which represents the global similarity between the localized image and the comparison image, is greater than or equal to the preset threshold, the local similarity between the localized image and the comparison image is calculated.
3. The method of claim 1, wherein, The similarity includes local similarity and global similarity. The local similarity is used to characterize the degree of local similarity between the localized image and the comparison image, and the global similarity is used to characterize the degree of global similarity between the localized image and the comparison image. The calculation of the similarity between the localized image and the comparison image includes: A multi-task network architecture is constructed to process the comparison image and the localization image to obtain image processing results; The localization image and the comparison image are input into the multi-task network architecture to obtain the processing result, which includes the global similarity and the local similarity.
4. The method according to any one of claims 1-3, characterized in that, The location information includes the shooting location and shooting pose, the shooting pose indicates the shooting field of view of the electronic device that generates the positioning image, and the map is a point cloud map; Determining the corresponding comparison image based on the location information includes: Determine the location coordinates of the point cloud map indicated by the shooting location; Based on the shooting location, a three-dimensional image corresponding to the shooting field of view corresponding to the shooting pose is generated; The three-dimensional image is rendered to generate a two-dimensional image, which is then used as a comparison image.
5. The method according to any one of claims 1 to 3, characterized in that, A preset image library, which includes at least one preset image and the location information corresponding to the preset image; the location information includes the shooting location and shooting pose, and the shooting pose indicates the shooting field of view of the electronic device that generates the positioning image; Determining the corresponding comparison image based on the location information includes: Based on the location information of the positioning image, at least one of the preset images is obtained; An image synthesis method is used to generate a comparison image from the at least one preset image, such that the position information of the comparison image is the same as the position information of the positioning image.
6. The method according to any one of claims 1-3, characterized in that, The method further includes: Acquire raw images from an electronic device, the raw images being captured by the electronic device and used for visual positioning of the electronic device; An image processing algorithm is used to remove dynamic objects from the original image to generate a localization image; The dynamic objects include vehicles and / or animals.
7. The method according to claim 2, characterized in that, The calculation of the local similarity between the localized image and the comparison image includes: Based on a preset Region of Interest (ROI) division method, the localized image is divided into at least one ROI, and the comparison image is divided into at least one ROI. The ROI includes a target object, which includes at least one of a building, road, or road sign. For ROIs at the same location in the localization image and the comparison image, perform the following operations: By comparing the ROIs on the localized image and the ROIs on the comparison image, the similarity of the ROIs is calculated to obtain the local similarity.
8. The method of claim 7, wherein, If the similarity indicates that the location image and the comparison image are not similar, updating the map based on the location image includes: If the local similarity indicates that the ROI regions of the comparison image and the localization image are not similar, a three-dimensional 3D bounding box is generated based on the ROI in the localization image. The 3D bounding box includes the ROI region point cloud and the number of times the ROI region point cloud is accessed. Visual positioning is performed based on the map, and the number of visits to the point cloud of the ROI region is counted during visual positioning. The map includes the 3D bounding box. Based on the number of times the ROI region point cloud is accessed in the 3D bounding box, if it is determined that the number of times the ROI region point cloud is accessed exceeds a preset number, the original ROI in the map is updated to the ROI in the positioning image.
9. The method of claim 8, wherein, The method further includes: Generate a corresponding ROI point cloud based on the ROI region of the positioning image.
10. The method according to any one of claims 2-3 and 7-9, characterized in that, The method further includes: If the global similarity indicates that the comparison image and the localization image are not similar, it is determined that the localization of the localization image has failed.
11. An apparatus for updating a map, the apparatus comprising: Includes one or more processors; memory; And one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs including instructions that, when executed by the map updating device, cause the map updating device to perform the method as described in any one of claims 1-10.
12. An electronic device, characterized in that, include: One or more processors; Memory; And one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs including instructions that, when executed by the electronic device, cause the electronic device to perform the method as described in any one of claims 1-10.
13. A computer-readable storage medium, characterized in that, Includes computer instructions that, when executed on a computer, cause the computer to perform the method as described in any one of claims 1-10.