A method and image processing apparatus for estimating a possible pose relative to a spatial region
By identifying key feature points of multiple 2D images in image processing and estimating pose using a multi-match likelihood function, the problem of difficult feature point matching and excessive computational resource consumption in existing technologies is solved, achieving more efficient and accurate pose estimation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- YINWANG INTELLIGENT TECHNOLOGIES CO LTD
- Filing Date
- 2021-02-19
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies suffer from difficulties in feature point matching when estimating pose relative to a spatial region due to issues such as occlusion, lighting differences, and motion blur. They also consume excessive computational resources and are difficult to accurately determine interior points.
By identifying key feature points in multiple 2D images, the pose is estimated using a multi-match likelihood function. Combining multiple possible 2D feature point matching, a sampling-based framework is used to search for the optimal pose, avoiding the fixed correspondence determination in traditional methods and adaptively selecting the closest feature point.
It improves the accuracy and efficiency of pose estimation, reduces computational resource consumption, enhances the ability to capture points in different regions, and reduces errors.
Smart Images

Figure CN116964630B_ABST
Abstract
Description
Technical Field
[0001] The present invention generally relates to image processing, and more particularly to a method and image processing apparatus for estimating possible poses relative to a spatial region. Background Technology
[0002] Mapping, locating within a map, and using maps for planning are critical tasks for autonomous systems such as robots, ADAS, and autonomous driving systems. While the interdependence between mapping and localization is a well-known Simultaneous Localization and Mapping (SLAM) problem, contemporary research increasingly recognizes that planning how autonomous systems map and explore unknown environments (and subsequently operate within those environments) can mitigate degenerate conditions and significantly reduce the complexity of SLAM. Therefore, the task of exploring new environments combines the problems of map building, localization within a map, and planning using maps, as autonomous systems must find ways to reduce uncertainties in map building and localization.
[0003] In location-based map building, one of the most important sensor inputs comes from vision sensors, such as digital cameras. Furthermore, 3D feature points can be determined by detecting and matching 2D feature points in image data captured using stereo cameras or from cameras with known poses of each other. The most successful camera relative pose estimation methods, relative to a set of 3D feature points, rely on detecting 2D feature points in the images captured by the cameras and matching these 2D and 3D feature points to find feature correspondences.
[0004] Typically, matching is based on feature descriptor similarity. Feature descriptors are usually vectors describing the local environment of feature points in an image. Using these matches, the relative pose of the camera can be estimated via the Perspective-n-Point (PnP) method. Furthermore, it is crucial to find enough correct correspondences (called inliers) in different regions of the image to accurately estimate the pose. When the images involved (e.g., captured from cameras facing the same direction and spatially close) are similar in content and lighting conditions (no occlusion, etc.), matching feature points in other images can provide enough inliers.
[0005] Traditional pose estimation typically employs two methods. In the first method, feature points are detected in both the captured reference and target images. Feature point matching is then performed to determine the correspondence between feature points in the two images, and the image pose is calculated. However, during real-time image capture, the captured images may suffer from numerous issues such as occlusion, lighting variations, and motion blur. This makes matching features with feature points extremely difficult. Consequently, multiple mismatches may exist, which in turn complicate pose estimation. For example, if any important feature points are missed due to incorrect matching, the pose cannot be correctly estimated.
[0006] In another pose estimation method, once feature points in both the reference and target images are detected, feature point matching is performed simultaneously to determine correspondences, and then pose calculation is performed. While it is highly desirable to perform feature point matching and pose estimation simultaneously, the existence of a large number of possible matches and the multimodal pose parameter space make such operations very difficult, which in turn requires significant computational resources.
[0007] While it is highly desirable to perform feature point matching and pose estimation simultaneously, the existence of a large number of possible matches and a multimodal pose parameter space makes such operations very difficult, which in turn requires a lot of computational resources.
[0008] Therefore, compared with existing methods, there is a need to address the aforementioned technical deficiencies in capturing interior points in an efficient manner. Summary of the Invention
[0009] The purpose of this invention is to provide an improved method and an improved image processing apparatus for estimating possible poses relative to a spatial region, while avoiding one or more drawbacks of existing methods.
[0010] This objective is achieved through the features of the independent claim. Other implementations are apparent from the dependent claims, the specification, and the drawings.
[0011] The present invention provides an improved method and image processing apparatus for estimating possible poses relative to a spatial region.
[0012] According to a first aspect, a method is provided for estimating the possible pose of an image processing device relative to a spatial region. The image processing device is coupled to an imaging capture device for capturing one or more 2D images of a scene within the spatial region, wherein the image processing device has spatial coordinates of a plurality of 3D point locations within the spatial region. The method includes: identifying key features present in the one or more 2D images. The method includes: identifying a correspondence between one or more clusters of the plurality of 3D point locations and key feature points present in the one or more 2D images. The method includes: estimating the possible pose of the image processing device relative to the spatial region based on the identified correspondence by using a multi-match likelihood function to find the closest 2D feature point among k possible 2D key feature points for each 3D point location, where k is an integer greater than 1.
[0013] According to the method described herein, multiple optimal 2D feature point matches for reference feature points are determined, rather than a single match for reference 3D feature points. This, in turn, creates a set of multiple matches. Therefore, by using multiple matches and identifying the best match from multiple possible matches, a larger set of inliers can be flexibly captured in different regions of the image, enabling accurate estimation of possible poses.
[0014] Optionally, the method includes: determining the spatial coordinates of multiple 3D point locations within the spatial region based on multiple images captured from mutually different angles.
[0015] Optionally, the method includes: implementing the multi-match likelihood function as a sampling-based framework to search for the optimal pose of the possible poses by calculating the maximum value of the multi-match likelihood function. The sampling-based framework avoids searching for local minima of the cost function and can find the optimal overall minimum of the cost function.
[0016] Optionally, the method includes using an optimization process to adaptively select the closest 2D key feature points among k best matches to capture more inliers in the 2D image.
[0017] Optionally, the method includes: implementing the multiple-match likelihood function, as shown below:
[0018]
[0019] Among them, Q i : The i-th 3D feature point; P(Θ): The inverse camera pose parameterized by Θ; P(Θ)Q i The 3D feature point Q projected onto the target image i The coordinates of the point; m iThe target image may correspond to the 3D feature point Q. i The i-th 2D feature point; ε: = a constant of the uniform outlier distance distribution. The pose parameter Θ is chosen to maximize the multi-match likelihood function L(Θ), thereby determining the possible pose.
[0020] The multi-match likelihood function can be implemented as a single-match robust likelihood function, as shown below:
[0021]
[0022] Wherein, the exponent k is usually a decimal, optionally in the range of 0 to 10; Q i : The i-th 3D feature point; P(Θ): The inverse camera pose parameterized by Θ; P(Θ)Q i The 3D feature point Q projected onto the target image i The coordinates of the point; m ij The target image may correspond to the 3D feature point Q. i The ij-th 2D feature point; ε: a constant of the uniform outlier distance distribution. For each given pose parameter P(Θ), the method includes: first, at km ij Find the closest point m in space among (j = 1, ..., k). ij Then based on the nearest point m ij Calculate the multiple-match likelihood function L(Θ). The value of N can be as high as several thousand.
[0023] The multi-match likelihood function can be implemented as a multi-match robust likelihood function, as shown below:
[0024] For all i:
[0025]
[0026] Among them, Q i : The i-th 3D feature point; P(Θ): The inverse camera pose parameterized by Θ; isP(Θ)Q i : Projected onto the target image to obtain q i The 3D feature point Q i The coordinates of the point; m ij The target image may correspond to a 3D feature point Q. i The ij-th 2D feature point; ε: a constant of the uniform outlier distance distribution; D q D m : These are the descriptor vectors for q and m, respectively; A function for determining distance based on the descriptor similarity and the spatial distance between the projection q of the 3D point Q in the image and the possible correspondence m of the image. For each given pose parameter P(Θ), the method includes: firstly, in km ij Find the closest point m in (j = 1, ..., k) with respect to f(). ij Then based on the nearest point m ij Calculate the multiple-match likelihood function L(Θ). k is usually a decimal, for example, in the range of 0 to 10.
[0027] The multi-match likelihood function can be implemented as a multi-match robust likelihood function, as shown below:
[0028] For all i:
[0029]
[0030] The function f() is defined as follows:
[0031]
[0032] Where the position of q in a given image depends on the camera pose parameters Θ and their 3D positions Q, q i =P(Θ)Q i Among them, for a given feature point q, there are k possible correspondences; m ij := The target image may correspond to a 3D feature point Q i The ij-th 2D feature point; ε: = constant of the uniform outlier distance distribution; D q D m := are the descriptor vectors of q and m, respectively.
[0033] According to a second aspect, an image processing apparatus is provided for estimating possible poses relative to a spatial region. The image processing apparatus is coupled to an imaging capture device for capturing one or more 2D images of a scene within the spatial region. The image processing apparatus has spatial coordinates of a plurality of 3D point locations within the spatial region. The image processing apparatus is configured to: identify key features present in the one or more 2D images; identify a correspondence between one or more clusters of the plurality of 3D point locations and key feature points present in the one or more 2D images; and estimate the possible pose of the image processing apparatus relative to the spatial region based on the identified correspondences by using a multi-match likelihood function to find the closest 2D feature point among k possible 2D key feature points for each 3D point location. Here, k is an integer greater than 1.
[0034] The image processing apparatus described herein is used to simultaneously determine multiple feature matches between 3D and 2D feature points in an image, and to identify each 3D feature point. This method does not require determining a fixed set of correspondences before performing any optimization process. Furthermore, during the optimization process, the closest 2D feature point can be adaptively selected from the multiple best matches. Due to the use of these multiple matches of feature points, a larger set of correspondences can be efficiently compared with conventional methods.
[0035] Optionally, the image processing device is used to: determine the spatial coordinates of multiple 3D point locations within the spatial region based on multiple images captured from mutually different angles.
[0036] According to a third aspect, a computer program is provided, including instructions that, when executed by a computer, cause the computer to perform the method.
[0037] According to a fourth aspect, a non-transitory computer-readable medium is provided, comprising computer-executable instructions. When executed by a computer, the computer-executable instructions cause the computer to perform the methods described above.
[0038] This invention solves the technical problem in the prior art, namely, how to accurately determine interior points in order to perform pose estimation in different regions of an image.
[0039] Therefore, in contrast to existing technologies, the image processing apparatus and method for estimating the possible pose of the image processing apparatus relative to a spatial region provided by the present invention utilize multiple matching of feature points corresponding to a reference image and a target image to identify the optimal feature match from multiple reasonable matches. The multiple matching function simultaneously finds the closest 2D feature point among k possible 2D feature points for each 3D feature point and calculates the likelihood function of the overall pose. This multiple matching likelihood function is typically used in sampling-based frameworks to search for the optimal pose by maximizing the likelihood function.
[0040] These and other aspects of the invention will become apparent from the implementation described below. Attached Figure Description
[0041] The implementation of the invention will now be described by way of example only, with reference to the following accompanying drawings, wherein:
[0042] Figure 1 A block diagram of an image processing apparatus for estimating possible poses relative to a spatial region, provided by an implementation of the present invention, is shown.
[0043] Figure 2 An example of feature point mapping provided by the implementation of the present invention is shown;
[0044] Figure 3 An exemplary description of the location multi-matching feature points provided by the implementation of the present invention is shown;
[0045] Figure 4 A flowchart is shown of a method for estimating the possible pose of an image processing device relative to a spatial region, provided by an implementation of the present invention. Detailed Implementation
[0046] The present invention provides a method for estimating the possible pose of an image processing device relative to a spatial region; furthermore, the present invention provides an image processing device for estimating the pose of a camera with higher accuracy by identifying appropriate correspondences in different regions of an image.
[0047] To enable those skilled in the art to more easily understand the present invention, the following implementation of the present invention is described in conjunction with the accompanying drawings.
[0048] The terms "first," "second," "third," and "fourth" (if any) in the abstract, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific sequence or order. It should be understood that the terms used are interchangeable where appropriate, and thus, for example, the embodiments of the invention described herein can be implemented in sequences different from those shown or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to the steps or units expressly listed, but may include other steps or units not expressly listed, or steps or units inherent to such processes, methods, products, or apparatuses.
[0049] Terminology Explanation:
[0050] Image: An image is defined as a normal two-dimensional picture (RGB or chroma-luminance) captured using a single camera.
[0051] Scene: A scene is a specific area of interest in the real world that is captured or captured by a camera.
[0052] 2D feature points: 2D feature points are points in an image that have (x, y) coordinates.
[0053] 3D feature points: 3D feature points refer to points in a 3D scene that have (X, Y, Z) coordinates.
[0054] Correspondence: A correspondence refers to a pair of feature points. In this paper, a correspondence refers to a pair of feature points consisting of a 3D feature point and a related 2D feature point.
[0055] Interior point correspondence or simple interior point: Interior point correspondence or simple interior point refers to a correctly determined correspondence. A 2D feature point is the projection of the corresponding 3D feature point onto the image.
[0056] Pose: Pose refers to a 6D vector consisting of the 3D position coordinates (x, y, z) of the image capturing device and three directional angles.
[0057] Relative pose: Euclidean transformation from one coordinate system / pose to another coordinate system / pose.
[0058] Likelihood function: A function that evaluates how well the model estimate fits the observed values.
[0059] Loss: A scalar that describes the result of the negative likelihood function.
[0060] Figure 1 A block diagram of an image processing apparatus 102 for estimating possible poses relative to a spatial region, provided by an implementation of the present invention, is shown. The image processing apparatus 102 is coupled to an imaging capture device 104, which is used to capture one or more 2D images of a scene within the spatial region. The image processing apparatus 102 has spatial coordinates of a plurality of 3D point locations within the spatial region. The image processing apparatus 102 is used to: identify key features present in the one or more 2D images. The image processing apparatus 102 is also used to: identify a correspondence between one or more clusters of the plurality of 3D point locations and key feature points present in the one or more 2D images. The image processing apparatus 102 is used to: estimate the possible pose of the image processing apparatus 102 relative to the spatial region based on the identified correspondences, by using a multi-match likelihood function to find the closest 2D feature point among k possible 2D key feature points for each 3D point location, where k is an integer greater than 1. Alternatively, k can be a small integer with a value in the range of 0 to 10.
[0061] The image processing device 102 is used to simultaneously determine multiple feature matches between 3D feature points and 2D feature points in an image, and to identify each 3D feature point. This method does not require determining a fixed set of correspondences before performing any optimization process. Furthermore, during the optimization process, the closest 2D feature point can be adaptively selected from the one or more best matches. Due to the use of the one or more matches of the feature points, a larger set of correspondences can be efficiently compared with conventional methods.
[0062] Optionally, the image processing device 102 is used to: determine the spatial coordinates of a plurality of 3D point locations within the spatial region based on one or more images captured from mutually different angles.
[0063] Figure 2 An example of feature point mapping provided by an implementation of the present invention is shown. According to... Figure 2 The image processing device identifies one or more 2D feature points p_11, p_12 in a first image 202 received from a first camera and one or more 2D feature points p_21, p_22 in a second image 204 received from a second camera. The image processing device matches the 2D feature points p_11 of the first image 202 with the 2D feature points p_21 of the second image 204. The image processing device also matches the 2D feature points p_11 of the first image 202 with the 2D feature points p_21 of the second image 204 and extracts the corresponding 3D feature points Q_1, Q_2. Furthermore, the image processing device extracts 2D feature points q_1, q_2 from the target image 206, where q_1 = PQ_1 and q_2 = PQ_2. Then, the image processing device matches the 3D feature points Q_1, Q_2 with the 2D feature points q_1, q_2 in the target image 206 to determine the feature correspondence or simple correspondence between the feature points in images 202, 204, and 206. Typically, this feature matching is based on the similarity of certain feature descriptors. Based on this feature matching, for example, the poses of the first camera and the second camera can be estimated using the Perspective-n-Point (PnP) method.
[0064] Optionally, the multi-match likelihood function is implemented as a sampling-based framework to search for the optimal pose of the possible poses by computing the maximum value of the multi-match likelihood function. For each given pose parameter P(Θ), firstly in km ij Find the closest point m in space among (j = 1, ..., k). ij Then, based on the closest point, the multi-match likelihood function is calculated. ij .
[0065] The image processing device is used to implement a multiple-match likelihood function, as shown below:
[0066]
[0067] Among them, Q i P(Θ) represents the i-th 3D feature point; P(Θ) represents the (reverse) camera pose parameterized by Θ; P(Θ)Q i The 3D feature point Q projected onto the target image i The coordinates of the point; m i Q: The i-th 2D feature point in the target image that may correspond to a 3D feature point; i ε: a constant of the uniform outlier distance distribution, where Qi : The i-th 3D feature point; P(Θ): The inverse camera pose parameterized by Θ; P(Θ)Q i The 3D feature point Q projected onto the target image i The coordinates of the point; m i The target image may correspond to the 3D feature point Q. i The i-th 2D feature point; ε: a constant of the uniform outlier distance distribution. The pose parameter Θ is chosen to maximize the multi-match likelihood function L(Θ), thereby determining the possible pose. The image processing device determines 3D points by detecting and matching feature points (p_11, p_12, p_21, p_22) captured using a stereo camera or from cameras with known poses of each other. Matching 3D and 2D feature points in the image can create feature correspondences or simple correspondences.
[0068] For each given pose parameter P(Θ), the image processing device first in km ij Find the closest point m in space among (j = 1, ..., k). ij Then based on the nearest point m ij Calculate the multiple-match likelihood function L(Θ).
[0069] The multi-match likelihood function can be implemented as a single-match robust likelihood function, as shown below:
[0070]
[0071] The index k is usually a decimal, optionally in the range of 0 to 10; Q i P(Θ)is: the i-th 3D feature point; P(Θ)Q: the inverted camera pose parameterized by Θ; i The 3D feature point Q projected onto the target image i The coordinates of the point; m ij The target image may correspond to the 3D feature point Q. i The ij-th 2D feature point; ε: a constant of the uniform outlier distance distribution.
[0072] Optionally, for each given pose parameter P(Θ), first in km ij Determine the closest point m in (j = 1, ..., k) (with regard to f()). ij Then based on the nearest point m ij Calculate the multiple-match likelihood function. For all i: The multi-match robust likelihood function is defined as follows:
[0073]
[0074] Among them, Q i : The i-th 3D feature point; P(Θ): The reverse camera pose parameterized by Θ; q i =P(Θ)Q i : Project onto the target image to obtain q i The 3D feature point Q i The coordinates of the point; m ij The target image may correspond to a 3D feature point Q. i The ij-th 2D feature point; ε: a constant of the uniform outlier distance distribution; D q D m : These are the descriptor vectors for q and m, respectively; A function for determining distance based on the descriptor similarity and the spatial distance between the projection q of the 3D point Q in the image and the possible correspondence m of the image.
[0075] The multi-match likelihood function can be implemented as a multi-match robust likelihood function (as shown below), for all i:
[0076]
[0077] The function f() is defined as follows:
[0078] or
[0079]
[0080] Where the position of q in a given image depends on the camera pose parameters Θ and their 3D position Q:
[0081] q i =P(Θ)Q i
[0082] For a given feature point q, there are k possible correspondences; where m ij The target image may correspond to a 3D feature point Q. i The ij-th 2D feature point; ε: a constant of the uniform outlier distance distribution; D q D m These are the descriptor vectors for q and m, respectively.
[0083] Figure 3 An exemplary description of the location multi-match feature points provided by the implementation of the present invention is shown. Figure 3The image includes a first image 302 and a second image 304. The first image 302 can be a reference image, and the second image 304 can be a target image. The image processing device detects 3D and 2D feature points in the first image 302 and the second image 304, and matches the detected 3D and 2D feature points to find sufficient correct correspondences (called inliers) in different regions of the image in order to accurately estimate the pose.
[0084] according to Figure 3 Instead of determining a single match for a reference 3D feature point, k optimal 2D feature point matches are detected between the first image 302 and the second image 304, creating a set of multiple matches. The multiple-match likelihood function simultaneously finds the closest 2D feature point among the k possible 2D feature points for each 3D feature point and calculates the likelihood function of the overall pose. This multiple-match likelihood function is typically used in sampling-based frames to search for the optimal pose by maximizing the likelihood function. Due to the use of... Figure 3 The multiple possible matches / correspondences shown enable the Perspective-n-Point (PnP) method to achieve a higher inlier rate. This improves the accuracy and robustness of the final result. Error statistics 306 after traversing the image sequence indicate that the multiple-match PnP results in a smaller pose error 308, which is a smaller maximum error and a smaller average error.
[0085] according to Figure 3 The method shown does not require determining a fixed set of correspondences before performing any optimization process, because during the optimization process, the closest 2D feature points can be adaptively selected from the k best matches. Compared to traditional methods, this approach can efficiently capture a larger set of interior points.
[0086] Figure 4A flowchart illustrating a method for estimating the possible pose of an image processing device relative to a spatial region, provided by an implementation of the present invention, is shown. The image processing device is coupled to an image capture device for capturing one or more 2D images of a scene within the spatial region. The image processing device has spatial coordinates of a plurality of 3D point locations within the spatial region. In step 402, key features present in the one or more 2D images are identified. In step 404, a correspondence is identified between one or more clusters of the plurality of 3D point locations and key feature points present in the one or more 2D images. In step 406, based on the correspondence, the possible pose of the image processing device relative to the spatial region is estimated by using a multi-match likelihood function to find the closest 2D feature point among k possible 2D key feature points for each 3D point location, where k is an integer greater than 1.
[0087] Optionally, the method includes: determining the spatial coordinates of multiple 3D point locations within the spatial region based on one or more images captured from mutually different angles. Optionally, the method includes: implementing the multi-match likelihood function as a sampling-based framework to search for the optimal pose of the possible poses by calculating the maximum value of the multi-match likelihood function. The sampling-based framework method avoids finding local minima of the cost function and can find the optimal overall minimum of the cost function. The multi-match likelihood function is a function that evaluates the degree of fit between the model estimate and the observations.
[0088] Optionally, the method includes: using an optimization process to adaptively select the closest 2D key feature points among k best matches and capture more inliers in the 2D image.
[0089] Optionally, the method includes: implementing the multi-match likelihood function using the following expression:
[0090]
[0091] in
[0092] Q i = The i-th 3D feature point;
[0093] P(Θ): = Reverse camera pose parameterized by Θ;
[0094] P(Θ)Q i := The 3D feature point Q projected onto the target image i The coordinates of the point;
[0095] m i The target image may correspond to the 3D feature point Q.i The i-th 2D feature point;
[0096] ε: = The constant of the uniform outlier distance distribution,
[0097] The pose parameter Θ is selected to maximize the multi-match likelihood function L(Θ), thereby determining the possible pose.
[0098] Optionally, the multi-match likelihood function is implemented as a single-match robust likelihood function, as shown below:
[0099]
[0100] Wherein, the index k is usually a decimal, optionally in the range of 0 to 10;
[0101] Q i : The i-th 3D feature point; P(Θ)is: The reverse camera pose parameterized by Θ;
[0102] P(Θ)Q i := The 3D feature point Q projected onto the target image i The coordinates of the point;
[0103] m ij The target image may correspond to the 3D feature point Q. i The ij-th 2D feature point;
[0104] ε: = a constant representing the uniform outlier distance distribution. For each given pose parameter P(Θ), the method includes: first, at km... ij Find the closest point m in space among (j = 1, ..., k). ij Then based on the nearest point m ij Calculate the multiple-match likelihood function L(Θ).
[0105] Optionally, the multi-match likelihood function is implemented as a multi-match robust likelihood function, as shown below:
[0106] For all i:
[0107]
[0108] Among them, Q i := the i-th 3D feature point;
[0109] P(Θ): = Reverse camera pose parameterized by Θ;
[0110] q i =P(Θ)Q i : Projected onto the target image to obtain q iThe 3D feature point Q i The coordinates of the point;
[0111] m ij := The target image may correspond to a 3D feature point Q i The ij-th 2D feature point;
[0112] ε: = The constant of the uniform outlier distance distribution;
[0113] D q D m := are the descriptor vectors of q and m, respectively;
[0114] = A function used to determine the distance based on the descriptor similarity and the spatial distance between the projection q of the 3D point Q in the image and the possible correspondence m of the image.
[0115] For each given pose parameter P(Θ), the method includes: first in km ij Find the closest point m in (j = 1, ..., k) with respect to f(). ij Then based on the nearest point m ij Calculate the multiple-match likelihood function L(Θ).
[0116] Optionally, the multi-match likelihood function is implemented as a multi-match robust likelihood function, as shown below:
[0117] For all i:
[0118]
[0119] The function f() is defined as follows:
[0120] or
[0121]
[0122] Where the position of q in a given image depends on the camera pose parameters Θ and their 3D position Q:
[0123] q i =P(Θ)Q i
[0124] For a given feature point q, there are k possible correspondences.
[0125] m ij := The target image may correspond to a 3D feature point Q i The ij-th 2D feature point;
[0126] ε: = The constant of the uniform outlier distance distribution;
[0127] D q D m := are the descriptor vectors of q and m, respectively.
[0128] A computer program includes instructions that, when executed by a computer, cause the computer to perform the methods described above.
[0129] A non-transitory computer-readable medium includes computer-executable instructions that, when executed by a computer, cause the computer to perform the above-described method.
[0130] It should be understood that the arrangement of components shown in the described figures is exemplary, and other arrangements are possible. It should also be understood that the various system components (and devices) defined by the claims, described below, and shown in the various block figures represent components in some systems configured according to the subject matter disclosed herein. For example, one or more of these system components (and devices) may be implemented wholly or partially by at least some of the components shown in the arrangements illustrated in the described figures.
[0131] Furthermore, although at least one of these components is implemented at least partially as an electronic hardware component and thus constitutes a machine, the other components may be implemented in software, which, when contained in an execution environment, constitute a machine, hardware, or a combination of software and hardware.
[0132] Although the invention and its advantages have been described in detail, it should be understood that various changes, substitutions and modifications may be made herein without departing from the spirit and scope of the disclosure as defined by the appended claims.
Claims
1. A method for estimating the possible pose of an image processing device (102) relative to a spatial region, characterized in that, The image processing device (102) is coupled to the imaging capture device (104), the imaging capture device (104) being used to capture one or more 2D images of a scene within the spatial region, wherein the image processing device (102) has spatial coordinates of multiple 3D point locations within the spatial region, and the method includes: (i) Identify key features present in the one or more 2D images; (ii) Identify the correspondence between one or more clusters of the plurality of 3D point locations and key feature points present in the one or more 2D images; (iii) Based on the correspondence described in (ii), by finding the location of each 3D point. k The image processing device (102) estimates the possible pose of the image processing device (102) relative to the spatial region by using the multi-match likelihood function of the closest 2D feature point among the possible 2D key feature points, wherein... k It is an integer greater than 1.
2. The method according to claim 1, characterized in that, The method includes: determining the spatial coordinates of multiple 3D points within the spatial region based on multiple images captured from mutually different angles.
3. The method according to claim 1 or 2, characterized in that, The method includes: implementing the multi-match likelihood function as a sampling-based framework to search for the optimal pose of the possible pose by calculating the maximum value of the multi-match likelihood function.
4. The method according to claim 1 or 2, characterized in that, The method includes using an optimization process to adaptively select the closest 2D key feature points among k best matches to capture more inliers in the 2D image.
5. The method according to claim 1 or 2, characterized in that, The method includes implementing the multi-match likelihood function, as shown below: in, =No. i 3D feature points; =By Parametric reverse camera pose; := The 3D feature points projected onto the target image The coordinates of the point; The target image may contain 3D feature points that correspond to the target feature points. The i 2D feature points; := The constant of the uniform outlier distance distribution, The pose parameter Θ is selected to maximize the multi-match likelihood function L(Θ), thereby determining the possible pose.
6. The method according to claim 1 or 2, characterized in that, The multi-match likelihood function is implemented as a single-match robust likelihood function, as shown below: Among them, index k It is usually a decimal, optionally in the range of 0 to 10; := No. i 3D feature points; =By Parametric reverse camera pose; := The 3D feature points projected onto the target image The coordinates of the point; The target image may contain 3D feature points that correspond to the target feature points. The ij 2D feature points; := The constant of the uniform outlier distance distribution, For each given pose parameter The method includes: firstly in k ( j Find the spatially closest point among (1, ..., k). Then based on the nearest point Calculate the multiple-match likelihood function L(Θ).
7. The method according to claim 1 or 2, characterized in that, The multi-match likelihood function is implemented as a multi-match robust likelihood function, as shown below: For all i: = L(Θ) = in, := No. i 3D feature points; =By Parametric reverse camera pose; : Projected onto the target image to obtain The 3D feature points The coordinates of the point; := Potential 3D feature points in the target image The ij 2D feature points; := The constant of the uniform outlier distance distribution; := are respectively q and m The descriptor vector; = Used to determine the similarity between the descriptor and the 3D point Q The projection in the image shoot q Possible correspondences with the image m The spatial distance between them is a function that determines the distance; For each given pose parameter The method includes: firstly in k ( j Find the closest point among (1, ..., k) with respect to f(). Then based on the nearest point Calculate the multiple-match likelihood function L(Θ).
8. The method according to claim 1 or 2, characterized in that, The multi-match likelihood function is implemented as a multi-match robust likelihood function, as shown below: For all i: = L(Θ) = The function f() is defined as follows: or in, q The position in a given image depends on the camera pose parameters. and their 3D positions Q : For a given feature point q ,exist k Possible correspondences, := Potential 3D feature points in the target image The ij 2D feature points; := The constant of the uniform outlier distance distribution; := are respectively q and m The descriptor vector.
9. An image processing apparatus (102) for estimating possible poses relative to a spatial region, characterized in that, The image processing device (102) is coupled to the imaging capture device (104), which is used to capture one or more 2D images of a scene within the spatial region. The image processing device (102) has spatial coordinates of multiple 3D point locations within the spatial region. The image processing device (102) is used to: (i) Identify key features present in the one or more 2D images; (ii) Identify the correspondence between one or more clusters of the plurality of 3D point locations and key feature points present in the one or more 2D images; (iii) Based on the correspondence described in (ii), by finding the location of each 3D point. k The image processing device (102) estimates the possible pose of the image processing device (102) relative to the spatial region by using the multi-match likelihood function of the closest 2D feature point among the possible 2D key feature points, wherein... k It is an integer greater than 1.
10. The image processing apparatus (102) according to claim 9, characterized in that, The image processing device (102) is used to: determine the spatial coordinates of multiple 3D points within the spatial region based on multiple images captured from different angles.
11. A computer program product, characterized in that, Includes instructions that, when executed by a computer, cause the computer to perform the method according to any one of claims 1 to 8.
12. A non-transitory computer-readable medium, characterized in that, It includes computer-executable instructions that, when executed by a computer, cause the computer to perform the method according to any one of claims 1 to 8.