Point cloud fusion method and device, electronic equipment and storage medium

By acquiring a set of multi-view image frames, their co-view relationship table, and semantic segmentation results, the problems of long processing time and low accuracy in point cloud fusion are solved, achieving more efficient point cloud fusion.

CN116452479BActive Publication Date: 2026-06-12BEIJING DAJIA INTERNET INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING DAJIA INTERNET INFORMATION TECH CO LTD
Filing Date
2023-04-11
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

The point cloud fusion process is time-consuming and produces redundant point clouds, resulting in low image accuracy.

Method used

By acquiring a set of image frames from multiple perspectives, along with their common-view relation tables and semantic segmentation results, and fusing the common-view relation tables and point clouds, redundant point cloud computing is reduced and accuracy is improved.

🎯Benefits of technology

It reduces the time spent in the point cloud fusion process and the occurrence of redundant point clouds, thereby improving the accuracy of the fused point cloud image.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116452479B_ABST
    Figure CN116452479B_ABST
Patent Text Reader

Abstract

The present disclosure relates to the technical field of image processing, and particularly relates to a point cloud fusion method and device, electronic equipment and storage medium. The point cloud fusion method comprises: acquiring a multi-view image frame set; acquiring a common view relationship table set corresponding to the image frame set, wherein the common view relationship table set comprises a common view relationship table subset of any image frame in the image frame set and a common view frame corresponding to the any image frame; acquiring a semantic segmentation result corresponding to the image frame set, wherein the semantic segmentation result comprises at least one point cloud; and acquiring a point cloud fusion image corresponding to the image frame set according to the common view relationship table set and the at least one point cloud. The present disclosure can reduce the time-consuming length of the point cloud fusion process while improving the accuracy of the point cloud fusion image.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of image processing technology, and in particular to a point cloud fusion method, apparatus, electronic device, and storage medium. Background Technology

[0002] With the development of science and technology, the continuous advancement of electronic devices has brought great convenience to users' production and lives. Among these advancements, point cloud fusion has significantly improved image fusion effects in image processing. Point cloud fusion technology fuses depth maps acquired from various viewpoints into a complete 3D point cloud. Due to the precision limitations of acquisition equipment and point cloud fusion algorithms, the original depth data from each viewpoint contains certain errors and noise. If these original depth data are directly projected into 3D space, the reconstructed 3D point cloud will be very cluttered, and the point cloud at the same location will be very thick. Furthermore, because each pixel of each depth map needs to be calculated individually, the computational load for point cloud fusion is extremely large, resulting in a long fusion process. The presence of redundant point clouds also leads to lower accuracy in the fused image. Summary of the Invention

[0003] This disclosure provides a point cloud fusion method, apparatus, electronic device, and storage medium to at least solve the problems in related technologies where the point cloud fusion process is time-consuming and the presence of redundant point clouds leads to low accuracy of the fused image. The technical solution of this disclosure is as follows:

[0004] According to a first aspect of the present disclosure, a point cloud fusion method is provided, comprising:

[0005] Obtain a collection of image frames from multiple perspectives;

[0006] Obtain the set of co-view relationship tables corresponding to the set of image frames, wherein the set of co-view relationship tables includes a subset of co-view relationship tables for any image frame in the set of image frames and the co-view frames corresponding to any image frame;

[0007] Obtain the semantic segmentation result corresponding to the image frame set, wherein the semantic segmentation result includes at least one point cloud;

[0008] Based on the common-view relation table set and the at least one point cloud, obtain the point cloud fusion image corresponding to the image frame set.

[0009] Optionally, obtaining the set of co-view relationship tables corresponding to the set of image frames includes:

[0010] Based on the pose information corresponding to each image frame in the image frame set, the first center position corresponding to the first image frame and the second center position corresponding to the second image frame are determined, wherein the first image frame is any image frame in the image frame set, the second image frame is any image frame in the image frame set other than the first image frame, the first center position is the center position of the camera device corresponding to the first image frame, and the second center position is the center position of the camera device corresponding to the second image frame.

[0011] Based on the first center position and the second center position, obtain the co-view sparse point cloud set corresponding to the first image frame and the second image frame;

[0012] Based on the number of sparse point clouds corresponding to the shared sparse point cloud set and the observation angle corresponding to at least one sparse point cloud in the shared sparse point cloud set, the scores corresponding to the first image frame and the second image frame are obtained, wherein the scores are used to indicate the degree of shared view between the first image frame and the second image frame.

[0013] Traverse all image frames in the image frame set except for the first image frame, and obtain at least one score;

[0014] Based on the at least one score, obtain a subset of the co-view relationship table corresponding to the first image frame;

[0015] Traverse the set of image frames to obtain the set of co-view relationship tables corresponding to the set of image frames.

[0016] Optionally, the method further includes:

[0017] Obtain the first direction vector between any sparse point cloud in the shared-view sparse point cloud set and the first center position;

[0018] Obtain the second direction vector between any sparse point cloud and the second center position;

[0019] Based on the first direction vector and the second direction vector, obtain the observation angle corresponding to any sparse point cloud.

[0020] Optionally, obtaining the scores corresponding to the first image frame and the second image frame based on the number of sparse point clouds corresponding to the shared sparse point cloud set and the observation angle corresponding to at least one sparse point cloud in the shared sparse point cloud set includes:

[0021] Obtain the number of sparse point clouds corresponding to the set of shared sparse point clouds;

[0022] The difference in the number of sparse point clouds is obtained based on the difference between the observation angle corresponding to any sparse point cloud in the common-view sparse point cloud set and the preset observation angle.

[0023] The scores corresponding to the first image frame and the second image frame are obtained based on the difference in the number of sparse point clouds and the corresponding score coefficient.

[0024] Optionally, obtaining a subset of the co-view relation table corresponding to the first image frame based on the at least one score includes:

[0025] According to the order of scores from high to low, add the second image frames corresponding to a predetermined number of scores from the at least one score to the co-viewing relationship table subset corresponding to the first image frame; or

[0026] Determine at least one target score among the at least one scores that is greater than a score threshold;

[0027] Add the second image frame corresponding to the at least one target score to the co-view relationship table subset corresponding to the first image frame.

[0028] Optionally, obtaining the semantic segmentation result corresponding to the image frame set includes:

[0029] Obtain the original image corresponding to any image frame in the image frame set;

[0030] Semantic segmentation is performed on the original image corresponding to any image frame in the image frame set to obtain the semantic segmentation result corresponding to any image frame.

[0031] Optionally, obtaining the point cloud fusion image corresponding to the image frame set based on the co-view relation table set and at least one point cloud included in the semantic segmentation result includes:

[0032] Obtain a first co-view relationship table set corresponding to the first image frame in the image frame set, wherein the first image frame is any image frame in the image frame set, and the first co-view relationship table set includes at least one second image frame that co-views with the first image frame;

[0033] Obtain any depth point corresponding to the first image frame;

[0034] Project the two-dimensional coordinates of any depth point onto any second image frame to obtain the projected coordinates corresponding to the two-dimensional coordinates;

[0035] If the projected coordinates satisfy the search stopping condition, any one of the depth points is determined as the first depth point, wherein the first depth point is a fused point, or the first depth point is a point outside the field of view of the first image frame.

[0036] If the projection coordinates do not meet the search stopping condition, the projection coordinates are projected to any third image frame until all second depth points corresponding to any depth point in the image frame set are obtained. The third image frame is any image frame in the second co-view relationship table set corresponding to the second image frame, and the second depth point is a point that is not fused and is within the field of view of the first image frame.

[0037] Obtain at least one second depth point that passes the consistency check among all the second depth points;

[0038] Project the arbitrary depth point and the at least one second depth point that has passed the consistency check into a three-dimensional space, obtain the three-dimensional coordinates corresponding to the arbitrary depth point, and determine that the point cloud fusion of the arbitrary depth point is complete.

[0039] Traverse the set of image frames to obtain the point cloud fused image corresponding to the set of image frames.

[0040] According to a second aspect of the present disclosure, a point cloud fusion apparatus is provided, comprising:

[0041] The collection acquisition unit is configured to acquire a collection of image frames from multiple perspectives.

[0042] The set acquisition unit is further configured to acquire the set of co-view relationship tables corresponding to the set of image frames, wherein the set of co-view relationship tables includes a subset of co-view relationship tables of any image frame in the set of image frames and the co-view frames corresponding to any image frame;

[0043] The result acquisition unit is configured to acquire the semantic segmentation result corresponding to the image frame set, wherein the semantic segmentation result includes at least one point cloud;

[0044] The point cloud fusion unit is configured to perform the following: obtain a point cloud fusion image corresponding to the image frame set based on the common view relation table set and the at least one point cloud.

[0045] According to some embodiments, the set acquisition unit includes a location acquisition subunit, a set acquisition subunit, a score acquisition subunit, and a subset acquisition subunit. The set acquisition unit is configured to, when performing the acquisition of the co-view relation table set corresponding to the image frame set:

[0046] The position acquisition subunit is configured to perform the following operations: determining a first center position corresponding to a first image frame and a second center position corresponding to a second image frame based on the pose information corresponding to each image frame in the image frame set. The first image frame is any image frame in the image frame set, the second image frame is any image frame in the image frame set other than the first image frame, the first center position is the center position of the camera device corresponding to the first image frame, and the second center position is the center position of the camera device corresponding to the second image frame.

[0047] The location acquisition subunit is configured to perform the following: determine the first center position corresponding to the first image frame and the second center position corresponding to the second image frame based on the center position corresponding to any image frame, wherein the first image frame is any image frame in the image frame set, and the second image frame is any image frame in the image frame set other than the first image frame set.

[0048] The score acquisition subunit is configured to perform an operation to acquire scores corresponding to the first image frame and the second image frame based on the number of sparse point clouds corresponding to the shared sparse point cloud set and the observation angle corresponding to at least one sparse point cloud in the shared sparse point cloud set, wherein the scores are used to indicate the degree of shared view between the first image frame and the second image frame.

[0049] The score acquisition subunit is also configured to perform a process of traversing all image frames in the image frame set except for the first image frame to acquire at least one score;

[0050] The subset acquisition subunit is configured to perform the acquisition of a subset of the co-view relation table corresponding to the first image frame based on the at least one score;

[0051] The set acquisition subunit is configured to traverse the image frame set and acquire the co-view relation table set corresponding to the image frame set.

[0052] According to some embodiments, the set acquisition unit further includes an angle acquisition subunit, which is configured to perform the acquisition of a first direction vector between any sparse point cloud in the set of shared sparse point clouds and the first center position;

[0053] Obtain the second direction vector between any sparse point cloud and the second center position;

[0054] Based on the first direction vector and the second direction vector, obtain the observation angle corresponding to any sparse point cloud.

[0055] According to some embodiments, the score acquisition subunit is configured to, when performing the task of acquiring the scores corresponding to the first image frame and the second image frame based on the number of sparse point clouds corresponding to the common-view sparse point cloud set and the observation angle corresponding to at least one sparse point cloud in the common-view sparse point cloud set, specifically configured to perform the following:

[0056] Obtain the number of sparse point clouds corresponding to the set of shared sparse point clouds;

[0057] The difference in the number of sparse point clouds is obtained based on the difference between the observation angle corresponding to any sparse point cloud in the common-view sparse point cloud set and the preset observation angle.

[0058] The scores corresponding to the first image frame and the second image frame are obtained based on the difference in the number of sparse point clouds and the corresponding score coefficient.

[0059] According to some embodiments, when the subset acquisition subunit is configured to perform the task of acquiring a subset of the co-view relation table corresponding to the first image frame based on the at least one score, it is specifically configured to perform the following:

[0060] According to the order of scores from high to low, add the second image frames corresponding to a predetermined number of scores from the at least one score to the co-viewing relationship table subset corresponding to the first image frame; or

[0061] Determine at least one target score among the at least one scores that is greater than a score threshold;

[0062] Add the second image frame corresponding to the at least one target score to the co-view relationship table subset corresponding to the first image frame.

[0063] According to some embodiments, when the result acquisition unit is configured to acquire the semantic segmentation result corresponding to the image frame set, it is specifically configured to perform:

[0064] Obtain a first co-view relationship table set corresponding to the first image frame in the image frame set, wherein the first image frame is any image frame in the image frame set, and the first co-view relationship table set includes at least one second image frame that co-views with the first image frame;

[0065] Obtain any depth point corresponding to the first image frame;

[0066] Project the two-dimensional coordinates of any depth point onto any second image frame to obtain the projected coordinates corresponding to the two-dimensional coordinates;

[0067] If the projected coordinates satisfy the search stopping condition, any one of the depth points is determined as the first depth point, wherein the first depth point is a fused point, or the first depth point is a point outside the field of view of the first image frame.

[0068] If the projection coordinates do not meet the search stopping condition, the projection coordinates are projected to any third image frame until all second depth points corresponding to any depth point in the image frame set are obtained. The third image frame is any image frame in the second co-view relationship table set corresponding to the second image frame, and the second depth point is a point that is not fused and is within the field of view of the first image frame.

[0069] Obtain at least one second depth point that passes the consistency check among all the second depth points;

[0070] Project the arbitrary depth point and the at least one second depth point that has passed the consistency check into a three-dimensional space, obtain the three-dimensional coordinates corresponding to the arbitrary depth point, and determine that the point cloud fusion of the arbitrary depth point is complete.

[0071] Traverse the set of image frames to obtain the point cloud fused image corresponding to the set of image frames.

[0072] According to a third aspect of the present disclosure, an electronic device is provided, comprising:

[0073] processor;

[0074] Memory used to store the processor's executable instructions;

[0075] The processor is configured to execute the instructions to implement the point cloud fusion method described in any one of the preceding aspects.

[0076] According to a fourth aspect of this application, a storage medium is provided that, when instructions in the storage medium are executed by a processor of an electronic device, enables the electronic device to perform the point cloud fusion method described in any one of the preceding aspects.

[0077] According to a fifth aspect of this application, a computer program product is provided, comprising a computer program that, when executed by a processor, implements the method described in any one of the preceding aspects.

[0078] The technical solutions provided by the embodiments of this disclosure bring at least the following beneficial effects:

[0079] In some or related embodiments, the following steps are taken: First, a set of image frames from multiple perspectives is acquired. Second, a set of common-view relation tables corresponding to the image frame set is acquired, wherein the common-view relation table set includes a subset of common-view relation tables for any image frame in the image frame set and the common-view frames corresponding to that image frame. Third, a semantic segmentation result corresponding to the image frame set is acquired, wherein the semantic segmentation result includes at least one point cloud. Fourth, a point cloud fusion image corresponding to the image frame set is acquired based on the common-view relation table set and the at least one point cloud. Therefore, acquiring the point cloud fusion image through the common-view relation table set and the speech segmentation result can reduce point cloud redundancy caused by directly fusing the image frame set, reduce the calculation of redundant point clouds in point cloud fusion, reduce the time consumed in the point cloud fusion process, and reduce the occurrence of redundant point clouds that lead to low accuracy of the point cloud fusion image, thereby improving the accuracy of point cloud fusion.

[0080] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Attached Figure Description

[0081] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure, and are not intended to unduly limit this disclosure.

[0082] Figure 1 This is an example schematic diagram illustrating a point cloud fusion method according to an exemplary embodiment;

[0083] Figure 2 This is an example schematic diagram illustrating a point cloud fusion method according to an exemplary embodiment;

[0084] Figure 3 This is an example schematic diagram illustrating a point cloud fusion method according to an exemplary embodiment;

[0085] Figure 4 This is an example schematic diagram illustrating a point cloud fusion method according to an exemplary embodiment;

[0086] Figure 5 This is an example schematic diagram illustrating a point cloud fusion method according to an exemplary embodiment;

[0087] Figure 6 This is an example schematic diagram illustrating a common viewpoint finding method according to an exemplary embodiment;

[0088] Figure 7 This is a block diagram illustrating a point cloud fusion apparatus according to an exemplary embodiment;

[0089] Figure 8This is a block diagram illustrating a point cloud fusion apparatus according to an exemplary embodiment;

[0090] Figure 9 This is a block diagram illustrating a point cloud fusion apparatus according to an exemplary embodiment;

[0091] Figure 10 This is a block diagram illustrating an electronic device according to an exemplary embodiment. Detailed Implementation

[0092] To enable those skilled in the art to better understand the technical solutions of this disclosure, the technical solutions in the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings.

[0093] It should be noted that the terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this disclosure are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this disclosure described herein can be implemented in orders other than those illustrated or described herein. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this disclosure. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this disclosure as detailed in the appended claims.

[0094] Figure 1 This is a flowchart illustrating a point cloud fusion method according to an exemplary embodiment, such as... Figure 1 As shown, this point cloud fusion method can be used in point cloud fusion scenarios, and includes the following steps:

[0095] In step S11, a set of image frames from multiple perspectives is obtained;

[0096] According to some embodiments, an image frame set refers to a collection comprised of at least one image frame. This image frame may, for example, be a depth image corresponding to an original image. The image frame set does not specifically refer to a fixed set. For example, the image frame set may change accordingly when the object being captured by the electronic device changes. For example, the image frame set may also change accordingly when the number of frames corresponding to it changes.

[0097] According to some embodiments, there are many ways to obtain depth maps, such as using depth sensors, LiDAR, structured light scanners, and binocular stereo geometry algorithms to calculate the depth.

[0098] It's easy to understand that multi-viewpoints are used to acquire image frames from different angles. Multi-viewpoints refer to different angles; that is, at least one image frame in the set of image frames is not acquired from the same angle. This multi-viewpoint does not specifically refer to a fixed angle. For example, when one of the viewpoints corresponding to a multi-viewpoint changes, the multi-viewpoint itself can also change accordingly. Similarly, when the number of viewpoints corresponding to a multi-viewpoint changes, the multi-viewpoint itself can also change accordingly.

[0099] According to some embodiments, in these embodiments of the present disclosure, when an electronic device performs a point cloud fusion method, the electronic device can acquire a set of image frames from multiple perspectives. The electronic device can, for example, control different camera devices to acquire this set of image frames, or it can acquire them from the electronic device's storage, from a server, or from other electronic devices. These embodiments of the present disclosure do not limit this to any particular method.

[0100] In step S12, the set of co-view relationship tables corresponding to the set of image frames is obtained;

[0101] It is easy to understand that the common-view relation table set includes a subset of the common-view relation tables for any image frame in the image frame set and the corresponding common-view frames. This common-view relation table set refers to a collection formed by at least one subset of common-view relation tables. When the subsets of common-view relation tables included in this common-view relation table set change, the common-view relation table set can also change accordingly. For example, when the image frame set changes, the common-view relation table set corresponding to that image frame set can also change accordingly.

[0102] According to some embodiments, a co-view relationship is used to indicate that two image frames share a common image. This co-view relationship table subset refers to at least one image frame with a co-view relationship to a given image frame. This co-view relationship table subset does not specifically refer to a fixed table; for example, when an image frame changes, the corresponding co-view relationship table subset may also change accordingly.

[0103] According to some embodiments, when an electronic device acquires a set of image frames, the electronic device can acquire a set of co-view relationship tables corresponding to the set of image frames.

[0104] In step S13, the semantic segmentation results corresponding to the image frame set are obtained;

[0105] According to some embodiments, the speech segmentation result refers to the segmentation result obtained by segmenting the speech of an image. The speech may, for example, refer to the image content. Any image frame refers to any image frame in a set of image frames; that is, the "any image frame" does not specifically refer to any image frame in a particular set of image frames.

[0106] It is easy to understand that the speech segmentation result includes at least one point cloud.

[0107] It is easy to understand that an electronic device can, for example, acquire the speech segmentation result corresponding to any image frame. An electronic device can acquire the speech segmentation result corresponding to a set of image frames.

[0108] In step S14, a point cloud fusion image corresponding to the image frame set is obtained based on the common view relationship table set and at least one point cloud.

[0109] According to some embodiments, when an electronic device obtains a set of common view relation tables corresponding to a set of image frames and at least one point cloud included in the semantic segmentation result, the electronic device can obtain a point cloud fusion image corresponding to the set of image frames.

[0110] It is easy to understand that point cloud fusion can, for example, acquire a 3D image of the acquired object.

[0111] In some or related embodiments, the following steps are taken: First, a set of image frames from multiple perspectives is acquired. Second, a set of common-view relation tables corresponding to the image frame set is acquired, wherein the common-view relation table set includes a subset of common-view relation tables for any image frame in the image frame set and its corresponding common-view frames. Third, a semantic segmentation result corresponding to the image frame set is acquired, wherein the semantic segmentation result includes at least one point cloud. Fourth, a point cloud fusion image corresponding to the image frame set is acquired based on the common-view relation table set and the at least one point cloud. Therefore, acquiring the point cloud fusion image through the common-view relation table set and the speech segmentation result can reduce point cloud redundancy caused by directly fusing the image frame set, reduce the computation for redundant point clouds in point cloud fusion, reduce the time consumed in the point cloud fusion process, and reduce the occurrence of redundant point clouds that lead to low accuracy of the point cloud fusion image, thereby improving the accuracy of point cloud fusion.

[0112] Figure 2 This is a flowchart illustrating a point cloud fusion method according to an exemplary embodiment, such as... Figure 2 As shown, this point cloud fusion method can be used in point cloud fusion scenarios, and includes the following steps:

[0113] In step S21, a set of image frames from multiple perspectives is obtained;

[0114] The specific process is as described above and will not be repeated here.

[0115] According to some embodiments, in this disclosure, the electronic device can control at least one camera device to capture images of a target object and obtain a set of image frames from multiple perspectives.

[0116] In step S22, the first center position corresponding to the first image frame and the second center position corresponding to the second image frame are obtained according to the pose information corresponding to each image frame in the image frame set.

[0117] According to some embodiments, the first image frame is any image frame in the image frame set, the second image frame is any image frame in the image frame set other than the first image frame, the first center position is the center position of the camera device corresponding to the first image frame, and the second center position is the center position of the camera device corresponding to the second image frame.

[0118] In some embodiments, pose information refers to the pose corresponding to any image frame at the time of acquisition. This pose information does not specifically refer to any fixed information. For example, when any image frame changes, the pose information corresponding to that image frame may also change accordingly. For example, when the acquisition angle corresponding to any image frame changes, that image frame may also change accordingly.

[0119] It is easy to understand that the center position is the center position of the camera device corresponding to any given image frame. The center position of the camera device can be, for example, the center position of the camera device's lens. The center position of the camera device can also be, for example, the geometric center position of the camera device. Different image frames correspond to different center positions.

[0120] According to some embodiments, when an electronic device acquires a set of image frames, it can acquire the pose information corresponding to each image frame in the set. For example, the electronic device can acquire the position information corresponding to all image frames in the set. Based on this position information, the electronic device can determine the first center position corresponding to the first image frame and the second center position corresponding to the second image frame.

[0121] According to some embodiments, the first center position refers to the center position of the camera device corresponding to the first image frame. The first image frame can be any image frame in a set of image frames. The first image frame does not specifically refer to a particular fixed image frame. For example, when the image content corresponding to the first image frame changes, the first image frame can also change accordingly. For example, when the frame identifier corresponding to the first image frame changes, the first image frame can also change accordingly.

[0122] It is easy to understand that the second center position refers to the center position of the camera device corresponding to the second image frame, which is any image frame in the image frame set other than the first image frame. The second image frame does not specifically refer to a particular fixed image frame. For example, when the image content corresponding to the second image frame changes, the second image frame may also change accordingly. For example, when the frame identifier corresponding to the second image frame changes, the second image frame may also change accordingly.

[0123] According to some embodiments, the image frame set includes N image frames, where N is a positive integer. Here, i represents the first image frame, and j represents the second image frame. The pose T of each image frame is determined... o Obtain the center position c of the camera device corresponding to each image frame. o (N in total). Among them, c i c represents the center position of the camera corresponding to the first image frame. j This represents the center position of the camera corresponding to the second image frame. The electronic device can pair up N center positions (c...). i ,c j ), calculate the position difference between the first center and the second center as follows:

[0124] dist i, =c i -c j |

[0125] In step S23, based on the first center position and the second center position, a set of co-view sparse point clouds corresponding to the first image frame and the second image frame is obtained;

[0126] According to some embodiments, when the electronic device obtains the first center position and the second center position, it can obtain a set of co-view sparse point clouds corresponding to the first image frame and the second image frame based on the first center position and the second center position. Specifically, the electronic device can use a sparse point cloud model to obtain the set of co-view sparse point clouds corresponding to the first image frame and the second image frame.

[0127] It is easy to understand that a shared-view sparse point cloud set refers to a collective consisting of at least one shared-view sparse point cloud. This shared-view sparse point cloud set is the intersection of the point cloud set corresponding to the first image frame and the set corresponding to the second image frame. This shared-view sparse point cloud set does not specifically refer to a fixed set. For example, when the number of point clouds included in the shared-view sparse point cloud set changes, the shared-view sparse point cloud set can also change accordingly. For example, when the specific point clouds included in the shared-view sparse point cloud set change, the shared-view sparse point cloud set can also change accordingly.

[0128] In step S24, the scores corresponding to the first image frame and the second image frame are obtained based on the number of sparse point clouds corresponding to the common sparse point cloud set and the observation angle corresponding to at least one sparse point cloud in the common sparse point cloud set.

[0129] In some embodiments, the score is used to indicate the degree of co-existence between the first image frame and the second image frame. The score corresponding to the first image frame and the second image frame is not specifically a fixed score. For example, when the degree of co-existence between the first image frame and the second image frame changes, the score corresponding to the first image frame and the second image frame may also change accordingly.

[0130] According to some embodiments, when an electronic device acquires a set of shared sparse point clouds, it can acquire the number of sparse point clouds corresponding to the set. The electronic device can also acquire the observation angle corresponding to at least one sparse point cloud in the set. The observation angle is the angle between any sparse point cloud and a first center position and a second center position.

[0131] It is easy to understand that, based on the number of sparse point clouds corresponding to the common sparse point cloud set and the observation angle corresponding to at least one sparse point cloud in the common sparse point cloud set, the electronic device can obtain the scores corresponding to the first image frame and the second image frame.

[0132] According to some embodiments, a first direction vector is obtained between any sparse point cloud in the shared-view sparse point cloud set and a first center position; a second direction vector is obtained between any sparse point cloud and a second center position; and the observation angle corresponding to any sparse point cloud is obtained based on the first and second direction vectors. Therefore, determining the observation angle based on the first and second direction vectors can improve the accuracy of the observation angle determination.

[0133] It is easy to understand that the first direction vector refers to the direction vector between any sparse point cloud and the first center position. This first direction vector does not specifically refer to a fixed direction vector. For example, when any sparse point cloud or the first center position changes, this first direction vector can also change accordingly.

[0134] It is easy to understand that any sparse point cloud can be, for example, a 3D point p. The direction vector of point p with respect to the first and second center positions is calculated as follows:

[0135] v i =c i -p

[0136] v j =c j -p

[0137] Among them, v i Let v represent the first direction vector. j This represents the second direction vector.

[0138] The electronic device can calculate the angle between the 3D point p and the first and second center positions:

[0139]

[0140] Where, θ p Let p be the angle of observation.

[0141] According to some embodiments, when obtaining the scores corresponding to the first image frame and the second image frame based on the number of sparse points corresponding to the shared sparse point cloud set and the observation angle corresponding to at least one sparse point cloud in the shared sparse point cloud set, the number of sparse points corresponding to the shared sparse point cloud set can be obtained; the difference in the number of sparse points can be obtained based on the difference between the observation angle corresponding to any sparse point cloud in the shared sparse point cloud set and a preset observation angle; and the scores corresponding to the first image frame and the second image frame can be obtained based on the difference in the number of sparse points and the corresponding score coefficient. Therefore, obtaining the scores corresponding to the first image frame and the second image frame based on the difference in the number of sparse points and the corresponding score coefficient allows for score determination using different difference coefficients, which can improve the accuracy of score determination.

[0142] It is easy to understand that when the observation angle corresponding to any sparse point cloud is less than the preset observation angle, the fractional coefficient corresponding to the difference between the observation angle corresponding to any sparse point cloud and the preset observation angle can be, for example, the first fractional coefficient; when the observation angle corresponding to any sparse point cloud is greater than or equal to the preset observation angle, the fractional coefficient corresponding to the difference between the observation angle corresponding to any sparse point cloud and the preset observation angle can be, for example, the second fractional coefficient, wherein the first fractional coefficient is less than the second fractional coefficient.

[0143] In some embodiments, the observation quality of the first and second image frames is scored based on the number of sparse point clouds corresponding to the shared sparse point cloud set and the observation angle corresponding to any sparse point cloud:

[0144]

[0145]

[0146] Among them, s i, The score for the first image frame and the second image frame, s p Let θ0 be the fraction corresponding to any sparse point cloud. θ0 is the preset observation angle, and σ1 is the fraction corresponding to &θ. p The fractional coefficient corresponding to <θ0, σ2 is &θ p The fractional coefficient corresponding to ≥θ0. Where,

[0147] According to some embodiments, for example, θ0 = 5°, σ1 = 1, σ2 = 10. Wherein, the above calculation of s... i, The physical meaning of the formula is: for example, θ0 can be considered as the optimal observation angle between two frames, at which point the depth estimation quality is highest. Therefore, this scoring function is most effective at the angle θ. p The highest score is 1 point when the angle is θ0. The meaning of σ1<σ2 is that the penalty is greater for the included angle <θ0, because in the principle of the algorithm, the included angle between the two frames is too small at this time, and the error of triangulation will increase rapidly, thus improving the accuracy of the score determination.

[0148] In step S25, all image frames in the image frame set except the first image frame are traversed to obtain at least one score;

[0149] According to some embodiments, when the scores corresponding to the first image frame and the second image frame are obtained, the electronic device can traverse all image frames in the image frame set except for the first image frame to obtain at least one score. For example, the electronic device can obtain multiple scores corresponding to the first image frame and multiple image frames.

[0150] For example, when the set of image frames includes image frame A, image frame B, image frame C, image frame D, and image frame E, the electronic device can obtain the scores corresponding to image frames A and B, image frames A and C, image frames A and D, and image frames A and E.

[0151] In step S26, a subset of the co-view relation table corresponding to the first image frame is obtained based on at least one score;

[0152] According to some embodiments, since the second image frame is any image frame in the image frame set other than the first image frame, the electronic device can obtain at least one score corresponding to the first image frame and the second image frame. That is, regarding the scores of the first image frame and the remaining image frames, the electronic device can obtain at least one score.

[0153] Optionally, the electronic device can obtain a subset of the co-viewing relationship table corresponding to the first image frame based on the scores corresponding to at least one first image frame and one second image frame. For example, the electronic device can obtain all image frames that have a co-viewing relationship with the first image frame.

[0154] According to some embodiments, an electronic device obtains a subset of the co-view relationship table corresponding to a first image frame based on at least one score, including: adding a predetermined number of second image frames corresponding to at least one score to the subset of the co-view relationship table corresponding to the first image frame in descending order of score; or determining at least one target score greater than a score threshold among at least one score; and adding the second image frames corresponding to at least one target score to the subset of the co-view relationship table corresponding to the first image frame. Therefore, obtaining a subset of the co-view relationship table based on scores can reduce the possibility of low co-view relationship between the first and second image frames, and can improve the accuracy of obtaining the subset of the co-view relationship table.

[0155] In some embodiments, the electronic device can acquire N-1 scores between the first image frame and the remaining image frames. The electronic device can sort the N-1 scores, obtain the top M results, and obtain a subset of the co-view relation table corresponding to the first image frame. Here, M is a positive integer.

[0156] In step S27, the image frame set is traversed to obtain the set of co-view relationship tables corresponding to the image frame set;

[0157] The specific process is as described above and will not be repeated here.

[0158] According to some embodiments, when the electronic device obtains a subset of the co-view relation table corresponding to the first image frame, it can traverse the image frame set to obtain the co-view relation table set corresponding to the image frame set. For example, the electronic device can obtain subsets of the co-view relation tables corresponding to all image frames and add the subsets of the co-view relation tables corresponding to all image frames to the co-view relation table set.

[0159] In some embodiments, the common view relation table set includes a common view relation table subset of any image frame in the image frame set and the common view frames corresponding to any image frame.

[0160] In step S28, the semantic segmentation results corresponding to the image frame set are obtained;

[0161] The specific process is as described above and will not be repeated here.

[0162] In step S29, a point cloud fusion image corresponding to the image frame set is obtained based on the common view relation table set and at least one point cloud included in the semantic segmentation result.

[0163] The specific process is as described above and will not be repeated here.

[0164] In some or related embodiments, by acquiring a set of image frames from multiple perspectives; determining the first center position of the first image frame and the second center position of the second image frame based on the pose information corresponding to each image frame in the image frame set; and acquiring a set of shared sparse point clouds corresponding to the first and second image frames based on the first and second center positions, the accuracy of acquiring the shared sparse point cloud set can be improved by reducing the possibility of inaccurate point cloud determination due to directly acquiring the shared sparse point cloud set. Secondly, based on the number of sparse point clouds corresponding to the shared sparse point cloud set and the observation angle corresponding to at least one sparse point cloud in the shared sparse point cloud set, scores corresponding to the first and second image frames are acquired; by traversing all image frames in the image frame set except the first image frame, at least one score is acquired; and based on at least one score, a subset of the shared view relation table corresponding to the first image frame is acquired, which can improve the accuracy of acquiring the subset of the shared view relation table and reduce the possibility of the first image frame not being shared with any other image frame. Furthermore, by traversing the image frame set, a common-view relation table set corresponding to the image frame set is obtained; the semantic segmentation result corresponding to the image frame set is obtained, wherein the semantic segmentation result includes at least one point cloud; based on the common-view relation table set and at least one point cloud, the point cloud fusion image corresponding to the image frame set is obtained. Obtaining the point cloud fusion image through the common-view relation table set and the speech segmentation result can reduce the point cloud redundancy caused by directly fusing the image frame set, reduce the calculation of redundant point clouds in the point cloud fusion, reduce the time consumption of the point cloud fusion process, and at the same time, reduce the occurrence of redundant point clouds that lead to low accuracy of the point cloud fusion image, thereby improving the accuracy of point cloud fusion.

[0165] Figure 3 This is a flowchart illustrating a point cloud fusion method according to an exemplary embodiment, such as... Figure 3 As shown, this point cloud fusion method can be used in point cloud fusion scenarios, and includes the following steps:

[0166] In step S31, a set of image frames from multiple perspectives is obtained;

[0167] The specific process is as described above and will not be repeated here.

[0168] In step S32, the set of co-view relationship tables corresponding to the set of image frames is obtained;

[0169] The specific process is as described above and will not be repeated here.

[0170] The set of co-view relationship tables includes a subset of the co-view relationship tables for any image frame in the image frame set and the co-view frames corresponding to any image frame.

[0171] In step S33, the original image corresponding to any image frame in the image frame set is obtained;

[0172] According to some embodiments, the image frames included in the image frame set may be depth maps. An electronic device can acquire the original image corresponding to any image frame in the image frame set. The original image is an RGB image. For example, the electronic device can acquire the RGB image corresponding to any image frame in the image frame set. An RGB image may, for example, refer to an image composed of red, blue, and green colors.

[0173] In step S34, semantic segmentation processing is performed on the original image corresponding to any image frame in the image frame set to obtain the semantic segmentation result corresponding to any image frame;

[0174] According to some embodiments, when an electronic device acquires the original image corresponding to any image frame in an image frame set, it can perform semantic segmentation processing on the original image corresponding to any image frame in the image frame set to obtain the semantic segmentation result corresponding to any image frame. The semantic segmentation result includes at least one point cloud.

[0175] It is easy to understand that when an electronic device performs semantic segmentation processing on the original image corresponding to any image frame in the image frame set, the electronic device can use a semantic segmentation network to obtain the speech segmentation result corresponding to any image frame.

[0176] According to some embodiments, the original image may be, for example, an RGB image. Semantic segmentation processing is performed on the original image corresponding to any image frame in the image frame set. For example, a fully convolutional neural network can be used to perform speech segmentation on the RGB image to obtain the speech segmentation result. Alternatively, pixel-level end-to-end semantic segmentation can be performed directly on the RGB image.

[0177] According to some embodiments, the electronic device can, for example, perform sky segmentation. Figure 4 This is an example schematic diagram illustrating a point cloud fusion method according to an exemplary embodiment, such as... Figure 4 As shown, any image frame can be, for example, Q, and an image frame that only includes ground depth can be W. Semantic segmentation processing is performed on any image frame Q to obtain, for example, the semantic segmentation result corresponding to any image frame.

[0178] According to some embodiments, electronic devices can utilize the consistency of multi-view semantics to obtain cleaner point clouds, so that during subsequent point cloud fusion, if a certain position p... i It was determined to conform to the semantic segmentation result G i If the number of occurrences exceeds a certain threshold, the point cloud at that location is retained; otherwise, the point cloud is deleted.

[0179] In step S35, a point cloud fusion image corresponding to the image frame set is obtained based on the common view relationship table set and at least one point cloud.

[0180] The specific process is as described above and will not be repeated here.

[0181] According to some embodiments, Figure 5 This is an example schematic diagram illustrating a point cloud fusion method according to an exemplary embodiment, such as... Figure 5 As shown, the image obtained by direct point cloud fusion without using the technical solution of this disclosure can be R, while the image obtained by point cloud fusion using the technical solution of this disclosure can be T.

[0182] According to some embodiments, when an electronic device acquires a point cloud fused image corresponding to a set of image frames, it can acquire a first co-view relation table set corresponding to a first image frame in the image frame set, wherein the first image frame is any image frame in the image frame set, and the first co-view relation table set includes at least one second image frame co-viewing the first image frame; acquire any depth point corresponding to the first image frame; project the two-dimensional coordinates of any depth point onto any second image frame to obtain the projected coordinates corresponding to the two-dimensional coordinates; if the projected coordinates meet the search stopping condition, determine any depth point as the first depth point, wherein the first depth point is a fused point, or a point outside the field of view of the first image frame; if the projected coordinates do not meet the search stopping condition... In this case, the projection coordinates are projected onto any third image frame until all second depth points corresponding to any depth point in the image frame set are obtained. Here, any third image frame is any image frame in the second co-view table set corresponding to the second image frame, and the second depth point is a point that is not fused and is within the field of view of the first image frame. At least one second depth point that passes the consistency check is obtained from all second depth points. The point cloud fusion of any depth point and the at least one second depth point that passes the consistency check is projected into a three-dimensional (3D) space, and the 3D coordinates corresponding to the point cloud are obtained, confirming the completion of point cloud fusion for any depth point. The image frame set is traversed to obtain the point cloud fused image corresponding to the image frame set. Therefore, the point cloud fusion in this embodiment is based on three-dimensional space, reducing the inaccuracy caused by two-dimensional space point cloud fusion. Furthermore, the consistency check improves the accuracy of obtaining the 3D coordinates corresponding to any depth point, thus improving the accuracy of point cloud fusion and reducing the repeated display of the same depth point. Simultaneously, based on at least one second depth point, the point cloud fusion time can be reduced, improving the efficiency of point cloud fusion.

[0183] According to some embodiments, the technical solution of this disclosure can perform point cloud fusion in a unified three-dimensional space. The fusion logic always unfolds from the perspective of 3D points, and the program in the electronic device maintains the information of 3D points in three-dimensional space, rather than the two-dimensional depth maps of each frame.

[0184] According to some implementations, a mask for the "fused region" is set for each frame, denoted as F.i Before the algorithm begins, all points on the depth map have not been merged, therefore F... i The initial value is a matrix of all zeros. The electronic device can fuse all image frames one by one, executing the following process sequentially. Suppose we need to process the depth map of the i-th frame, which could be, for example, the first image frame. Then, for the depth map D of the i-th frame... i Each depth point d in i All follow these steps:

[0185] Electronic devices can find p using the following steps. i All corresponding common viewpoints:

[0186] According to some embodiments, Figure 6 This is an exemplary schematic diagram illustrating a common-viewpoint finding method according to an exemplary embodiment, such as... Figure 6 As shown:

[0187] (1) This p i The two-dimensional coordinates of the point p i Projecting onto all image frames that share the same view as it, we obtain the corresponding projection coordinates. The projection formula is:

[0188]

[0189] Where, p i It is the depth map D of the i-th frame. i Two-dimensional pixel coordinates;

[0190] d i For this p i The corresponding depth value (i.e., d) i =D i (p i ));

[0191] K i This K represents the camera intrinsics corresponding to the capture of this depth map. i For example, it could be a 3x3 matrix;

[0192] x i The depth point d in the current reference frame coordinate system i The corresponding 3D point cloud;

[0193] T i It is the pose when the depth map of the i-th frame was captured;

[0194] T j It is the pose when the depth map of frame j was captured.

[0195] (2) For each target frame j projected to the past, the mask F of that frame j middle:

[0196] i) If that point The corresponding value is 1 (i.e., the point has participated in the fusion), or If the location is outside the depth map range, terminate the search and confirm p. i This is an invalid point; this invalid point is the first depth point. Otherwise:

[0197] ii) Then the coordinates of this point Continue projecting onto all frames that share the same view as frame j:

[0198]

[0199] (3) Repeat steps (1)-(2) until p i All valid points corresponding to the full depth map were found. A valid point is defined as: F j The mask value is 0 (i.e., the point has not yet been fused), and the projected point Depth map D in frame j j Within the field of view, this effective point is the second depth point.

[0200] Electronic devices can access p i All corresponding common viewpoints p j Perform a consistency check. This consistency check includes, but is not limited to, geometric consistency, projection consistency, and normal consistency.

[0201] In some embodiments, the electronic device can access p i All corresponding common viewpoints p j Consistency checks can be performed, for example, by following these steps:

[0202] (1) Geometric consistency:

[0203] i) Using the pose of the depth map in the i-th frame as the reference frame, the depth D i Projected onto the 3D space of the reference frame, a 3D point cloud x in the coordinate system of the reference frame is generated. i :

[0204]

[0205] ii) x i Projected onto the remaining M-1 depth maps, x i The depth projected in the reference frame of the j-th image is denoted as .

[0206]

[0207] T i It is the pose when the depth map of the i-th frame was captured;

[0208] T j It is the pose when the depth map of frame j was captured.

[0209] iii) Calculate the relative error between the two depths:

[0210]

[0211] Where, d j Let e ​​be the original depth at the location corresponding to the j-th image. geo A value <0.01 indicates high geometric consistency. Where e geo The corresponding threshold value does not refer to a specific fixed value.

[0212] (2) Projection consistency:

[0213] Ideally, p i After projection onto the j-th frame (denoted as...) Its position should be relative to depth d. j The coordinates p j Same (i.e., the same location in 3D space). Therefore, projection error can be used to represent the consistency of projections across multiple frames:

[0214]

[0215] Among them, e proj e represents the projection error. proj The unit of projection error is pixels, f. ∈ For example, it can take any value from 1 to 3. If e proj <f ∈ If the projection consistency is high, then it is considered to be high.

[0216] (3) Normal consistency:

[0217] i) From depth map D i Calculate the normal map N i , where p i The corresponding normal direction is n i .

[0218] ii) n i Projected onto all shared-view frames, the normal values ​​for each frame's view are obtained. Ideally, the normals projected onto the same frame from multiple frames should be identical (i.e., representing the same location in 3D space, and with consistent surface orientation). Therefore, normal consistency can be used to represent data quality.

[0219]

[0220] Where, θ i,Represents the angle between the normals, if θ i, If the angle is less than 30°, the electronic device has a high degree of normal consistency, meaning that the normal consistency meets the normal consistency requirements.

[0221] According to some embodiments, if a common viewpoint p j If all three consistency conditions are met, then point p is considered to be consistent. j depth d j The quality meets the quality requirements. Assuming there are H points that meet the quality requirements, the depth points that meet the quality requirements are projected into 3D space, and their average value is taken as the final 3D coordinates of the point.

[0222]

[0223]

[0224]

[0225] At this time, point p i Point cloud fusion has been completed, and all corresponding p-values ​​that meet the quality requirements have been obtained. j Point cloud fusion was also completed.

[0226] According to some embodiments, for all points p that have been fused... i p j The electronic device can change the corresponding position of the "fused region" mask F to 1.

[0227] Optionally, the electronic device can repeat the above process until every depth value of each depth map is involved in the fusion. At this point, the point cloud fusion process is complete.

[0228] In some or related embodiments, the following steps are taken: First, a set of image frames from multiple perspectives is acquired. Then, a set of common-view relation tables corresponding to the image frame set is acquired, where the common-view relation table set includes a subset of common-view relation tables for any image frame in the image frame set and its corresponding common-view frames. Next, the original image corresponding to any image frame in the image frame set is acquired. Then, semantic segmentation processing is performed on the original image corresponding to any image frame in the image frame set to obtain a semantic segmentation result for that image frame. Finally, based on the common-view relation table set and at least one point cloud included in the semantic segmentation result, a point cloud fusion image corresponding to the image frame set is acquired. Therefore, performing semantic segmentation processing on the original image corresponding to any image frame can improve the accuracy of the semantic segmentation result acquisition. Acquiring the point cloud fusion image through the common-view relation table set and the speech segmentation result can reduce point cloud redundancy caused by directly fusing the image frame set, reduce the computation for redundant point clouds in point cloud fusion, reduce the time consumed in the point cloud fusion process, and simultaneously reduce the occurrence of redundant point clouds that lead to low accuracy of the point cloud fusion image, thereby improving the accuracy of point cloud fusion.

[0229] Figure 7 This is a block diagram illustrating a point cloud fusion apparatus according to an exemplary embodiment. (Refer to...) Figure 7 The device 700 includes a set acquisition unit 701, a result acquisition unit 702, and a point cloud fusion unit 703.

[0230] The set acquisition unit 701 is configured to acquire a set of image frames from multiple perspectives;

[0231] The set acquisition unit 701 is further configured to acquire the common view relation table set corresponding to the set of image frames, wherein the common view relation table set includes a common view relation table subset of any image frame in the set of image frames and the common view frames corresponding to any image frame;

[0232] The result acquisition unit 702 is configured to acquire the semantic segmentation result corresponding to the set of image frames, wherein the semantic segmentation result includes at least one point cloud;

[0233] The point cloud fusion unit 703 is configured to perform the operation of obtaining a point cloud fusion image corresponding to an image frame set based on a common view relation table set and at least one point cloud.

[0234] According to some embodiments, Figure 8 This is a block diagram illustrating a point cloud fusion apparatus according to an exemplary embodiment. (Refer to...) Figure 8 The set acquisition unit 701 includes a position acquisition subunit 711, a set acquisition subunit 721, a score acquisition subunit 731, and a subset acquisition subunit 741. The set acquisition unit 701 is configured to acquire the common view relation table set corresponding to the set of image frames as follows:

[0235] The position acquisition subunit 711 is configured to perform the following: determine the first center position corresponding to the first image frame and the second center position corresponding to the second image frame based on the pose information corresponding to each image frame in the image frame set. The first image frame is any image frame in the image frame set, the second image frame is any image frame in the image frame set other than the first image frame, the first center position is the center position of the camera device corresponding to the first image frame, and the second center position is the center position of the camera device corresponding to the second image frame.

[0236] The set acquisition subunit 721 is configured to perform the operation of acquiring the co-view sparse point cloud set corresponding to the first image frame and the second image frame based on the first center position and the second center position;

[0237] The score acquisition subunit 731 is configured to perform the following: based on the number of sparse point clouds corresponding to the common sparse point cloud set and the observation angle corresponding to at least one sparse point cloud in the common sparse point cloud set, the score is used to indicate the degree of common view between the first image frame and the second image frame.

[0238] The score acquisition subunit 731 is also configured to perform a traversal of all image frames in the image frame set except for the first image frame to acquire at least one score;

[0239] Subset acquisition subunit 741 is configured to perform the task of acquiring a subset of the co-view relation table corresponding to the first image frame based on the scores corresponding to at least one first image frame and the second image frame;

[0240] The set acquisition subunit 721 is configured to traverse the set of image frames and obtain the set of co-view relation tables corresponding to the set of image frames.

[0241] According to some embodiments, Figure 9 This is a block diagram illustrating a point cloud fusion apparatus according to an exemplary embodiment. (Refer to...) Figure 9 The set acquisition unit 701 further includes an angle acquisition subunit 751, which is configured to acquire the first direction vector of any sparse point cloud in the set of shared sparse point clouds and the first center position.

[0242] Obtain the second direction vector between any sparse point cloud and the second center position;

[0243] Based on the first direction vector and the second direction vector, obtain the observation angle corresponding to any sparse point cloud.

[0244] According to some embodiments, when the score acquisition subunit 731 is configured to acquire the scores corresponding to the first image frame and the second image frame based on the number of sparse point clouds corresponding to the common-view sparse point cloud set and the observation angle corresponding to at least one sparse point cloud in the common-view sparse point cloud set, it is specifically configured to perform the following:

[0245] Get the number of sparse points corresponding to the shared sparse point cloud set;

[0246] The difference in the number of sparse point clouds is obtained by comparing the observation angle of any sparse point cloud in the common-view sparse point cloud set with the preset observation angle.

[0247] Based on the difference in the number of sparse point clouds and the corresponding score coefficient, the scores corresponding to the first and second image frames are obtained.

[0248] According to some embodiments, when the subset acquisition subunit is configured to perform the action of acquiring a subset of the co-view relation table corresponding to the first image frame based on at least one score, it is specifically configured to perform the following:

[0249] In descending order of scores, add the second image frames corresponding to at least a predetermined number of scores to the co-viewing relation table subset corresponding to the first image frame; or

[0250] Identify at least one target score that is greater than a score threshold from at least one score.

[0251] Add the second image frame corresponding to at least one target score to the co-view relation table subset corresponding to the first image frame.

[0252] According to some embodiments, when the result acquisition unit 702 is configured to acquire the semantic segmentation results corresponding to the image frame set, it is specifically configured to perform:

[0253] Get the original image corresponding to any image frame in the image frame set;

[0254] Semantic segmentation is performed on the original image corresponding to any image frame in the image frame set to obtain the semantic segmentation result corresponding to any image frame.

[0255] According to some embodiments, when the point cloud fusion unit 703 is configured to acquire a point cloud fusion image corresponding to an image frame set based on at least one point cloud included in the co-view relation table set and semantic segmentation results, it is specifically configured to perform the following:

[0256] Obtain the first co-view relationship table set corresponding to the first image frame in the image frame set, wherein the first image frame is any image frame in the image frame set, and the first co-view relationship table set includes at least one second image frame that co-views with the first image frame;

[0257] Obtain any depth point corresponding to the first image frame;

[0258] Project the two-dimensional coordinates of any depth point onto any second image frame to obtain the projected coordinates corresponding to the two-dimensional coordinates;

[0259] If the projected coordinates meet the search stopping condition, any depth point is determined as the first depth point, wherein the first depth point is either a fused point or a point outside the field of view of the first image frame.

[0260] If the projection coordinates do not meet the search stopping condition, the projection coordinates are projected to any third image frame until all the second depth points corresponding to any depth point in the image frame set are obtained. Here, any third image frame is any image frame in the second co-view relation table set corresponding to the second image frame, and the second depth point is a point that is not fused and is within the field of view of the first image frame.

[0261] Obtain at least one second depth point that passes the consistency check among all second depth points;

[0262] Project any depth point and at least one second depth point that has passed the consistency check onto a three-dimensional space, obtain the three-dimensional coordinates corresponding to any depth point, and determine that the point cloud fusion of any depth point is complete;

[0263] Traverse the image frame set to obtain the point cloud fused image corresponding to the image frame set.

[0264] Regarding the apparatus in the above embodiments, the specific manner in which each module performs its operation has been described in detail in the embodiments related to the method, and will not be elaborated upon here.

[0265] In summary, the apparatus provided in this embodiment of the present disclosure, through a set acquisition unit 701, is configured to acquire a set of image frames from multiple perspectives; the set acquisition unit 701 is also configured to acquire a set of common-view relation tables corresponding to the set of image frames, wherein the set of common-view relation tables includes a subset of common-view relation tables for any image frame in the set of image frames and the common-view frames corresponding to any image frame; the result acquisition unit 702 is configured to acquire a semantic segmentation result corresponding to the set of image frames, wherein the semantic segmentation result includes at least one point cloud; and the point cloud fusion unit 703 is configured to acquire a point cloud fusion image corresponding to the set of image frames based on the set of common-view relation tables and at least one point cloud. Therefore, acquiring a point cloud fusion image through the set of common-view relation tables and the speech segmentation result can reduce the point cloud redundancy caused by directly fusing the set of image frames, reduce the calculation of redundant point clouds in point cloud fusion, reduce the time consumed in the point cloud fusion process, and reduce the occurrence of redundant point clouds that lead to low accuracy of the point cloud fusion image, thereby improving the accuracy of point cloud fusion.

[0266] Figure 10A schematic block diagram of an example electronic device 1000 that can be used to implement embodiments of the present disclosure is shown. The components shown herein, their connections and relationships, and their functions are merely examples and are not intended to limit the implementation of the present disclosure described and / or claimed herein.

[0267] like Figure 10 As shown, the electronic device 1000 includes a computing unit 1001, which can perform various appropriate actions and processes according to a computer program stored in a read-only memory (ROM) 1002 or a computer program loaded from a storage unit 1008 into a random access memory (RAM) 1003. The RAM 1003 may also store various programs and data required for the operation of the electronic device 1000. The computing unit 1001, ROM 1002, and RAM 1003 are interconnected via a bus 1004. An input / output (I / O) interface 1005 is also connected to the bus 1004.

[0268] Multiple components in electronic device 1000 are connected to I / O interface 1005, including: input unit 1006, such as keyboard, mouse, etc.; output unit 1007, such as various types of displays, speakers, etc.; storage unit 1008, such as disk, optical disk, etc.; and communication unit 1009, such as network card, modem, wireless transceiver, etc. Communication unit 1009 allows electronic device 1000 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0269] The computing unit 1001 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 1001 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1001 performs the various methods and processes described above, such as point cloud fusion methods. For example, in some embodiments, a leaf spring stiffness value determination method can be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 1008. In some embodiments, part or all of the computer program can be loaded and / or installed on the electronic device 1000 via ROM 1002 and / or communication unit 1009. When the computer program is loaded into RAM 1003 and executed by the computing unit 1001, one or more steps of the point cloud fusion method described above can be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured to perform a point cloud fusion method by any other suitable means (e.g., by means of firmware).

[0270] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), payload-programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.

[0271] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0272] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0273] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0274] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as data servers), or middleware components (e.g., application servers), or frontend components (e.g., user computers with graphical user interfaces or web browsers through which users can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., communication networks). Examples of communication networks include local area networks (LANs), wide area networks (WANs), the Internet, and blockchain networks.

[0275] Computer systems can include clients and servers. Clients and servers are generally geographically separated and typically interact via communication networks. The client-server relationship is created by computer programs running on the respective computers and having a client-server relationship with each other. A server can be a cloud server, also known as a cloud computing server or cloud host, a hosting product within the cloud computing service ecosystem, addressing the shortcomings of traditional physical hosts and VPS (Virtual Private Server, or simply "VPS") services, such as high management difficulty and weak business scalability. Servers can also be servers for distributed systems or servers incorporating blockchain technology.

[0276] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.

[0277] The specific embodiments described above do not constitute a limitation on the scope of protection of this disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this disclosure should be included within the scope of protection of this disclosure.

Claims

1. A point cloud fusion method, characterized in that, include: Obtain a collection of image frames from multiple perspectives; Obtain the set of co-view relationship tables corresponding to the set of image frames, wherein the set of co-view relationship tables includes a subset of co-view relationship tables for any image frame in the set of image frames and the co-view frames corresponding to any image frame; Obtain the semantic segmentation result corresponding to the image frame set, wherein the semantic segmentation result includes at least one point cloud; Based on the common view relationship table set and the at least one point cloud, obtain the point cloud fusion image corresponding to the image frame set; The step of obtaining the set of co-view relationship tables corresponding to the set of image frames includes: Based on the pose information corresponding to each image frame in the image frame set, the first center position corresponding to the first image frame and the second center position corresponding to the second image frame are determined, wherein the first image frame is any image frame in the image frame set, the second image frame is any image frame in the image frame set other than the first image frame, the first center position is the center position of the camera device corresponding to the first image frame, and the second center position is the center position of the camera device corresponding to the second image frame. Based on the first center position and the second center position, obtain the co-view sparse point cloud set corresponding to the first image frame and the second image frame; Based on the number of sparse point clouds corresponding to the shared sparse point cloud set and the observation angle corresponding to at least one sparse point cloud in the shared sparse point cloud set, the scores corresponding to the first image frame and the second image frame are obtained, wherein the scores are used to indicate the degree of shared view between the first image frame and the second image frame. Traverse all image frames in the image frame set except for the first image frame, and obtain at least one score; Based on the at least one score, obtain a subset of the co-view relationship table corresponding to the first image frame; Traverse the set of image frames to obtain the set of co-view relationship tables corresponding to the set of image frames.

2. The method of claim 1, wherein, The method further includes: Obtain the first direction vector between any sparse point cloud in the shared-view sparse point cloud set and the first center position; Obtain the second direction vector between any sparse point cloud and the second center position; Based on the first direction vector and the second direction vector, obtain the observation angle corresponding to any sparse point cloud.

3. The method of claim 1, wherein, The step of obtaining the scores corresponding to the first image frame and the second image frame based on the number of sparse point clouds corresponding to the shared sparse point cloud set and the observation angle corresponding to at least one sparse point cloud in the shared sparse point cloud set includes: Obtain the number of sparse point clouds corresponding to the set of shared sparse point clouds; The difference in the number of sparse point clouds is obtained based on the difference between the observation angle corresponding to any sparse point cloud in the common-view sparse point cloud set and the preset observation angle. The scores corresponding to the first image frame and the second image frame are obtained based on the difference in the number of sparse point clouds and the corresponding score coefficient.

4. The method according to claim 1, characterized in that, The step of obtaining a subset of the co-view relation table corresponding to the first image frame based on the at least one score includes: According to the order of scores from high to low, add the second image frames corresponding to a predetermined number of scores from the at least one score to the co-viewing relationship table subset corresponding to the first image frame; or Determine at least one target score among the at least one scores that is greater than a score threshold; Add the second image frame corresponding to the at least one target score to the co-view relationship table subset corresponding to the first image frame.

5. The method according to claim 1, characterized in that, The step of obtaining the semantic segmentation result corresponding to the image frame set includes: Obtain the original image corresponding to any image frame in the image frame set; Semantic segmentation is performed on the original image corresponding to any image frame in the image frame set to obtain the semantic segmentation result corresponding to any image frame.

6. The method according to claim 1, characterized in that, The step of obtaining the point cloud fused image corresponding to the image frame set includes: Obtain a first co-view relationship table set corresponding to the first image frame in the image frame set, wherein the first image frame is any image frame in the image frame set, and the first co-view relationship table set includes at least one second image frame that co-views with the first image frame; Obtain any depth point corresponding to the first image frame; Project the two-dimensional coordinates of any depth point onto any second image frame to obtain the projected coordinates corresponding to the two-dimensional coordinates; If the projected coordinates satisfy the search stopping condition, any one of the depth points is determined as the first depth point, wherein the first depth point is a fused point, or the first depth point is a point outside the field of view of the first image frame. If the projection coordinates do not meet the search stopping condition, the projection coordinates are projected to any third image frame until all second depth points corresponding to any depth point in the image frame set are obtained. The third image frame is any image frame in the second co-view relationship table set corresponding to the second image frame, and the second depth point is a point that is not fused and is within the field of view of the first image frame. Obtain at least one second depth point that passes the consistency check among all the second depth points; Project the arbitrary depth point and the at least one second depth point that has passed the consistency check into a three-dimensional space, obtain the three-dimensional coordinates corresponding to the arbitrary depth point, and determine that the point cloud fusion of the arbitrary depth point is complete. Traverse the set of image frames to obtain the point cloud fused image corresponding to the set of image frames.

7. A point cloud fusion device, characterized in that, include: The collection acquisition unit is configured to acquire a collection of image frames from multiple perspectives. The set acquisition unit is further configured to acquire the set of co-view relationship tables corresponding to the set of image frames, wherein the set of co-view relationship tables includes a subset of co-view relationship tables of any image frame in the set of image frames and the co-view frames corresponding to any image frame; The result acquisition unit is configured to acquire the semantic segmentation result corresponding to the image frame set, wherein the semantic segmentation result includes at least one point cloud; The point cloud fusion unit is configured to perform the following: obtain a point cloud fusion image corresponding to the image frame set based on the common view relationship table set and the at least one point cloud; The set acquisition unit includes a position acquisition subunit, a set acquisition subunit, a score acquisition subunit, and a subset acquisition subunit. The set acquisition unit is configured to acquire the common-view relation table set corresponding to the image frame set when: The position acquisition subunit is configured to perform the following operations: determine the first center position corresponding to the first image frame and the second center position corresponding to the second image frame based on the pose information corresponding to each image frame in the image frame set. The first image frame is any image frame in the image frame set, the second image frame is any image frame in the image frame set other than the first image frame, the first center position is the center position of the camera device corresponding to the first image frame, and the second center position is the center position of the camera device corresponding to the second image frame. The set acquisition subunit is configured to acquire the co-view sparse point cloud set corresponding to the first image frame and the second image frame based on the first center position and the second center position; The score acquisition subunit is configured to perform an operation to acquire scores corresponding to the first image frame and the second image frame based on the number of sparse point clouds corresponding to the shared sparse point cloud set and the observation angle corresponding to at least one sparse point cloud in the shared sparse point cloud set, wherein the scores are used to indicate the degree of shared view between the first image frame and the second image frame. The score acquisition subunit is further configured to perform a process of traversing all image frames in the image frame set except for the first image frame to acquire at least one score; The subset acquisition subunit is configured to perform the task of acquiring a subset of the co-view relation table corresponding to the first image frame based on the at least one score; The set acquisition subunit is configured to traverse the image frame set and acquire the co-view relationship table set corresponding to the image frame set.

8. An electronic device, characterized in that, include: processor; Memory used to store the processor's executable instructions; The processor is configured to execute the instructions to implement the point cloud fusion method as described in any one of claims 1 to 6.

9. A storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the point cloud fusion method as described in any one of claims 1 to 6.