Multi-phase scene reconstruction method based on undirected graph, scene reconstruction device, medium and product

By constructing an undirected graph of data association and selecting high-quality multi-view frames as the initial reference, the problem of 3D structure distortion caused by improper selection of initial image pairs in the prior art is solved, and the accuracy and robustness of 3D scene reconstruction are achieved.

CN122199843APending Publication Date: 2026-06-12SHANGHAI GOERTEK TECHNOLOGY DEVELOPMENT CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHANGHAI GOERTEK TECHNOLOGY DEVELOPMENT CO LTD
Filing Date
2026-02-12
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing technologies often result in unreliable pairs with extremely low overlap when selecting initial image pairs, leading to insufficient effectiveness in feature matching. This, in turn, causes fuzzy scale estimation and deviations in the position of 3D points during the triangulation process, resulting in distortion of the overall 3D structure of the scene.

Method used

By constructing a data association undirected graph with multi-view frames as graph nodes and the number of matching points between multi-view frames as edge weights, the multi-view frame with the most triangulated points is selected as the initial multi-view frame to generate a local 3D point cloud. A preset number of multi-view frames with the largest association weight with the initial multi-view frame are selected in the data association undirected graph to construct an initial map. Finally, a 3D scene reconstruction map is constructed based on the initial map and the data association undirected graph.

🎯Benefits of technology

It improves the reliability and consistency of the initial reconstruction, reduces the triangulation bias caused by incorrect matching, ensures the accuracy and robustness of the 3D scene structure, avoids distortion of the 3D scene structure, and enhances the overall consistency and robustness of the reconstruction results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122199843A_ABST
    Figure CN122199843A_ABST
Patent Text Reader

Abstract

The application discloses a multi-phase scene reconstruction method based on an undirected graph, a scene reconstruction device, a medium and a product, relates to the technical field of three-dimensional scene reconstruction, and discloses a multi-phase scene reconstruction method based on an undirected graph, which comprises the following steps: constructing a data association undirected graph with multi-view frames as graph nodes and the number of matching between the multi-view frames as edge weights according to an image set collected by a multi-view camera, the image set being a subset of multi-view frames; determining a multi-view frame with the largest number of triangulation points in the image set as an initial multi-view frame, generating a local three-dimensional point cloud based on the initial multi-view frame; selecting a multi-view frame set with the largest association weight with the initial multi-view frame from the data association undirected graph, and constructing an initial map according to the multi-view frame set and the local three-dimensional point cloud; and constructing a three-dimensional scene reconstruction graph corresponding to the image set according to the initial map and the data association undirected graph. Through acquisition of reliable initial multi-view frame images, the numerical stability of the whole reconstruction process is ensured.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of 3D scene reconstruction technology, and in particular to a multi-camera scene reconstruction method, scene reconstruction equipment, medium and product based on undirected graphs. Background Technology

[0002] When recovering the 3D structure of a scene from a series of 2D images, the camera pose and the 3D point cloud of the scene are usually recovered iteratively through steps such as feature matching, triangulation, and bundle adjustment (BA) based on the unknown camera pose.

[0003] However, current processing methods typically select a set of images as initial pairs randomly or in the order of acquisition. When initial images are selected randomly or in the order of acquisition, unreliable initial pairs with extremely low overlap are prone to occur, leading to insufficient effectiveness of feature matching, an increased proportion of false matches, and consequently, blurring of scale estimation and deviation of 3D point positions during triangulation. These blurring and deviations are continuously propagated through subsequent camera pose estimation and bundle adjustment steps, ultimately causing distortion of the overall 3D structure of the scene.

[0004] The above content is only used to help understand the technical solution of this application and does not represent an admission that the above content is prior art. Summary of the Invention

[0005] The main purpose of this application is to provide a multi-camera scene reconstruction method, scene reconstruction equipment, medium and product based on undirected graphs, which aims to solve the technical problem of distortion in the constructed 3D scene structure due to the selection of unreliable initial time synchronization.

[0006] To achieve the above objectives, this application proposes a multi-camera scene reconstruction method based on undirected graphs, the method comprising: Based on the image set acquired by the multi-view camera, which is a subset of multi-view frames, a data association undirected graph is constructed with the multi-view frames as graph nodes and the number of matchings between the multi-view frames as the edge weights. The multiview frame with the most triangulated points in the image set is determined as the initial multiview frame, and a local three-dimensional point cloud is generated based on the initial multiview frame. A preset number of multi-view frames with the highest association weight with the initial multi-view frame are selected in the data association undirected graph, and an initial map is constructed based on the multi-view frame set and the local 3D point cloud. Based on the initial map and the data association undirected graph, a 3D scene reconstruction map corresponding to the image set is constructed.

[0007] In one embodiment, the step of constructing a 3D scene reconstruction map corresponding to the image set based on the initial map and the data-associative undirected graph includes: Determine the unregistered multi-view frames and registered multi-view frames in the data association undirected graph, wherein the registered multi-view frames include at least the initial multi-view frame and the set of multi-view frames; Based on the unregistered multi-view frames, determine the multi-view frames to be registered, and register the multi-view frames to be registered to the initial map; If all multiview frames of the data association undirected graph have been registered to the initial map, a 3D scene reconstruction map is generated based on the 3D point cloud model corresponding to the initial map.

[0008] In one embodiment, the step of determining the multi-view frames to be registered based on the unregistered multi-view frames and registering the multi-view frames to be registered to the initial map includes: Based on the unregistered multi-view frames that have a matching relationship with the registered multi-view frames in the undirected graph of the data association, an association weight set is constructed. The association weight set contains the pairwise association weights between each group of registered multi-view frames and unregistered multi-view frames. The graph node corresponding to the registered multi-view frame is deleted from the data association undirected graph, and a target graph node is generated in the data association undirected graph. The target graph node is connected to the node to be registered based on the association weight set. The target multi-view frame corresponding to the graph node with the highest weight value among the target graph nodes is determined as the multi-view frame to be registered: Register the multi-view frame to be registered to the initial map.

[0009] In one embodiment, after the steps of determining the multi-view frames to be registered based on the unregistered multi-view frames and registering the multi-view frames to be registered to the initial map, the multi-camera scene reconstruction method based on undirected graphs further includes: It is determined that there are other multi-view frames associated with the multi-view frame to be registered; The multi-view frame to be registered and the other multi-view frames are marked as registered multi-view frames.

[0010] In one embodiment, the step of registering the multi-view frame to be registered to the initial map includes: Based on a 2D to 3D point matching method, the relative pose between the multiview frame to be registered and the initial map is calculated, and the relative pose is updated. Based on the camera extrinsic parameters of the multi-view camera, the overlapping feature points of the multi-view frame to be registered are triangulated to obtain three-dimensional point cloud data. Based on the relative pose, the 3D point cloud data is converted to the coordinate system corresponding to the initial map to complete the registration of multi-view frames.

[0011] In one embodiment, after the steps of determining the multi-view frames to be registered based on the unregistered multi-view frames and registering the multi-view frames to be registered to the initial map, the multi-camera scene reconstruction method based on undirected graphs further includes: If the number of currently registered multiview frames is greater than a preset number, global bundle adjustment is performed on the registered multiview frames of the initial map, wherein the extrinsic parameter relationships within the multiview frames remain unchanged.

[0012] In one embodiment, the step of constructing an initial map based on the multi-view frame set and the local 3D point cloud includes: Calculate the set of relative poses between all multi-view frames to be processed in the multi-view frame set and the initial multi-view frame; Based on the camera extrinsic parameters of the multi-view camera, the multi-view frame to be processed is triangulated to obtain three-dimensional points, and point cloud data is generated through the effective three-dimensional points. Based on the relative pose of the multi-view frame to be processed in the relative pose set, the point cloud data is converted to the coordinate system corresponding to the local 3D point cloud to obtain the initial map.

[0013] In addition, to achieve the above objectives, this application also proposes a scene reconstruction device, which includes: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the multi-camera scene reconstruction method based on undirected graphs as described above.

[0014] In addition, to achieve the above objectives, this application also proposes a storage medium, which is a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, it implements the steps of the multi-camera scene reconstruction method based on undirected graphs as described above.

[0015] In addition, to achieve the above objectives, this application also provides a computer program product, which includes a computer program that, when executed by a processor, implements the steps of the multi-camera scene reconstruction method based on undirected graphs as described above.

[0016] One or more technical solutions proposed in this application have at least the following technical effects: An image set with multi-view frames as subsets, acquired by a multi-view camera, is used to construct a data association undirected graph with multi-view frames as graph nodes and the number of matching points between multi-view frames as edge weights. The multi-view frame with the most triangulated points in the image set is then selected as the initial multi-view frame, and a local 3D point cloud is generated. Subsequently, a preset number of multi-view frames with the largest association weight with the initial multi-view frame are selected from the data association undirected graph to construct an initial map. Finally, a 3D scene reconstruction map is constructed based on the initial map and the data association undirected graph. By constructing a graph model from the multi-view frame data set acquired by multi-view cameras, and determining the set of multi-view frames with the highest association weight with the initial multi-view frame having the most triangulation points based on the graph model, the effectiveness and consistency of inter-frame matching during the initial map construction process are further guaranteed. In this way, by organizing and managing multi-view geometric data through the graph model, the reconstruction process is established within a globally related framework, thereby providing strong constraints for periodic local or global optimization. This enables the system to directly obtain scale-consistent and reliable initial reconstruction, reduces triangulation bias caused by incorrect matching, and prevents the propagation of initial errors to subsequent reconstruction processes. Ultimately, this achieves the accuracy and reliability of 3D scene structure reconstruction, avoids distortion of the 3D scene structure, and improves the overall consistency and robustness of the reconstruction results. Attached Figure Description

[0017] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0018] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0019] Figure 1 This is a flowchart illustrating the first embodiment of the multi-camera scene reconstruction method based on undirected graphs in this application. Figure 2 This is a schematic diagram of the data association undirected graph generated by the multi-camera scene reconstruction method based on undirected graphs in this application; Figure 3 This is a flowchart illustrating the second embodiment of the multi-camera scene reconstruction method based on undirected graphs in this application. Figure 4 This is a schematic diagram illustrating the update process of the data association undirected graph in the multi-camera scene reconstruction method based on undirected graphs in this application; Figure 5 A simplified flowchart illustrating the multi-camera scene reconstruction method based on undirected graphs provided in the third embodiment of this application; Figure 6This is a schematic diagram of the device structure of the hardware operating environment involved in the multi-camera scene reconstruction method based on undirected graphs in the embodiments of this application.

[0020] The purpose, features, and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0021] It should be understood that the specific embodiments described herein are merely illustrative of the technical solutions of this application and are not intended to limit this application.

[0022] The main solution of this application embodiment is: based on the image set acquired by the multi-view camera with multi-view frames as a subset, construct a data association undirected graph with the multi-view frames as graph nodes and the number of matching between the multi-view frames as the edge weight; The multiview frame with the most triangulated points in the image set is determined as the initial multiview frame, and a local three-dimensional point cloud is generated based on the initial multiview frame. A preset number of multi-view frames with the highest association weight with the initial multi-view frame are selected in the data association undirected graph, and an initial map is constructed based on the multi-view frame set and the local 3D point cloud. Based on the initial map and the data association undirected graph, a 3D scene reconstruction map corresponding to the image set is constructed.

[0023] In this embodiment, for ease of description, the scene reconstruction device will be used as the execution subject in the following description.

[0024] Incremental 3D reconstruction is a key technology for recovering the 3D structure of a scene from a series of 2D images, and it has been widely used in fields such as robot navigation, autonomous driving, virtual reality, cultural relic preservation, and industrial inspection. Traditional incremental reconstruction methods, such as Structure Reconstruction Based on Motion (SFM), are usually based on unknown camera poses and iteratively recover the camera pose and the 3D point cloud of the scene through steps such as feature matching, triangulation, and bundle adjustment.

[0025] However, current processing methods typically select a set of images as initial pairs randomly or in the order of acquisition. When initial images are selected randomly or in the order of acquisition, unreliable initial pairs with extremely low overlap are prone to occur, leading to insufficient effectiveness of feature matching, an increased proportion of false matches, and consequently, blurring of scale estimation and deviation of 3D point positions during triangulation. These blurring and deviations are continuously propagated through subsequent camera pose estimation and bundle adjustment steps, ultimately causing distortion of the overall 3D structure of the scene.

[0026] It can be seen that traditional methods struggle to find reliable initial image pairs, easily leading to initialization failures. Furthermore, the reconstruction results suffer from scale blurring, requiring additional steps to determine the true scale. In the incremental process, errors accumulate with the addition of new views, causing drift and deformation in the final model. While global image processing (BA) can alleviate this problem, it remains insufficient for large-scale datasets.

[0027] Based on this, this application provides a solution that improves the robustness and accuracy of reconstruction by introducing a graph model to organize and manage multi-view geometric data. As a tool for naturally expressing the observational relationships between views, the data association undirected graph can explicitly model the relative geometric constraints between camera nodes, providing rich constraint information for the optimization process and fully utilizing its global association information to suppress error accumulation and improve numerical stability and scene accuracy. Specifically, a graph model is constructed from the multi-view frame data set acquired by the multi-view cameras. Based on the graph model, the set of multi-view frames with the largest association weight with the initial multi-view frame having the most triangulation points is determined, further ensuring the effectiveness and consistency of inter-frame matching during the initial map construction process. Thus, by constructing and utilizing the data association undirected graph, a more robust and scale-consistent initial reconstruction is provided in the initialization phase. In the incremental expansion phase, the optimal view is selected and its pose optimized through graph constraints, and efficient subgraph or global optimization is periodically performed using the graph structure. Finally, a 3D scene reconstruction graph is constructed based on the initial map and the data association undirected graph, effectively overcoming the shortcomings of traditional methods and achieving more numerically stable and more accurate 3D reconstruction.

[0028] It should be noted that the executing entity in this embodiment can be a computing service device with data processing, network communication, and program execution functions, such as a tablet computer, personal computer, or mobile phone, or an electronic device or scene reconstruction device capable of performing the above functions. The scene reconstruction device can be a smart device such as a mobile phone or computer, or a multi-view camera equipped with a smart module. The following description uses a scene reconstruction device as an example to illustrate this embodiment and the subsequent embodiments.

[0029] To better understand the technical solution of this application, a detailed description will be provided below in conjunction with the accompanying drawings and specific implementation methods.

[0030] This application provides a multi-camera scene reconstruction method based on undirected graphs, referring to... Figure 1 , Figure 1 This is a flowchart illustrating the first embodiment of the multi-camera scene reconstruction method based on undirected graphs in this application.

[0031] In this embodiment, the multi-camera scene reconstruction method based on undirected graphs includes steps S10 to S40: Step S10: Based on the image set acquired by the multi-view camera with multi-view frames as subsets, construct a data association undirected graph with multi-view frames as graph nodes and the number of matching pairs between multi-view frames as edge weights.

[0032] In this embodiment, a multi-view frame is a scene captured simultaneously by multiple cameras from multiple perspectives. The data association undirected graph is a topological structure that reflects the matching association between multi-view frames. Each graph node uniquely identifies a single set of multi-view frames, and the edge weight is the number of valid feature matching pairs between two frames. Specifically, during scene reconstruction, the first step is to acquire a set of synchronized images captured by multiple cameras: ,in, It refers to the number of times the multi-view camera captures data, and the number of multi-view frames. middle, , Representing the The first synchronized timestamp Images acquired by the eye This represents the number of cameras. In addition to acquiring the image set, multi-camera systems also receive an external parameter matrix between all cameras obtained through calibration. , Indicates the first The transformation matrix from camera number 1 to camera number 2.

[0033] When constructing an undirected graph of data association, feature extraction and matching methods such as XFeat (accelerated feature extraction) and Superglue (a common technical term in the industry, meaning a super matching network) are used to analyze all image sets. Point feature extraction and matching are performed, and the number of matches between multiview frames is recorded. Then, an undirected data association graph is constructed based on the recorded data. The graph nodes of the undirected data association graph are multiview frames, denoted as . , The number of nodes in the data association graph represents the number of matches between multiple frames. : , For multi-view frames and The number of matches between them. An undirected graph of data association constructed based on multiview frames in an image set, such as... Figure 2 As shown, the nodes are... Named, edge weight is .

[0034] It is understood that the process of extracting and matching point features from all multi-view frames of an image set is existing technology, and this application does not limit it.

[0035] In this embodiment, the multi-view camera can simultaneously capture multi-view images of the scene, and a single acquisition can provide rich multi-view constraints, which greatly reduces the number of data acquisitions and motion path planning complexities required to cover the same scene, improves data acquisition efficiency, and is suitable for rapid reconstruction in dynamic or constrained environments.

[0036] Step S20: Determine the multiview frame with the most triangulated points in the image set as the initial multiview frame, and generate a local 3D point cloud based on the initial multiview frame.

[0037] In computer vision, especially in the field of multi-view 3D scene reconstruction, triangulation refers to the core algorithm that, at the same moment, uses 2D image feature points captured by multiple cameras or the same camera in different poses, combined with known camera extrinsic parameters (position, pose), to geometrically recover the true coordinates of these feature points in 3D space. Essentially, it derives the 3D position from the multi-view 2D projection. The triangulation point count, on the other hand, is the total number of feature points whose 3D spatial coordinates are successfully recovered through triangulation after obtaining pairs of similar 2D feature points through feature matching within the overlapping fields of view of multiple cameras.

[0038] After acquiring an image set comprised of multi-view frames from a multi-view camera, triangulation of feature points within the overlapping fields of view of each multi-view frame can be performed before, after, or simultaneously with constructing the data association undirected graph, using the camera extrinsic parameters input by the multi-view camera at the same time as the input image set. Specifically, for each group of multi-view frames, each multi-view frame from the sequence-synchronized image set can be selected sequentially. Using pre-calibrated intrinsic parameters of the binocular camera, triangulation calculations are performed on the matching feature points of the left / right eye images within a frame. The number of triangulated points in each group of multi-view frames is counted, and the multi-view frame with the highest number of triangulated points is selected as the initial multi-view frame, denoted as . During triangulation, multi-view frames can be selected sequentially based on time order for calculation, or unprocessed image frames can be randomly selected for calculation. Triangulation can also be performed by directly retrieving multi-view frames from the image set or by selecting multi-view frames from the constructed undirected graph with data association.

[0039] After obtaining the initial multi-view frame, invalid points can be filtered and coordinates normalized on the matching feature points of the initial multi-view frame to obtain a local 3D point cloud. The point cloud coordinates are based on one of the camera coordinate systems of the multi-view camera, usually the leftmost camera coordinate system.

[0040] For example, the initial multi-view frame is a multi-view frame from the center of the living room. The left-view image clearly captures the sofa, coffee table, and TV cabinet. The right-view image has a 90% overlap with the left-view image. After XFeat feature matching, 20 incorrect matches are removed by RANSAC, leaving 450 valid matching pairs. Then, after triangulation, 450 three-dimensional points are obtained, covering the complete area from the sofa to the TV cabinet, with no obvious flying points.

[0041] Furthermore, after processing the initial multi-view frame, the multi-view frame can be marked as processed, and its index number can be added to the tag set, denoted as . This is so that the undirected graph of data association can be updated later based on the processed labels. Here, p is a value from 1 to N, and N is the index number. In this embodiment, an undirected graph reflecting the correlation of multi-view frames is constructed to provide a correlation reference for initial benchmark selection. Then, the initial multi-view frame with the most triangulation points is selected to ensure the reliability of the initial benchmark reconstruction. This avoids scale ambiguity and 3D point deviation caused by insufficient matching pairs or excessively small triangulation angles. It replaces the traditional blind strategy of randomly or sequentially selecting initial pairs, eliminating unreliable frames with low overlap and low matching quality from the source. The selected initial multi-view frames naturally possess a high-quality stereo matching foundation and accurate triangulation potential, which determines the accuracy of subsequent local point cloud generation and initial map construction. This ensures the reliability of the initial benchmark reconstruction, enabling the device to directly obtain a scale-consistent and reliable initial reconstruction, completely avoiding the scale drift problem and significantly reducing the risk of initialization failure.

[0042] Step S30: Select a preset number of multi-view frames with the highest association weight with the initial multi-view frame from the data association undirected graph, and construct the initial map based on the multi-view frame set and the local 3D point cloud.

[0043] In this embodiment, the association weight refers to the weight of the edges in the data association undirected graph, that is, the number of effective feature matching pairs between two multi-view frames. The larger the weight, the higher the overlap between frames and the stronger the matching reliability. The initial map is a local reconstruction result containing a certain range of 3D point cloud and the corresponding multi-view frame pose, which serves as the benchmark framework for subsequent global reconstruction. The preset number is usually set based on actual needs, and is preferably 10 in this embodiment.

[0044] As an optional implementation, based on the initial multiview frame and local 3D point cloud, edges connected to the initial multiview frame are extracted from the data association undirected graph. After sorting according to weight, a preset number, such as 10 multiview frames with the largest weights, are selected to form a multiview frame set. Next, based on the point matching method from two-dimensional to three-dimensional, the relative pose set between all the multi-view frames to be processed and the initial multi-view frame in the multi-view frame set is calculated. That is, for each element in the set, these associated multi-view frames and the initial multi-view frame are calculated sequentially. The relative pose is obtained ,in Representing the Each associated multi-view frame relative to the initial multi-view frame The relative pose, in which the two-dimensional to three-dimensional point matching method, namely 2D-3D point matching, is the process of matching the 2D pixels of the multi-view frame to be processed with the 3D points of the initial multi-view frame, and then inferring the pose of the multi-view frame to be processed relative to the registered map. The core is to solve the spatial transformation by using the perspective relationship of 3D points projected onto the camera imaging plane = 2D pixels.

[0045] After obtaining the relative pose set, the overlapping feature points of the multiview frames to be processed are triangulated using the fixed extrinsic parameters within the multiview frames, i.e., the camera extrinsic parameters of the multiview camera, to obtain 3D points. Point cloud data is then generated using the selected valid 3D points. Finally, the generated point cloud data needs to be added to the local point cloud of the initial multiview frames. At this point, the point cloud data needs to be transformed into the coordinates corresponding to the local 3D point cloud, based on the relative pose of the multiview frames to be processed in the relative pose set, to obtain the initial map. This involves transforming the point cloud of all elements in the multiview frame set to the initial multiview frame. After establishing the coordinate system, all elements of the multi-view frame set can be labeled and processed, and the label set can be updated. among.

[0046] For example, taking a stereo camera as an example, in the initial multi-view frame of the data association undirected graph, the left eye of the stereo camera is used as the reference, the intrinsic parameter K=[[1200, 0, 640], [0, 1200, 360], [0, 0, 1]], the extrinsic parameter R0 is the identity matrix, t0=[0, 0, 0]), and 5 sets of associated multi-view frames are selected from the edges directly connected to the initial multi-view frame. These 5 sets of multi-view frames form the set {CO1, CO2, CO3, CO4, CO5}. Next, for each multi-view frame in the set, 100 sets of matching feature points with the initial multi-view frame are extracted. The relative pose is solved by iterative PnP (Perspective-n-Point) algorithm to obtain the rotation matrix R1=[[0.998, 0.02, 0], [-0.02, 0.998, 0], [0, 0, 1]] and translation vector t3=[-0.5m, 0.2m, 0] based on the pre-calibrated intrinsic parameters K of the multi-view camera and the frame-intra-intra-extrinsic parameters. Triangulation is then performed on each multi-view frame in set S. Taking CO1 as an example, 200 sets of matching feature points of its left and right eyes are extracted. A linear equation system is constructed using the DLT algorithm to solve for the coordinates of 200 three-dimensional points. Then, through reprojection error filtering and triangulation angle filtering, 30 invalid points are removed, and 170 valid three-dimensional points are retained to generate the point cloud data of CO1. Finally, by combining the relative pose set, the point cloud data of each multiview frame to be processed is transformed into the local 3D point cloud coordinate system of the initial multiview frame. After completing the point cloud transformation of all frames in sequence, duplicate points can be removed, and finally an initial map is obtained that contains valid 3D points between the initial multiview frame and the multiview frame set, and the poses of each frame are consistent with the point cloud coordinates. The process of transformation based on extrinsic parameters and relative pose is existing technology and is not limited here.

[0047] As an alternative implementation, the K-nearest neighbor algorithm can also be used. This involves using a KD-tree to perform a K-nearest neighbor search on all multiview frames associated with the initial multiview frame in the undirected graph of the data association. The multiview frames with the highest association weights are selected to form a multiview frame set. Then, the initial pose is estimated using the EPnP algorithm, and local BA optimization is performed using the local point cloud of the initial multiview frame to improve pose estimation accuracy. Finally, the RANSAC algorithm is executed to register the point cloud between the current multiview frame to be processed and the initial multiview frame. After processing all frames in the set, the initial map is obtained.

[0048] This embodiment selects the multi-view frame with the highest correlation to the initial multi-view frame through a data association undirected graph to expand the reconstruction range. The constructed initial map not only retains the high accuracy of the initial benchmark, but also ensures the continuity of the map through high inter-frame overlap, providing a stable basic framework for subsequent global reconstruction and avoiding structural distortion in subsequent reconstruction due to benchmark confusion.

[0049] Step S40: Based on the initial map and the data association undirected graph, construct a 3D scene reconstruction map corresponding to the image set.

[0050] In this embodiment, the 3D scene reconstruction map is the final result containing complete 3D geometric information of the scene and multi-view frame motion trajectory. It covers dense point cloud, structured mesh and main camera motion trajectory, realizing digital restoration of the scene.

[0051] After obtaining the initial map, incremental or batch global reconstruction can be performed. As an optional implementation, incremental global reconstruction can exclude multi-view frames already included in the initial map from the data association undirected graph. The remaining multi-view frames are then sorted from largest to smallest according to their association weight with frames in the initial map, and their poses are estimated sequentially using the PnP+BA algorithm and registered to the initial map. Local BA optimization is performed every time a predetermined number of multi-view frames are registered (e.g., 10 groups of multi-view frames) to correct accumulated errors. Next, dense triangulation is performed on the stereo matching feature points of all registered multi-view frames, and point cloud details are supplemented by combining image pixel information to obtain a dense point cloud. Subsequently, the Poisson reconstruction algorithm is used to reconstruct the surface of the dense point cloud, resulting in a structured mesh model of the scene. Then, the pose information of the main camera in all registered multi-view frames is extracted, and the main camera motion trajectory is generated by sorting by acquisition time. Finally, the dense point cloud, mesh model, and motion trajectory are integrated to form a 3D scene reconstruction map.

[0052] As an alternative implementation, during batch global reconstruction, the poses of all multi-view frames in the data-associative undirected graph and the 3D point cloud of the initial map are used as optimization variables. Global BA optimization is performed to minimize the global reprojection error. Subsequently, dense triangulation is performed on texture-rich regions, while key feature points are retained in 3D regions to generate a semi-dense point cloud. Next, the Alpha shape algorithm is used to construct a mesh model, and the Laplacian smoothing algorithm is used to optimize the mesh surface to reduce jagged edges. Then, a sliding window filter is applied to the main camera's motion trajectory to eliminate minor jitter in the trajectory. Finally, the semi-dense point cloud, the optimized mesh model, and the smoothed trajectory are integrated to form a 3D scene reconstruction map.

[0053] This embodiment is based on the high-precision benchmark of the initial map. It completes the reconstruction and optimization of all multi-view frames through incremental or batch methods, expands the local benchmark into a global complete scene, and finally outputs a 3D scene reconstruction map that realizes the complete restoration of scene geometry and camera motion trajectory.

[0054] This embodiment provides a multi-camera scene reconstruction method based on undirected graphs. By constructing an undirected graph reflecting the correlation between multiple camera frames, and selecting the multi-camera frame with the most triangulated points as the initial reconstruction reference, the reliability of the reconstruction starting point is ensured. Then, the multi-camera frame with the highest correlation with the initial reference is selected to expand and construct an initial map, enhancing the coherence of the reference extension. Global 3D reconstruction is then completed based on the initial map, effectively avoiding the erroneous matching and triangulation bias problems caused by unreliable initial matching in traditional methods. In the final generated 3D scene reconstruction map, the spatial relationship of the target scene's 3D structure is accurate, with no obvious structural distortion. The scale consistency between the 3D model and the real scene is good, and the camera motion trajectory completely and accurately maps the acquisition path. The overall reconstruction result has excellent accuracy, robustness, and stability.

[0055] Based on the first embodiment of this application, in the second embodiment of this application, the content that is the same as or similar to the first embodiment described above can be referred to the above description, and will not be repeated hereafter. Based on this, please refer to... Figure 3 Step S40 also includes steps S41 to S43: Step S41: Determine the unregistered multicast frames and registered multicast frames in the data association undirected graph.

[0056] In this embodiment, frame registration refers to the process of aligning the pose of the frame to be registered with the initial map through pose estimation, and then fusing the point cloud generated by its triangulation into the initial map. Registered multiview frames refer to multiview frames whose poses have been accurately estimated and integrated into the initial map. Among them, registered multiview frames include at least the initial multiview frame and the set of multiview frames.

[0057] In this embodiment, after a multi-view frame in the data association undirected graph completes registration (including the initial multi-view frame), the device marks its status, that is, adds the information of the multi-view frame to the tag set so that the device can identify unregistered multi-view frames. The multi-view frame ID, registration status, pose parameters, etc., can be recorded in key-value pairs, and the initial multi-view frame and the multi-view frame set are also marked as registered.

[0058] Step S42: Determine the multi-view frames to be registered based on the unregistered multi-view frames, and register the multi-view frames to be registered to the initial map.

[0059] In this embodiment, when registering based on unregistered multi-view frames, a list of association weights between unregistered and registered multi-view frames can be first established based on the edge weights of the data association undirected graph. For example, the weight between unregistered multi-view frame CO4 and registered multi-view frame Lset3 is 300, and the weight between CO4 and Lset8 is 280, etc. After constructing the association weight list, the unregistered multi-view frame with the largest weight value is selected as the multi-view frame to be registered.

[0060] Optionally, in addition to directly selecting the unregistered multi-view frame with the largest weight value for registration, the average association weight between all unregistered multi-view frames and all registered frames can be calculated, and the multi-view frame with the highest average weight can be selected as the multi-view frame to be registered.

[0061] When determining the multi-view frame to be registered and registering it to the initial map, a 2D-to-3D point matching method can be used to calculate the relative pose between the multi-view frame to be registered and the initial map, and then perform pose refinement, i.e., update the relative pose. Subsequently, the overlapping feature points of the multi-view frame to be registered are triangulated using the camera extrinsic parameters of the multi-view camera to obtain 3D point cloud data. Finally, the 3D power data is transformed to the coordinate system corresponding to the initial map based on the relative pose to complete the registration of the multi-view frame. The refinement process can be carried out by using the pose of the main camera of the multi-view frame to be registered as the optimization variable, while fixing the 3D point coordinates of the initial map and the extrinsic parameters of the subordinate cameras within the multi-view frame relative to the main camera. Through a local BA algorithm, the sum of the reprojection errors of all 2D-3D matching pairs is iteratively minimized, and the pose parameters are continuously adjusted until the error converges to within a preset threshold, ultimately obtaining a more accurate updated relative pose.

[0062] Understandably, after a multi-view frame to be registered is registered to the initial map, it is also marked as a registered multi-view frame.

[0063] Step S43: If all multiview frames of the data association undirected graph have been registered to the initial map, generate a 3D scene reconstruction map based on the 3D point cloud model corresponding to the initial map.

[0064] In this embodiment, after registering the multi-view frames to be registered to the initial map, the registration process is repeated until all multi-view frames corresponding to all nodes are registered. Finally, a dense or semi-dense 3D point cloud model, a mesh model, and the main camera motion trajectory of all multi-camera frames are output.

[0065] This embodiment provides a multi-camera scene reconstruction method based on undirected graphs. By clarifying the frame registration status and selecting multi-view frames to be registered based on the principle of high correlation / temporal continuity, the continuity of the registration process and the accuracy of pose estimation are ensured. Furthermore, global optimization and structured modeling are performed through full registration to improve the reconstruction quality, and finally a 3D scene graph with a 3D structure consistent with the scale of the real scene is generated.

[0066] Based on the second embodiment of this application, in the third embodiment of this application, the content that is the same as or similar to the second embodiment described above can be referred to the above description and will not be repeated hereafter. In addition, step S42 further includes steps S421 to S424: Step S421: Based on the unregistered multi-view frames that have a matching relationship with the registered multi-view frames in the data association undirected graph, construct an association weight set.

[0067] The association weight set contains the pairwise association weights between each group of registered multi-view frames and unregistered multi-view frames.

[0068] In this embodiment, when determining the multi-view frame to be registered, it is necessary to first construct a set of association weights between registered and unregistered multi-view frames so that new weighted edges and nodes can be generated in the data association undirected graph based on the set of association weights. The nodes inherit the association weights between the nodes corresponding to the registered multi-view frames and the nodes corresponding to the unregistered multi-view frames.

[0069] Specifically, it can be based on the initial map. and tag collection Update the weight set of the data association undirected graph. That is, recording unprocessed multi-view frames and The set of associated weights for the registered multi-view frames is denoted as... The number of p's is k+1. It is the sum of the edge weights associated with the registered multi-view frames in the associated weight set.

[0070] Step S422: Delete the graph nodes corresponding to the registered multi-view frames in the data association undirected graph, and generate the target graph node in the data association undirected graph.

[0071] In this embodiment, after constructing the association weight set, relevant nodes can be deleted from the data association undirected graph based on the parameters of the registered multi-view frame label set, and new nodes, i.e., target graph nodes, can be generated. This allows the target image nodes to be associated with... New weighted edges are generated from the elements in the graph. Specifically, the associated weights of the target graph node's associated weight set are connected to the nodes to be registered.

[0072] Specifically, the update process of a data-associative undirected graph is as follows: Figure 4 As shown, and For already registered multi-view frames, delete the registered frames in the data-associative undirected graph. and Simultaneously, a target graph node is generated in the wireless graph. This node passes through The elements in and As weighted edge and Connection, and through and connect.

[0073] Step S423: Determine the target multi-view frame corresponding to the graph node with the highest weight value among the target graph nodes, and use it as the multi-view frame to be registered.

[0074] Step S424: Register the multi-view frame to be registered to the initial map.

[0075] It's important to note that the core logic of 3D scene reconstruction is to progressively expand the map based on effective inter-frame association constraints. Only nodes with associations can provide the 2D-3D matching pairs needed for pose solving and the overlap information required for triangulation. Unassociated nodes cannot support reliable reconstruction. Simultaneously, the updated target image nodes, as pre-selected candidates with substantial associations to the registered map, are chosen by selecting the node with the highest weight. This essentially locks in the reconstruction foundation with the optimal association strength and the most sufficient constraint information, ensuring that subsequent pose solving and triangulation have a sufficiently reliable association basis. This avoids selecting from the entire undirected graph, which could lead to a loss of association support, structural breaks, or distortion in the reconstruction. Therefore, after updating the data-associative undirected graph, the graph node with the highest weight value should be selected from the target image nodes, rather than from the entire data-associative undirected graph.

[0076] Meanwhile, the association extension of 3D scene reconstruction relies on the overlapping features between unregistered nodes and registered maps. Each time a processed node is deleted and the undirected graph is updated, the generation of new nodes is based on the substantial association between the remaining unregistered nodes and the current registered map (including the recently processed nodes). Under normal circumstances, there are often multiple unregistered nodes, each associated with different regions of registered nodes, such as different numbers of matching points or different proportions of overlapping areas. Therefore, multiple new nodes are generated, each corresponding to a frame to be registered with a valid association, or a merged associated unit. In this case, selecting the node with the highest weight from these target image nodes ensures that reconstruction is always advanced with the "strongest association and most sufficient constraints," avoiding pose errors caused by weakly associated nodes. If only one new node is generated, it means that among the remaining unregistered nodes, only this one node has a valid association with the registered map, and this node is the only feasible reconstruction path. Figure 4 The update process of the data association undirected graph shown is for illustrative purposes only and is not intended to limit the update process.

[0077] In this embodiment, based on the latest weights of each target image node in the updated data association undirected graph, the target multi-view frame corresponding to the node with the highest edge weight can be selected as the multi-view frame to be registered, and then registration processing can be performed based on the multi-view frame to be registered.

[0078] Furthermore, after selecting and registering the multi-view frame to be registered based on the new graph nodes, it is also necessary to identify other multi-view frames associated with the frame to be registered, and mark the frame to be registered and other multi-view frames as registered multi-view frames. Figure 4 Taking the updated data association undirected graph as an example, in the node to be registered, And after the node completes registration, it needs to... , as well as Marking a node as registered means adding its tag to the tag set so that when the target node is regenerated later, all nodes corresponding to registered multi-view frames will be deleted.

[0079] This embodiment deletes registered nodes and generates new graph nodes, rather than directly selecting a node from unregistered multiview frames for registration. By merging registered nodes, the new nodes simplify the topology of the undirected graph, reducing variable redundancy and computational burden in subsequent registrations. Simultaneously, the new nodes inherit the association weights of the original nodes and unregistered nodes, avoiding the topological connection breaks caused by deleting registered nodes and ensuring that the region to be registered remains associated with the registered map. Furthermore, existing unregistered nodes inherit early pose errors from registered nodes, while new nodes, being merged and re-registered, can correct these early errors, improving the baseline accuracy of subsequent registrations and preventing error propagation. Moreover, directly selecting frames to be registered often relies only on the association constraints of a single registered node, while new nodes inherit the association weights of multiple original registered nodes, providing more sufficient geometric constraints for subsequent nodes to be registered, improving the robustness of pose estimation, and ultimately allowing map expansion to maintain structural coherence while achieving higher accuracy and efficiency.

[0080] For example, to help understand the implementation process of the multi-camera scene reconstruction method based on undirected graphs obtained by combining this embodiment with the first embodiment described above, please refer to... Figure 5 , Figure 5 A simplified flowchart of a multi-camera scene reconstruction method based on undirected graphs is provided. Specifically, the process includes six stages. In the S1 system input and data preprocessing stage, a set of multi-view frame images acquired by multi-view cameras and pre-calibrated extrinsic parameters between cameras are obtained. Then, in the S2 multi-view sequence image data association graph stage, an initial multi-view frame and a set of multiple multi-view frames associated with the initial multi-view frame are selected through feature matching. Next, in the S3 multi-camera map initialization stage, map initialization is performed using the selected initial multi-view frame and the set of multi-view frames to establish an initial map with correct scale and broad spatial coverage. Then, in the S4 data association undirected graph update stage, nodes of registered multi-view frames are deleted from the data association undirected graph using registered and unregistered multi-view frames, while new nodes are generated. Finally, in the S5 local graph optimization and map update with extrinsic parameter constraints, nodes that meet the requirements are selected from the newly generated nodes for association update, realizing the local optimization and update of the initial map. Next, determine whether the node format of the data association graph is 1. If not, repeat steps S3-S5 until only one node remains in the data association graph, then terminate the above operation. In the global optimization and model output stage of S6, output dense or semi-dense 3D point cloud model, mesh model, and the main camera motion trajectory of all multi-camera frames.

[0081] It should be noted that the above examples are only for understanding this application and do not constitute a limitation on the multi-camera scene reconstruction method based on undirected graphs in this application. Any simple transformations based on this technical concept are within the protection scope of this application.

[0082] This embodiment provides a multi-camera scene reconstruction method based on an undirected graph. The reconstruction process is established within a globally interconnected framework of an undirected graph with cameras as nodes and geometric relationships between views as edges. By updating the graph structure, the optimal next extended view is intelligently selected based on the updated graph structure, providing strong constraints for periodic local or global optimization. Simultaneously, graph optimization techniques are used to smooth and correct accumulated errors globally, ensuring the numerical stability of the entire reconstruction process and ultimately obtaining a more accurate and structurally consistent 3D scene model.

[0083] This application also provides a three-dimensional scene reconstruction system, the system comprising: Multi-camera image acquisition module: used to simultaneously acquire image data from multiple cameras.

[0084] External parameter calibration and storage module: used to store and provide pre-calibrated external parameters of the multi-camera system.

[0085] Feature extraction and matching module: used to extract image features and perform feature matching within and between frames.

[0086] Multi-camera virtualization module: used to unify multiple camera frames into the main camera coordinate system based on extrinsic parameters.

[0087] Constrained incremental reconstruction module: used to perform the initialization, local and global optimization processes of data-associative undirected graphs.

[0088] 3D Model Output Module: Used to generate and output the final 3D reconstruction results.

[0089] This application provides a scene reconstruction device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, which are executed by the at least one processor to enable the at least one processor to perform the multi-camera scene reconstruction method based on undirected graphs in the first embodiment described above.

[0090] The following is for reference. Figure 6The diagram illustrates a structural schematic suitable for implementing the scene reconstruction device in the embodiments of this application. The scene reconstruction device in the embodiments of this application may include, but is not limited to, mobile terminals such as mobile phones, laptops, digital broadcast receivers, personal digital assistants (PDAs), tablet computers (PADs), portable media players (PMPs), in-vehicle terminals (e.g., in-vehicle navigation terminals), and fixed terminals such as digital TVs and desktop computers. Figure 6 The scene reconstruction device shown is merely an example and should not impose any limitations on the functionality and scope of use of the embodiments of this application.

[0091] like Figure 6 As shown, the scene reconstruction device may include a processing unit 1001 (e.g., a central processing unit, a graphics processing unit, etc.), which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 1002 or a program loaded from a storage device 1003 into a random access memory (RAM) 1004. The RAM 1004 also stores various programs and data required for the operation of the scene reconstruction device. The processing unit 1001, the ROM 1002, and the RAM 1004 are interconnected via a bus 1005. An input / output (I / O) interface 1006 is also connected to the bus. Typically, the following systems can be connected to the input / output interface 1006: input devices 1007 including, for example, a touchscreen, touchpad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; output devices 1008 including, for example, a liquid crystal display (LCD), speaker, vibrator, etc.; storage devices 1003 including, for example, magnetic tape, hard disk, etc.; and communication devices 1009. The communication device 1009 allows the scene reconstruction device to communicate wirelessly or wiredly with other devices to exchange data. Although the figure shows scene reconstruction devices with various systems, it should be understood that it is not required to implement or possess all of the systems shown. More or fewer systems may be implemented alternatively.

[0092] Specifically, according to the embodiments disclosed in this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments disclosed in this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device, or installed from storage device 1003, or installed from read-only memory 1002. When the computer program is executed by processing device 1001, it performs the functions defined in the methods of the embodiments disclosed in this application.

[0093] The scene reconstruction device provided in this application employs the multi-camera scene reconstruction method based on undirected graphs in the above embodiments, which can solve the technical problem of distorted 3D scene structure caused by selecting unreliable initial time synchronization. Compared with the prior art, the beneficial effects of the scene reconstruction device provided in this application are the same as those of the multi-camera scene reconstruction method based on undirected graphs provided in the above embodiments, and other technical features of this scene reconstruction device are the same as those disclosed in the previous embodiment method, and will not be repeated here.

[0094] It should be understood that the various parts disclosed in this application can be implemented using hardware, software, firmware, or a combination thereof. In the description of the above embodiments, specific features, structures, materials, or characteristics can be combined in any suitable manner in one or more embodiments or examples.

[0095] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

[0096] This application provides a computer-readable storage medium having computer-readable program instructions (i.e., a computer program) stored thereon, which are used to execute the multi-camera scene reconstruction method based on undirected graphs in the above embodiments.

[0097] The computer-readable storage medium provided in this application may be, for example, a USB flash drive, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard disks, random access memory, read-only memory, erasable programmable read-only memory (EPROM, or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this embodiment, the computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, system, or device. The program code contained on the computer-readable storage medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, radio frequency (RF), etc., or any suitable combination thereof.

[0098] The aforementioned computer-readable storage medium may be included in the scene reconstruction device; or it may exist independently and not be assembled into the scene reconstruction device.

[0099] The aforementioned computer-readable storage medium carries one or more programs that, when executed by the scene reconstruction device, cause the scene reconstruction device to: construct a data association undirected graph with the multi-view frames as graph nodes and the number of matching pairs between the multi-view frames as edge weights, based on the image set acquired by the multi-view camera and the multi-view frames as a subset of the image set; The multiview frame with the most triangulated points in the image set is determined as the initial multiview frame, and a local three-dimensional point cloud is generated based on the initial multiview frame. A preset number of multi-view frames with the highest association weight with the initial multi-view frame are selected in the data association undirected graph, and an initial map is constructed based on the multi-view frame set and the local 3D point cloud. Based on the initial map and the data association undirected graph, a 3D scene reconstruction map corresponding to the image set is constructed.

[0100] Computer program code for performing the operations of this application can be written in one or more programming languages ​​or a combination thereof, including object-oriented programming languages ​​such as Java, Smalltalk, and C++, as well as conventional procedural programming languages ​​such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0101] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0102] The modules described in the embodiments of this application can be implemented in software or hardware. The names of the modules do not necessarily limit the functionality of the unit itself.

[0103] The readable storage medium provided in this application is a computer-readable storage medium that stores computer-readable program instructions (i.e., a computer program) for executing the above-described multi-camera scene reconstruction method based on undirected graphs. This solves the technical problem of distorted 3D scene structures caused by selecting unreliable initial time synchronization. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided in this application are the same as those of the multi-camera scene reconstruction method based on undirected graphs provided in the above embodiments, and will not be repeated here.

[0104] The above description is only a part of the embodiments of this application and does not limit the patent scope of this application. All equivalent structural transformations made under the technical concept of this application and using the contents of the specification and drawings of this application, or direct / indirect applications in other related technical fields, are included in the patent protection scope of this application.

Claims

1. A multi-camera scene reconstruction method based on undirected graphs, characterized in that, The multi-camera scene reconstruction method based on undirected graphs includes: Based on the image set acquired by the multi-view camera, which is a subset of multi-view frames, a data association undirected graph is constructed with the multi-view frames as graph nodes and the number of matchings between the multi-view frames as the edge weights. The multiview frame with the most triangulated points in the image set is determined as the initial multiview frame, and a local three-dimensional point cloud is generated based on the initial multiview frame. A preset number of multi-view frames with the highest association weight with the initial multi-view frame are selected in the data association undirected graph, and an initial map is constructed based on the multi-view frame set and the local 3D point cloud. Based on the initial map and the data association undirected graph, a 3D scene reconstruction map corresponding to the image set is constructed.

2. The multi-camera scene reconstruction method based on undirected graphs as described in claim 1, characterized in that, The step of constructing a 3D scene reconstruction map corresponding to the image set based on the initial map and the data association undirected graph includes: Determine the unregistered multi-view frames and registered multi-view frames in the data association undirected graph, wherein the registered multi-view frames include at least the initial multi-view frame and the set of multi-view frames; Based on the unregistered multi-view frames, determine the multi-view frames to be registered, and register the multi-view frames to be registered to the initial map; If all multiview frames of the data association undirected graph have been registered to the initial map, a 3D scene reconstruction map is generated based on the 3D point cloud model corresponding to the initial map.

3. The multi-camera scene reconstruction method based on undirected graphs as described in claim 2, characterized in that, The step of determining the multi-view frames to be registered based on the unregistered multi-view frames and registering the multi-view frames to be registered to the initial map includes: Based on the unregistered multi-view frames that have a matching relationship with the registered multi-view frames in the undirected graph of the data association, an association weight set is constructed. The association weight set contains the pairwise association weights between each group of registered multi-view frames and unregistered multi-view frames. The graph node corresponding to the registered multi-view frame is deleted from the data association undirected graph, and a target graph node is generated in the data association undirected graph. The target graph node is connected to the node to be registered based on the association weight set. The target multi-view frame corresponding to the graph node with the highest weight value among the target graph nodes is determined as the multi-view frame to be registered: Register the multi-view frame to be registered to the initial map.

4. The multi-camera scene reconstruction method based on undirected graphs as described in claim 3, characterized in that, After the step of determining the multi-view frames to be registered based on the unregistered multi-view frames and registering the multi-view frames to be registered to the initial map, the multi-camera scene reconstruction method based on undirected graphs further includes: It is determined that there are other multi-view frames associated with the multi-view frame to be registered; The multi-view frame to be registered and the other multi-view frames are marked as registered multi-view frames.

5. The multi-camera scene reconstruction method based on undirected graphs as described in claim 2, characterized in that, The step of registering the multi-view frame to be registered to the initial map includes: Based on a 2D to 3D point matching method, the relative pose between the multiview frame to be registered and the initial map is calculated, and the relative pose is updated. Based on the camera extrinsic parameters of the multi-view camera, the overlapping feature points of the multi-view frame to be registered are triangulated to obtain three-dimensional point cloud data. Based on the relative pose, the 3D point cloud data is converted to the coordinate system corresponding to the initial map to complete the registration of multi-view frames.

6. The multi-camera scene reconstruction method based on undirected graphs as described in claim 2, characterized in that, After the step of determining the multi-view frames to be registered based on the unregistered multi-view frames and registering the multi-view frames to be registered to the initial map, the multi-camera scene reconstruction method based on undirected graphs further includes: If the number of currently registered multiview frames is greater than a preset number, global bundle adjustment is performed on the registered multiview frames of the initial map, wherein the extrinsic parameter relationships within the multiview frames remain unchanged.

7. The multi-camera scene reconstruction method based on undirected graphs as described in claim 1, characterized in that, The step of constructing an initial map based on the multi-view frame set and the local 3D point cloud includes: Calculate the set of relative poses between all multi-view frames to be processed in the multi-view frame set and the initial multi-view frame; Based on the camera extrinsic parameters of the multi-view camera, the multi-view frame to be processed is triangulated to obtain three-dimensional points, and point cloud data is generated through the effective three-dimensional points. Based on the relative pose of the multi-view frame to be processed in the relative pose set, the point cloud data is converted to the coordinate system corresponding to the local 3D point cloud to obtain the initial map.

8. A scene reconstruction device, characterized in that, The scene reconstruction device includes: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the multi-camera scene reconstruction method based on undirected graphs as described in any one of claims 1 to 7.

9. A storage medium, characterized in that, The storage medium is a computer-readable storage medium, and a computer program is stored on the storage medium. When the computer program is executed by a processor, it implements the steps of the multi-camera scene reconstruction method based on undirected graphs as described in any one of claims 1 to 7.

10. A computer program product, characterized in that, The computer program product includes a computer program that, when executed by a processor, implements the steps of the multi-camera scene reconstruction method based on undirected graphs as described in any one of claims 1 to 7.