Virtual reality scene three-dimensional reconstruction method and system based on multi-source data fusion
By using a multi-source data fusion method and system for 3D reconstruction of virtual reality scenes, the problem of data fusion and VR interaction in underground commercial spaces has been solved, achieving high-precision, dynamic 3D reconstruction and immersive display, and improving modeling timeliness and interactive performance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- GUANGDONG ZHONGKE KAIZER INFORMATION TECH CO LTD
- Filing Date
- 2025-05-09
- Publication Date
- 2026-06-26
Smart Images

Figure CN120635342B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of three-dimensional reconstruction technology for integrated underground commercial and trade space scenes, specifically to a method and system for three-dimensional reconstruction of virtual reality scenes based on multi-source data fusion. Background Technology
[0002] With the accelerating pace of urbanization, the development and utilization of urban underground space is becoming increasingly widespread, especially in core urban areas. The integrated construction of underground commercial facilities and transportation hubs is gradually becoming an important means to improve land use efficiency and alleviate surface traffic pressure. These integrated underground commercial and transportation spaces have complex structures and diverse functions, involving multiple key aspects such as pedestrian flow organization, equipment layout, safe evacuation, and signage navigation, which places higher demands on spatial planning, operation and maintenance management, and emergency response.
[0003] Currently, the management and visualization of urban underground spaces mainly rely on two-dimensional drawings, BIM models, or traditional 3D modeling methods. While these methods are feasible for static displays or during the construction phase, they face significant challenges: data uniformity, high structural complexity, difficulty in updating and maintaining, and poor interactive experience. In recent years, the rapid development of technologies such as artificial intelligence, 3D reconstruction, and virtual reality has provided new ideas for achieving high-precision, dynamically updated 3D modeling and immersive displays of underground spaces. However, existing technologies still lack a complete system for integrated underground commercial and trade spaces that possesses multi-source heterogeneous data fusion capabilities, structural semantic modeling capabilities, and VR interactive output capabilities. In particular, effective solutions have not yet been developed for spatial dynamic recognition, structural status assessment, and multi-role immersive interaction. Summary of the Invention
[0004] To address the shortcomings of existing technologies, this invention provides a method and system for three-dimensional reconstruction of virtual reality scenes based on multi-source data fusion, in order to solve the problems mentioned in the background art.
[0005] To achieve the above objectives, the present invention is implemented through the following technical solution: a method and system for three-dimensional reconstruction of virtual reality scenes based on multi-source data fusion, including a data acquisition module, a data processing module, an AI reconstruction fusion model establishment module, a structural reconstruction monitoring module, a fusion monitoring module, a VR interaction monitoring module, and a real scene output module;
[0006] The data acquisition module is used to synchronously acquire multi-source heterogeneous data through various devices in an underground commercial and trade integrated space scenario, including: point cloud and image information, device pose and positioning data, semantic visual elements and text information;
[0007] The data processing module is used to perform spatiotemporal alignment, denoising fusion, and dense modeling on the collected data to establish the first dataset; to fuse IMU and UWB data for trajectory estimation and sensor registration to establish the second dataset; and to extract semantic elements and encode and label them using target detection, semantic segmentation, and OCR recognition technologies to establish the third dataset.
[0008] The AI reconstruction fusion model building module is used to train and extract features based on a three-dimensional convolutional neural network, combining spatial structure fusion data and semantic annotation data, to build an AI reconstruction fusion model for three-dimensional model generation and multimodal feature analysis, and to support the subsequent evaluation and strategy output of structural integrity, fusion consistency and VR interaction adaptability.
[0009] The structural reconstruction monitoring module is used to monitor the geometric integrity and density of the three-dimensional reconstruction model, calculate and obtain the structural integrity evaluation coefficient JGPG, and compare it with the first threshold Q1 to determine whether the structural reconstruction is complete. If it is incomplete, a strategy is given.
[0010] The fusion monitoring module is used to monitor the spatial registration effect and semantic matching degree between point cloud structure and semantic image information, calculate and obtain the fusion consistency evaluation coefficient RHPG, and compare and analyze it with the second threshold Q2 to determine whether the fusion error rate is qualified. If it is not qualified, a strategy is given.
[0011] The VR interaction monitoring module is used to monitor the visualization, interaction performance and rendering smoothness of the 3D model in the virtual reality environment in real time, calculate and obtain the VR interaction adaptability evaluation coefficient VRJH, and compare and analyze it with the third threshold Q3 to determine whether the 3D model meets the VR interaction performance requirements. If it does not meet the requirements, a strategy is given.
[0012] The real-world scene output module is used to perform adaptability optimization and process instruction generation on the 3D reconstruction model based on the evaluation results of structural integrity, fusion consistency and VR interaction adaptability and corresponding strategies, so as to realize the final adjustment and output encapsulation of the model.
[0013] Preferably, the data acquisition module includes a spatial structure image acquisition unit, a pose positioning information acquisition unit, and a scene semantic annotation information acquisition unit;
[0014] The spatial structure image acquisition unit is used to acquire point cloud data of the underground commercial space scene by installing a three-dimensional LiDAR device, including the structural outlines of walls, columns, corridors and escalators; to acquire depth maps and synchronous color image data by installing an RGB-D camera device, forming dense structure and texture alignment information; and to acquire wide-angle environmental images and key visual feature area images by using an inspection robot equipped with a high-definition wide-angle camera.
[0015] The pose positioning information acquisition unit is used to acquire the three-axis acceleration, angular velocity and attitude information of the acquisition device by integrating an inertial measurement device, for SLAM synchronization and sensor registration; and to acquire the position coordinates in the underground enclosed space by deploying a UWB positioning device.
[0016] The scene semantic annotation information acquisition unit is used to acquire semantic visual object images of shop fronts, directional signs, and entrance / exit numbers by deploying image recognition cameras; and to acquire text content, route markings, and notices on subway station signs by installing OCR recognition cameras.
[0017] Preferably, the data processing module includes a first data processing unit, a second data processing unit, and a third data processing unit;
[0018] The first data processing unit is used to perform time-stamp alignment and spatial coordinate transformation on LiDAR point cloud, RGB-D depth map and IMU pose data using time synchronization and coordinate unification technology; to remove noise and redundancy from structural point cloud data using point cloud filtering and simplification algorithms; and to combine depth map, color image and point cloud using image depth fusion and voxel stitching algorithms to form a dense three-dimensional structural representation with consistent texture and geometry, and to establish the first dataset.
[0019] The second data processing unit is used to estimate the device trajectory and pose information by using IMU point cloud combined with SLAM technology; to construct a continuous and verifiable trajectory line by fusing UWB positioning data; to perform temporal and spatial registration between sensors by using a multi-sensor calibration algorithm; and to establish a second dataset.
[0020] The third data processing unit is used to extract semantic targets such as shop signs, directional signs, and entrance / exit signs using the YOLO-Nano object detection algorithm; to obtain scene semantic masks and label maps using DeepLabv3 semantic segmentation technology; to recognize Chinese or English route instructions, numbers, and notices in the image using Tesseract OCR technology; to perform spatial annotation and category encoding on all semantic information, generate a structured semantic graph and an interactive element index table, and establish a third dataset.
[0021] Preferably, the AI reconstruction fusion model building module is used to construct an initial 3D convolutional neural network model using a 3D convolutional neural network, and to train and test the initial 3D convolutional neural network model with data from the first dataset, the second dataset, and the third dataset. The trained initial 3D convolutional neural network model is then used as the AI reconstruction fusion model. Simultaneously, the intermediate layer output of the spatial structure fusion data and the scene semantic annotation data is used as a feature vector to identify structural features and semantic information. The acquired feature information is then used to train and test the AI reconstruction fusion model. The trained AI reconstruction fusion model is then used for 3D model construction and multimodal feature analysis to support structural integrity assessment, fusion consistency assessment, and VR interactive adaptive calculation and strategy output.
[0022] Preferably, the structural reconstruction monitoring module includes a first computing unit and a first analysis unit;
[0023] The first computing unit is used to monitor the geometric integrity and density of the 3D reconstructed model. Combined with the first dataset, after dimensionless processing, the structural integrity evaluation coefficient JGPG is calculated, as shown in the following formula:
[0024] ;
[0025] In the formula, The sum of the areas of the closed structural surfaces is represented by Δt, where Δt represents the total surface area of the model. Vt represents the number of reconstructed voxels, w1 and w2 represent the weighting coefficients.
[0026] Preferably, the first analysis unit is used to pre-set a first threshold Q1 and compare the structural integrity evaluation coefficient JGPG with the first threshold Q1 to obtain a first evaluation result, including:
[0027] When the structural integrity assessment coefficient JGPG ≥ the first threshold Q1, it indicates that the structural reconstruction is complete and continuous monitoring is required.
[0028] When the structural integrity assessment coefficient JGPG < the first threshold Q1, it indicates that the structural reconstruction is incomplete, triggering the first warning instruction and generating the first strategy: re-collecting data of the missing area and calling the hole detection algorithm and depth inference technology to complete the shape.
[0029] Preferably, the fusion monitoring module includes a second computing unit and a second analysis unit;
[0030] The second computing unit is used to monitor the spatial registration effect and semantic matching degree between point cloud structure and semantic image information. Combining the data from the first, second, and third datasets, after dimensionless processing, the fusion consistency evaluation coefficient RHPG is calculated and obtained, as shown in the following formula:
[0031] ;
[0032] In the formula, Oe represents the semantic edge overlap rate, which is the proportion of the overlap length between the semantic boundary line of the image and the geometric edge line of the point cloud; Ef represents the mean fusion error, which is the spatial offset distance after the point cloud and the image are aligned; and w3 and w4 represent the weight coefficients.
[0033] The second analysis unit is used to pre-set a second threshold Q2 and compare the fusion consistency evaluation coefficient RHPG with the second threshold Q2 to obtain the second evaluation result, including:
[0034] When the fusion consistency evaluation coefficient RHPG ≥ the second threshold Q2, it indicates that the fusion error rate is qualified, the registration is successful, and continuous monitoring is required.
[0035] When the fusion consistency evaluation coefficient RHPG < the second threshold Q2, it indicates that the fusion error rate is unqualified, the registration fails, the second warning instruction is triggered, and the second strategy is generated: automatically correct the image semantic annotation deviation, perform point cloud re-registration and trajectory reconstruction, and recalculate until the fusion consistency evaluation coefficient RHPG ≥ the second threshold Q2.
[0036] Preferably, the VR interactive monitoring module includes a third computing unit and a third analysis unit;
[0037] The third computing unit is used to analyze the adaptability of the 3D reconstructed model in the immersive virtual reality scene, and to monitor the visualization, interaction performance, and rendering smoothness of the 3D model in the virtual reality environment in real time. Combining the data from the first and third datasets, after dimensionless processing, the VR interaction adaptability evaluation coefficient VRJH is calculated and obtained, as shown in the following formula:
[0038] ;
[0039] ;
[0040] In the formula, Rr represents the visualization degree of the interactive viewpoint, Fs represents the frame rate stability coefficient, Rc represents the semantic interactive object recognition response completeness rate, and w5, w6, and w7 represent weight coefficients. Indicates standard deviation, Indicates the average frame rate;
[0041] The third analysis unit is used to pre-set a third threshold Q3, and compare the VR interaction adaptability evaluation coefficient VRJH with the third threshold Q3 to obtain the third evaluation result, including:
[0042] When the VR interaction adaptability evaluation coefficient VRJH ≥ the third threshold Q3, it means that the 3D model meets the VR interaction performance requirements, there are no potential interaction problems in VR operation, and continuous monitoring is required.
[0043] When the VR interaction adaptability evaluation coefficient VRJH < the third threshold Q3, it indicates that the 3D model does not meet the VR interaction performance requirements, and there are potential interaction problems in VR operation. This triggers the third warning instruction and generates the third strategy: automatically analyzes the occluded areas in the user's view path, adjusts the POV parameters, and makes the invisible areas transparent; applies local textures to interactive hotspot areas, increases the resolution by 20%, and automatically switches LevelofDetail based on model complexity and device performance, improving frame rate stability by 10% and loading efficiency by 10%; and improves the confidence of re-identification and optimizes interactive areas for semantic interaction objects in the current model that have delayed response or insufficient recognition, including buttons, door signs, and prompts, by 20%.
[0044] Preferably, the real-world scene output module is used to optimize the adaptability of the 3D reconstruction model and generate results based on the judgment results of the structural integrity evaluation coefficient JGPG, the fusion consistency evaluation coefficient RHPG, and the VR interaction adaptability evaluation coefficient VRJH, combined with the corresponding generated strategies, so as to complete the final adjustment and output encapsulation of the model.
[0045] A preferred method for 3D reconstruction of virtual reality scenes based on multi-source data fusion includes the following steps:
[0046] Step 1: In the underground commercial and trade integrated space scenario, complete the synchronous collection of multi-source heterogeneous data through various devices, including: point cloud and image information, device pose and positioning data, semantic visual elements and text information;
[0047] Step 2: Perform spatiotemporal alignment, denoising fusion, and dense modeling on the collected data to establish the first dataset; fuse IMU and UWB data for trajectory estimation and sensor registration to establish the second dataset; use target detection, semantic segmentation, and OCR recognition technologies to extract semantic elements and encode and label them to establish the third dataset.
[0048] Step 3: Based on a 3D convolutional neural network, train and extract features by combining spatial structure fusion data and semantic annotation data to build an AI reconstruction fusion model for 3D model generation and multimodal feature analysis, and support subsequent evaluation and strategy output for structural integrity, fusion consistency and VR interaction adaptability.
[0049] Step 4: By monitoring the geometric integrity and density of the 3D reconstructed model, calculate the structural integrity evaluation coefficient JGPG and compare it with the first threshold Q1 to determine whether the structural reconstruction is complete. If it is incomplete, a strategy is given.
[0050] Step 5: By monitoring the spatial registration effect and semantic matching degree between the point cloud structure and semantic image information, calculate and obtain the fusion consistency evaluation coefficient RHPG, and compare it with the second threshold Q2 to determine whether the fusion error rate is qualified. If it is not qualified, a strategy is given.
[0051] Step 6: By monitoring the visualization, interaction performance, and rendering smoothness of the 3D model in the virtual reality environment in real time, calculate and obtain the VR interaction adaptability evaluation coefficient VRJH, and compare it with the third threshold Q3 to determine whether the 3D model meets the VR interaction performance requirements. If it does not meet the requirements, a strategy will be given.
[0052] Step 7: Based on the assessment results of structural integrity, fusion consistency, and VR interaction adaptability, and the corresponding strategies, optimize the 3D reconstruction model for adaptability and generate processing instructions to achieve the final adjustment and output encapsulation of the model.
[0053] This invention provides a method and system for 3D reconstruction of virtual reality scenes based on multi-source data fusion. It has the following beneficial effects:
[0054] (1) The virtual reality scene three-dimensional reconstruction method and system based on multi-source data fusion can realize high-precision three-dimensional reconstruction of underground commercial and trade integrated space by integrating point cloud, image, IMU, UWB and semantic visual information. It can dynamically reflect structural changes and environmental updates, improve the timeliness and accuracy of spatial modeling, and overcome the technical bottlenecks of traditional two-dimensional drawings and static BIM models that are difficult to update and have poor expression.
[0055] (2) The virtual reality scene 3D reconstruction method and system based on multi-source data fusion has constructed a unified data fusion interface and format conversion module, which can perform standardized preprocessing of heterogeneous data such as point cloud, image, inertial navigation, positioning and semantic label, and adapt to a variety of 3D engines and VR platforms, realize high compatibility output for different terminal devices, and improve the system’s universality and deployment flexibility in different application scenarios.
[0056] (3) The virtual reality scene 3D reconstruction method and system based on multi-source data fusion adopts 3D convolutional neural network and multimodal feature fusion technology, which can deeply integrate structural geometry and semantic annotation information for modeling. Through the fusion consistency evaluation mechanism RHPG, it ensures the high matching degree between semantic boundaries and structural features in the 3D model, providing a clear semantic expression basis for subsequent navigation guidance, safety management and other applications.
[0057] (4) The virtual reality scene 3D reconstruction method and system based on multi-source data fusion dynamically analyzes the visibility, frame rate stability and semantic response effect of the model in VR through the VR Interaction Adaptability Evaluation Mechanism VRJH, automatically adjusts the interactive perspective and detail rendering strategy, effectively improves the immersive visualization experience, and meets the operation smoothness and interaction efficiency of different user roles in VR scene. Attached Figure Description
[0058] Figure 1 This is a flowchart illustrating the block diagram of the virtual reality scene 3D reconstruction system based on multi-source data fusion according to the present invention.
[0059] Figure 2 This is a schematic diagram illustrating the steps of the virtual reality scene 3D reconstruction method based on multi-source data fusion according to the present invention. Detailed Implementation
[0060] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0061] Example 1
[0062] Please see Figure 1 This invention provides a virtual reality scene 3D reconstruction system based on multi-source data fusion, including a data acquisition module, a data processing module, an AI reconstruction fusion model establishment module, a structural reconstruction monitoring module, a fusion monitoring module, a VR interaction monitoring module, and a real scene output module.
[0063] The data acquisition module is used to synchronously acquire multi-source heterogeneous data through various devices in an underground commercial and trade integrated space scenario, including: point cloud and image information, device pose and positioning data, semantic visual elements and text information;
[0064] The data processing module is used to perform spatiotemporal alignment, denoising fusion, and dense modeling on the collected data to establish the first dataset; to fuse IMU and UWB data for trajectory estimation and sensor registration to establish the second dataset; and to extract semantic elements and encode and label them using target detection, semantic segmentation, and OCR recognition technologies to establish the third dataset.
[0065] The AI reconstruction fusion model building module is used to train and extract features based on a three-dimensional convolutional neural network, combining spatial structure fusion data and semantic annotation data, to build an AI reconstruction fusion model for three-dimensional model generation and multimodal feature analysis, and to support the subsequent evaluation and strategy output of structural integrity, fusion consistency and VR interaction adaptability.
[0066] The structural reconstruction monitoring module is used to monitor the geometric integrity and density of the three-dimensional reconstruction model, calculate and obtain the structural integrity evaluation coefficient JGPG, and compare it with the first threshold Q1 to determine whether the structural reconstruction is complete. If it is incomplete, a strategy is given.
[0067] The fusion monitoring module is used to monitor the spatial registration effect and semantic matching degree between point cloud structure and semantic image information, calculate and obtain the fusion consistency evaluation coefficient RHPG, and compare and analyze it with the second threshold Q2 to determine whether the fusion error rate is qualified. If it is not qualified, a strategy is given.
[0068] The VR interaction monitoring module is used to monitor the visualization, interaction performance and rendering smoothness of the 3D model in the virtual reality environment in real time, calculate and obtain the VR interaction adaptability evaluation coefficient VRJH, and compare and analyze it with the third threshold Q3 to determine whether the 3D model meets the VR interaction performance requirements. If it does not meet the requirements, a strategy is given.
[0069] The real-world scene output module is used to perform adaptability optimization and process instruction generation on the 3D reconstruction model based on the evaluation results of structural integrity, fusion consistency and VR interaction adaptability and corresponding strategies, so as to realize the final adjustment and output encapsulation of the model.
[0070] In this embodiment, through modular multi-source data acquisition and processing, and an AI reconstruction fusion and monitoring mechanism, high-precision, real-time 3D reconstruction and VR adaptation of an integrated underground commercial space can be achieved. Specifically, the data acquisition module can simultaneously acquire multiple data sources, and the data processing module performs precise spatiotemporal alignment and fusion processing on the data, ensuring the high precision and high consistency of the reconstructed model. Simultaneously, the structural reconstruction monitoring, fusion monitoring, and VR interaction monitoring modules can evaluate the geometric integrity of the model, data fusion consistency, and interactive adaptability in real time, ensuring the visualization and interactive performance of the 3D reconstructed model in the virtual reality environment. Ultimately, this achieves the output of a virtual scene of an integrated underground commercial space with high precision, high adaptability, and high interactivity.
[0071] Example 2
[0072] This embodiment is an explanation based on Embodiment 1. Please refer to it. Figure 1Specifically, the data acquisition module includes a spatial structure image acquisition unit, a pose positioning information acquisition unit, and a scene semantic annotation information acquisition unit;
[0073] The spatial structure image acquisition unit is used to acquire point cloud data of the underground commercial space scene by installing a three-dimensional LiDAR device, including the structural outlines of walls, columns, corridors and escalators; to acquire depth maps and synchronous color image data by installing an RGB-D camera device, forming dense structure and texture alignment information; and to acquire wide-angle environmental images and key visual feature area images by using an inspection robot equipped with a high-definition wide-angle camera.
[0074] The pose positioning information acquisition unit is used to acquire the three-axis acceleration, angular velocity and attitude information of the acquisition device by integrating an inertial measurement device, for SLAM synchronization and sensor registration; and to acquire the position coordinates in the underground enclosed space by deploying a UWB positioning device.
[0075] The scene semantic annotation information acquisition unit is used to acquire semantic visual object images of shop fronts, directional signs, and entrance / exit numbers by deploying image recognition cameras; and to acquire text content, route markings, and notices on subway station signs by installing OCR recognition cameras.
[0076] In this embodiment, by integrating multiple advanced data acquisition devices, comprehensive and multi-dimensional data acquisition of the underground commercial space was achieved. The spatial structure image acquisition unit, through 3D LiDAR, RGB-D cameras, and inspection robots, precisely acquired point clouds, depth maps, and environmental images, ensuring the dense structure and texture alignment of the scene and providing rich geometric information for subsequent 3D reconstruction. The pose positioning information acquisition unit, through inertial measurement units and UWB positioning devices, accurately acquired device position and attitude information, effectively supporting SLAM synchronization and sensor registration, ensuring high data accuracy and consistency. The scene semantic annotation information acquisition unit, through image recognition and OCR technology, accurately extracted semantic information such as shop entrances, signs, and entrance / exit numbers, enriching the semantic layers of the scene and further enhancing the semantic understanding and realism of the 3D reconstruction model. This high-precision, multi-layered data acquisition and integration laid a solid foundation for subsequent 3D reconstruction and virtual reality adaptation.
[0077] Example 3
[0078] This embodiment is an explanation based on Embodiment 2. Please refer to it. Figure 1 Specifically, the data processing module includes a first data processing unit, a second data processing unit, and a third data processing unit;
[0079] The first data processing unit is used to perform time-stamp alignment and spatial coordinate transformation on LiDAR point cloud, RGB-D depth map and IMU pose data using time synchronization and coordinate unification technology; to remove noise and redundancy from structural point cloud data using point cloud filtering and simplification algorithms; and to combine depth map, color image and point cloud using image depth fusion and voxel stitching algorithms to form a dense three-dimensional structural representation with consistent texture and geometry, and to establish the first dataset.
[0080] The second data processing unit is used to estimate the device trajectory and pose information by using IMU point cloud combined with SLAM technology; to construct a continuous and verifiable trajectory line by fusing UWB positioning data; to perform temporal and spatial registration between sensors by using a multi-sensor calibration algorithm; and to establish a second dataset.
[0081] The third data processing unit is used to extract semantic targets such as shop signs, directional signs, and entrance / exit signs using the YOLO-Nano object detection algorithm; to obtain scene semantic masks and label maps using DeepLabv3 semantic segmentation technology; to recognize Chinese or English route instructions, numbers, and notices in the image using Tesseract OCR technology; to perform spatial annotation and category encoding on all semantic information, generate a structured semantic graph and an interactive element index table, and establish a third dataset.
[0082] In this embodiment, efficient integration and optimization of underground commercial space data are achieved through multi-layered data processing units. The first data processing unit, using time synchronization and coordinate unification technology, accurately interfaces with LiDAR point cloud, RGB-D depth map, and IMU pose data, removing noise and redundancy. Through image depth fusion and voxel stitching technology, a dense 3D structural representation with consistent texture and geometry is formed, providing high-quality foundational data for subsequent reconstruction. The second data processing unit, through the fusion of IMU point cloud, SLAM technology, and UWB positioning data, accurately estimates the device trajectory and pose information, ensuring the continuity and verifiability of the entire data acquisition process and providing reliable support for accurate 3D spatial modeling. The third data processing unit, using YOLO-Nano object detection, DeepLabv3 semantic segmentation, and Tesseract OCR text recognition technology, accurately extracts semantic targets, labels, and text information from the scene, performs spatial annotation and category encoding, and generates a structured semantic map and interactive element index table, providing complete and efficient semantic data support for subsequent 3D model and virtual reality adaptation.
[0083] Example 4
[0084] This embodiment is an explanation based on Embodiment 3. Please refer to it. Figure 1Specifically, the AI reconstruction and fusion model building module is used to construct an initial 3D convolutional neural network model using a 3D convolutional neural network. The initial 3D convolutional neural network model is trained and tested using data from the first, second, and third datasets. The trained initial 3D convolutional neural network model is then used as the AI reconstruction and fusion model. Simultaneously, the intermediate layer output of the spatial structure fusion data and scene semantic annotation data is used as a feature vector to identify structural features and semantic information. The acquired feature information is then used to train and test the AI reconstruction and fusion model. The trained AI reconstruction and fusion model is then used for 3D model construction and multimodal feature analysis to support structural integrity assessment, fusion consistency assessment, and VR interaction adaptive calculation and strategy output.
[0085] In this embodiment, an AI reconstruction fusion model is constructed using a 3D convolutional neural network (3D CNN), achieving efficient fusion of 3D structural and semantic features. By combining spatial structure fusion data with scene semantic annotation data, the model can extract rich structural features and semantic information from multimodal information. The feature vectors output from intermediate layers further optimize the training process, improving the accuracy and detail of 3D reconstruction. The trained AI reconstruction fusion model not only accurately constructs 3D models but also performs multimodal feature analysis, providing strong support for evaluating structural integrity, fusion consistency, and VR interactive adaptability. Based on the evaluation results, it outputs corresponding strategies, effectively improving the model's adaptability and interactive performance, ensuring high-quality reconstruction and virtual reality adaptation of the underground commercial space.
[0086] Example 5
[0087] This embodiment is an explanation based on Embodiment 4. Please refer to it. Figure 1 Specifically, the structure reconstruction monitoring module includes a first computing unit and a first analysis unit;
[0088] The first computing unit is used to monitor the geometric integrity and density of the 3D reconstructed model. Combined with the first dataset, after dimensionless processing, the structural integrity evaluation coefficient JGPG is calculated, as shown in the following formula:
[0089] ;
[0090] In the formula, The sum of the areas of the closed structural surfaces is represented by Δt, where Δt represents the total surface area of the model. Indicates the number of reconstructed voxels, Vt represents the theoretical voxel range, and w1 and w2 represent weighting coefficients. , ,and .
[0091] In this embodiment, the first computing unit in the structural reconstruction monitoring module can monitor the geometric integrity and density of the 3D reconstructed model in real time, and improve the model's quality control capability by accurately calculating the structural integrity evaluation coefficient JGPG. This calculation method, combined with the first dataset, analyzes the model's surface area and voxel information through dimensionless processing to assess whether the reconstructed model meets the geometric integrity requirements. By adjusting the weighting coefficients w1 and w2, the evaluation accuracy can be optimized according to specific needs, thereby enabling early identification of potential defects in the structural reconstruction process and providing an accurate basis for the generation of subsequent optimization strategies.
[0092] Example 6
[0093] This embodiment is an explanation based on Embodiment 5. Please refer to it. Figure 1 Specifically, the first analysis unit is used to pre-set a first threshold Q1, and compare the structural integrity evaluation coefficient JGPG with the first threshold Q1 to obtain a first evaluation result, including:
[0094] When the structural integrity assessment coefficient JGPG ≥ the first threshold Q1, it indicates that the structural reconstruction is complete and continuous monitoring is required.
[0095] When the structural integrity assessment coefficient JGPG < the first threshold Q1, it indicates that the structural reconstruction is incomplete, triggering the first warning instruction and generating the first strategy: re-collecting data of the missing area and calling the hole detection algorithm and depth inference technology to complete the shape.
[0096] In this embodiment, the structural integrity assessment mechanism set by the first analysis unit can automatically assess the structural integrity of the 3D reconstructed model. When the structural integrity assessment coefficient JGPG is lower than the preset first threshold Q1, the system can immediately trigger an early warning mechanism and automatically generate optimization strategies, such as re-collecting data of missing areas and combining hole detection algorithms with depth inference technology for shape completion, thereby effectively solving the structural defect problem in model reconstruction. This mechanism improves the accuracy and automation level of the reconstruction process, ensures the integrity of the 3D reconstructed model, and quickly corrects problems when they are detected, greatly improving the robustness and reconstruction efficiency of the system, as shown in the table below:
[0097]
[0098] Example 7
[0099] This embodiment is an explanation based on Embodiment 4. Please refer to it. Figure 1 Specifically, the fusion monitoring module includes a second computing unit and a second analysis unit;
[0100] The second computing unit is used to monitor the spatial registration effect and semantic matching degree between point cloud structure and semantic image information. Combining the data from the first, second, and third datasets, after dimensionless processing, the fusion consistency evaluation coefficient RHPG is calculated and obtained, as shown in the following formula:
[0101] ;
[0102] In the formula, Oe represents the semantic edge overlap rate, which is the proportion of the overlap length between the semantic boundary line of the image and the geometric edge line of the point cloud; Ef represents the mean fusion error, which is the spatial offset distance after the point cloud and the image are aligned; and w3 and w4 represent the weighting coefficients. , and ;
[0103] The second analysis unit is used to pre-set a second threshold Q2 and compare the fusion consistency evaluation coefficient RHPG with the second threshold Q2 to obtain the second evaluation result, including:
[0104] When the fusion consistency evaluation coefficient RHPG ≥ the second threshold Q2, it indicates that the fusion error rate is qualified, the registration is successful, and continuous monitoring is required.
[0105] When the fusion consistency evaluation coefficient RHPG < the second threshold Q2, it indicates that the fusion error rate is unqualified, the registration fails, the second warning instruction is triggered, and the second strategy is generated: automatically correct the image semantic annotation deviation, perform point cloud re-registration and trajectory reconstruction, and recalculate until the fusion consistency evaluation coefficient RHPG ≥ the second threshold Q2.
[0106] In this embodiment, the fusion consistency evaluation mechanism of the second computing unit and the second analysis unit can accurately monitor the spatial registration effect and semantic matching degree of point cloud and semantic image information. When the evaluation coefficient RHPG is lower than the preset second threshold Q2, the system automatically triggers an early warning and generates a strategy, taking measures such as automatically correcting image semantic annotation deviation, performing point cloud re-registration and trajectory reconstruction to ensure accurate fusion of image and point cloud data. This automated correction function improves the accuracy and consistency of model fusion, reduces the need for manual intervention, greatly enhances the stability and adaptability of the system, effectively avoids data deviation caused by registration failure, and ensures high-quality output of 3D reconstruction results, as shown in the table below:
[0107]
[0108] Example 8
[0109] This embodiment is an explanation based on Embodiment 7. Please refer to it. Figure 1 Specifically, the VR interactive monitoring module includes a third computing unit and a third analysis unit;
[0110] The third computing unit is used to analyze the adaptability of the 3D reconstructed model in the immersive virtual reality scene, and to monitor the visualization, interaction performance, and rendering smoothness of the 3D model in the virtual reality environment in real time. Combining the data from the first and third datasets, after dimensionless processing, the VR interaction adaptability evaluation coefficient VRJH is calculated and obtained, as shown in the following formula:
[0111] ;
[0112] ;
[0113] In the formula, Rr represents the visualization degree of the interactive viewpoint, Fs represents frame rate stability, Rc represents the semantic interaction object recognition response completeness rate, and w5, w6, and w7 represent weight coefficients. , , and , Indicates standard deviation, Indicates the average frame rate;
[0114] The third analysis unit is used to pre-set a third threshold Q3, and compare the VR interaction adaptability evaluation coefficient VRJH with the third threshold Q3 to obtain the third evaluation result, including:
[0115] When the VR interaction adaptability evaluation coefficient VRJH ≥ the third threshold Q3, it means that the 3D model meets the VR interaction performance requirements, there are no potential interaction problems in VR operation, and continuous monitoring is required.
[0116] When the VR interaction adaptability evaluation coefficient VRJH < the third threshold Q3, it indicates that the 3D model does not meet the VR interaction performance requirements, and there are potential interaction problems in VR operation. This triggers the third warning instruction and generates the third strategy: automatically analyzes the occluded areas in the user's view path, adjusts the POV parameters, and makes the invisible areas transparent; applies local textures to interactive hotspot areas, increases the resolution by 20%, and automatically switches LevelofDetail based on model complexity and device performance, improving frame rate stability by 10% and loading efficiency by 10%; and improves the confidence of re-identification and optimizes interactive areas for semantic interaction objects in the current model that have delayed response or insufficient recognition, including buttons, door signs, and prompts, by 20%.
[0117] In this embodiment, the VR interaction adaptability evaluation mechanism of the third computing unit and the third analysis unit can monitor and evaluate the adaptability of the 3D reconstruction model in the virtual reality environment in real time, especially its visualization, interaction performance, and rendering smoothness. When the evaluation coefficient VRJH is lower than the preset third threshold Q3, the system automatically analyzes the occlusion area in the user's view path, adjusts the viewpoint parameters and implements transparency processing, while optimizing the resolution and frame rate stability of the interaction hotspot area. This automated optimization strategy not only improves the interactive experience in the virtual reality environment, but also ensures the adaptability and performance of the system on different devices, significantly improves the interaction smoothness and user experience of the model in VR, and reduces potential problems caused by hardware limitations or insufficient interaction, as shown in the table below:
[0118]
[0119] Example 9
[0120] This embodiment is an explanation based on Embodiment 8. Please refer to it. Figure 1 Specifically, the real-world scene output module is used to optimize the adaptability of the 3D reconstruction model and generate results based on the judgment results of the structural integrity evaluation coefficient JGPG, the fusion consistency evaluation coefficient RHPG, and the VR interaction adaptability evaluation coefficient VRJH, combined with the corresponding generated strategies, and to complete the final adjustment and output encapsulation of the model.
[0121] In this embodiment, through the adaptability optimization mechanism of the real-world scene output module, the 3D reconstruction model can be intelligently adjusted and optimized based on the evaluation results of structural integrity, fusion consistency, and VR interaction adaptability, along with their corresponding strategies. This module ensures that the model meets predetermined requirements under different evaluation criteria, automatically adjusts defects or inconsistencies in the model, and optimizes its structure and interactive performance to achieve the best adaptation effect in the virtual reality environment. This process not only improves the overall quality and stability of the 3D model but also significantly enhances the user experience, ensuring the model's usability and smooth interaction in practical applications.
[0122] Example 10
[0123] For a method of 3D reconstruction of virtual reality scenes based on multi-source data fusion, please refer to [link / reference]. Figure 2 This includes the following steps:
[0124] Step 1: In the underground commercial and trade integrated space scenario, complete the synchronous collection of multi-source heterogeneous data through various devices, including: point cloud and image information, device pose and positioning data, semantic visual elements and text information;
[0125] Step 2: Perform spatiotemporal alignment, denoising fusion, and dense modeling on the collected data to establish the first dataset; fuse IMU and UWB data for trajectory estimation and sensor registration to establish the second dataset; use target detection, semantic segmentation, and OCR recognition technologies to extract semantic elements and encode and label them to establish the third dataset.
[0126] Step 3: Based on a 3D convolutional neural network, train and extract features by combining spatial structure fusion data and semantic annotation data to build an AI reconstruction fusion model for 3D model generation and multimodal feature analysis, and support subsequent evaluation and strategy output for structural integrity, fusion consistency and VR interaction adaptability.
[0127] Step 4: By monitoring the geometric integrity and density of the 3D reconstructed model, calculate the structural integrity evaluation coefficient JGPG and compare it with the first threshold Q1 to determine whether the structural reconstruction is complete. If it is incomplete, a strategy is given.
[0128] Step 5: By monitoring the spatial registration effect and semantic matching degree between the point cloud structure and semantic image information, calculate and obtain the fusion consistency evaluation coefficient RHPG, and compare it with the second threshold Q2 to determine whether the fusion error rate is qualified. If it is not qualified, a strategy is given.
[0129] Step 6: By monitoring the visualization, interaction performance, and rendering smoothness of the 3D model in the virtual reality environment in real time, calculate and obtain the VR interaction adaptability evaluation coefficient VRJH, and compare it with the third threshold Q3 to determine whether the 3D model meets the VR interaction performance requirements. If it does not meet the requirements, a strategy will be given.
[0130] Step 7: Based on the assessment results of structural integrity, fusion consistency, and VR interaction adaptability, and the corresponding strategies, optimize the 3D reconstruction model for adaptability and generate processing instructions to achieve the final adjustment and output encapsulation of the model.
[0131] In this embodiment, a precise seven-step process is employed, comprehensively utilizing multi-source data acquisition, AI reconstruction fusion model training, and multi-dimensional evaluation mechanisms to achieve efficient construction and optimization of a 3D reconstruction model in an integrated underground commercial space. By dynamically monitoring and adjusting the model's structural integrity, fusion consistency, and VR interactivity, this invention can automatically detect and correct potential problems at each stage of model construction, ensuring that the final output 3D model achieves optimal results in terms of accuracy, interactivity, and user experience. This process significantly improves the model's stability and adaptability, particularly in virtual reality applications, optimizes the model's interactive smoothness and compatibility, and greatly enhances its application value in real-world scenarios.
[0132] The threshold is set to facilitate comparison. The size of the threshold depends on the amount of sample data and the number of bases set by those skilled in the art for each set of sample data; as long as it does not affect the ratio between the parameter and the quantized value, it is acceptable.
[0133] The above formulas are all derived from software simulation using a large amount of data, and are selected to be close to the actual values. The coefficients in the formulas are set by those skilled in the art based on the actual situation. The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or changes made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.
Claims
1. A virtual reality scene 3D reconstruction system based on multi-source data fusion, characterized in that, It includes a data acquisition module, a data processing module, an AI reconstruction and fusion model establishment module, a structural reconstruction monitoring module, a fusion monitoring module, a VR interactive monitoring module, and a real-world scene output module; The data acquisition module is used to synchronously acquire multi-source heterogeneous data through various devices in an underground commercial and trade integrated space scenario, including: point cloud and image information, device pose and positioning data, semantic visual elements and text information; The data processing module is used to perform spatiotemporal alignment, denoising fusion, and dense modeling on the collected data to establish the first dataset; to fuse IMU and UWB data for trajectory estimation and sensor registration to establish the second dataset; and to extract semantic elements and encode and label them using target detection, semantic segmentation, and OCR recognition technologies to establish the third dataset. The AI reconstruction fusion model building module is used to train and extract features based on a three-dimensional convolutional neural network, combining spatial structure fusion data and semantic annotation data, to build an AI reconstruction fusion model for three-dimensional model generation and multimodal feature analysis, and to support the subsequent evaluation and strategy output of structural integrity, fusion consistency and VR interaction adaptability. The structural reconstruction monitoring module is used to monitor the geometric integrity and density of the three-dimensional reconstruction model, calculate and obtain the structural integrity evaluation coefficient JGPG, and compare it with the first threshold Q1 to determine whether the structural reconstruction is complete. If it is incomplete, a strategy is given. The fusion monitoring module is used to monitor the spatial registration effect and semantic matching degree between point cloud structure and semantic image information, calculate and obtain the fusion consistency evaluation coefficient RHPG, and compare and analyze it with the second threshold Q2 to determine whether the fusion error rate is qualified. If it is not qualified, a strategy is given. The VR interaction monitoring module is used to monitor the visualization, interaction performance and rendering smoothness of the 3D model in the virtual reality environment in real time, calculate and obtain the VR interaction adaptability evaluation coefficient VRJH, and compare and analyze it with the third threshold Q3 to determine whether the 3D model meets the VR interaction performance requirements. If it does not meet the requirements, a strategy is given. The real-world scene output module is used to perform adaptability optimization and process instruction generation on the 3D reconstruction model based on the evaluation results of structural integrity, fusion consistency and VR interaction adaptability and corresponding strategies, so as to realize the final adjustment and output encapsulation of the model.
2. The virtual reality scene 3D reconstruction system based on multi-source data fusion according to claim 1, characterized in that, The data acquisition module includes a spatial structure image acquisition unit, a pose positioning information acquisition unit, and a scene semantic annotation information acquisition unit; The spatial structure image acquisition unit is used to acquire point cloud data of the underground commercial space scene by installing a three-dimensional LiDAR device, including the structural outlines of walls, columns, corridors and escalators; to acquire depth maps and synchronous color image data by installing an RGB-D camera device, forming dense structure and texture alignment information; and to acquire wide-angle environmental images and key visual feature area images by using an inspection robot equipped with a high-definition wide-angle camera. The pose positioning information acquisition unit is used to acquire the three-axis acceleration, angular velocity and attitude information of the acquisition device by integrating an inertial measurement device, for SLAM synchronization and sensor registration; and to acquire the position coordinates in the underground enclosed space by deploying a UWB positioning device. The scene semantic annotation information acquisition unit is used to acquire semantic visual object images of shop fronts, directional signs, and entrance / exit numbers by deploying image recognition cameras; and to acquire text content, route markings, and notices on subway station signs by installing OCR recognition cameras.
3. The virtual reality scene 3D reconstruction system based on multi-source data fusion according to claim 2, characterized in that, The data processing module includes a first data processing unit, a second data processing unit, and a third data processing unit; The first data processing unit is used to perform time stamp alignment and spatial coordinate transformation on LiDAR point cloud, RGB-D depth map and IMU pose data using time synchronization and coordinate unification technology; Point cloud filtering and simplification algorithms are used to remove noise and redundancy from structural point cloud data; image depth fusion and voxel stitching algorithms are used to combine depth maps, color images and point clouds to form a dense three-dimensional structural representation with consistent texture and geometry, and the first dataset is established. The second data processing unit is used to estimate the device trajectory and pose information using IMU-point cloud joint SLAM technology; A continuous and verifiable trajectory line is constructed by fusing UWB positioning data; a multi-sensor calibration algorithm is used to perform temporal and spatial registration between sensors, and a second dataset is established. The third data processing unit is used to extract semantic targets such as shop signs, directional signs, and entrance / exit signs using the YOLO-Nano object detection algorithm; to obtain scene semantic masks and label maps using DeepLabv3 semantic segmentation technology; to recognize Chinese or English route instructions, numbers, and notices in the image using Tesseract OCR technology; to perform spatial annotation and category encoding on all semantic information, generate a structured semantic graph and an interactive element index table, and establish a third dataset.
4. The virtual reality scene 3D reconstruction system based on multi-source data fusion according to claim 3, characterized in that, The AI reconstruction and fusion model building module is used to construct an initial 3D convolutional neural network model using a 3D convolutional neural network. This initial model is then trained and tested using data from the first, second, and third datasets. The trained initial model is used as the AI reconstruction and fusion model. Simultaneously, the intermediate layer output of the spatial structure fusion data and scene semantic annotation data is used as a feature vector to identify structural features and semantic information. The acquired feature information is then used to train and test the AI reconstruction and fusion model. The trained AI reconstruction and fusion model is then used for 3D model construction and multimodal feature analysis, supporting structural integrity assessment, fusion consistency assessment, and VR interactive adaptive calculation and strategy output.
5. The virtual reality scene 3D reconstruction system based on multi-source data fusion according to claim 4, characterized in that, The structural reconstruction monitoring module includes a first computing unit and a first analysis unit; The first computing unit is used to monitor the geometric integrity and density of the 3D reconstructed model. Combined with the first dataset, after dimensionless processing, the structural integrity evaluation coefficient JGPG is calculated, as shown in the following formula: ; In the formula, This represents the total area of closed structural surfaces. This represents the total surface area of the model. Vt represents the number of reconstructed voxels, w1 and w2 represent the weighting coefficients.
6. The virtual reality scene 3D reconstruction system based on multi-source data fusion according to claim 5, characterized in that, The first analysis unit is used to pre-set a first threshold Q1, and compare the structural integrity evaluation coefficient JGPG with the first threshold Q1 to obtain a first evaluation result, including: When the structural integrity assessment coefficient JGPG ≥ the first threshold Q1, it indicates that the structural reconstruction is complete and continuous monitoring is required. When the structural integrity assessment coefficient JGPG < the first threshold Q1, it indicates that the structural reconstruction is incomplete, triggering the first warning instruction and generating the first strategy: re-collecting data of the missing area and calling the hole detection algorithm and depth inference technology to complete the shape.
7. The virtual reality scene 3D reconstruction system based on multi-source data fusion according to claim 4, characterized in that, The fusion monitoring module includes a second computing unit and a second analysis unit; The second computing unit is used to monitor the spatial registration effect and semantic matching degree between point cloud structure and semantic image information. Combining the data from the first, second, and third datasets, after dimensionless processing, the fusion consistency evaluation coefficient RHPG is calculated and obtained, as shown in the following formula: ; In the formula, The semantic edge overlap rate is the proportion of the overlap length between the semantic boundary line of the image and the geometric edge line of the point cloud. Ef represents the mean fusion error, which is the spatial offset distance after the point cloud and the image are aligned. w3 and w4 represent the weight coefficients. The second analysis unit is used to pre-set a second threshold Q2 and compare the fusion consistency evaluation coefficient RHPG with the second threshold Q2 to obtain the second evaluation result, including: When the fusion consistency evaluation coefficient RHPG ≥ the second threshold Q2, it indicates that the fusion error rate is qualified, the registration is successful, and continuous monitoring is required. When the fusion consistency evaluation coefficient RHPG < the second threshold Q2, it indicates that the fusion error rate is unqualified, the registration fails, the second warning instruction is triggered, and the second strategy is generated: automatically correct the image semantic annotation deviation, perform point cloud re-registration and trajectory reconstruction, and recalculate until the fusion consistency evaluation coefficient RHPG ≥ the second threshold Q2.
8. The virtual reality scene 3D reconstruction system based on multi-source data fusion according to claim 7, characterized in that, The VR interactive monitoring module includes a third computing unit and a third analysis unit; The third computing unit is used to analyze the adaptability of the 3D reconstructed model in the immersive virtual reality scene, and to monitor the visualization, interaction performance, and rendering smoothness of the 3D model in the virtual reality environment in real time. Combining the data from the first and third datasets, after dimensionless processing, the VR interaction adaptability evaluation coefficient VRJH is calculated and obtained, as shown in the following formula: ; ; In the formula, Indicates the visibility of the interactive viewpoint. Represents the frame rate stability factor. w5, w6, and w7 represent the semantic interaction object recognition response completeness rate, and w5, w6, and w7 represent weight coefficients. Indicates standard deviation, Indicates the average frame rate; The third analysis unit is used to pre-set a third threshold Q3, and compare the VR interaction adaptability evaluation coefficient VRJH with the third threshold Q3 to obtain the third evaluation result, including: When the VR interaction adaptability evaluation coefficient VRJH ≥ the third threshold Q3, it means that the 3D model meets the VR interaction performance requirements, there are no potential interaction problems in VR operation, and continuous monitoring is required. When the VR interaction adaptability evaluation coefficient VRJH < the third threshold Q3, it indicates that the 3D model does not meet the VR interaction performance requirements, and there are potential interaction problems in VR operation. This triggers the third warning instruction and generates the third strategy: automatically analyzes the occluded areas in the user's view path, adjusts the POV parameters, and makes the invisible areas transparent; applies local textures to interactive hotspot areas, increases the resolution by 20%, and automatically switches LevelofDetail based on model complexity and device performance, improving frame rate stability by 10% and loading efficiency by 10%; and improves the confidence of re-identification and optimizes interactive areas for semantic interaction objects in the current model that have delayed response or insufficient recognition, including buttons, door signs, and prompts, by 20%.
9. A method for 3D reconstruction of virtual reality scenes based on multi-source data fusion, comprising the 3D reconstruction system for virtual reality scenes based on multi-source data fusion as described in any one of claims 1 to 8, characterized in that, Includes the following steps: Step 1: In the underground commercial and trade integrated space scenario, complete the synchronous collection of multi-source heterogeneous data through various devices, including: point cloud and image information, device pose and positioning data, semantic visual elements and text information; Step 2: Perform spatiotemporal alignment, denoising fusion, and dense modeling on the collected data to establish the first dataset; fuse IMU and UWB data for trajectory estimation and sensor registration to establish the second dataset; use target detection, semantic segmentation, and OCR recognition technologies to extract semantic elements and encode and label them to establish the third dataset. Step 3: Based on a 3D convolutional neural network, train and extract features by combining spatial structure fusion data and semantic annotation data to build an AI reconstruction fusion model for 3D model generation and multimodal feature analysis, and support subsequent evaluation and strategy output for structural integrity, fusion consistency and VR interaction adaptability. Step 4: By monitoring the geometric integrity and density of the 3D reconstructed model, calculate the structural integrity evaluation coefficient JGPG and compare it with the first threshold Q1 to determine whether the structural reconstruction is complete. If it is incomplete, a strategy is given. Step 5: By monitoring the spatial registration effect and semantic matching degree between the point cloud structure and semantic image information, calculate and obtain the fusion consistency evaluation coefficient RHPG, and compare it with the second threshold Q2 to determine whether the fusion error rate is qualified. If it is not qualified, a strategy is given. Step 6: By monitoring the visualization, interaction performance, and rendering smoothness of the 3D model in the virtual reality environment in real time, calculate and obtain the VR interaction adaptability evaluation coefficient VRJH, and compare it with the third threshold Q3 to determine whether the 3D model meets the VR interaction performance requirements. If it does not meet the requirements, a strategy will be given. Step 7: Based on the assessment results of structural integrity, fusion consistency, and VR interaction adaptability, and the corresponding strategies, optimize the 3D reconstruction model for adaptability and generate processing instructions to achieve the final adjustment and output encapsulation of the model.