A panoramic image construction method and system for VR roaming, a computer device and a medium

By employing panoramic image sequence acquisition, chain calibration, and virtual environment rendering technologies, the problem of insufficient calibration accuracy in VR roaming scenes has been solved, achieving efficient and coherent panoramic image calibration and immersive VR roaming effects.

CN122199797APending Publication Date: 2026-06-12GUANGZHOU MASHI INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUANGZHOU MASHI INFORMATION TECH CO LTD
Filing Date
2026-03-10
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing rendering technologies suffer from low rendering efficiency in high-dynamic VR roaming scenarios due to insufficient calibration accuracy, making it difficult to ensure real-time performance while maintaining the realism of the roaming and the smoothness of the interaction.

Method used

By acquiring panoramic image sequences in a virtual 3D scene, the optimal rotation matrix is ​​calculated using feature point extraction and matching, Random Sample Consensus Algorithm (RANSAC), and point cloud alignment algorithm (Kabsch) for panoramic image calibration. Texture rendering is then performed using shader technology, achieving efficient and coherent calibration of panoramic images and seamless integration with the virtual environment.

🎯Benefits of technology

It enhances the visual continuity and immersion of panoramic VR applications, solves the problem of low rendering efficiency caused by insufficient calibration accuracy, and achieves smooth transitions and a strong sense of immersion.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122199797A_ABST
    Figure CN122199797A_ABST
Patent Text Reader

Abstract

The present application relates to the VR image processing technical field, especially to a kind of panorama image construction method, system, computer device and medium for VR roaming;The method comprises: obtaining panorama image sequence in virtual three-dimensional scene, there is overlapping field of view area between any two adjacent panoramas in panorama image sequence;According to image sorting, the first panorama image is selected as initial reference image from panorama image sequence, and panorama image calibration operation is sequentially performed on each group of adjacent panorama image pair in sequence;The calibrated panorama image sequence is mapped to the model surface of virtual three-dimensional scene, and texture rendering is carried out by shader technology, to obtain the mapped virtual environment.By such a way, the technical problem that the rendering efficiency is low due to insufficient calibration accuracy in the high dynamic VR roaming scene of existing rendering technology is solved, and the visual coherence, immersion and construction efficiency of panorama VR application are improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of VR image processing technology, and in particular to a method, system, computer device, and medium for constructing panoramic images for VR roaming. Background Technology

[0002] With the rapid development of Virtual Reality (VR) technology, panoramic roaming, as a core application of immersive experiences, has been widely adopted in fields such as digital twins, real estate display, and cultural tourism and education. Panoramic rendering engines such as Three.js (a 3D rendering engine) and skybox technologies provide users with the basic ability to view surrounding scenes by mapping two-dimensional panoramic images to three-dimensional space. In traditional one-way scene construction, these technologies, with their ease of implementation and high rendering efficiency, can meet the visual display needs in static environments. However, as users' demands for realism and interactivity in roaming increase, especially in VR roaming scenarios with dynamic multi-view switching, the limitations of existing rendering methods are becoming increasingly apparent.

[0003] In the field of panoramic image processing and feature matching, existing technologies have accumulated to a certain extent. Prior art document 1 (application publication number CN106780573B) discloses a method and system for optimizing the accuracy of panoramic image feature matching. It improves the matching results layer by layer by building an image pyramid and uses virtual ordinary images as a medium to optimize feature point localization, aiming to improve the matching accuracy of panoramic images in 3D reconstruction. However, this method mainly focuses on the matching optimization of static image pairs, with its core objective being to improve the accuracy of feature point coordinates, without fully considering the real-time rendering requirements between multiple sequences of panoramic images during dynamic roaming. Specifically, existing technologies have two prominent problems: firstly, rendering methods based on a single panoramic image lack a sense of depth and are difficult to simulate the spatial depth of real scenes, leading to visual distortion during user roaming; secondly, in scenarios involving switching between multiple panoramic images, while existing feature matching technologies can improve static alignment accuracy, they cannot efficiently support smooth transitions between consecutive viewpoints and rely on costly panoramic texture modeling, making it difficult to quickly achieve virtual-real fusion based on existing 3D models. These intertwined issues make it difficult for existing panoramic rendering technologies to simultaneously guarantee real-time performance while maintaining a sense of realism and smooth interaction, thus failing to meet the practical needs of highly immersive VR applications. Therefore, existing rendering technologies suffer from low rendering efficiency in highly dynamic VR roaming scenarios due to insufficient calibration accuracy. Summary of the Invention

[0004] To address the aforementioned shortcomings or drawbacks, this invention provides a panoramic image construction method, system, computer equipment, and medium for VR roaming, which can solve the technical problem of low rendering efficiency caused by insufficient calibration accuracy in existing rendering technologies in high-dynamic VR roaming scenarios.

[0005] This invention provides a method for constructing panoramic images for VR roaming, comprising: Acquire a panoramic image sequence from a virtual 3D scene, where any two adjacent panoramic images in the sequence have overlapping fields of view.

[0006] Based on the image order, the first panoramic image in the panoramic image sequence is selected as the initial reference image, and panoramic image calibration is performed on each pair of adjacent panoramic images in the sequence in turn.

[0007] The calibrated panoramic image sequence is mapped onto the surface of a virtual 3D scene model, and texture rendering is performed using shader technology to obtain the mapped virtual environment.

[0008] Specifically, panoramic image calibration is performed sequentially on adjacent panoramic image pairs in each group of the sequence, including: For each pair of adjacent panoramic images in each round, feature point extraction and matching operations are performed to obtain multiple sets of feature point pairs.

[0009] The two-dimensional image coordinates of each feature point pair are converted into three-dimensional spherical coordinates to construct the source spherical point set and the target spherical point set.

[0010] Based on the source spherical point set and the target spherical point set, the optimal rotation matrix between the current adjacent panoramic image pairs is calculated through the random sampling consistency algorithm and the point cloud alignment algorithm. The point cloud alignment algorithm is configured to calculate the rotation matrix only based on the horizontal component of the three-dimensional spherical coordinates, and the vertical component is set to a constant value.

[0011] Based on the application example of the optimal rotation matrix in the target spherical point set, horizontal calibration is performed on adjacent panoramic image pairs in each round, and the calibrated image is used as the reference for the next round of calibration.

[0012] According to a second aspect, the present invention provides a panoramic image construction system for VR roaming, comprising: The panoramic image sequence acquisition module is used to acquire panoramic image sequences in a virtual 3D scene. There is an overlapping field of view between any two adjacent panoramic images in the panoramic image sequence.

[0013] The panoramic image multi-round calibration module is used to select the first panoramic image from the panoramic image sequence as the initial reference image according to the image order, and then perform panoramic image construction operations on each group of adjacent panoramic image pairs in the sequence in turn.

[0014] The VR roaming scene rendering module is used to map the calibrated panoramic image sequence onto the model surface of the virtual 3D scene, and perform texture rendering through shader technology to obtain the mapped virtual environment.

[0015] The panoramic image multi-round calibration module is also used for: For each pair of adjacent panoramic images in each round, feature point extraction and matching operations are performed to obtain multiple sets of feature point pairs. The two-dimensional image coordinates in each feature point pair are converted to three-dimensional spherical coordinates, constructing a source spherical point set and a target spherical point set. Based on the source and target spherical point sets, the optimal rotation matrix between the current adjacent panoramic image pairs is calculated using a random sampling consensus algorithm and a point cloud alignment algorithm. The point cloud alignment algorithm is configured to calculate the rotation matrix only based on the horizontal component of the three-dimensional spherical coordinates, while setting the vertical component to a constant value. Based on an application example of the optimal rotation matrix in the target spherical point set, horizontal calibration is performed on each pair of adjacent panoramic images in each round, and the calibrated image is used as the reference for the next round of calibration.

[0016] According to a third aspect, the present invention provides a computer device comprising: At least one processor; and a memory communicatively connected to the at least one processor; The memory stores instructions that can be executed by the at least one processor, which, when executed by the at least one processor, enables the at least one processor to perform any of the panoramic image construction methods for VR roaming in the embodiments of the present invention.

[0017] According to another aspect of the present invention, a non-transitory computer-readable storage medium storing computer instructions is provided, wherein the computer instructions are used to cause a computer to execute any of the panoramic image construction methods for VR roaming in the embodiments of the present invention.

[0018] The present invention provides a panoramic image construction method for VR roaming, which is achieved through four core steps: panoramic image sequence acquisition, chain calibration, rotation matrix calculation, and virtual environment rendering. The process involves: acquiring panoramic image sequences from a virtual 3D scene to provide raw input data with overlapping views; selecting the first image as a reference based on image order and sequentially performing calibration operations on adjacent image pairs to establish an ordered global orientation unification process; calculating the optimal rotation matrix based on the 3D spherical coordinates transformed from feature point pairs using the Random Sample Consensus (RANSAC) algorithm (a core algorithm for improving the robustness of automatic panoramic image calibration algorithms) and the Kabsch point cloud alignment algorithm to achieve accurate rotation parameter estimation. The point cloud alignment algorithm is configured to calculate rotation based only on the horizontal component and set the vertical component as a constant value. The fundamental reason for this is that during panoramic image acquisition, the device is kept horizontally positioned by a bracket, so that the main directional differences between images only occur in the horizontal plane, while the rotation component in the vertical direction is redundant and unstable noise, thus focusing on horizontal calibration; mapping the calibrated sequence onto the surface of a 3D model and performing texture rendering using shader technology to generate a virtual environment suitable for immersive roaming.

[0019] In this technical solution, the present invention addresses the problem described in the background art where feature matching is susceptible to noise interference due to significant displacement. It uses the RANSAC algorithm to iteratively sample and filter interior points from matching point pairs, effectively suppressing the impact of mismatched points on rotation estimation. This improves the robustness of the calibration process to scene changes and solves the defect of existing methods where calibration accuracy decreases due to feature matching errors under complex displacements. Regarding the problem that traditional 3D rotation estimation models may introduce unnecessary rotation components in practical applications due to over-parameterization, the present invention uses a point cloud alignment algorithm (Kabsch) and constrains it to calculate the rotation matrix only based on the horizontal component, setting the vertical component as a constant value. This achieves accurate and efficient estimation of the main horizontal rotation between panoramic images, avoiding image distortion caused by estimating redundant rotation components. For multi-image... To address the issue of low efficiency in achieving global orientation consistency across image sequences, a chain-like calibration operation is employed. This operation uses the first image as a baseline, calibrates sequentially, and then uses the calibrated images as the baseline for the next round. This achieves efficient and coherent orientation alignment for image sequences of arbitrary length, overcoming the drawbacks of traditional methods that require tedious alignment of each pair of images individually or cannot guarantee overall sequence consistency. Furthermore, to address the difficulty of integrating calibrated panoramic images with the virtual environment, the calibrated image sequence is mapped as a texture resource onto the surface of the virtual 3D scene model and rendered using shader technology. This achieves seamless integration of the calibration results with the virtual map platform's world model, resolving the technical problem of low rendering efficiency due to insufficient calibration accuracy in high-dynamic VR roaming scenarios. This improves the visual coherence, immersion, and construction efficiency of panoramic VR applications. Attached Figure Description

[0020] Figure 1 This is a flowchart of a panoramic image construction method for VR roaming according to an embodiment of the present invention; Figure 2 This is a schematic diagram illustrating the geometric principle of using a spherical coordinate system to model the three-dimensional spatial position of feature points in a panoramic image in one embodiment of the present invention. Figure 3 This is a schematic diagram of the structure of a panoramic image construction system for VR roaming according to an embodiment of the present invention; Figure 4 This is a block diagram of a computer device for implementing embodiments of the present invention. Detailed Implementation

[0021] The following description, in conjunction with the accompanying drawings, illustrates exemplary embodiments of the present invention, including various details to aid understanding. These details should be considered merely exemplary. Therefore, those skilled in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope of the invention. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description.

[0022] During the development of this invention, the inventors, through extensive experiments and data analysis, revealed the intrinsic relationship between displacement differences between adjacent panoramic images and orientation calibration accuracy: when displacement differences increase (e.g., 3-10 meters, depending on the specific acquisition scenario of the panoramic images), traditional feature matching methods are easily affected by noise interference from nearby feature points, leading to the accumulation of calibration errors, while distant feature points, although highly stable, are difficult to match. Based on this relationship, the inventors innovatively proposed this technical solution, utilizing the overlapping field-of-view characteristics of panoramic image sequences, through feature point extraction and matching, the transformation from two-dimensional image coordinates to three-dimensional spherical coordinates, combined with the Random Sample Consensus Algorithm (RANSAC) and the point cloud alignment algorithm (Kabsch), thereby achieving efficient and automatic horizontal orientation calibration for multiple sequence panoramic images. This embodies the core concept of ensuring global orientation consistency through chained calibration operations and avoiding redundant distortion by constraining the rotation matrix calculation to be based solely on the horizontal component.

[0023] Specifically, through comparative experiments, the invention team discovered that traditional calibration methods based on single panoramic image rendering or simple 2D matching suffer from insufficient calibration accuracy and reliance on manual intervention. These methods cannot handle the coherent alignment of multiple image sequences and are prone to visual jumps due to feature matching errors in dynamic roaming scenarios. These technical shortcomings result in a stiff VR roaming experience, poor immersion, and high scene construction costs and low efficiency. However, the chain-based panoramic image construction method proposed in this invention improves the robustness of the calibration process and rendering efficiency, achieving a smooth transition and a highly realistic panoramic VR roaming effect.

[0024] Therefore, according to the first aspect, the present invention provides a panoramic image construction method and rendering process for VR roaming, which can be applied to an image processing system (hereinafter referred to as "the system"). The system can be deployed locally or run on a server, embedded device, or mobile computing platform via cloud services to complete automatic orientation calibration of continuous panoramic images with displacement differences.

[0025] Specifically, this system can be deployed in various hardware environments, including but not limited to: cloud servers, edge computing devices, mobile terminals (such as smartphones and panoramic camera built-in processors), and dedicated image processing workstations. This flexible deployment architecture allows the system to meet the needs of large-scale batch processing in the cloud while also adapting to real-time calibration on mobile devices or resource-constrained embedded scenarios. In terms of its operating mechanism, the system implements a complete workflow of panoramic image acquisition, feature analysis, geometric calculation, and image transformation through modular design: first, the panoramic image input is received through the image acquisition module; then, the correspondence between images is established by the feature matching module; next, the pure horizontal rotation parameters are calculated by the coordinate transformation and rotation estimation module; and the robustness is improved through the optimization module. Finally, the calibration output is completed by the image processing module.

[0026] like Figure 1 As shown, the method may include: Step S110: Obtain a panoramic image sequence in the virtual 3D scene.

[0027] In a panoramic image sequence, any two adjacent panoramic images have overlapping fields of view. A panoramic image sequence refers to a collection of multiple panoramic images arranged in a predetermined order. Overlapping fields of view refer to the portion of the scene shared by any two adjacent panoramic images from the same shooting angle, ensuring the feasibility of feature matching. A virtual 3D scene refers to a pre-constructed digital 3D model environment used to map panoramic images.

[0028] Specifically, the system can collect data along a predetermined path in a virtual 3D scene by deploying panoramic shooting equipment (such as Insta360 series cameras). The equipment is installed at a uniform height of 1.5 meters to maintain a horizontal viewing angle, and the spacing between the acquisition points is configured to be 3 to 6 meters, ensuring that the images between adjacent acquisition points meet the condition of mutual visibility.

[0029] For example, in an indoor office area scenario, the system sets up a collection point every 5 meters along a straight path, uses an Insta360 camera to capture horizontal panoramic images, and generates a panoramic sequence containing 10 images, with the overlapping area of ​​adjacent images accounting for approximately 30%. During the acquisition process, the system can verify the overlapping area ratio of adjacent images (e.g., ≥30%) through real-time preview or image analysis tools. If the standard is not met, the position of the collection point is adjusted to ensure the feasibility of feature matching.

[0030] Step S120: Select the first panoramic image from the panoramic image sequence as the initial reference image according to the image sorting, and perform panoramic image calibration operation on each pair of adjacent panoramic images in the sequence in turn.

[0031] The initial reference image refers to the first panoramic image in the sequence, which serves as the reference for subsequent calibration; the image order refers to the sequence order based on the acquisition time or file naming order; and the adjacent panoramic image pair refers to the pairing of the Nth and N+1th images in the sequence (N is an integer greater than or equal to 1).

[0032] Specifically, the system can use a chained calibration process, starting with the first image and processing adjacent images in pairs, using the calibrated image as the reference for the next pair. For example, when processing a sequence of 10 images, the system first uses image 1 as the reference and forms the first pair with image 2 for calibration; after calibration, image 2 becomes the new reference and forms the second pair with image 3, and so on until image 10.

[0033] Further, step S120 involves sequentially performing panoramic image calibration operations on adjacent panoramic image pairs in each group of the sequence, including: For each pair of adjacent panoramic images in each round, feature point extraction and matching operations are performed to obtain multiple sets of feature point pairs.

[0034] Feature point extraction and matching refers to the process of detecting significant key points in an image and establishing corresponding relationships using computer vision algorithms; a feature point pair refers to a set of matching feature points in two images.

[0035] Specifically, the system extracts feature points and 128-dimensional feature descriptor vectors from each panoramic image using the Scale-Invariant Feature Transform (SIFT) algorithm. Then, based on the Euclidean distance between the feature descriptors, it uses the K-Nearest Neighbors (KNN) matching algorithm to generate multiple sets of feature point pairs. For example, when processing a pair of adjacent panoramic images, the SIFT algorithm extracts approximately 300 feature points, and KNN matching generates 300 sets of valid feature point pairs, with the matching distance ratio threshold set to 0.7.

[0036] The two-dimensional image coordinates of each feature point pair are converted into three-dimensional spherical coordinates to construct the source spherical point set and the target spherical point set.

[0037] Among them, two-dimensional image coordinates refer to the pixel positions of feature points in the panoramic image, expressed in normalized coordinate form; three-dimensional spherical coordinates refer to the three-dimensional spatial coordinates after mapping the two-dimensional coordinates onto the surface of a sphere with a fixed radius; the source spherical point set refers to the set of three-dimensional coordinates of feature points in the reference image; and the target spherical point set refers to the set of three-dimensional coordinates of feature points in the image to be calibrated.

[0038] Specifically, the system can achieve this by normalizing the coordinates of each feature point. Mapping to radius Calculate the three-dimensional coordinates of a sphere with a diameter of 1 meter. The formula is: ; The feature point set of the reference image is then defined as the source spherical point set, and the feature point set of the image to be calibrated is defined as the target spherical point set.

[0039] For example, for a set of feature point pairs, the system will use two-dimensional coordinates. Convert to 3D coordinates They then constructed a source point set and a target point set containing 150 points each.

[0040] Based on the source spherical point set and the target spherical point set, the optimal rotation matrix between the current adjacent panoramic image pairs is calculated through the random sampling consistency algorithm and the point cloud alignment algorithm. The point cloud alignment algorithm is configured to calculate the rotation matrix only based on the horizontal component of the three-dimensional spherical coordinates, and the vertical component is set to a constant value.

[0041] Among them, the Random Sample Consensus Algorithm (RANSAC) is a robust parameter estimation algorithm used to iteratively select the optimal subset from noisy data; the Kabsch Algorithm is a point cloud registration method based on Singular Value Decomposition (SVD); the optimal rotation matrix is ​​the three-dimensional transformation matrix that minimizes the alignment error between the source point set and the target point set; the horizontal component refers to the X and Y axis components of the three-dimensional spherical coordinates, and the vertical component refers to the Z axis component. In this method, since the device is kept horizontal by a bracket during panoramic image acquisition, the main directional changes occur in the horizontal plane. Therefore, before calculating the rotation matrix, the algorithm first performs data dimensionality reduction: it calculates only based on the horizontal component (X and Y axis coordinates), while setting the vertical component (Z axis coordinate) to a preset constant value (such as 0). This processing not only simplifies the calculation but also avoids estimating redundant vertical rotation components, thus accurately focusing on horizontal alignment.

[0042] Specifically, the system can randomly sample feature point pairs 100 times using the RANSAC algorithm, selecting 3 pairs of points each time. For each sample subset, the Kabsch algorithm is used to calculate candidate rotation matrices, including the steps of calculating the covariance matrix and performing SVD decomposition to solve for the rotation matrix. Finally, the candidate matrix that minimizes the sum of errors for all feature point pairs is selected as the optimal rotation matrix. For example, after 100 RANSAC iterations, the system obtains the optimal rotation matrix with a root mean square error (RMSE) of 0.02 radians.

[0043] Furthermore, the Kabsch algorithm solves for the optimal rotation matrix using singular value decomposition (SVD) by minimizing the root mean square error between point sets. In this embodiment, since the panoramic acquisition device remains horizontal and the vertical component is considered noise, the z-axis coordinate is fixed to a constant value (e.g., 0), optimizing only the horizontal plane rotation, which conforms to actual physical constraints. Additionally, a computational example can be added: "For example, for 3 pairs of points, the SVD decomposition of the covariance matrix produces a rotation matrix with a determinant of 1, ensuring the transformation is a pure rotation."

[0044] Based on the application example of the optimal rotation matrix in the target spherical point set, horizontal calibration is performed on adjacent panoramic image pairs in each round, and the calibrated image is used as the reference for the next round of calibration.

[0045] Horizontal calibration refers to the process of adjusting the orientation of the image to be calibrated using a rotation matrix to align it with the reference image on a horizontal plane; the application example refers to the specific operation of performing matrix multiplication between the rotation matrix and the coordinates of the target point set.

[0046] Specifically, the system can generate a transformed point set by multiplying the optimal rotation matrix by the coordinates of each point in the target spherical point set, thereby achieving image orientation calibration. For example, after applying the rotation matrix, the average distance error between the feature point set of the image to be calibrated and the interior point set of the reference image is reduced from 0.5 meters to 0.05 meters, and the calibrated image serves as the reference for the next pair.

[0047] Step S130: Map the calibrated panoramic image sequence onto the model surface of the virtual 3D scene, and perform texture rendering using shader technology to obtain the mapped virtual environment.

[0048] Texture rendering refers to the process of applying a calibrated panoramic image as a texture resource to the surface of a 3D model; shader technology refers to a dedicated program executed on a graphics processing unit (GPU) to control rendering effects; and virtual environment refers to an immersive 3D scene that can be used for VR roaming.

[0049] Specifically, the system can use the ShaderMaterial interface of a graphics rendering engine (such as Three.js) to take a calibrated panoramic image sequence as texture input and achieve real-time rendering on the GPU through vertex shaders and fragment shaders: the vertex shader processes the world coordinates of the model's vertices and the panoramic image coordinates, and the fragment shader calculates UV texture coordinates based on the viewing direction and samples colors from the panoramic image. Here, ShaderMaterial refers to a procedural material type in computer graphics rendering engines (such as Three.js) that allows developers to define the visual appearance of a 3D model's surface by directly writing shader code.

[0050] For example, the system uses the Three.js engine to map 10 calibrated panoramic images (2560 pixels × 1280 pixels) onto the surface of a 3D building model containing 50,000 vertices. During VR roaming, to achieve smooth visual transitions between different panoramic images as the viewpoint moves, the system employs a distance-weighted color blending method. Specifically, when the viewpoint is located between two adjacent panoramic capture points in the scene, the fragment shader not only calculates the current gaze direction vector (such as...) ) and convert to UV coordinates (e.g. Furthermore, it dynamically calculates the color contribution weight of each panoramic image to the current pixel based on the spatial distance between the viewpoint and each acquisition point. Ultimately, the pixel's color value is not directly obtained from sampling a single panoramic image, but rather from sampling results of multiple related panoramic images (such as...). With another picture The result is obtained by weighted mixing calculation, thus achieving a seamless VR transition effect.

[0051] In other embodiments, such as Figure 2 This diagram illustrates the geometric principles of using a spherical coordinate system to model the three-dimensional spatial position of feature points in a panoramic image. Figure 2 Point M in the image represents the spatial position of a feature point in the panoramic image after it has been projected onto a 3D sphere. This position is uniquely determined by three parameters in the spherical coordinate system: radial distance... (i.e., the length of line segment OM in the diagram), polar angle (i.e., the angle between line segment OM and the positive Z-axis) and azimuth angle (That is, the angle between the projection point m of point M on the XZ plane and the positive direction of the Z-axis). In the method of the present invention, the two-dimensional image coordinates of the feature point To three-dimensional spherical coordinates The transformation is based on Figure 2 The geometric relationships are shown. Specifically, the parameters in the two-dimensional coordinates. and Mapped to polar angles respectively and azimuth radial distance This is set to a fixed value (e.g., 1 meter), thus mapping points on the image plane onto a reference sphere of fixed radius. and To normalize to Image coordinate parameters of the interval Indicates the horizontal direction (longitude mapping). Representing the vertical direction (latitude mapping), using a fixed radius Meters project two-dimensional points onto a sphere to ensure geometric consistency.

[0052] For example, when performing a coordinate transformation operation, the system can normalize the two-dimensional coordinates of a feature point. in accordance with Figure 2 The model shown is transformed. Among them, the parameters... Corresponding to azimuth angle radian; parameter Corresponding to polar angle radian; fixed radius The default value is 1 meter (m). This is based on the conversion formula between spherical and rectangular coordinates. The three-dimensional spherical coordinates of this feature point can be calculated to be approximately This transformation process provides an accurate geometric data foundation for subsequent calculations of the rotation matrix based on a 3D point set.

[0053] Therefore, according to the above implementation method, the system achieves its functionality through four core steps: panoramic image sequence acquisition, chain calibration, rotation matrix calculation, and virtual environment rendering. Specifically, acquiring panoramic image sequences from the virtual 3D scene provides raw input data with overlapping views; selecting the first image as a reference based on image order and sequentially performing calibration operations on adjacent image pairs establishes an ordered global orientation unification process; based on the 3D spherical coordinates transformed from feature point pairs, the optimal rotation matrix is ​​calculated using the Random Sample Consensus Algorithm (RANSAC) and the Kabsch point cloud alignment algorithm to achieve accurate rotation parameter estimation. The point cloud alignment algorithm is configured to calculate rotation only based on the horizontal component and set the vertical component as a constant value. This is because during panoramic image acquisition, the device is kept horizontally positioned by a bracket, ensuring that the main directional differences between images occur only in the horizontal plane, while the rotation component in the vertical direction is redundant and unstable noise, thus focusing on horizontal calibration; the calibrated sequence is mapped onto the surface of a 3D model and textured using shader technology to generate a virtual environment suitable for immersive roaming.

[0054] Specifically, in this implementation, addressing the issue of feature matching being susceptible to noise interference due to significant displacement, as mentioned in the background technology, the RANSAC algorithm iteratively samples and filters interior points from matching point pairs, effectively suppressing the impact of mismatched points on rotation estimation. This improves the robustness of the calibration process to scene changes and solves the defect of existing methods where calibration accuracy decreases due to feature matching errors under complex displacements. Regarding the problem that traditional 3D rotation estimation models may introduce unnecessary rotation components in practical applications due to over-parameterization, a point cloud alignment algorithm (Kabsch) is used, constraining it to calculate the rotation matrix only based on the horizontal component, while setting the vertical component as a constant value. This achieves accurate and efficient estimation of the main horizontal rotations between panoramic images, avoiding image distortion caused by estimating redundant rotation components. To address the issue of low efficiency in achieving global orientation consistency across multiple image sequences, a chain-like calibration operation is employed. This operation uses the first image as a baseline, calibrates sequentially, and then uses the calibrated images as the baseline for the next round. This achieves efficient and coherent orientation alignment for image sequences of arbitrary length, overcoming the drawbacks of traditional methods that require tedious alignment of each pair of images individually or cannot guarantee overall sequence consistency. Furthermore, to address the difficulty of integrating calibrated panoramic images with the virtual environment, the calibrated image sequence is mapped as a texture resource onto the surface of the virtual 3D scene model and rendered using shader technology. This achieves seamless integration of the calibration results with the virtual map platform's world model, resolving the technical problem of low rendering efficiency due to insufficient calibration accuracy in high-dynamic VR roaming scenarios. This improves the visual coherence, immersion, and construction efficiency of panoramic VR applications.

[0055] In some embodiments, the feature point extraction and matching operation includes: The feature points and corresponding feature descriptor vectors of adjacent panoramic image pairs are extracted using the scale-invariant feature transform algorithm.

[0056] Among them, the Scale Invariant Feature Transform (SIFT) algorithm is a computer vision algorithm used to extract feature points from images that are robust to scaling, rotation and illumination changes; the feature descriptor vector is a mathematical vector used to quantify the image region around the feature point, usually a high-dimensional real number array.

[0057] Specifically, the system can perform multi-scale spatial analysis on each panoramic image by calling the SIFT algorithm in the Open Source Computer Vision Library (OpenCV): First, an image pyramid is constructed, stable feature points (such as corner points and edge points) are detected in different scale spaces, and a 128-dimensional feature descriptor vector is calculated for each feature point. This vector is generated based on the gradient orientation histogram of the feature point's neighborhood.

[0058] For example, when the system processes a pair of adjacent panoramic images with a resolution of 2560 pixels by 1280 pixels, the scale-invariant feature transform algorithm extracts about 300 feature points from each image. Each feature point generates a 128-dimensional feature descriptor vector, with vector elements being floating-point numbers ranging from 0 to 255.

[0059] Based on the similarity between feature descriptor vectors, the K-nearest neighbor matching algorithm is used to match feature points and generate multiple sets of feature point pairs.

[0060] Among them, the K-Nearest Neighbors (KNN) matching algorithm is a matching method that finds the most similar features based on distance metrics; a feature point pair refers to a pair of feature points in two images that establish a correspondence through descriptor similarity.

[0061] Specifically, the system assesses similarity by calculating the Euclidean distance (a mathematical distance used to measure the difference between vectors) between the feature descriptor vectors of the reference image and the image to be calibrated. For each feature point in the reference image, it finds the K (usually K=2) feature points in the image to be calibrated with the smallest Euclidean distance as candidate matches. Then, it filters out mismatches based on a set distance ratio threshold (i.e., the ratio of the nearest neighbor distance to the second nearest neighbor distance): if the ratio is less than the threshold, the nearest neighbor match is retained; otherwise, it is considered a mismatch and discarded. For example, with K=2 and a distance ratio threshold of 0.7, for 300 feature points in the reference image, 1000 candidate match pairs are initially generated. After filtering by the distance ratio threshold, mismatches are eliminated, and finally, 300 high-confidence feature point pairs are retained, with a matching error rate of less than 5%.

[0062] The scale-invariant feature transform algorithm is configured to perform multi-scale spatial analysis on each panoramic image, extract target feature points from each panoramic image and generate corresponding feature descriptor vectors; the K-nearest neighbor matching algorithm is configured to calculate the Euclidean distance between each feature descriptor vector, find K maximum likelihood matching points for each feature point, and filter out mismatches according to a set distance ratio threshold to generate high-confidence feature point pairs.

[0063] Specifically, the scale-invariant feature transform algorithm ensures the invariance of feature points to image scaling through Difference of Gaussian (DoG) scale-space extremum detection and ensures rotation invariance through principal direction assignment. The K-nearest neighbor matching algorithm achieves efficient distance calculation through brute-force matching or FastLibrary for Approximate Nearest Neighbors (FLANN) and controls the matching strictness through a distance ratio threshold. For example, in an indoor office scene, the scale-invariant feature transform algorithm successfully extracts significant feature points such as door and window corners; the K-nearest neighbor matching algorithm improves the matching accuracy to over 95% with a distance ratio threshold of 0.7, significantly outperforming fixed threshold methods (such as 70%).

[0064] This is because the SIFT algorithm is preferred for panoramic image feature extraction due to its invariance to image scaling, rotation, and illumination changes; the distance ratio threshold of 0.7 is based on experimental data and balances matching accuracy and recall, reducing false matches. An example could be added to illustrate the impact of the threshold: "For example, a threshold that is too low may result in too few matching points, affecting calibration; a threshold that is too high may introduce noise."

[0065] Furthermore, Brute-Force Matcher is a classic algorithm used for feature point matching in computer vision. Its core idea is to find the closest (i.e. most similar) feature point descriptor in the image to be matched by exhaustive search for the descriptor of each feature point in the reference image.

[0066] Specifically, the algorithm's workflow is as follows: First, it calculates the distance (usually Euclidean or Hamming distance) between the descriptor of a feature point in the reference image and the descriptors of all feature points in the image to be matched. Then, it selects the feature point with the smallest distance as the best matching candidate. This process is repeated for each feature point in the reference image, generating an initial set of matching pairs. Finally, post-processing steps such as distance ratio checks are typically combined to filter out unreliable matches, improving the robustness of the matching. For example, when implementing panoramic image feature matching, if the reference image has 1000 feature points and the image to be matched has 800 feature points, the brute-force matcher will... Secondary descriptor distance calculation. This is a computationally expensive method, but its advantage is that it can ensure finding the best match within the current dataset. It is often used in scenarios where the number of feature points is small or where extremely high matching accuracy is required.

[0067] Therefore, according to the above implementation method, the system can automatically and robustly establish accurate feature point correspondences between adjacent panoramic images, providing high-quality input data for subsequent three-dimensional spherical coordinate transformation and rotation matrix calculation, thereby supporting efficient calibration of panoramic image sequences.

[0068] In some embodiments, the two-dimensional image coordinates in each feature point pair are converted into three-dimensional spherical coordinates to construct a source spherical point set and a target spherical point set, including: The two-dimensional image coordinates of each feature point in the feature point pair are mapped onto the surface of a sphere with a fixed radius to obtain the corresponding three-dimensional spherical coordinates.

[0069] Among them, the fixed radius spherical surface refers to a spherical model with a preset radius value, which is used to project two-dimensional coordinates into three-dimensional space; two-dimensional image coordinates refer to the pixel position of feature points in the panoramic image, represented in normalized coordinate form; three-dimensional spherical coordinates refer to the three-dimensional spatial position of feature points on the spherical surface.

[0070] Specifically, the system can use the two-dimensional image coordinates of each feature point. Convert to 3D spherical coordinates ,in and Is it normalized to The parameter of the interval is converted by the following formula: ; in This represents a fixed radius value, with the unit being meters (m).

[0071] For example, the system is set to a fixed radius. m represents the two-dimensional image coordinates of a feature point. The three-dimensional spherical coordinates are calculated. ).

[0072] The set of three-dimensional spherical coordinates of feature points in the panoramic image that serves as the reference image in each group of adjacent panoramic images is defined as the source spherical point set.

[0073] The reference image refers to the panoramic image used as a reference during the calibration process; the source spherical point set refers to the set of three-dimensional spherical coordinates corresponding to all feature points in the reference image.

[0074] Specifically, the system can iterate through the successfully matched feature points in the reference image and aggregate their 3D spherical coordinates into a point set data structure, such as an array or list. For example, for a set of adjacent panoramic images, if the reference image contains 300 matching feature points, the system can store the 3D spherical coordinates of these points as a point set containing 300 elements, defined as the source spherical point set.

[0075] The set of three-dimensional spherical coordinates of feature points of the panoramic image to be calibrated in each group of adjacent panoramic images is defined as the target spherical point set.

[0076] The image to be calibrated refers to a panoramic image whose orientation needs to be adjusted to align with the reference image; the target spherical point set refers to the set of three-dimensional spherical coordinates corresponding to all feature points in the image to be calibrated.

[0077] Specifically, the system can iterate through the successfully matched feature points in the image to be calibrated and aggregate their 3D spherical coordinates into another point set data structure, for example, using the same storage format as the source spherical point set. For example, for the same set of adjacent panoramic images, if the image to be calibrated contains 300 matching feature points, the system can store the 3D spherical coordinates of these points as a point set containing 300 elements, defined as the target spherical point set.

[0078] Therefore, according to the above implementation method, the system can automatically construct point set data for subsequent rotation matrix calculation, providing a basis for panoramic image calibration, ensuring that feature points are accurately mapped from two-dimensional image space to three-dimensional spherical space, thereby supporting efficient chain calibration operations.

[0079] In some embodiments, based on the source spherical point set and the target spherical point set, the optimal rotation matrix between current adjacent panoramic image pairs is calculated using a random sampling consensus algorithm and a point cloud alignment algorithm, including: The random sampling consensus algorithm is used to randomly sample a subset containing a preset number of feature point pairs from multiple sets of feature point pairs.

[0080] The preset number of feature point pairs refers to the number of feature point pairs that are set in advance for each sampling operation. This number must meet the minimum computational requirements of the point cloud alignment algorithm (usually 3 pairs of points).

[0081] Specifically, the system can configure the sampling parameters of the Random Sample Consensus Algorithm (RANSAC), setting the number of iterations (e.g., 100 times) and the number of feature point pairs sampled each time (e.g., 3 pairs), to randomly extract a subset from all feature point pairs. For example, the system performs 100 random samplings from 300 sets of feature point pairs, uniformly and randomly selecting 3 pairs of points each time to generate 100 sampling subsets.

[0082] For each subset obtained from sampling, a candidate rotation matrix is ​​calculated using the point cloud alignment algorithm.

[0083] Among them, the point cloud alignment algorithm (Kabsch algorithm) is a three-dimensional point cloud registration method based on singular value decomposition (SVD) to calculate the rotation matrix between two point sets; the candidate rotation matrix refers to the temporary rotation matrix calculated at each sampling.

[0084] Specifically, the system calculates the candidate rotation matrix using the following steps: First, calculate the centroid coordinates of the source and target spherical point sets; translate the coordinates of the two point sets to centers centered on their respective centroids; calculate the covariance matrix of the translated point sets; perform singular value decomposition (SVD) on the covariance matrix; and solve for the rotation matrix based on the decomposition results. (It should be noted that in the specific application scenario of this scheme, since all feature points are located on a sphere of fixed radius, their geometric center is the center of the sphere. Therefore, this characteristic can be used to simplify the calculation; for example, in a specific implementation, the origin of the coordinate system can be directly treated as the equivalent centroid.) For example, for a sampling subset containing 3 pairs of points, the covariance matrix after translation is a 3x3 matrix, and the candidate rotation matrix obtained after SVD decomposition has a determinant value of 1.0. The candidate rotation matrix is ​​then multiplied by the coordinates of each point in the target spherical point set to generate the transformed target spherical point set, and the sum of errors between the transformed target spherical point set and the source spherical point set for all feature point pairs is calculated.

[0085] Matrix multiplication is a linear transformation operation that multiplies a 3x3 rotation matrix with the coordinates of a 3D point. In this method, when this operation is applied to a point, its vertical component (Z-axis coordinate) remains unchanged, while only its horizontal component (X and Y-axis coordinates) is changed to achieve horizontal alignment. The transformed target spherical point set is the new point set obtained after applying the rotation matrix, and its vertical component of the point coordinates is consistent with that before the rotation. The total error is the sum of the Euclidean distances between the transformed target point and the corresponding source point in all feature point pairs.

[0086] Specifically, the system can represent the point set using homogeneous coordinates, multiply the candidate rotation matrix by the coordinates of each target point to generate a transformed point set; then it iterates through each pair of feature points, calculates the Euclidean distance between the transformed target point and the source point, and sums all distance values. For example, for 300 pairs of feature points, after applying the candidate rotation matrix, the error distance (e.g., 0.1 meters) of each pair is calculated, and the sum of these errors yields a total error of 30 meters.

[0087] Choose the candidate rotation matrix that minimizes the total error as the optimal rotation matrix.

[0088] The optimal rotation matrix is ​​the rotation matrix among all candidate rotation matrices that minimizes the total error, and is used to achieve optimal alignment of point sets.

[0089] Specifically, the system can select the matrix with the minimum sum of errors from all candidate rotation matrices as the final result. For example, if the system evaluates the sum of errors of 100 candidate rotation matrices and finds that the sum of errors of the 5th matrix is ​​25 m, which is the minimum, it will be selected as the optimal rotation matrix.

[0090] Therefore, according to the above implementation method, the system can automatically and robustly calculate the optimal rotation parameters between adjacent panoramic images through iterative sampling and error optimization, providing high-precision input for horizontal calibration.

[0091] In some embodiments, based on an application example of the optimal rotation matrix in the target spherical point set, horizontal calibration is performed on adjacent panoramic image pairs in each round, and the calibrated image is used as the reference for the next round of calibration, including: The first panoramic image in the panoramic image sequence is used as the initial reference image.

[0092] The initial reference image refers to the first panoramic image in the panoramic image sequence, which serves as the starting point for calibration and is used as a reference for subsequent image alignment.

[0093] Specifically, the system can designate the first image file in the image sequence as the initial reference image by reading the file naming order or acquisition timestamp, keeping its orientation unchanged, and using it as a reference frame for subsequent calibration. For example, when processing a sequence of 10 panoramic images, the system sets the first image (filenamed "image_001.jpg") as the initial reference image, initializing its three-dimensional orientation coordinates to the identity matrix.

[0094] The initial reference image and the second adjacent panoramic image in the sequence are taken as the first pair of adjacent panoramic images. Feature point extraction and matching, coordinate transformation, and calculation and application of the optimal rotation matrix are performed to complete the calibration of the second panoramic image.

[0095] The first pair of adjacent panoramic images refers to the pairing of the first and second images in the sequence, used to initiate the chain calibration process; the feature point extraction and matching operation refers to the process of detecting significant key points in the image and establishing corresponding relationships through computer vision algorithms; the coordinate transformation operation refers to the mathematical transformation process of mapping the two-dimensional image coordinates of feature points to three-dimensional spherical coordinates; the operation of calculating and applying the optimal rotation matrix refers to the process of solving the rotation parameters through the Random Sample Consensus Algorithm (RANSAC) and the point cloud alignment algorithm (Kabsch) and applying them to image alignment.

[0096] Specifically, the system can perform a complete calibration process on the first set of image pairs by calling a feature point extraction module (such as the scale-invariant feature transform algorithm SIFT), a coordinate transformation module (mapping two-dimensional coordinates to a sphere), and a rotation matrix calculation module (RANSAC+Kabsch algorithm). First, feature points are extracted and matched to generate multiple sets of feature point pairs. Then, the feature point coordinates are converted to three-dimensional spherical coordinates to construct a source spherical point set (reference image) and a target spherical point set (image to be calibrated). Finally, the RANSAC algorithm is iterated: in each iteration, a subset of feature point pairs is randomly sampled, a candidate rotation matrix is ​​calculated using the Kabsch algorithm, and inliers (i.e., point pairs whose error with the source point after transformation is less than the threshold) are selected based on a preset error threshold. Through multiple iterations, the candidate rotation matrix with the largest number of inliers or the smallest total inlier error is finally selected as the optimal rotation matrix. This optimal matrix is ​​applied to the target spherical point set to align the orientation of the second image with the initial reference image. For example, for the first image pair (Image 1 and Image 2), the system extracts 300 feature point pairs. After calculating the optimal rotation matrix through the above process, the average alignment error between Image 2 and Image 1 is reduced from 0.5 meters to 0.05 meters, completing the calibration. After each calibration round, the system calculates the alignment error (e.g., average distance error). If the error exceeds a threshold (e.g., 0.1 meters), the feature points are re-matched or the sampling strategy is adjusted to ensure the robustness of the chained process.

[0097] The calibrated Nth panoramic image is used as the reference image, and together with the (N+1)th panoramic image in the sequence, a new pair of adjacent panoramic images is formed, where N is an integer greater than or equal to 2.

[0098] The new adjacent panoramic image pair refers to the image pairing dynamically generated during the calibration process, where the Nth image has been calibrated and used as the new benchmark, and the N+1th image is the object to be calibrated; N is the iteration index, which increments from 2.

[0099] Specifically, the system can use a cyclic index N (initially set to 2) to switch the role of the calibrated Nth image from "image to be calibrated" to "reference image" each time, and form a new pairing with the next image in the sequence (the N+1th image). For example, when N=2, the system uses the calibrated image 2 as the reference image and forms a new pairing with image 3; when N=3, it uses image 3 as the reference and pairs it with image 4, and so on.

[0100] The system performs feature point extraction and matching, coordinate transformation, and calculation and application of the optimal rotation matrix on new adjacent panoramic image pairs to complete the calibration of the N+1th panoramic image.

[0101] The calibration operation refers to repeating the same complete process as the first group for each new pair to ensure that the orientation of each image in the sequence is gradually unified.

[0102] Specifically, the system can repeatedly call the same algorithm modules (feature point extraction, coordinate transformation, and rotation matrix calculation) to perform feature matching, coordinate transformation, and rotation matrix calculation for each new pair, and apply the optimal rotation matrix to the point set of the (N+1)th image to achieve orientation alignment. For example, for a new pair with N=2 (image 2 and image 3), the system performs feature point matching to generate 300 point pairs, and after calculating the rotation matrix, the calibration error of image 3 is reduced to 0.06 meters.

[0103] Repeat the previous step until the last panoramic image in the panoramic image sequence is calibrated.

[0104] Here, repeated execution means iteratively processing all image pairs in the sequence until the end of the sequence; the termination condition is to stop iterating when N+1 equals the total number of images in the sequence.

[0105] Specifically, the system can automatically traverse the sequence by setting a loop termination condition (such as when N+1 equals the total number of images in the sequence), ensuring that each image is calibrated. For example, for a sequence of 10 images, the system iterates starting from N=2, processing image pairs sequentially. Once image 10 is calibrated, the process terminates.

[0106] Therefore, according to the above implementation method, the system can automatically and efficiently unify the horizontal direction of the entire panoramic image sequence through a chain calibration process, ensuring the smoothness of perspective switching and visual continuity during VR roaming.

[0107] In some embodiments, the step of calculating a candidate rotation matrix using a point cloud alignment algorithm for each sampled subset includes: Set the z-axis coordinates of each point in the spherical point set to a preset constant value.

[0108] As a standard procedure in point cloud alignment algorithms (such as the Kabsch algorithm), it is theoretically necessary to calculate the centroid of the point set. The centroid usually refers to the arithmetic mean of the three-dimensional coordinates of all points in the point set, representing the spatial center position of the point set. In the algorithm described in this method, if the centroid is calculated, the z-axis coordinate of each point must be regarded as a preset constant value (e.g., 0). Therefore, the centroid calculation in this case is actually based only on the x-axis and y-axis coordinate components of the point.

[0109] The corresponding theoretical calculation formula is: Where C is a preset constant value. For example, for a source spherical point set containing 3 points, if the coordinates of each point are... Then its theoretical centroid coordinates are However, in the specific implementation scenario of this invention, since both the source point cloud and the target point cloud originate from a panoramic acquisition device with a known spatial location, their coordinate systems are already consistent during the data acquisition stage. Therefore, when solving the rotation matrix later, the calculation can be performed directly based on the point set, without needing to perform the step of translation to the centroid.

[0110] Calculate the covariance matrix between the source spherical point set and the target spherical point set after translation.

[0111] The covariance matrix is ​​a 3x3 matrix used to characterize the linear correlation between two point sets, and its element values ​​are the covariances of the coordinate components of the point sets.

[0112] Specifically, the system can be implemented through formulas Calculate the covariance matrix, where Σ represents the summation over all pairs of points, and T represents the transpose operation.

[0113] For example, for 3 pairs of points, the system calculates a 3x3 covariance matrix, where the element in the first row and first column has a value of 2.5.

[0114] Perform singular value decomposition on the covariance matrix and solve for the candidate rotation matrix based on the results of the singular value decomposition.

[0115] Singular value decomposition (SVD) is a mathematical method that decomposes a matrix into a product of three matrices. The candidate rotation matrix is ​​a temporary rotation matrix calculated from the decomposition results and is used for point set alignment.

[0116] Specifically, the system can achieve decomposition by calling the SVD function of a linear algebra library (such as NumPy, a software tool for performing core mathematical operations) and based on the formula. Calculate the rotation matrix while checking that the determinant value is 1 (right-handed coordinate system). For example, after performing SVD decomposition on the above covariance matrix, we obtain the rotation matrix R, whose determinant value is 1.0, indicating a valid rotation.

[0117] Therefore, according to the above implementation method, the system can automatically solve the candidate rotation matrix through precise mathematical calculations, providing high-precision transformation parameters for panoramic image calibration.

[0118] In some embodiments, the calibrated panoramic image sequence is mapped onto the model surface of a virtual 3D scene, and texture rendering is performed using shader technology to obtain the mapped virtual environment, including: The calibrated panoramic image sequence is input as a texture resource into the virtual map platform.

[0119] Among them, the virtual map platform refers to the software system used to load, display and interact with three-dimensional virtual scenes, such as the Three.js graphics rendering engine; texture resources refer to the calibrated panoramic image sequence, stored in the form of image files, and used for texture rendering.

[0120] Specifically, the system can batch load calibrated panoramic image files through file input interfaces (such as the TextureLoader class in Three.js, a utility class built into the Three.js graphics rendering engine specifically for asynchronously loading image files and converting them into Texture Objects), convert them into Texture Objects, and store them in the platform's resource pool. For example, the system inputs 10 calibrated panoramic images (JPEG format, each with a resolution of 2560 pixels by 1280 pixels) as texture resources into the Three.js virtual map platform, generating 10 Texture Objects, each approximately 30MB in size.

[0121] In a virtual map platform, shader technology is used to map the texture of a single panoramic image onto the surface of a 3D model of a world model configured by the virtual map platform.

[0122] The world model refers to a pre-built 3D scene model in the virtual map platform, which consists of vertices, edges, and faces; the 3D model surface refers to the collection of polygons in the world model, used to receive texture mapping.

[0123] Specifically, the system can bind panoramic image textures to 3D models using shader materials: first, the geometric data of the world model (such as vertex coordinates and normal vectors) is passed to the GPU, and then the shader program maps the texture onto the model surface. For example, the virtual map platform configures an indoor office area world model, which is composed of multiple polygonal bodies with complex geometric structures; the system maps a single panoramic image texture (such as the first calibration image) onto the model surface, covering the entire visible area of ​​the scene.

[0124] Shader technology includes vertex shaders and fragment shaders. Vertex shaders are used to process the world coordinates and panorama coordinates of vertices in the world model, while fragment shaders are used to calculate UV texture coordinates based on the viewing direction and sample colors from the input panorama texture.

[0125] Among them, the vertex shader refers to the GPU program executed for each vertex, which is used for coordinate transformation; the fragment shader refers to the GPU program executed for each pixel fragment, which is used for color calculation; world coordinates refer to the global position of the vertex in the 3D scene; panorama coordinates refer to the position of the panorama image acquisition point in 3D space; UV texture coordinates refer to the normalized coordinates (U and V range from 0 to 1) on the 2D texture image.

[0126] Specifically, the system can be implemented through the following process: the vertex shader receives model vertex data and outputs variables containing world coordinates and panoramic coordinates; the fragment shader calculates the gaze direction vector based on the camera position and vertex world coordinates, converts this vector to spherical coordinates, maps it to UV coordinates, and finally samples the corresponding color value from the panoramic texture. For example, the fragment shader calculates the gaze direction vector... Convert to spherical coordinates (polar angle) azimuth ), mapping to obtain UV coordinates RGB (Red, Green, Blue) color values ​​are obtained by sampling the texture of the panoramic image. .

[0127] During VR roaming, when the viewpoint is located between two or more adjacent panoramic acquisition points, the fragment shader dynamically calculates the distance ratio between the viewpoint and each panoramic acquisition point, and performs weighted mixing of the sampled colors of multiple panoramic images based on the distance ratio, so that the multiple panoramic images can be smoothly transitioned.

[0128] The distance ratio refers to the inverse weight of the spatial distance between the current viewpoint and each panoramic acquisition point, which is calculated through linear interpolation or a nonlinear decay function; the weighted mixing refers to sampling multiple panoramic textures in parallel in the GPU fragment shader and performing fusion calculation on the RGB color values ​​according to the weight coefficients.

[0129] Specifically, the system achieves the transition effect through the following steps: First, the fragment shader calculates the Euclidean distance from the viewpoint to each relevant acquisition point based on the world coordinates of the current viewpoint and the known coordinates of the panoramic acquisition points. Then, it converts the absolute distance into normalized weight coefficients using the inverse of the distance or a Gaussian decay function, ensuring that the sum of all weights is 1. Finally, it samples the color values ​​of multiple panoramic images at the corresponding UV coordinates for each pixel fragment in parallel, performs linear blending according to weights, and outputs the final color. This process fully utilizes the parallel computing power of the GPU and is completed in real time in the rendering pipeline, ensuring a transition frame rate of no less than 30fps.

[0130] For example, when viewpoint C is located at panoramic acquisition point A (coordinates...) ) and point B (coordinates) Between ), and the coordinates of point C are hour: Calculate distance: rice, rice; Calculate the weights: , ; Color mixing: Colors are sampled from panoramic image A. Colors were obtained by sampling from panoramic image B. Final color .

[0131] In this way, as the viewpoint moves from A to B, the scene texture gradually transitions from being dominated by panoramic image A to being dominated by panoramic image B, avoiding visual jumps.

[0132] Therefore, according to the above implementation method, the system can achieve high-fidelity fusion of the calibrated panoramic image and the virtual 3D scene to generate a coherent visual environment that can be used for immersive VR roaming.

[0133] In another embodiment of the invention, the system is applied to a virtual 3D scene of an indoor office area, aiming to achieve an immersive VR roaming experience. First, the system acquires a sequence of panoramic images collected along a preset path. This sequence contains 10 panoramic images, with the acquisition point interval set to 5 meters to ensure that there is an overlapping field of view (approximately 30%) between any two adjacent panoramic images. The images are captured using an Insta360 panoramic camera (a universal panoramic shooting device), with a resolution of 2560 pixels multiplied by 1280 pixels. The camera is fixed 1.5 meters above the ground by a bracket to maintain a horizontal viewing angle. Adjacent images in the sequence satisfy the condition of mutual visibility; for example, the first and second images can jointly cover the corridor entrance area.

[0134] For each pair of adjacent panoramic images (e.g., the first and second images), the system performs feature point extraction and matching: Feature points and 128-dimensional feature descriptor vectors for each image are extracted using the Scale Invariant Feature Transform (SIFT) algorithm. Then, the K-Nearest Neighbor (KNN) matching algorithm is used based on descriptor similarity to generate multiple pairs of feature points (e.g., the first image pair generates 300 pairs). Next, the system converts the two-dimensional image coordinates in each pair of feature points into three-dimensional spherical coordinates: the normalized coordinates of each feature point are... Map the coordinates onto the surface of a sphere with a radius of 1 meter (m) and calculate the three-dimensional coordinates. This allows us to construct a source spherical point set (belonging to the reference image, such as the first image) and a target spherical point set (belonging to the image to be calibrated, such as the second image).

[0135] Based on the source and target spherical point sets, the system calculates the optimal rotation matrix using the Random Sample Consensus Algorithm (RANSAC) and the Kabsch point cloud alignment algorithm: The RANSAC algorithm randomly samples 100 times from 300 feature point pairs, selecting 3 pairs each time; for each sample subset, the Kabsch algorithm calculates candidate rotation matrices (including centroid translation, covariance matrix calculation, singular value decomposition (SVD), and other steps); the system evaluates the sum of errors of each candidate matrix (i.e., the sum of Euclidean distances between the transformed point set and the source point set), and selects the matrix corresponding to the minimum sum of errors (e.g., 0.02 radians) as the optimal rotation matrix. This matrix is ​​applied to the target spherical point set to complete horizontal calibration (e.g., aligning the second image with the first image, reducing the average distance error from 0.5 meters to 0.05 meters).

[0136] After calibration, the system performs a panoramic rendering mapping step: Using the ShaderMaterial interface of the Three.js graphics rendering engine (an open-source JavaScript library for 3D computer graphics rendering), the calibrated panoramic image sequence is mapped as a texture resource onto the surface of the 3D architectural model (the model contains 50,000 vertices). The vertex shader processes each vertex, outputting world coordinates and panoramic coordinates; the fragment shader calculates the view direction vector (e.g., ...) based on the camera position and vertex world coordinates. ), convert to spherical coordinates (polar angle) azimuth ) and mapped to UV texture coordinates (e.g. Finally, color values ​​are sampled from the panoramic image (e.g., ).

[0137] During VR roaming, when the user's viewpoint is located between two adjacent panoramic points (such as point A and point B), the system does not directly blend the two original panoramic images. Instead, it performs parallel rendering and color fusion based on the model surface: For the same vertex on the 3D model surface (taking position D as an example), the fragment shader calculates its viewing direction (i.e., vector D→A and vector D→B) relative to the panoramic acquisition points A and B respectively. These two direction vectors are converted into spherical coordinates and mapped to their respective independent panoramic texture coordinates (UV coordinates). Then, the shader samples in parallel from the corresponding panoramic textures A and B according to their respective calculated texture coordinates to obtain the color value of the vertex from the two viewpoints A and B. and Subsequently, weights are dynamically calculated based on the distance ratio between the current viewpoint C and points A and B (e.g., 30% from point A and 70% from point B), and applied accordingly. and A weighted blending process is performed to generate the final display color of the vertex. During this process, because vertex C is mapped to different texture coordinates in the two different panoramic coordinate systems A and B, therefore... and Essentially, it extracts colors based on different perspectives (DA and DB directions). The farther away from a panoramic acquisition point, the more significant the effect of perspective projection distortion becomes on the colors sampled from that point's panoramic image, thus achieving a smooth transition effect that conforms to visual laws. This embodiment fully demonstrates the entire process from data acquisition and automatic calibration to real-time rendering, showcasing the technical advantages of this invention in improving the realism and efficiency of VR roaming.

[0138] Figure 3 This is a structural block diagram of a panoramic image construction system for VR roaming according to an embodiment of the present invention.

[0139] like Figure 3 As shown, the panoramic image construction system for VR roaming includes: The panoramic image sequence acquisition module 210 is used to acquire a panoramic image sequence in a virtual 3D scene, wherein there is an overlapping field of view between any two adjacent panoramic images in the panoramic image sequence.

[0140] The panoramic image multi-round calibration module 220 is used to select the first panoramic image from the panoramic image sequence as the initial reference image according to the image order, and perform panoramic image calibration operations on each pair of adjacent panoramic images in the sequence in turn.

[0141] The VR roaming scene rendering module 230 is used to map the calibrated panoramic image sequence onto the model surface of the virtual 3D scene, and perform texture rendering through shader technology to obtain the mapped virtual environment.

[0142] The panoramic image multi-round calibration module 220 is also used for: For each pair of adjacent panoramic images in each round, feature point extraction and matching operations are performed to obtain multiple sets of feature point pairs. The two-dimensional image coordinates in each feature point pair are converted to three-dimensional spherical coordinates, constructing a source spherical point set and a target spherical point set. Based on the source and target spherical point sets, the optimal rotation matrix between the current adjacent panoramic image pairs is calculated using a random sampling consensus algorithm and a point cloud alignment algorithm. The point cloud alignment algorithm is configured to calculate the rotation matrix only based on the horizontal component of the three-dimensional spherical coordinates, while setting the vertical component to a constant value. Based on an application example of the optimal rotation matrix in the target spherical point set, horizontal calibration is performed on each pair of adjacent panoramic images in each round, and the calibrated image is used as the reference for the next round of calibration.

[0143] The specific functions and examples of each module and submodule of the device in this embodiment of the invention can be found in the relevant descriptions of the corresponding steps in the above method embodiments, and will not be repeated here.

[0144] According to embodiments of the present invention, the above-described method of the present invention can be applied to a computer device and a readable storage medium.

[0145] Figure 4 A schematic block diagram of an example computer device 600 that can be used to implement embodiments of the present invention is shown. The computer device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The computer device can also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the invention described and / or claimed herein.

[0146] like Figure 4 As shown, the computer device 600 includes a computing unit 601, which can perform various appropriate actions and processes based on a computer program stored in a read-only memory (ROM) 602 or a computer program loaded from a storage unit 608 into a random access memory (RAM) 603. The RAM 603 may also store various programs and data required for the operation of the computer device 600. The computing unit 601, ROM 602, and RAM 603 are interconnected via a bus 604. An input / output (I / O) interface 605 is also connected to the bus 604.

[0147] Multiple components in computer device 600 are connected to I / O interface 605, including: input unit 606, such as keyboard, mouse, etc.; output unit 607, such as various types of monitors, speakers, etc.; storage unit 608, such as disk, optical disk, etc.; and communication unit 609, such as network card, modem, wireless transceiver, etc. Communication unit 609 allows computer device 600 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0148] The computing unit 601 can be various general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the various methods and processes described above, such as a panoramic image construction method for VR roaming. For example, in some embodiments, a panoramic image construction method for VR roaming can be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program can be loaded and / or installed on the computer device 600 via ROM 602 and / or communication unit 609. When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the panoramic image construction method for VR roaming described above can be performed. Alternatively, in other embodiments, the computing unit 601 may be configured by any other suitable means (e.g., by means of firmware) to perform a panoramic image construction method for VR roaming.

[0149] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), payload-programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.

[0150] The program code used to implement the methods of the present invention can be written in any combination of one or more programming languages. This program code can be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code can be executed entirely on the machine, partially on the machine, as a standalone software package partially on the machine and partially on a remote machine, or entirely on a remote machine or server.

[0151] In the context of this invention, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media can include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0152] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0153] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as a data server), or computing systems that include middleware components (e.g., an application server), or computing systems that include frontend components (e.g., a user computer with a graphical user interface or web browser through which a user can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., a communication network). Examples of communication networks include local area networks (LANs), wide area networks (WANs), and the Internet.

[0154] Computer systems can include clients and servers. Clients and servers are generally located far apart and typically interact via communication networks. Client-server relationships are created by computer programs running on the respective computers and having a client-server relationship with each other. Servers can be cloud servers, servers in distributed systems, or servers incorporating blockchain technology.

[0155] It should be understood that the various forms of processes shown above can be used to reorder, add, or delete steps. For example, the steps described in this invention can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this invention can be achieved, and this is not limited herein.

[0156] The specific embodiments described above do not constitute a limitation on the scope of protection of this invention. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the principles of this invention should be included within the scope of protection of this invention.

Claims

1. A method for constructing panoramic images for VR roaming, characterized in that, include: Obtain a panoramic image sequence in a virtual 3D scene, wherein any two adjacent panoramic images in the panoramic image sequence have overlapping fields of view; The first panoramic image is selected from the panoramic image sequence according to the image order as the initial reference image, and panoramic image calibration is performed on each pair of adjacent panoramic images in the sequence in turn. The calibrated panoramic image sequence is mapped onto the model surface of the virtual 3D scene, and texture rendering is performed using shader technology to obtain the mapped virtual environment; The step of sequentially performing panoramic image calibration operations on adjacent panoramic image pairs in each group of the sequence includes: For each pair of adjacent panoramic images in each round, feature point extraction and matching operations are performed to obtain multiple sets of feature point pairs; The two-dimensional image coordinates in each of the feature point pairs are converted into three-dimensional spherical coordinates to construct the source spherical point set and the target spherical point set; Based on the source spherical point set and the target spherical point set, the optimal rotation matrix between the current adjacent panoramic image pairs is calculated by a random sampling consistency algorithm and a point cloud alignment algorithm. The point cloud alignment algorithm is configured to calculate the rotation matrix only based on the horizontal component of the three-dimensional spherical coordinates and set the vertical component to a constant value. Based on the application example of the optimal rotation matrix in the target spherical point set, horizontal calibration is performed on adjacent panoramic image pairs in each round, and the calibrated image is used as the reference for the next round of calibration.

2. The method according to claim 1, characterized in that, The feature point extraction and matching operation includes: The feature points and corresponding feature descriptor vectors of the adjacent panoramic image pairs are extracted using the scale-invariant feature transform algorithm. Based on the similarity between the feature descriptor vectors, the K-nearest neighbor matching algorithm is used to match the feature points and generate the multiple sets of feature point pairs; The scale-invariant feature transform algorithm is configured to perform multi-scale spatial analysis on each panoramic image, extract target feature points from each panoramic image, and generate corresponding feature descriptor vectors. The K-nearest neighbor matching algorithm is configured to calculate the Euclidean distance between each feature descriptor vector, find K maximum likelihood matching points for each feature point, and filter out false matches according to a set distance ratio threshold to generate high-confidence feature point pairs.

3. The method according to claim 2, characterized in that, The step of converting the two-dimensional image coordinates in each of the feature point pairs into three-dimensional spherical coordinates to construct the source spherical point set and the target spherical point set includes: The two-dimensional image coordinates of each feature point in the feature point pair are mapped onto the surface of a sphere with a fixed radius to obtain the corresponding three-dimensional spherical coordinates. The set of three-dimensional spherical coordinates of feature points of the panoramic image serving as the reference image in each group of adjacent panoramic images is defined as the source spherical point set; The set of three-dimensional spherical coordinates of feature points of the panoramic image that serves as the image to be calibrated in each group of adjacent panoramic images is defined as the target spherical point set.

4. The method according to claim 1, characterized in that, The step of calculating the optimal rotation matrix between the current adjacent panoramic image pairs based on the source spherical point set and the target spherical point set using a random sampling consistency algorithm and a point cloud alignment algorithm includes: The random sampling consensus algorithm is used to randomly sample a subset containing a preset number of feature point pairs from the multiple sets of feature point pairs multiple times; For each subset obtained from sampling, a candidate rotation matrix is ​​calculated using the point cloud alignment algorithm described above; The candidate rotation matrix is ​​applied to the target spherical point set to generate a transformed target spherical point set. Based on a preset error threshold, an interior point set is determined from the multiple sets of feature point pairs. The total error of the interior point set between the transformed target spherical point set and the source spherical point set is calculated. The candidate rotation matrix that minimizes the sum of errors is selected as the optimal rotation matrix.

5. The method according to claim 1, characterized in that, The application example of the optimal rotation matrix in the target spherical point set includes performing horizontal calibration on adjacent panoramic image pairs in each round, and using the calibrated image as the reference for the next round of calibration, including: The first panoramic image in the panoramic image sequence is used as the initial reference image; The initial reference image and the adjacent second panoramic image in the sequence are taken as the first pair of adjacent panoramic images. The feature point extraction and matching operation, the coordinate transformation operation, and the operation of calculating and applying the optimal rotation matrix are performed to complete the calibration of the second panoramic image. The calibrated Nth panoramic image is used as the reference image, and it is combined with the (N+1)th panoramic image in the sequence to form a new pair of adjacent panoramic images, where N is an integer greater than or equal to 2. The feature point extraction and matching operation, the coordinate transformation operation, and the calculation and application of the optimal rotation matrix are performed on the new adjacent panoramic image pairs to complete the calibration of the N+1th panoramic image; Repeat the previous step until the last panoramic image in the panoramic image sequence is calibrated.

6. The method according to claim 4, characterized in that, The step of calculating a candidate rotation matrix using the point cloud alignment algorithm for each sampled subset includes: Set the z-axis coordinates of each point in the spherical point set to a preset constant value; Calculate the covariance matrix between the source spherical point set and the target spherical point set after translation; The covariance matrix is ​​subjected to singular value decomposition, and the candidate rotation matrix is ​​solved based on the results of the singular value decomposition.

7. The method according to claim 1, characterized in that, The process of mapping the calibrated panoramic image sequence onto the model surface of a virtual 3D scene and performing texture rendering using shader technology to obtain the mapped virtual environment includes: The calibrated panoramic image sequence is input as a texture resource into the virtual map platform; In the virtual map platform, the texture of a single panoramic image is mapped onto the surface of a 3D model of a world model configured by the virtual map platform using shader technology; The shader technology includes a vertex shader and a fragment shader. The vertex shader is used to process the world coordinates and panorama coordinates of vertices in the world model, and the fragment shader is used to calculate UV texture coordinates based on the viewing direction and sample colors from the input panorama texture. During VR roaming, when the viewpoint is located between two or more adjacent panoramic acquisition points, the fragment shader dynamically calculates the distance ratio between the viewpoint and each panoramic acquisition point, and performs weighted mixing of the sampled colors of multiple panoramic images based on the distance ratio, so that the multiple panoramic images can be smoothly transitioned.

8. A panoramic image construction system for VR roaming, characterized in that, include: A panoramic image sequence acquisition module is used to acquire a panoramic image sequence in a virtual 3D scene, wherein there is an overlapping field of view between any two adjacent panoramic images in the panoramic image sequence. The panoramic image multi-round calibration module is used to select the first panoramic image from the panoramic image sequence as the initial reference image according to the image order, and perform panoramic image calibration operations on each pair of adjacent panoramic images in the sequence in turn. The VR roaming scene rendering module is used to map the calibrated panoramic image sequence onto the model surface of the virtual 3D scene, and perform texture rendering through shader technology to obtain the mapped virtual environment. The panoramic image multi-round calibration module is further used for: For each pair of adjacent panoramic images in each round, feature point extraction and matching operations are performed to obtain multiple sets of feature point pairs; The two-dimensional image coordinates in each of the feature point pairs are converted into three-dimensional spherical coordinates to construct the source spherical point set and the target spherical point set; Based on the source spherical point set and the target spherical point set, the optimal rotation matrix between the current adjacent panoramic image pairs is calculated by a random sampling consistency algorithm and a point cloud alignment algorithm. The point cloud alignment algorithm is configured to calculate the rotation matrix only based on the horizontal component of the three-dimensional spherical coordinates and set the vertical component to a constant value. Based on the application example of the optimal rotation matrix in the target spherical point set, horizontal calibration is performed on adjacent panoramic image pairs in each round, and the calibrated image is used as the reference for the next round of calibration.

9. A computer device, characterized in that, include: At least one processor; and a memory that is communicatively connected to the at least one processor; The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

10. A non-transitory computer-readable storage medium storing computer instructions, characterized in that, in, Computer instructions are used to cause a computer to perform the method according to any one of claims 1-7.