An artificial intelligence-based meta-universe scene rendering optimization system and method

The AI-based metaverse scene rendering optimization system solves the problem of 3D model fitting defects in traditional methods, achieving efficient and stable 3D scene rendering and improving user experience and rendering quality.

CN122289489APending Publication Date: 2026-06-26COLLEGE OF SCI & TECH NINGBO UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
COLLEGE OF SCI & TECH NINGBO UNIV
Filing Date
2026-04-01
Publication Date
2026-06-26

Smart Images

  • Figure CN122289489A_ABST
    Figure CN122289489A_ABST
Patent Text Reader

Abstract

This invention relates to the field of scene rendering, specifically to an AI-based metaverse scene rendering optimization system and method, comprising: an entity transformation unit, a model rendering unit, a view planning unit, a scene merging unit, and a computation execution module. The entity transformation unit is used to photograph scene entities and obtain pose transformation matrices; the model rendering unit is used to create 3D models; the view planning unit is used to determine the view roaming path; the scene merging unit is used to adjust the rendering scale; and the computation execution module is used to deploy and schedule rendering tasks. This invention can improve rendering quality and view continuity, maintain stable performance of multi-view scene rendering, enhance the realism and interaction efficiency of VR experiences, increase the rendering speed of new views, increase the computational efficiency of GPU memory, improve the modeling performance of complex model scenes, and enhance visual fidelity and stereoscopic rendering effects.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of scene rendering, specifically to an artificial intelligence-based metaverse scene rendering optimization system and method. Background Technology

[0002] The metaverse is a digital space constructed using virtual reality technology. Users can enter metaverse scenes through virtual reality or augmented reality devices to gain virtual social, entertainment, and learning experiences. Scene rendering is the core of metaverse technology. Through cropping and coloring, virtual 3D scenes are converted into 2D image sequences to fit the user's planar field of vision. Dedicated metaverse spaces can also use digital modeling to convert physical scenes into virtual scenes, which, after high-quality rendering, achieve the effect of remote scene reconstruction.

[0003] Metaverse applications are characterized by large software size, complex operating environment, and strong real-time interactivity. Traditional 2D image modeling methods, under the constraints of computer hardware resources, are prone to fitting defects in the geometric structure of 3D models. This results in problems such as unintuitive image information, large changes in viewpoint, long interactive roaming time, and uneven paths. Consequently, the construction process of virtual space lacks data support, the 3D reconstruction capability is insufficient, the rendering efficiency of 3D scenes is reduced, and it is difficult to meet the rendering requirements of high-quality scenes.

[0004] Furthermore, as VR scenes become more complex, the number of three-dimensional entities in the virtual space increases, the complexity rises, and the scene scale expands. There is a lack of optimization in the user's viewpoint selection. In scenarios where the user's field of vision moves frequently, there are problems such as lengthy image rendering time and rendering results that do not match the user's observation habits, making it difficult to achieve the ideal scene rendering effect. Summary of the Invention

[0005] The purpose of this invention is to provide an artificial intelligence-based metaverse scene rendering optimization system and method to solve the problems mentioned in the background art.

[0006] To address the aforementioned technical problems, this invention provides the following technical solution: a metaverse scene rendering optimization system based on artificial intelligence, comprising: an entity transformation unit, a model rendering unit, a view planning unit, a scene merging unit, and a computation execution module;

[0007] The entity transformation unit is used to capture scene entities that need to be digitized using a binocular camera, obtain planar images, calibrate camera intrinsic parameters, synthesize fisheye images from planar images through optical axis transformation, reconstruct the camera movement process based on the real-time camera pose, use data continuity and overlap as constraints, replace the camera architecture with a panoramic architecture, register the camera coordinate system to the metaverse world coordinate system, and obtain the pose transformation matrix. The model rendering unit is used to create a two-dimensional planar object model in the metaverse space, project the model into a three-dimensional space through a pose transformation matrix, use the RVM geometry generation algorithm to assemble the two-dimensional planar object model according to the image shooting position attributes, generate geometric primitives, particleize the color primitives, establish the topological structure of the image particle space, blur the image particles, establish the homography mapping of pixels within the particles, and create a three-dimensional model composed of discrete geometric primitives. The view planning unit is used to acquire the user's viewpoint, segment the foreground according to the user's viewing distance, divide the scene into sub-regions, construct axis-aligned bounding boxes for the sub-regions through convex hulls, sample each axis-aligned bounding box, calculate the surface saliency entropy of the viewpoint sample geometric metric model based on vertex curvature, filter samples with saliency entropy higher than the threshold, form viewpoint sets for each sub-region, sort the viewpoint sets within the sub-regions through the TSP algorithm, and determine the viewpoint roaming path. The scene merging unit is used to segment the scene image, calculate the visual perception based on information entropy, observation distance, motion speed and field of view eccentricity, adjust the rendering resolution level through the visual perception, adjust the rendering ratio of edge model, point model and mesh model according to the resolution level, perform texture display and color rendering on entities in the metaverse scene, and output a three-dimensional metaverse scene. The computation execution module is used to deploy the serialized rendering tasks to different computing platforms for execution. Based on the rendering quality, transmission latency and adjacent time quality smoothness, it calculates the view difference of each level of model under the roaming path, obtains the visual perception error, schedules rendering content of different quality levels, and controls the visual perception error within a preset range.

[0008] Furthermore, the entity transformation unit includes: a camera calibration unit and a coordinate registration unit; The camera calibration unit is used to calibrate the focal length, principal point, distortion coefficient and lens relative position of the binocular camera, perform stereo correction on the image, align the pixel rows of the left and right images, calculate the three-dimensional point cloud and camera pose through triangulation, and solve the new camera pose increment by PNP to add a new image. The coordinate registration unit is used to generate a dense point cloud using a stereo matching algorithm, segment the image in each frame, decompose the entity into a static background and a dynamic object, adjust the physical motion trajectory according to rigid body motion constraints, and merge the dynamic object model with the static background into the metaverse scene to generate a pose transformation matrix.

[0009] Furthermore, the model rendering unit includes: an entity creation unit, a primitive connectivity unit, and a 3D rendering unit; The entity creation unit is used to obtain the projection image mesh of the three-dimensional entity through triangular rasterization, project the image texture onto the image mesh, and register the reconstructed three-dimensional model into the global coordinate system of the metaverse space through the pose transformation matrix. The primitive connectivity unit is used to take a two-dimensional plane model as input using a random vector model to generate geometric primitives. A morphological connectivity operator is applied to each colored primitive, and each particle represents an irregular connected region. Topological relationships between particles are established to form a topological graph. The 3D rendering unit is used to perform reachability matrix analysis on directed trees and directed loops in the topology graph, establish homography mapping from 2D image to 3D model surface, and obtain 3D model.

[0010] Furthermore, the view planning unit includes: a scene segmentation unit and a path roaming unit; The scene segmentation unit is used to constrain the scene structure by surface unbiased estimation, and to use the truncated section of a multidimensional Gaussian function as the frustum sampling region between the near and far planes of the user's frustum. It defines a weight distribution on a plane perpendicular to the line of sight and uses a depth map to segment the foreground and background according to the user's viewing distance. The path roaming unit is used to generate candidate viewpoints within the convex hull and axis-aligned bounding box of each sub-region. For each candidate viewpoint, the curvature change of the visible surface is calculated, and the entropy of the curvature distribution is calculated based on the curvature change as the surface saliency entropy. Viewpoints with surface saliency entropy higher than the threshold are selected to form the viewpoint set of the sub-region. A roaming path is generated by solving the TSP, and the inflection points in the roaming path are smoothed.

[0011] Furthermore, the scene merging unit includes: a perception hierarchy unit, a transmission rendering unit, and a layer fusion unit; The perception hierarchy unit is used to segment the scene image into region blocks using the SLIC algorithm, calculate the information entropy of each image region block based on the roaming path length, model volume and texture density, calculate the visual perception by weighted summation, and adjust the rendering resource allocation of different level regions. The transmission rendering unit is used to calculate reflection and transmission components through selective amplitude, depth smoothing constraints and asymptotic consistency, and to merge three-dimensional scenes by utilizing GPU parallel rendering layer resources. The layer fusion unit is used to construct an inverse rendering network from the encoder and decoder, predict albedo and normals, calculate light intensity through the lighting model, and input the albedo map, normal map and light intensity map into the neural radiation network to reconstruct the image of each viewpoint within the model.

[0012] Furthermore, the computation execution module includes: a task allocation unit and a thread adjustment unit; The task allocation unit is used to extract semi-transparent entities in the scene, perform pseudo-supervision of the normal gradient through the depth gradient, calculate the propagation path of light in the semi-transparent entities, render the refraction view, obtain the complete metaverse rendering task, serialize the task, and deploy it to various computing platforms, including: local GPU, cloud GPU cluster and edge device. The thread adjustment unit is used to fit the difference between the view rendered by the model of different quality level and the highest quality view through the image classification model, output the visual perception error, monitor the visual perception error in real time, and adjust the execution platform or model rendering quality level of the rendering task when the error exceeds the threshold, and generate scheduling decisions.

[0013] An AI-based method for optimizing the rendering of metaverse scenes includes the following steps: Step S1. Use a binocular camera to photograph the entity, obtain a planar image, calibrate the camera intrinsic parameters, synthesize a fisheye image, reconstruct the camera movement process based on the real-time camera pose, register the camera coordinate system to the metaverse world coordinate system, and obtain the pose transformation matrix. Step S2. Project the two-dimensional planar model into the three-dimensional space through the pose transformation matrix, assemble the two-dimensional model according to the image shooting position, generate geometric primitives, particleize the primitives, establish the topological structure of the image particle space, and create a three-dimensional scene model based on the homography mapping of the pixels within the particles. Step S3. Divide the scene model into sub-regions according to the user's viewing distance, construct axis-aligned bounding boxes for the sub-regions using convex hulls, sample each axis-aligned bounding box, calculate the surface saliency entropy of the box model, filter samples with saliency entropy higher than the threshold, form the viewpoint set for each sub-region, sort the viewpoint sets within the sub-regions using the TSP algorithm, and determine the roaming path. Step S4. Based on the weighted sum of information entropy, observation distance, motion speed and field of view eccentricity, obtain the visual perception of each sub-region. Adjust the rendering resolution level through the visual perception, perform texture display and shading rendering on the entities, and output a three-dimensional metaverse scene. Step S5. Deploy the serialized rendering tasks to different computing platforms. Calculate the visual perception error of each level of model under the roaming path based on rendering quality, transmission latency, and adjacent time quality smoothness. Schedule the rendering tasks to control the visual perception error within a preset range.

[0014] Furthermore, step S1 includes: Step S11. Calibrate the focal length, principal point, distortion coefficient and relative position of the lenses of the binocular camera, perform stereo correction on the image, align the pixel rows of the left and right images, calculate the three-dimensional point cloud and camera pose through triangulation, and use the PNP algorithm to solve the new camera pose increment to add the new image. Step S12. Generate a dense point cloud using a stereo matching algorithm, segment the image in each frame, decompose the entity into a static background and a dynamic object, adjust the physical motion trajectory according to rigid body motion constraints, and merge the dynamic object model with the static background into the metaverse scene to generate a pose transformation matrix.

[0015] Furthermore, step S2 includes: Step S21. Obtain the projection image mesh of the 3D entity through triangular rasterization, project the image texture onto the image mesh, and register the reconstructed 3D model into the global coordinate system of the metaverse space through the pose transformation matrix. Step S22. Using a random vector model, the two-dimensional planar model is used as input to generate geometric primitives. A morphological connectivity operator is applied to each colored primitive, and each particle represents an irregular connected region. Topological relationships between particles are established to form a topological graph. Reachability matrix analysis is performed on the directed trees and directed loops in the topological graph to establish a homography mapping from the two-dimensional image to the surface of the three-dimensional model, thus obtaining the three-dimensional model.

[0016] Furthermore, step S3 includes: Step S31. Constrain the scene structure by surface unbiased estimation. Between the near and far planes of the user's view frustum, use the truncated section of a multidimensional Gaussian function as the view frustum sampling region. Define the weight distribution on the plane perpendicular to the line of sight. Based on the user's viewing distance, segment the foreground and background using the depth map. Step S32. Generate candidate viewpoints within the convex hull and axis-aligned bounding box of each sub-region. For each candidate viewpoint, calculate the curvature change of the visible surface. Calculate the entropy of the curvature distribution based on the curvature change as the surface saliency entropy. Select viewpoints with surface saliency entropy higher than the threshold to form the viewpoint set of the sub-region. Generate a roaming path by solving the TSP and smooth the inflection points in the roaming path.

[0017] Furthermore, step S4 includes: Step S41. Use the SLIC algorithm to segment the scene image into region blocks, calculate the information entropy of each image region block according to the roaming path length, model volume and texture density, calculate the visual perceptuality by weighted summation, adjust the rendering resource allocation of different level regions, calculate the reflection and transmission components through selective amplitude, depth smoothing constraints and asymptotic consistency, use GPU parallel rendering layer resources, and merge the three-dimensional scene. Step S42. Construct an inverse rendering network using the encoder and decoder to predict albedo and normals, calculate illumination intensity using the illumination model, and input the albedo map, normal map, and illumination intensity map into the neural radiation network to reconstruct images from various perspectives within the model.

[0018] Furthermore, step S5 includes: Step S51. Extract the semi-transparent entities in the scene, perform pseudo-supervision on the normal gradient through the depth gradient, calculate the propagation path of light in the semi-transparent entities, render the refraction view, obtain the complete metaverse rendering task, serialize the task, and deploy it to each computing platform, including: local GPU, cloud GPU cluster and edge device. Step S52. Fit the differences between views rendered by models of different quality levels using an image classification model, output the visual perception error, monitor the visual perception error in real time, and when the error exceeds the threshold, adjust the execution platform of the rendering task or the quality level of the model rendering to generate a scheduling decision.

[0019] Compared with the prior art, the beneficial effects achieved by the present invention are: 1. This invention uses a camera to capture scene entities, reconstructs the camera movement process, generates a two-dimensional planar object model in the metaverse space, creates a three-dimensional model composed of discrete geometric primitives, digitizes real-world entities and integrates them into the metaverse, improves rendering quality and view continuity, and has the advantages of fast rendering speed, low operation latency, good rendering quality and support for a wide range of data types.

[0020] 2. This invention constrains the scene structure from the user's perspective, segments the foreground according to the user's viewing distance, determines the viewpoint roaming path, and smooths the inflection points in the roaming path. This generates high-quality and personalized 3D scene roaming routes, achieving the goals of high roaming comfort, strong user immersion, and high scene information awareness. It also maintains stable performance in multi-view scene rendering and enhances the realism and interaction efficiency of the VR experience.

[0021] 3. This invention segments scene images, adjusts rendering resolution levels, performs texture display and shading rendering on entities in the metaverse scene, and outputs a three-dimensional metaverse scene. This allows for the scheduling of rendering content of different quality levels, ensuring visual details and perceptual quality of scene modeling, improving the rendering speed of new views, increasing the computational efficiency of GPU memory, enhancing the modeling performance of complex model scenes, and improving visual fidelity and stereoscopic rendering effects. Attached Figure Description

[0022] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used in conjunction with embodiments of the invention to explain the invention and do not constitute a limitation thereof. In the drawings: Figure 1 This is a schematic diagram of the structure of a metaverse scene rendering optimization system based on artificial intelligence according to the present invention; Figure 2 This is a schematic diagram illustrating the steps of an artificial intelligence-based metaverse scene rendering optimization method according to the present invention. Detailed Implementation

[0023] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0024] Please see Figures 1 to 2 The present invention provides a technical solution: a metaverse scene rendering optimization system based on artificial intelligence, comprising: an entity transformation unit, a model rendering unit, a view planning unit, a scene merging unit, and a computation execution module; The entity transformation unit is used to capture scene entities that need to be digitized using a binocular camera, obtain planar images, calibrate camera intrinsic parameters, synthesize fisheye images from planar images through optical axis transformation, reconstruct the camera movement process based on the real-time camera pose, use data continuity and overlap as constraints, replace the camera architecture with a panoramic architecture, register the camera coordinate system to the metaverse world coordinate system, and obtain the pose transformation matrix. The entity transformation unit includes: a camera calibration unit and a coordinate registration unit; The camera calibration unit is used to calibrate the focal length, principal point, distortion coefficient and lens relative position of the binocular camera, perform stereo correction on the image, align the pixel rows of the left and right images, calculate the three-dimensional point cloud and camera pose through triangulation, and solve the new camera pose increment by PNP to add a new image. The coordinate registration unit is used to generate a dense point cloud using a stereo matching algorithm, segment the image in each frame, decompose the entity into a static background and a dynamic object, adjust the physical motion trajectory according to rigid body motion constraints, and merge the dynamic object model with the static background into the metaverse scene to generate a pose transformation matrix.

[0025] The model rendering unit is used to create a two-dimensional planar object model in the metaverse space, project the model into a three-dimensional space through a pose transformation matrix, use the RVM geometry generation algorithm to assemble the two-dimensional planar object model according to the image shooting position attributes, generate geometric primitives, particleize the color primitives, establish the topological structure of the image particle space, blur the image particles, establish the homography mapping of pixels within the particles, and create a three-dimensional model composed of discrete geometric primitives. The model rendering unit includes: an entity creation unit, a primitive connectivity unit, and a 3D rendering unit; The entity creation unit is used to obtain the projection image mesh of the three-dimensional entity through triangular rasterization, project the image texture onto the image mesh, and register the reconstructed three-dimensional model into the global coordinate system of the metaverse space through the pose transformation matrix. The primitive connectivity unit is used to take a two-dimensional plane model as input using a random vector model to generate geometric primitives. A morphological connectivity operator is applied to each colored primitive, and each particle represents an irregular connected region. Topological relationships between particles are established to form a topological graph. The 3D rendering unit is used to perform reachability matrix analysis on directed trees and directed loops in the topology graph, establish homography mapping from 2D image to 3D model surface, and obtain 3D model.

[0026] The view planning unit is used to acquire the user's viewpoint, segment the foreground according to the user's viewing distance, divide the scene into sub-regions, construct axis-aligned bounding boxes for the sub-regions through convex hulls, sample each axis-aligned bounding box, calculate the surface saliency entropy of the viewpoint sample geometric metric model based on vertex curvature, filter samples with saliency entropy higher than the threshold, form viewpoint sets for each sub-region, sort the viewpoint sets within the sub-regions through the TSP algorithm, and determine the viewpoint roaming path. The perspective planning unit includes: a scene segmentation unit and a path roaming unit; The scene segmentation unit is used to constrain the scene structure by surface unbiased estimation, and to use the truncated section of a multidimensional Gaussian function as the frustum sampling region between the near and far planes of the user's frustum. It defines a weight distribution on a plane perpendicular to the line of sight and uses a depth map to segment the foreground and background according to the user's viewing distance. The path roaming unit is used to generate candidate viewpoints within the convex hull and axis-aligned bounding box of each sub-region. For each candidate viewpoint, the curvature change of the visible surface is calculated, and the entropy of the curvature distribution is calculated based on the curvature change as the surface saliency entropy. Viewpoints with surface saliency entropy higher than the threshold are selected to form the viewpoint set of the sub-region. A roaming path is generated by solving the TSP, and the inflection points in the roaming path are smoothed.

[0027] The scene merging unit is used to segment the scene image, calculate the visual perception based on information entropy, observation distance, motion speed and field of view eccentricity, adjust the rendering resolution level through the visual perception, adjust the rendering ratio of edge model, point model and mesh model according to the resolution level, perform texture display and color rendering on entities in the metaverse scene, and output a three-dimensional metaverse scene. The scene merging unit includes: a perception hierarchy unit, a transmission rendering unit, and a layer blending unit; The perception hierarchy unit is used to segment the scene image into region blocks using the SLIC algorithm, calculate the information entropy of each image region block based on the roaming path length, model volume and texture density, calculate the visual perception by weighted summation, and adjust the rendering resource allocation of different level regions. The transmission rendering unit is used to calculate reflection and transmission components through selective amplitude, depth smoothing constraints and asymptotic consistency, and to merge three-dimensional scenes by utilizing GPU parallel rendering layer resources. The layer fusion unit is used to construct an inverse rendering network from the encoder and decoder, predict albedo and normals, calculate light intensity through the lighting model, and input the albedo map, normal map and light intensity map into the neural radiation network to reconstruct the image of each viewpoint within the model.

[0028] The computation execution module is used to deploy the serialized rendering tasks to different computing platforms for execution. Based on the rendering quality, transmission latency and adjacent time quality smoothness, it calculates the view difference of each level of model under the roaming path, obtains the visual perception error, schedules rendering content of different quality levels, and controls the visual perception error within a preset range.

[0029] The computation execution module includes: a task allocation unit and a thread adjustment unit; The task allocation unit is used to extract semi-transparent entities in the scene, perform pseudo-supervision of the normal gradient through the depth gradient, calculate the propagation path of light in the semi-transparent entities, render the refraction view, obtain the complete metaverse rendering task, serialize the task, and deploy it to various computing platforms, including: local GPU, cloud GPU cluster and edge device. The thread adjustment unit is used to fit the difference between the view rendered by the model of different quality level and the highest quality view through the image classification model, output the visual perception error, monitor the visual perception error in real time, and adjust the execution platform or model rendering quality level of the rendering task when the error exceeds the threshold, and generate scheduling decisions.

[0030] An AI-based method for optimizing the rendering of metaverse scenes includes the following steps: Step S1. Use a binocular camera to photograph the entity, obtain a planar image, calibrate the camera intrinsic parameters, synthesize a fisheye image, reconstruct the camera movement process based on the real-time camera pose, register the camera coordinate system to the metaverse world coordinate system, and obtain the pose transformation matrix. Step S1 includes: Step S11. Calibrate the focal length, principal point, distortion coefficient and relative position of the lenses of the binocular camera, perform stereo correction on the image, align the pixel rows of the left and right images, calculate the three-dimensional point cloud and camera pose through triangulation, and use the PNP algorithm to solve the new camera pose increment to add the new image. Step S12. Generate a dense point cloud using a stereo matching algorithm, segment the image in each frame, decompose the entity into a static background and a dynamic object, adjust the physical motion trajectory according to rigid body motion constraints, and merge the dynamic object model with the static background into the metaverse scene to generate a pose transformation matrix.

[0031] Step S2. Project the two-dimensional planar model into the three-dimensional space through the pose transformation matrix, assemble the two-dimensional model according to the image shooting position, generate geometric primitives, particleize the primitives, establish the topological structure of the image particle space, and create a three-dimensional scene model based on the homography mapping of the pixels within the particles. Step S2 includes: Step S21. Obtain the projection image mesh of the 3D entity through triangular rasterization, project the image texture onto the image mesh, and register the reconstructed 3D model into the global coordinate system of the metaverse space through the pose transformation matrix. Step S22. Using a random vector model, the two-dimensional planar model is used as input to generate geometric primitives. A morphological connectivity operator is applied to each colored primitive, and each particle represents an irregular connected region. Topological relationships between particles are established to form a topological graph. Reachability matrix analysis is performed on the directed trees and directed loops in the topological graph to establish a homography mapping from the two-dimensional image to the surface of the three-dimensional model, thus obtaining the three-dimensional model.

[0032] Step S3. Divide the scene model into sub-regions according to the user's viewing distance, construct axis-aligned bounding boxes for the sub-regions using convex hulls, sample each axis-aligned bounding box, calculate the surface saliency entropy of the box model, filter samples with saliency entropy higher than the threshold, form the viewpoint set for each sub-region, sort the viewpoint sets within the sub-regions using the TSP algorithm, and determine the roaming path. Step S3 includes: Step S31. Constrain the scene structure by surface unbiased estimation. Between the near and far planes of the user's view frustum, use the truncated section of a multidimensional Gaussian function as the view frustum sampling region. Define the weight distribution on the plane perpendicular to the line of sight. Based on the user's viewing distance, segment the foreground and background using the depth map. Step S32. Generate candidate viewpoints within the convex hull and axis-aligned bounding box of each sub-region. For each candidate viewpoint, calculate the curvature change of the visible surface. Calculate the entropy of the curvature distribution based on the curvature change as the surface saliency entropy. Select viewpoints with surface saliency entropy higher than the threshold to form the viewpoint set of the sub-region. Generate a roaming path by solving the TSP and smooth the inflection points in the roaming path.

[0033] Step S4. Based on the weighted sum of information entropy, observation distance, motion speed and field of view eccentricity, obtain the visual perception of each sub-region. Adjust the rendering resolution level through the visual perception, perform texture display and shading rendering on the entities, and output a three-dimensional metaverse scene. Step S4 includes: Step S41. Use the SLIC algorithm to segment the scene image into region blocks, calculate the information entropy of each image region block according to the roaming path length, model volume and texture density, calculate the visual perceptuality by weighted summation, adjust the rendering resource allocation of different level regions, calculate the reflection and transmission components through selective amplitude, depth smoothing constraints and asymptotic consistency, use GPU parallel rendering layer resources, and merge the three-dimensional scene. Step S42. Construct an inverse rendering network using the encoder and decoder to predict albedo and normals, calculate illumination intensity using the illumination model, and input the albedo map, normal map, and illumination intensity map into the neural radiation network to reconstruct images from various perspectives within the model.

[0034] Step S5. Deploy the serialized rendering tasks to different computing platforms. Calculate the visual perception error of each level of model under the roaming path based on rendering quality, transmission latency, and adjacent time quality smoothness. Schedule the rendering tasks to control the visual perception error within a preset range.

[0035] Step S5 includes: Step S51. Extract the semi-transparent entities in the scene, perform pseudo-supervision on the normal gradient through the depth gradient, calculate the propagation path of light in the semi-transparent entities, render the refraction view, obtain the complete metaverse rendering task, serialize the task, and deploy it to each computing platform, including: local GPU, cloud GPU cluster and edge device. Step S52. Fit the differences between views rendered by models of different quality levels using an image classification model, output the visual perception error, monitor the visual perception error in real time, and when the error exceeds the threshold, adjust the execution platform of the rendering task or the quality level of the model rendering to generate a scheduling decision.

[0036] Example: Multi-angle shooting around the entity, image correction to eliminate distortion, feature matching across all images, generation of dense point clouds, separation of dynamic and static objects, model integration, texture mapping, construction of a 3D model, initial viewing position and direction specified by the user in the metaverse scene, acquisition of scene geometric structure information, definition of view frustum sampling region, division of scene sub-regions, construction of axis-aligned bounding boxes, screening of viewpoints with saliency entropy higher than the threshold, obtaining the shortest path through all viewpoints by solving the TSP, smoothing using spline curves to obtain the roaming route, calculation of information entropy of each region block on the route, adjustment of rendering resolution level of each region block, advanced shading and lighting calculations, merging the rendering process, and completion of rendering.

[0037] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.

[0038] Finally, it should be noted that the above descriptions are merely preferred embodiments of the present invention and are not intended to limit the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing embodiments or make equivalent substitutions for some of the technical features. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. An artificial intelligence-based meta-universe scene rendering optimization method, characterized in that, The method includes the following steps: Step S1. Use a binocular camera to photograph the entity, obtain a planar image, calibrate the camera intrinsic parameters, synthesize a fisheye image, reconstruct the camera movement process based on the real-time camera pose, register the camera coordinate system to the metaverse world coordinate system, and obtain the pose transformation matrix. Step S2. Project the two-dimensional planar model into the three-dimensional space through the pose transformation matrix, assemble the two-dimensional model according to the image shooting position, generate geometric primitives, particleize the primitives, establish the topological structure of the image particle space, and create a three-dimensional scene model based on the homography mapping of the pixels within the particles. Step S3. Divide the scene model into sub-regions according to the user's viewing distance, construct axis-aligned bounding boxes for the sub-regions using convex hulls, sample each axis-aligned bounding box, calculate the surface saliency entropy of the box model, filter samples with saliency entropy higher than the threshold, form the viewpoint set for each sub-region, sort the viewpoint sets within the sub-regions using the TSP algorithm, and determine the roaming path. Step S4. Based on the weighted sum of information entropy, observation distance, motion speed and field of view eccentricity, obtain the visual perception of each sub-region. Adjust the rendering resolution level through the visual perception, perform texture display and shading rendering on the entities, and output a three-dimensional metaverse scene. Step S5. Deploy the serialized rendering tasks to different computing platforms. Calculate the visual perception error of each level of model under the roaming path based on rendering quality, transmission latency, and adjacent time quality smoothness. Schedule the rendering tasks to control the visual perception error within a preset range.

2. The artificial intelligence-based meta-universe scene rendering optimization method according to claim 1, characterized in that: Step S1 includes: Step S11. Calibrate the focal length, principal point, distortion coefficient and relative position of the lenses of the binocular camera, perform stereo correction on the image, align the pixel rows of the left and right images, calculate the three-dimensional point cloud and camera pose through triangulation, and use the PNP algorithm to solve the new camera pose increment to add the new image. Step S12. Generate a dense point cloud using a stereo matching algorithm, segment the image in each frame, decompose the entity into a static background and a dynamic object, adjust the physical motion trajectory according to rigid body motion constraints, and merge the dynamic object model with the static background into the metaverse scene to generate a pose transformation matrix.

3. The artificial intelligence-based meta-universe scene rendering optimization method according to claim 2, characterized in that: Step S2 includes: Step S21. Obtain the projection image mesh of the 3D entity through triangular rasterization, project the image texture onto the image mesh, and register the reconstructed 3D model into the global coordinate system of the metaverse space through the pose transformation matrix. Step S22. Using a random vector model, the two-dimensional planar model is used as input to generate geometric primitives. Morphological connectivity operators are applied to each colored primitive, and each particle represents an irregular connected region. Topological relationships between particles are established to form a topological graph. Reachability matrix analysis is performed on the directed trees and directed loops in the topological graph to establish a homography mapping from the two-dimensional image to the surface of the three-dimensional model, thus obtaining the three-dimensional model. Step S3 includes: Step S31. Constrain the scene structure by surface unbiased estimation. Between the near and far planes of the user's view frustum, use the truncated section of a multidimensional Gaussian function as the view frustum sampling region. Define the weight distribution on the plane perpendicular to the line of sight. Based on the user's viewing distance, segment the foreground and background using the depth map. Step S32. Generate candidate viewpoints within the convex hull and axis-aligned bounding box of each sub-region. For each candidate viewpoint, calculate the curvature change of the visible surface. Calculate the entropy of the curvature distribution based on the curvature change as the surface saliency entropy. Select viewpoints with surface saliency entropy higher than the threshold to form the viewpoint set of the sub-region. Generate a roaming path by solving the TSP and smooth the inflection points in the roaming path.

4. The artificial intelligence-based meta-universe scene rendering optimization method according to claim 3, characterized in that: Step S4 includes: Step S41. Use the SLIC algorithm to segment the scene image into region blocks, calculate the information entropy of each image region block according to the roaming path length, model volume and texture density, calculate the visual perceptuality by weighted summation, adjust the rendering resource allocation of different level regions, calculate the reflection and transmission components through selective amplitude, depth smoothing constraints and asymptotic consistency, use GPU parallel rendering layer resources, and merge the three-dimensional scene. Step S42. Construct an inverse rendering network using the encoder and decoder to predict albedo and normals, calculate illumination intensity using the illumination model, and input the albedo map, normal map, and illumination intensity map into the neural radiation network to reconstruct images from various perspectives within the model.

5. The artificial intelligence-based meta-universe scene rendering optimization method according to claim 4, characterized in that: Step S5 includes: Step S51. Extract the semi-transparent entities in the scene, perform pseudo-supervision on the normal gradient through the depth gradient, calculate the propagation path of light in the semi-transparent entities, render the refraction view, obtain the complete metaverse rendering task, serialize the task, and deploy it to each computing platform, including: local GPU, cloud GPU cluster and edge device. Step S52. Fit the differences between views rendered by models of different quality levels using an image classification model, output the visual perception error, monitor the visual perception error in real time, and when the error exceeds the threshold, adjust the execution platform of the rendering task or the quality level of the model rendering to generate a scheduling decision.

6. A metaverse scene rendering optimization system based on artificial intelligence, characterized in that, The system includes the following modules: entity transformation unit, model rendering unit, view planning unit, scene merging unit, and operation execution module; The entity transformation unit is used to capture scene entities that need to be digitized using a binocular camera, obtain planar images, calibrate camera intrinsic parameters, synthesize fisheye images from planar images through optical axis transformation, reconstruct the camera movement process based on the real-time camera pose, use data continuity and overlap as constraints, replace the camera architecture with a panoramic architecture, register the camera coordinate system to the metaverse world coordinate system, and obtain the pose transformation matrix. The model rendering unit is used to create a two-dimensional planar object model in the metaverse space, project the model into a three-dimensional space through a pose transformation matrix, use the RVM geometry generation algorithm to assemble the two-dimensional planar object model according to the image shooting position attributes, generate geometric primitives, particleize the color primitives, establish the topological structure of the image particle space, blur the image particles, establish the homography mapping of pixels within the particles, and create a three-dimensional model composed of discrete geometric primitives. The view planning unit is used to acquire the user's viewpoint, segment the foreground according to the user's viewing distance, divide the scene into sub-regions, construct axis-aligned bounding boxes for the sub-regions through convex hulls, sample each axis-aligned bounding box, calculate the surface saliency entropy of the viewpoint sample geometric metric model based on vertex curvature, filter samples with saliency entropy higher than the threshold, form viewpoint sets for each sub-region, sort the viewpoint sets within the sub-regions through the TSP algorithm, and determine the viewpoint roaming path. The scene merging unit is used to segment the scene image, calculate the visual perception based on information entropy, observation distance, motion speed and field of view eccentricity, adjust the rendering resolution level through the visual perception, adjust the rendering ratio of edge model, point model and mesh model according to the resolution level, perform texture display and color rendering on entities in the metaverse scene, and output a three-dimensional metaverse scene. The computation execution module is used to deploy the serialized rendering tasks to different computing platforms for execution. Based on the rendering quality, transmission latency and adjacent time quality smoothness, it calculates the view difference of each level of model under the roaming path, obtains the visual perception error, schedules rendering content of different quality levels, and controls the visual perception error within a preset range.

7. The metaverse scene rendering optimization system based on artificial intelligence according to claim 6, characterized in that: The entity transformation unit includes: a camera calibration unit and a coordinate registration unit; The camera calibration unit is used to calibrate the focal length, principal point, distortion coefficient and lens relative position of the binocular camera, perform stereo correction on the image, align the pixel rows of the left and right images, calculate the three-dimensional point cloud and camera pose through triangulation, and solve the new camera pose increment by PNP to add a new image. The coordinate registration unit is used to generate a dense point cloud using a stereo matching algorithm, segment the image in each frame, decompose the entity into a static background and a dynamic object, adjust the physical motion trajectory according to rigid body motion constraints, and merge the dynamic object model with the static background into the metaverse scene to generate a pose transformation matrix.

8. The metaverse scene rendering optimization system based on artificial intelligence according to claim 7, characterized in that: The model rendering unit includes: an entity creation unit, a primitive connectivity unit, and a 3D rendering unit; The entity creation unit is used to obtain the projection image mesh of the three-dimensional entity through triangular rasterization, project the image texture onto the image mesh, and register the reconstructed three-dimensional model into the global coordinate system of the metaverse space through the pose transformation matrix. The primitive connectivity unit is used to take a two-dimensional plane model as input using a random vector model to generate geometric primitives. A morphological connectivity operator is applied to each colored primitive, and each particle represents an irregular connected region. Topological relationships between particles are established to form a topological graph. The 3D rendering unit is used to perform reachability matrix analysis on directed trees and directed loops in the topology graph, establish homography mapping from 2D image to 3D model surface, and obtain 3D model.

9. The metaverse scene rendering optimization system based on artificial intelligence according to claim 8, characterized in that: The perspective planning unit includes: a scene segmentation unit and a path roaming unit; The scene segmentation unit is used to constrain the scene structure by surface unbiased estimation, and to use the truncated section of a multidimensional Gaussian function as the frustum sampling region between the near and far planes of the user's frustum. It defines a weight distribution on a plane perpendicular to the line of sight and uses a depth map to segment the foreground and background according to the user's viewing distance. The path roaming unit is used to generate candidate viewpoints within the convex hull and axis-aligned bounding box of each sub-region. For each candidate viewpoint, the curvature change of the visible surface is calculated, and the entropy of the curvature distribution is calculated based on the curvature change as the surface saliency entropy. Viewpoints with surface saliency entropy higher than the threshold are selected to form the viewpoint set of the sub-region. A roaming path is generated by solving the TSP, and the inflection points in the roaming path are smoothed. The scene merging unit includes: a perception hierarchy unit, a transmission rendering unit, and a layer blending unit; The perception hierarchy unit is used to segment the scene image into region blocks using the SLIC algorithm, calculate the information entropy of each image region block based on the roaming path length, model volume and texture density, calculate the visual perception by weighted summation, and adjust the rendering resource allocation of different level regions. The transmission rendering unit is used to calculate reflection and transmission components through selective amplitude, depth smoothing constraints and asymptotic consistency, and to merge three-dimensional scenes by utilizing GPU parallel rendering layer resources. The layer fusion unit is used to construct an inverse rendering network from the encoder and decoder, predict albedo and normals, calculate light intensity through the lighting model, and input the albedo map, normal map and light intensity map into the neural radiation network to reconstruct the image of each viewpoint within the model.

10. The metaverse scene rendering optimization system based on artificial intelligence according to claim 9, characterized in that: The computation execution module includes: a task allocation unit and a thread adjustment unit; The task allocation unit is used to extract semi-transparent entities in the scene, perform pseudo-supervision of the normal gradient through the depth gradient, calculate the propagation path of light in the semi-transparent entities, render the refraction view, obtain the complete metaverse rendering task, serialize the task, and deploy it to various computing platforms, including: local GPU, cloud GPU cluster and edge device. The thread adjustment unit is used to fit the difference between the view rendered by the model of different quality level and the highest quality view through the image classification model, output the visual perception error, monitor the visual perception error in real time, and adjust the execution platform or model rendering quality level of the rendering task when the error exceeds the threshold, and generate scheduling decisions.