Dynamic scene reconstruction method, apparatus, device, medium, and product

By adaptively optimizing and training 4D Gaussian representation and scale-aware residual fields, the shortcomings of rendering quality and real-time rendering speed in dynamic scene reconstruction are solved, and high-quality and real-time dynamic scene rendering is achieved.

CN119169183BActive Publication Date: 2026-06-16PEKING UNIV SHENZHEN GRADUATE SCHOOL

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
PEKING UNIV SHENZHEN GRADUATE SCHOOL
Filing Date
2024-08-21
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing dynamic scene reconstruction methods have shortcomings in rendering quality and real-time rendering speed. They are difficult to accurately capture spatial size and temporal changes in dynamic scenes, and they ignore spatiotemporal information when dealing with complex dynamic scenes.

Method used

A dynamic scene reconstruction method based on 4D Gaussian representation is adopted. By adaptively optimizing the trained 4D Gaussian representation, combining the scale-aware residual field to encode the region occupied by Gaussian elements, and using a multilayer perceptron to decode the residual features, the learning rate and gradient threshold are dynamically adjusted to achieve high-quality and real-time rendering.

🎯Benefits of technology

It achieves accurate reconstruction of high-frequency time information in dynamic scenes, enables rendering at any time and from any perspective, improves rendering quality, and enables real-time rendering.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119169183B_ABST
    Figure CN119169183B_ABST
Patent Text Reader

Abstract

The application discloses a dynamic scene reconstruction method and device, equipment, medium and product, and relates to the technical field of multimedia. The method comprises the following steps: in response to receiving a dynamic scene reconstruction instruction, determining a rendering time and a rendering view angle according to the dynamic scene reconstruction instruction; and based on a pre-trained 4D Gaussian representation, rendering according to the rendering time and the rendering view angle to obtain a target rendering view. The 4D Gaussian representation is obtained by adaptively optimizing and training a Gaussian cell in a 4D space based on a scale-aware residual field. The scale-aware residual field is used to encode the area occupied by the Gaussian cell, so that more accurate features are generated and aligned with the self-splitting behavior of the Gaussian cell. Through an adaptive optimization strategy, the ability of the obtained 4D Gaussian representation to reconstruct high-frequency time information in a dynamic scene is enhanced. The 4D Gaussian representation obtained by training can be used for rendering at any time and from any view angle, and the rendering quality of the dynamic scene is improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of multimedia technology, and in particular to a dynamic scene reconstruction method, apparatus, device, medium and product. Background Technology

[0002] Dynamic scene reconstruction is crucial for immersive imaging, driving the development of various multimedia technologies such as VR, AR, and the metaverse. Rendering dynamic scenes from arbitrary time, location, and viewpoint is essential for enhancing the user experience of multimedia products such as free-viewpoint video and bullet-time effects.

[0003] Currently, significant progress has been made in dynamic scene reconstruction methods based on Neural Radiance Fields (NeRF) and 3DGS. NeRF employs implicit fields to simulate static scenes and achieves realistic viewpoint synthesis. Many NeRF extensions for dynamic scenes either utilize deformation and canonical fields to simulate the motion of objects relative to canonical frames over time, or decompose a four-dimensional volume into spatial and spatiotemporal spaces, representing the space by combining dimensionality-reduced features. Despite significant advancements in rendering quality, these methods suffer from a marked disadvantage in rendering speed. The advent of 3DGS has made real-time rendering of dynamic scenes possible. Some methods attempt to model dynamic scenes based on 3DGS. However, they either struggle to simulate temporally complex scenes, such as object appearance and disappearance, or ignore spatiotemporal information within the scene, resulting in a disadvantage when handling temporally complex dynamic scenes.

[0004] The above content is only used to help understand the technical solution of this application and does not represent an admission that the above content is prior art. Summary of the Invention

[0005] The main purpose of this application is to provide a method, apparatus, device, medium and product for dynamic scene reconstruction, which aims to improve the rendering quality of dynamic scenes.

[0006] To achieve the above objectives, this application provides a dynamic scene reconstruction method, the method comprising:

[0007] In response to receiving a dynamic scene reconstruction instruction, the rendering time and rendering perspective are determined according to the dynamic scene reconstruction instruction;

[0008] Based on a pre-trained 4D Gaussian representation, the target rendered view is obtained by rendering according to the rendering time and rendering view.

[0009] The 4D Gaussian representation is obtained by adaptively optimizing and training Gaussian primitives in 4D space based on a scale-aware residual field.

[0010] In one embodiment, the step of obtaining the target rendered view based on the pre-trained 4D Gaussian representation and rendering according to the rendering time and rendering viewpoint further includes:

[0011] Obtain a calibrated multi-view video sequence, wherein the calibrated multi-view video sequence includes sampling time and corresponding real image;

[0012] Based on the calibrated multi-view video sequence, initialize 4D Gaussian primitives and scale-aware residual fields in 4D space;

[0013] Given the sampling time, determine the residual characteristics of the 4D Gaussian elements in the scale-aware residual field;

[0014] Based on the sampling time and residual characteristics, calculate the survival state and attribute residuals of the 4D Gaussian unit at the sampling time;

[0015] By combining the initial properties of the 4D Gaussian primitives with the residuals of those properties, a 3D Gaussian representation in 3D space is obtained.

[0016] The 3D Gaussian representation is projected to obtain a rendered image;

[0017] Gradient inversion is performed based on the loss between the rendered image and the real image, and adaptive optimization training is performed on the 4D Gaussian representation to obtain the 4D Gaussian representation.

[0018] In one embodiment, the step of calculating the residuals of the survival state and properties of the 4D Gaussian unit at the sampling time based on the sampling time and residual characteristics includes:

[0019] Based on the sampling time and residual characteristics, the survival status of the 4D Gaussian unit at the sampling time is calculated;

[0020] The residual features of the 4D Gaussian elements at the sampling time are decoded using a multilayer perceptron, and combined with the survival state, the residual of the attribute is obtained.

[0021] In one embodiment, the scale-aware residual field is represented by six planes, which include a spatial plane and a spatiotemporal plane.

[0022] In one embodiment, the step of determining the residual characteristics of the 4D Gaussian elements in the scale-aware residual field includes:

[0023] A multi-scale feature representation framework is established using pyramid texture mapping stacking technology;

[0024] The lowest resolution level is optimized by dynamically calculating and storing pyramid texture mapping levels.

[0025] Based on the projection size and basic spatial scale of the 4D Gaussian primitive, the scaling level of the 4D Gaussian primitive in the multi-scale feature representation framework is determined;

[0026] The residual features are obtained by performing scale-aware feature extraction based on the lowest resolution level and scaling level.

[0027] In one embodiment, the step of adaptively optimizing and training the 4D Gaussian units to obtain the 4D Gaussian representation includes:

[0028] Obtain the sampling probability of the 4D Gaussian unit;

[0029] The cumulative distribution function is obtained by calculating the definite integral of the 4D Gaussian element in the time domain based on the sampling probability.

[0030] The densification threshold and learning rate of the 4D Gaussian unit are dynamically adjusted according to the cumulative distribution function, and iterative training is performed until the preset number of iterations is reached to obtain the 4D Gaussian representation.

[0031] Furthermore, to achieve the above objectives, this application also proposes a dynamic scene reconstruction device, which includes:

[0032] The response module is used to respond to the received dynamic scene reconstruction instruction and determine the rendering time and rendering perspective according to the dynamic scene reconstruction instruction;

[0033] The rendering module is used to render based on a pre-trained 4D Gaussian representation, according to the rendering time and rendering view, to obtain the target rendered view;

[0034] The 4D Gaussian representation is obtained by adaptively optimizing and training Gaussian primitives in 4D space based on a scale-aware residual field.

[0035] In addition, to achieve the above objectives, this application also proposes a dynamic scene reconstruction device, the device comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the dynamic scene reconstruction method as described above.

[0036] In addition, to achieve the above objectives, this application also proposes a storage medium, which is a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, it implements the steps of the dynamic scene reconstruction method described above.

[0037] In addition, to achieve the above objectives, this application also provides a computer program product, which includes a computer program that, when executed by a processor, implements the steps of the dynamic scene reconstruction method described above.

[0038] One or more technical solutions proposed in this application have at least the following technical effects:

[0039] In response to a received dynamic scene reconstruction command, the rendering time and rendering viewpoint are determined according to the command. Based on a pre-trained 4D Gaussian representation, rendering is performed according to the rendering time and viewpoint to obtain the target rendered view. The 4D Gaussian representation is obtained by adaptively optimizing and training Gaussian elements in 4D space using a scale-aware residual field. The scale-aware residual field encodes the regions occupied by Gaussian elements, thereby generating more accurate features and aligning with the self-splitting behavior of Gaussian elements. An adaptive optimization strategy is used to enhance the ability of the obtained 4D Gaussian representation to reconstruct high-frequency temporal information in dynamic scenes. The trained 4D Gaussian representation can perform rendering at any time and from any viewpoint, achieving real-time rendering and ensuring high-quality reconstruction, thus improving the rendering quality of dynamic scenes. Attached Figure Description

[0040] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0041] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0042] Figure 1 This is a flowchart illustrating an embodiment of the dynamic scene reconstruction method of this application.

[0043] Figure 2 This is a flowchart illustrating Embodiment 2 of the dynamic scene reconstruction method of this application;

[0044] Figure 3 This is a schematic diagram of the overall process according to the second embodiment of this application;

[0045] Figure 4 This is a flowchart illustrating Embodiment 3 of the dynamic scene reconstruction method of this application;

[0046] Figure 5 This is a flowchart illustrating Embodiment 4 of the dynamic scene reconstruction method of this application;

[0047] Figure 6 This is a schematic diagram showing a comparison of effects according to the fourth embodiment of this application;

[0048] Figure 7 This is a schematic diagram of the module structure of the dynamic scene reconstruction device according to an embodiment of this application;

[0049] Figure 8 This is a schematic diagram of the device structure of the hardware operating environment involved in the dynamic scene reconstruction method in this application embodiment.

[0050] The purpose, features, and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0051] It should be understood that the specific embodiments described herein are merely illustrative of the technical solutions of this application and are not intended to limit this application.

[0052] To better understand the technical solution of this application, a detailed description will be provided below in conjunction with the accompanying drawings and specific implementation methods.

[0053] The main solution of this application embodiment is as follows: in response to receiving a dynamic scene reconstruction instruction, the rendering time and rendering view are determined according to the dynamic scene reconstruction instruction; based on a pre-trained 4D Gaussian representation, rendering is performed according to the rendering time and rendering view to obtain a target rendering view; wherein, the 4D Gaussian representation is obtained by adaptively optimizing and training Gaussian elements in 4D space based on a scale-aware residual field, and the region occupied by the Gaussian elements is encoded by the scale-aware residual field, thereby generating more accurate features and aligning with the self-splitting behavior of the Gaussian elements; through an adaptive optimization strategy, the ability of the obtained 4D Gaussian representation to reconstruct high-frequency temporal information in dynamic scenes is enhanced, and the 4D Gaussian representation obtained by training can perform rendering at any time and any view, realizing real-time rendering and ensuring high-quality reconstruction, thereby improving the rendering quality of dynamic scenes.

[0054] In this embodiment, for ease of description, the dynamic scene reconstruction device will be used as the execution subject in the following description.

[0055] In existing technologies, dynamic scene reconstruction is crucial for immersive imaging, driving the development of various multimedia technologies such as VR, AR, and metaverse. Rendering dynamic scenes from arbitrary time, location, and viewpoint is essential for enhancing the user experience of multimedia products such as free-viewpoint video and bullet-time effects. Our goal is to reconstruct a continuous four-dimensional space from discrete temporal video sequences. However, this effort faces several challenges. First, reconstruction quality, a bottleneck in widespread adoption, requires accurately capturing spatial dimensions and temporal changes in dynamic scenes. Furthermore, the increasing demand for real-time interaction in multimedia products to enhance user engagement underscores the importance of achieving real-time rendering. However, existing methods struggle to simultaneously achieve high-quality reconstruction and real-time rendering, which is precisely the problem our method aims to address.

[0056] Currently, significant progress has been made in dynamic scene reconstruction methods based on NeRF and 3DGS. NeRF employs implicit fields to simulate static scenes and achieves realistic viewpoint synthesis. Many NeRF extensions for dynamic scenes either utilize deformation and canonical fields to simulate the motion of objects relative to canonical frames over time, or decompose the four-dimensional volume into spatial and spatiotemporal spaces, representing the space by combining dimensionality-reduced features. Despite significant advancements in rendering quality, these methods suffer from a marked disadvantage in rendering speed. The advent of 3DGS has made real-time rendering of dynamic scenes possible. Some methods attempt to model dynamic scenes based on 3DGS. However, some methods either struggle to simulate temporally complex scenes, such as object appearance and disappearance, or ignore spatiotemporal information in the scene, resulting in disadvantages when handling temporally complex dynamic scenes.

[0057] This application provides a solution, specifically proposing a dynamic scene reconstruction method based on Gaussian projectiles, which can reconstruct medium-to-high quality dynamic scenes with complex temporal information from monocular or multi-view video sequences.

[0058] It should be noted that the executing entity in this embodiment can be a computing service device with data processing, network communication, and program execution functions, such as a tablet computer, personal computer, or mobile phone, or an electronic device or dynamic scene reconstruction device capable of performing the above functions. The following description uses a dynamic scene reconstruction device as an example to illustrate this embodiment and the subsequent embodiments.

[0059] Based on this, embodiments of this application provide a dynamic scene reconstruction method, referring to... Figure 1 , Figure 1 This is a flowchart illustrating the first embodiment of the dynamic scene reconstruction method of this application.

[0060] In this embodiment, the dynamic scene reconstruction method includes steps S10 to S20:

[0061] Step S10: In response to receiving a dynamic scene reconstruction instruction, determine the rendering time and rendering perspective according to the dynamic scene reconstruction instruction;

[0062] Step S20: Based on the pre-trained 4D Gaussian representation, render according to the rendering time and rendering view to obtain the target rendering view. The 4D Gaussian representation is obtained by adaptively optimizing and training Gaussian primitives in 4D space based on the scale-aware residual field.

[0063] Specifically, in this embodiment of the application, a 4D Gaussian representation is obtained by training with a calibrated multi-view video sequence, which can be used for view rendering at any time or any viewpoint corresponding to the video sequence.

[0064] Optionally, in this embodiment, a set of four-dimensional Gaussian primitives and a scale-aware residual field M are simultaneously optimized. When combined with M, each Gaussian primitive generates a residual feature and a lifetime o'. Both represent the temporal characteristics of the Gaussian primitive. Given a sampling time t0, in this embodiment, the survival state g(t0) of the Gaussian can be calculated, and the Gaussian residual feature at time t0 can be decoded using an MLP to obtain the residual properties. Furthermore, in this embodiment, these residuals are combined with the initial properties of the Gaussian in four-dimensional space to obtain a three-dimensional Gaussian representation. After obtaining the three-dimensional Gaussian representation, a rendered image can be generated using the Gaussian projectile method.

[0065] Optionally, this application proposes a scale-aware residual field to encode the region occupied by Gaussian elements, thereby obtaining more accurate features and aligning with the self-splitting behavior of Gaussian elements. Specifically, the 4D space is decomposed into three planes Pso containing only spatial dimensions and three spatiotemporal planes Pst. Since the size of a Gaussian element only affects its projection within the spatial plane Pso, this application specifically considers using scale-aware encoding only within these planes. For each plane P containing only spatial dimensions, a MipMap stack can be used to represent features at different spatial scales in the scene. The 0th layer of the MipMap stack is a feature map of shape M×N×N, with the smallest spatial scale of all layers. The remaining layers are obtained by calculating thumbnails based on the features of the previous layer, where the width and height are each halved. After the MipMap stack is built, appropriate features can be obtained for a Gaussian element based on its geometry.

[0066] Optionally, this application introduces an adaptive optimization strategy to enhance the model's ability to reconstruct high-frequency temporal information in dynamic scenes. Specifically, this strategy dynamically adjusts the learning rate and gradient density threshold based on the sampling probability of the Gaussian element within the observable time range. Specifically, in this embodiment, its survival function can be used to calculate the time integral within the observable range, representing its sampling probability. The larger the integral, the more the Gaussian element's lifetime intersects with the observable range, thus making it more likely to be sampled. Further, in this embodiment, different learning rates and density thresholds are assigned to it based on its sampling probability.

[0067] This embodiment, through the above scheme, specifically responds to a received dynamic scene reconstruction command, determines the rendering time and rendering viewpoint according to the dynamic scene reconstruction command; based on a pre-trained 4D Gaussian representation, rendering is performed according to the rendering time and rendering viewpoint to obtain the target rendered view; wherein, the 4D Gaussian representation is obtained by adaptively optimizing and training Gaussian elements in 4D space based on a scale-aware residual field, using the scale-aware residual field to encode the regions occupied by Gaussian elements, thereby generating more accurate features and aligning with the self-splitting behavior of Gaussian elements; through an adaptive optimization strategy, the ability of the obtained 4D Gaussian representation to reconstruct high-frequency temporal information in dynamic scenes is enhanced, and the trained 4D Gaussian representation can perform rendering at any time and any viewpoint, achieving real-time rendering and ensuring high-quality reconstruction, thus improving the rendering quality of dynamic scenes.

[0068] Based on the first embodiment of this application, a second embodiment of this application is proposed. In this second embodiment, content that is the same as or similar to that in the first embodiment described above can be referred to the above description and will not be repeated hereafter. Based on this, please refer to... Figure 2 Before step S20, the dynamic scene reconstruction method further includes steps S01 to S07:

[0069] Step S01: Obtain the calibrated multi-view video sequence, wherein the calibrated multi-view video sequence includes the sampling time and the corresponding real image;

[0070] Step S02: Based on the calibrated multi-view video sequence, initialize 4D Gaussian primitives and scale-aware residual fields in 4D space;

[0071] Step S03: Given the sampling time, determine the residual characteristics of the 4D Gaussian unit in the scale-aware residual field;

[0072] Step S04: Based on the sampling time and residual characteristics, calculate the survival state and attribute residuals of the 4D Gaussian unit at the sampling time;

[0073] Step S05: Combine the initial properties of the 4D Gaussian primitives with the residuals of the properties to obtain the 3D Gaussian representation in 3D space;

[0074] Step S06: Project the 3D Gaussian representation to obtain a rendered image;

[0075] Step S07: Perform gradient decoupling based on the loss between the rendered image and the real image, and adaptively optimize and train the 4D Gaussian representation to obtain the 4D Gaussian representation.

[0076] Reference Figure 3 , Figure 3 This is a schematic diagram of the overall process according to the second embodiment of this application, as shown below. Figure 3 As shown, where σ: Lσ represents the lifetime of the 4D Gaussian, τ: represents the position of the 4D Gaussian in the time domain, and P SO : Represents the spatial plane P St Let represent the spacetime plane, and Δμ, Δ∑, Δc represent the residuals of the Gaussian properties.

[0077] Optionally, first, the set of Gaussian elements to be optimized in four-dimensional space (three-dimensional space plus the time dimension) and the scale-aware residual field M associated with these Gaussian elements are determined. Then, for each Gaussian element, its residual characteristics and lifetime o' are calculated and defined. These parameters describe the dynamic changes of the Gaussian elements over time. For a specific sampling time point t0, the survival probability t(t0) of each Gaussian element is calculated, which reflects the activity level of the Gaussian element at that time point.

[0078] Optionally, the scale-aware residual field is represented by six planes, which include a spatial plane and a spatiotemporal plane.

[0079] Optionally, the step of calculating the residuals of the survival state and properties of the 4D Gaussian unit at the sampling time based on the sampling time and residual characteristics includes:

[0080] Based on the sampling time and residual characteristics, the survival status of the 4D Gaussian unit at the sampling time is calculated;

[0081] The residual features of the 4D Gaussian elements at the sampling time are decoded using a multilayer perceptron, and combined with the survival state, the residual of the attribute is obtained.

[0082] Optionally, in this embodiment, a multilayer perceptron (MLP) is used to decode the residual features of each Gaussian element at time point t0 to predict its feature state at that time point. The decoded residual attributes are combined with the initial three-dimensional attributes of the Gaussian element to form a complete three-dimensional Gaussian representation, which will be used for subsequent image rendering. Using the obtained three-dimensional Gaussian representation, the final rendered image is generated using techniques such as Gaussian projection.

[0083] This embodiment, through the above scheme, specifically obtains a calibrated multi-view video sequence, wherein the calibrated multi-view video sequence includes a sampling time and the corresponding real image; based on the calibrated multi-view video sequence, initializes 4D Gaussian elements and a scale-aware residual field in 4D space; given the sampling time, determines the residual features of the 4D Gaussian elements in the scale-aware residual field; based on the sampling time and residual features, calculates the survival state and attribute residuals of the 4D Gaussian elements at the sampling time; combines the initial attributes of the 4D Gaussian elements with the attribute residuals to obtain a 3D Gaussian representation in 3D space; for The 3D Gaussian representation is projected to obtain a rendered image. Gradient inversion is performed based on the loss between the rendered image and the real image, and the 4D Gaussian primitives are adaptively optimized and trained to obtain the 4D Gaussian representation. A scale-aware residual field is used to encode the regions occupied by the Gaussian primitives, thereby generating more accurate features and aligning with the self-splitting behavior of the Gaussian primitives. An adaptive optimization strategy is employed to enhance the ability of the obtained 4D Gaussian representation to reconstruct high-frequency temporal information in dynamic scenes. The trained 4D Gaussian representation can perform rendering at any time and from any viewpoint, achieving real-time rendering and ensuring high-quality reconstruction, thus improving the rendering quality of dynamic scenes.

[0084] Based on any of the above embodiments of this application, a third embodiment of this application is proposed. In this third embodiment, content that is the same as or similar to any of the above embodiments can be referred to the above description, and will not be repeated hereafter. Based on this, please refer to... Figure 4 The step of determining the residual characteristics of the 4D Gaussian element in the scale-aware residual field includes steps S031 to S034:

[0085] Step S031: Establish a multi-scale feature representation framework using pyramid texture mapping stacking technology;

[0086] Step S032: The lowest resolution level is optimized by dynamically calculating, storing, and processing the pyramid texture mapping levels.

[0087] Step S033: Determine the scaling level of the 4D Gaussian primitive in the multi-scale feature representation framework based on the projection size and basic spatial scale of the 4D Gaussian primitive.

[0088] Step S034: Scale-aware feature extraction is performed based on the lowest resolution level and scaling level to obtain the residual features.

[0089] Optionally, to fully integrate the spatiotemporal information of the scene and save computational resources, this embodiment uses a six-plane representation to depict the scale-aware residual field M, which consists of a spatial-only plane and a spatiotemporal plane. However, ignoring the size of Gaussian primitives and projecting them onto the planes solely based on their 4D positions for feature extraction leads to incorrect residual features. First, Gaussian primitives can be approximated as ellipsoids. Therefore, when projecting Gaussian primitives onto the spatial-only plane, an elliptical region is obtained, rather than a single point as in current NeRF-based methods. Thus, the corresponding features of a Gaussian primitive should be a combination of all the regions it occupies. Second, if the self-splitting strategy of 3DGS is followed and large Gaussian primitives are divided into smaller primitives, they will have different residual features that deviate significantly from the features of their parent primitives, contradicting the original intent. Therefore, finding an appropriate method to encode the regions of the Gaussian projection is crucial.

[0090] Optionally, this application proposes a scale-aware residual field to address the above-mentioned problem, which decomposes 4D space into three spatial-only planes Pso and three spatiotemporal planes Pst. Given that the size of Gaussian elements only affects their projections within the spatial-only planes Pso, this application specifically considers using scale-aware coding only within these planes.

[0091] Optionally, for each spatial plane Pi,j, this embodiment employs a MipMap stack to represent features at different spatial scales in the scene. Level 0 of the MipMap stack is a feature map with an M×N×N shape and the smallest spatial scale s among all levels. 0 The remaining levels in the Mipmap stack are obtained by computing thumbnails based on the features of the previous level, where the width and height are reduced by a factor of 2 each time. Taking Pxy as an example, the relationship between their spatial scales is as follows:

[0092]

[0093] Optionally, variable B max and B min s represents the maximum and minimum values ​​of the scene bounding box, respectively. l This refers to the spatial scale of level l in the MipMap stack. In practice, only features in level 0 MipMap are stored and optimized, while the other levels are dynamically computed and generated during forward inference, thus enabling the encoding of features at different spatial scales in the scene.

[0094] Optionally, for a Gaussian element with a scaling factor s in 4D space, when projected onto a space-only plane, it will produce an axis with (s) x s y The 2D ellipse is given by the Gaussian element. Therefore, based on the size of the Gaussian element's projection onto Pxy and the corresponding basic spatial scale, the scaling level l associated with this Gaussian element can be determined.

[0095]

[0096] Optionally, to maintain the highest accuracy, in this embodiment, the minimum value among them is selected as the final spatial level l = min(l) x ,l y This allows us to obtain the two MipMap features that are closest to its spatial level: It is possible to obtain μ with 4D position in Px,y 4d The representation of Gausky's unit:

[0097]

[0098] in, Indicated by The resulting trilinear interpolation in the space. The final representation of the scale-perceived residual for a Gaussian unit is:

[0099]

[0100] C so ={(x,y),(x,z),(y,z)}C st ={(x,t),(t,z),(y,t)}

[0101] This embodiment establishes a multi-scale feature representation framework through the pyramid texture mapping stack technology as described above; it dynamically calculates, stores, and optimizes the lowest resolution level of the pyramid texture mapping; it determines the scaling level of the 4D Gaussian primitives in the multi-scale feature representation framework based on the projection size and basic spatial scale of the 4D Gaussian primitives; it performs scale-aware feature extraction based on the lowest resolution level and the scaling level to obtain the residual features; and it encodes the region occupied by the Gaussian primitives through the scale-aware residual field, thereby generating more accurate features and aligning them with the self-splitting behavior of the Gaussian primitives.

[0102] Based on any of the above embodiments of this application, a fourth embodiment of this application is proposed. In this fourth embodiment, content that is the same as or similar to any of the above embodiments can be referred to the above description, and will not be repeated hereafter. Based on this, please refer to... Figure 5The step of adaptively optimizing and training the 4D Gaussian primitives to obtain the 4D Gaussian representation includes steps S071 to S073:

[0103] Step S071: Obtain the sampling probability of the 4D Gaussian unit;

[0104] Step S072: Calculate the definite integral of the 4D Gaussian element in the time domain based on the sampling probability to obtain the cumulative distribution function;

[0105] Step S073: Dynamically adjust the densification threshold and learning rate of the 4D Gaussian unit according to the cumulative distribution function, and perform iterative training until the preset number of iterations is reached to obtain the 4D Gaussian representation.

[0106] Optionally, Gaussian primitives have different temporal locations and lifetimes in this 4D space, and each Gaussian primitive is sampled with a different probability over the observation time. To represent the temporal complexity of the scene, dynamic primitives typically have shorter lifetimes and lower sampling probabilities compared to static primitives. Primitives with fewer exposures throughout the temporal domain will have smaller gradients during the backpropagation loss function. Gradient values ​​are crucial in the 3DGS framework because they need to exceed a threshold to densify the corresponding primitive and optimize the currently imperfectly reconstructed region. Therefore, directly applying the same optimization and densification strategy to each primitive may lead to optimization imbalance.

[0107] To address the aforementioned issues, this application proposes an adaptive optimization strategy based on g. 4d The learning rate and gradient density threshold are dynamically adjusted based on the sampling probability within the observable time range. Specifically, in the embodiments of this application, g can be used. 4d The time integral within the observable range is calculated using γ(t), representing the sampling probability. The larger the integral, the more the lifetime of the Gaussian element intersects with the observable range, and the more likely it is to be sampled.

[0108] (a) Definite integral of the time-domain distribution: based on g 4d The state function γ(t) in this embodiment can be calculated as its definite integral (t) in the time domain, representing the CDF (cumulative distribution function) of each Gaussian element.

[0109]

[0110] I = F(t) end )-F(t start )

[0111] Optionally, I represents the entire time domain from t start to t end The definite integral of t, where t start and tend Normalize them to 0 and 1 respectively.

[0112] Alternatively, since γ(t) is a Gaussian-like function, calculating the exact definite integral value is challenging. In this embodiment, an approximate cumulative distribution function is derived:

[0113]

[0114] Alternatively, a = 0.070565992, b = 1.5976. Therefore, the definite integral of the Gaussian element in the time domain can be obtained with minimal computational complexity.

[0115] (b) Integral-based Gaussian-wise optimization strategy: After obtaining the time integral of each Gaussian, the learning rate and gradient threshold can be dynamically adjusted according to each primitive to achieve rapid reconstruction of dynamic regions.

[0116]

[0117] Optionally, k i and lr i They represent g respectively 4d The densification threshold and learning rate are given by g, where Ii and Imax represent g, respectively. 4d The time integral and the maximum time-domain integral among all Gaussian elements. The density threshold is adjusted each time density control is required. Furthermore, in this embodiment, every time a preset number of iterations (e.g., 50) is exceeded, based on g... 4d The learning rate is dynamically adjusted based on parameters related to the 4D position, scaling, rotation, and zero-order SH coefficients.

[0118] Reference Figure 6 , Figure 6 This is a schematic diagram showing a comparison of effects according to the fourth embodiment of this application, such as... Figure 6 As shown, when the method proposed in this application and various existing methods are used to render the same scene view (e.g., a scene with a standing person), the rendering quality varies significantly. The GT group represents realistic images, the Ours group represents images rendered using the dynamic scene reconstruction method in this application, and the remaining groups represent images rendered using existing methods such as 4DGS, D-NeRF, KPlans, and Hexplane. As can be seen from the figure, the images rendered using the dynamic scene reconstruction method in this application are closest to realistic images, especially showing a significant improvement in the rendering of facial and hand details. Experimental results under monocular and multi-view settings demonstrate that the method in this application achieves high-quality rendering while maintaining real-time rendering.

[0119] This embodiment, through the above scheme, specifically obtains the sampling probability of the 4D Gaussian primitive; calculates the definite integral of the 4D Gaussian primitive in the time domain based on the sampling probability to obtain the cumulative distribution function; dynamically adjusts the densification threshold and learning rate of the 4D Gaussian primitive based on the cumulative distribution function, and performs iterative training until a preset number of iterations is reached to obtain the 4D Gaussian representation. By introducing an adaptive optimization strategy, the model's ability to reconstruct high-frequency temporal information in dynamic scenes is enhanced.

[0120] It should be noted that the above examples are only for understanding this application and do not constitute a limitation on the dynamic scene reconstruction method of this application. Any simple transformations based on this technical concept are within the protection scope of this application.

[0121] This application also provides a dynamic scene reconstruction device; please refer to... Figure 7 The dynamic scene reconstruction device includes:

[0122] The response module is used to respond to the received dynamic scene reconstruction instruction and determine the rendering time and rendering perspective according to the dynamic scene reconstruction instruction;

[0123] The rendering module is used to render based on a pre-trained 4D Gaussian representation, according to the rendering time and rendering view, to obtain the target rendered view;

[0124] The 4D Gaussian representation is obtained by adaptively optimizing and training Gaussian primitives in 4D space based on a scale-aware residual field.

[0125] The dynamic scene reconstruction apparatus provided in this application, employing the dynamic scene reconstruction method in the above embodiments, can solve the technical problem of dynamic scene reconstruction. Compared with the prior art, the beneficial effects of the dynamic scene reconstruction apparatus provided in this application are the same as those of the dynamic scene reconstruction method provided in the above embodiments, and other technical features in the dynamic scene reconstruction apparatus are the same as those disclosed in the methods of the above embodiments, and will not be repeated here.

[0126] This application provides a dynamic scene reconstruction device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the dynamic scene reconstruction method in the first embodiment described above.

[0127] The following is for reference. Figure 8This document illustrates a structural schematic diagram of a dynamic scene reconstruction device suitable for implementing embodiments of this application. The dynamic scene reconstruction device in these embodiments may include, but is not limited to, mobile terminals such as mobile phones, laptops, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Portable Application Description), PMPs (Portable Media Players), and in-vehicle terminals (e.g., in-vehicle navigation terminals), as well as fixed terminals such as digital TVs and desktop computers. Figure 7 The dynamic scene reconstruction device shown is merely an example and should not impose any limitations on the functionality and scope of use of the embodiments of this application.

[0128] like Figure 8 As shown, the dynamic scene reconstruction device may include a processing unit 1001 (e.g., a central processing unit, a graphics processing unit, etc.), which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 1002 or a program loaded from a storage device 1003 into a random access memory (RAM) 1004. The RAM 1004 also stores various programs and data required for the operation of the dynamic scene reconstruction device. The processing unit 1001, ROM 1002, and RAM 1004 are interconnected via a bus 1005. An input / output (I / O) interface 1006 is also connected to the bus. Typically, the following systems can be connected to the I / O interface 1006: input devices 1007 including, for example, a touchscreen, touchpad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; output devices 1008 including, for example, a liquid crystal display (LCD), speaker, vibrator, etc.; storage devices 1003 including, for example, magnetic tape, hard disk, etc.; and communication devices 1009. The communication device 1009 allows the dynamic scene reconstruction device to communicate wirelessly or wiredly with other devices to exchange data. Although the figure shows dynamic scene reconstruction devices with various systems, it should be understood that it is not required to implement or possess all of the systems shown. More or fewer systems may be implemented alternatively.

[0129] Specifically, according to the embodiments disclosed in this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments disclosed in this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device, or installed from storage device 1003, or installed from ROM 1002. When the computer program is executed by processing device 1001, it performs the functions defined in the methods of the embodiments disclosed in this application.

[0130] The dynamic scene reconstruction device provided in this application, employing the dynamic scene reconstruction method in the above embodiments, can solve the technical problem of dynamic scene reconstruction. Compared with the prior art, the beneficial effects of the dynamic scene reconstruction device provided in this application are the same as those of the dynamic scene reconstruction method provided in the above embodiments, and other technical features of the dynamic scene reconstruction device are the same as those disclosed in the method of the previous embodiment, and will not be repeated here.

[0131] It should be understood that the various parts disclosed in this application can be implemented using hardware, software, firmware, or a combination thereof. In the description of the above embodiments, specific features, structures, materials, or characteristics can be combined in any suitable manner in one or more embodiments or examples.

[0132] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

[0133] This application provides a computer-readable storage medium having computer-readable program instructions (i.e., a computer program) stored thereon, the computer-readable program instructions being used to execute the dynamic scene reconstruction method in the above embodiments.

[0134] The computer-readable storage medium provided in this application may be, for example, a USB flash drive, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this embodiment, the computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, system, or device. The program code contained on the computer-readable storage medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (Radio Frequency), etc., or any suitable combination thereof.

[0135] The aforementioned computer-readable storage medium may be included in the dynamic scene reconstruction device; or it may exist independently and not be assembled into the dynamic scene reconstruction device.

[0136] The aforementioned computer-readable storage medium carries one or more programs that, when executed by a dynamic scene reconstruction device, cause the dynamic scene reconstruction device to: in response to receiving a dynamic scene reconstruction instruction, determine a rendering time and a rendering viewpoint according to the dynamic scene reconstruction instruction; and render a target rendering view based on a pre-trained 4D Gaussian representation according to the rendering time and the rendering viewpoint; wherein the 4D Gaussian representation is obtained by adaptively optimizing and training Gaussian primitives in 4D space based on a scale-aware residual field.

[0137] Computer program code for performing the operations of this application can be written in one or more programming languages ​​or a combination thereof, including object-oriented programming languages ​​such as Java, Smalltalk, and C++, and conventional procedural programming languages ​​such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a Local Area Network (LAN) or a Wide Area Network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0138] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0139] The modules described in the embodiments of this application can be implemented in software or hardware. The names of the modules do not necessarily limit the functionality of the unit itself.

[0140] The readable storage medium provided in this application is a computer-readable storage medium that stores computer-readable program instructions (i.e., a computer program) for executing the above-described dynamic scene reconstruction method, and is capable of solving the technical problem of dynamic scene reconstruction. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided in this application are the same as the beneficial effects of the dynamic scene reconstruction method provided in the above embodiments, and will not be repeated here.

[0141] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the dynamic scene reconstruction method described above.

[0142] The computer program product provided in this application can solve the technical problem of dynamic scene reconstruction. Compared with the prior art, the beneficial effects of the computer program product provided in this application are the same as the beneficial effects of the dynamic scene reconstruction method provided in the above embodiments, and will not be repeated here.

[0143] The above description is only a part of the embodiments of this application and does not limit the patent scope of this application. All equivalent structural transformations made under the technical concept of this application and using the contents of the specification and drawings of this application, or direct / indirect applications in other related technical fields, are included in the patent protection scope of this application.

Claims

1. A dynamic scene reconstruction method, characterized in that, The method includes: In response to receiving a dynamic scene reconstruction instruction, the rendering time and rendering perspective are determined according to the dynamic scene reconstruction instruction; Based on a pre-trained 4D Gaussian representation, the target rendered view is obtained by rendering according to the rendering time and rendering view. The 4D Gaussian representation is obtained by adaptively optimizing and training Gaussian elements in 4D space based on a scale-aware residual field. The 4D Gaussian elements have temporal attributes of temporal location and lifetime. The scale-aware residual field is represented by six planes, which include a spatial plane and a spatiotemporal plane. The step of obtaining the target rendered view based on the pre-trained 4D Gaussian representation and rendering according to the rendering time and rendering viewpoint further includes the following before: Obtain a calibrated multi-view video sequence, wherein the calibrated multi-view video sequence includes sampling time and corresponding real image; Based on the calibrated multi-view video sequence, initialize 4D Gaussian primitives and scale-aware residual fields in 4D space; Given the sampling time, a multi-scale feature representation framework is established using pyramid texture mapping stacking technology; The lowest resolution level is optimized by dynamically calculating and storing pyramid texture mapping levels. Based on the projection size and basic spatial scale of the 4D Gaussian primitive, the scaling level of the 4D Gaussian primitive in the multi-scale feature representation framework is determined; Scale-aware feature extraction is performed based on the lowest resolution level and scaling level to obtain residual features. The feature extraction relies on the feature data of the lowest resolution level and does not require accessing features of other levels or performing inter-level interpolation. Based on the sampling time and residual characteristics, calculate the survival state and attribute residuals of the 4D Gaussian unit at the sampling time; By combining the initial properties of the 4D Gaussian primitives with the residuals of those properties, a 3D Gaussian representation in 3D space is obtained. The 3D Gaussian representation is projected to obtain a rendered image; Gradient inversion is performed based on the loss between the rendered image and the real image to obtain the sampling probability of the 4D Gaussian unit; The cumulative distribution function is obtained by calculating the definite integral of the 4D Gaussian element in the time domain based on the sampling probability. The densification threshold and learning rate of the 4D Gaussian unit are dynamically adjusted according to the cumulative distribution function, and iterative training is performed until the preset number of iterations is reached to obtain the 4D Gaussian representation.

2. The method as described in claim 1, characterized in that, The step of calculating the residuals of the survival state and properties of the 4D Gaussian unit at the sampling time based on the sampling time and residual characteristics includes: Based on the sampling time and residual characteristics, the survival status of the 4D Gaussian unit at the sampling time is calculated; The residual features of the 4D Gaussian elements at the sampling time are decoded using a multilayer perceptron, and combined with the survival state, the residual of the attribute is obtained.

3. A dynamic scene reconstruction device, characterized in that, The device includes: The response module is used to respond to the received dynamic scene reconstruction instruction and determine the rendering time and rendering perspective according to the dynamic scene reconstruction instruction; The rendering module is used to render based on a pre-trained 4D Gaussian representation, according to the rendering time and rendering view, to obtain the target rendered view; The 4D Gaussian representation is obtained by adaptively optimizing and training Gaussian elements in 4D space based on a scale-aware residual field. The 4D Gaussian elements have temporal attributes of temporal location and lifetime. The scale-aware residual field is represented by six planes, which include a spatial plane and a spatiotemporal plane. The device further includes: Obtain a calibrated multi-view video sequence, wherein the calibrated multi-view video sequence includes sampling time and corresponding real image; Based on the calibrated multi-view video sequence, initialize 4D Gaussian primitives and scale-aware residual fields in 4D space; Given the sampling time, a multi-scale feature representation framework is established using pyramid texture mapping stacking technology; The lowest resolution level is optimized by dynamically calculating and storing pyramid texture mapping levels. Based on the projection size and basic spatial scale of the 4D Gaussian primitive, the scaling level of the 4D Gaussian primitive in the multi-scale feature representation framework is determined; Scale-aware feature extraction is performed based on the lowest resolution level and scaling level to obtain residual features. The feature extraction process relies on the feature data of the lowest resolution level and does not require accessing features of other levels or performing inter-level interpolation. Based on the sampling time and residual characteristics, calculate the survival state and attribute residuals of the 4D Gaussian unit at the sampling time; By combining the initial properties of the 4D Gaussian primitives with the residuals of those properties, a 3D Gaussian representation in 3D space is obtained. The 3D Gaussian representation is projected to obtain a rendered image; Gradient inversion is performed based on the loss between the rendered image and the real image to obtain the sampling probability of the 4D Gaussian unit; The cumulative distribution function is obtained by calculating the definite integral of the 4D Gaussian element in the time domain based on the sampling probability. The densification threshold and learning rate of the 4D Gaussian unit are dynamically adjusted according to the cumulative distribution function, and iterative training is performed until the preset number of iterations is reached to obtain the 4D Gaussian representation.

4. A dynamic scene reconstruction device, characterized in that, The device includes: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the dynamic scene reconstruction method as described in claim 1 or 2.

5. A storage medium, characterized in that, The storage medium is a computer-readable storage medium, and a computer program is stored on the storage medium. When the computer program is executed by a processor, it implements the steps of the dynamic scene reconstruction method as described in claim 1 or 2.

6. A computer program product, characterized in that, The computer program product includes a computer program that, when executed by a processor, implements the steps of the dynamic scene reconstruction method as described in any one of claims 1 or 2.