A dynamic scene novel view synthesis method and system based on pulse neurons
By constructing a spatiotemporal fine-grained mask field and a discontinuous dynamic and static label field based on a dynamic scene novel perspective synthesis method using spiking neurons, the accuracy of dynamic and static decomposition and gradient blocking issues are resolved, thereby improving the rendering quality and real-time performance of dynamic scene novel perspective synthesis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHEJIANG UNIV
- Filing Date
- 2026-04-08
- Publication Date
- 2026-06-19
AI Technical Summary
Existing methods for synthesizing new perspectives in dynamic scenes suffer from inaccurate prior knowledge of dynamic and static decomposition masks, unreasonable representation of dynamic and static labels, and gradient blocking issues in discrete label optimization, resulting in low rendering efficiency and insufficient reconstruction accuracy.
A spiking neuron-based approach is adopted to generate accurate prior supervision of dynamic and static labels by constructing a spatiotemporal fine-grained mask field and a discontinuous dynamic and static label field. The label allocation accuracy and rendering quality are improved by using a three-dimensional Gaussian sputtering algorithm and combined with a reverse optimization algorithm.
It significantly improves the rendering quality and real-time performance of new perspective compositing for dynamic scenes, solves the accuracy problem of dynamic and static decomposition, preserves fine-grained motion details, and improves rendering efficiency.
Smart Images

Figure CN122244273A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of computer vision and 3D reconstruction technology, and in particular relates to a method and system for synthesizing new perspectives of dynamic scenes based on spiking neurons. Background Technology
[0002] Novel View Synthesis (NVS) aims to reconstruct spatiotemporally consistent 3D scene representations from dynamic scene data acquired from multiple perspectives, and generate high-quality rendered images from any target viewpoint. It is a core support for immersive interactive technologies such as virtual reality and augmented reality. With the rise of 3D Gaussian Splatting (3DGS) technology, the rendering efficiency and reconstruction accuracy of dynamic scene reconstruction have been significantly improved, but it still faces many challenges in processing complex dynamic scenes.
[0003] (I) Mainstream technical approaches and limitations of dynamic scene reconstruction.
[0004] Existing dynamic scene reconstruction methods are mainly based on 3DGS technology, and can be broadly categorized into two implementation paths: one explicitly models Gaussian trajectories through deformation fields or tracking fields, treating all Gaussian ellipsoids as dynamic; the other incorporates the time dimension as an inherent dimension into the 3D Gaussian representation, modeling the dynamic scene in a four-dimensional Gaussian form. Both methods have significant drawbacks: the former is prone to severe overfitting and struggles to adapt to complex motion patterns; the latter results in an excessive number of Gaussian ellipsoids, leading to low rendering efficiency, a surge in storage overhead, and an inability to meet the demands of real-time applications.
[0005] (II) The core bottleneck of dynamic and static decomposition.
[0006] To balance reconstruction accuracy and computational efficiency, some works propose dynamic-static decomposition to assist in dynamic scene reconstruction. The core idea is to divide the scene into static and dynamic regions, and represent them using static and dynamic Gaussian ellipsoids, respectively. The key to this technique lies in achieving accurate allocation of dynamic and static labels to the Gaussian ellipsoid. However, existing implementations still suffer from two major bottlenecks that severely restrict the performance of synthesizing new perspectives in dynamic scenes.
[0007] The first bottleneck is the inaccuracy of dynamic and static mask priors. The accuracy of dynamic and static decomposition highly depends on reliable mask prior supervision. Existing methods sometimes rely on pre-trained semantic models to generate segmentation masks independently for each viewpoint, ignoring cross-viewpoint consistency and leading to conflicts in multi-view masks; other methods use time-invariant mask priors for each viewpoint, ignoring the temporal changes in dynamic scenes and easily causing over-segmentation of dynamic regions; still other methods do not introduce explicit decomposition mask priors, resulting in poor decomposition performance. These problems directly lead to misassignment of dynamic and static labels on the Gaussian ellipsoid, resulting in the loss of fine-grained motion details (such as finger movements and clothing swaying) during dynamic reconstruction and exacerbating overfitting of the model to the input viewpoint.
[0008] The second bottleneck is the unreasonable representation of dynamic and static labels. Existing methods typically use continuous floating-point dynamic and static attributes to quantify the dynamic probability of a Gaussian ellipsoid. This requires post-processing techniques, such as fixed thresholds and predefined scales, that rely on hyperparameters to discretize it into dynamic / static labels. However, the dynamic probability of the Gaussian ellipsoid at the boundary of a moving object is often in the middle range, making it highly sensitive to hyperparameters. Post-processing can easily lead to classification errors, resulting in artifacts in boundary region reconstruction. This is especially true in scenarios with large differences between training and testing perspectives, such as side views, where the synthesis effect of new perspectives is significantly degraded. More importantly, the post-processing discretization operation is disconnected from the model optimization process, creating a gap between the dynamic and static attribute optimization target and the actual dynamic and static labels used, further reducing label assignment accuracy.
[0009] To ensure that the optimized static and dynamic labels match the actual labels used, it is necessary to directly train and optimize the discrete static and dynamic labels. However, directly optimizing discrete static and dynamic labels can lead to gradient blocking, because the discretization operation using the step function is not differentiable, making it difficult for the loss function to optimize the static and dynamic labels during the rendering process. Summary of the Invention
[0010] To address the problems of inaccurate mask priors, unreasonable dynamic and static label representations, and gradient blocking in discrete label optimization in existing dynamic scene new perspective synthesis methods, this invention provides a dynamic scene new perspective synthesis method and system based on spiking neurons, which can achieve accurate allocation of dynamic and static attributes and improve the rendering quality and real-time performance of dynamic scene new perspective synthesis.
[0011] A novel perspective synthesis method for dynamic scenes based on spiking neurons includes the following steps: (1) Acquire video frame data of dynamic scenes from multiple perspectives and time steps, generate sparse 3D point clouds through the structure of motion recovery algorithm, and inflate them into a set of 3D Gaussian ellipsoids; each 3D Gaussian ellipsoid contains geometric and dynamic / static attributes, and the geometric attributes include spatial position, rotation matrix, scaling matrix, opacity, and color; (2) Construct a spatiotemporal fine-grained mask field to generate dynamic and static masks corresponding to different viewpoints and time steps, providing prior supervision for the dynamic and static label allocation of the three-dimensional Gaussian ellipsoid; (3) Based on spiking neurons, a discontinuous dynamic and static label field is constructed. With dynamic and static masks as priors, the continuous dynamic and static properties of the three-dimensional Gaussian ellipsoid are mapped to discrete binary dynamic and static labels. (4) Perform differentiated processing based on the dynamic and static properties of the three-dimensional Gaussian ellipsoid: introduce time deformation and variable field to the dynamic Gaussian ellipsoid to characterize the dynamic changes of its properties over time; keep the properties of the static Gaussian ellipsoid unchanged over time. (5) The dynamic Gaussian ellipsoid after differential processing is fused with the static Gaussian ellipsoid, and the new perspective synthesis of the dynamic scene is completed by the three-dimensional Gaussian sputtering algorithm, and the scene rendering image of the target perspective is output. (6) Based on the task performance feedback of the new perspective synthesis, the geometric and dynamic properties of the three-dimensional Gaussian ellipsoid are optimized in reverse to iteratively improve the scene reconstruction effect.
[0012] In step (1), Geometric properties of a 3D Gaussian ellipsoid Represented as: ; in, Indicates spatial location, Represents the rotation matrix. Represents the scaling matrix. Indicates the opacity of the ellipsoid. The color characteristics of the ellipsoid are represented by spherical harmonic functions.
[0013] In step (2), the spatiotemporal fine-grained mask field is constructed as a 4D mask field. Given perspective Time step Obtain 2D dynamic and static masks from the trained 4D mask field: ; Among them, the mask value Pixels with a value of 0 correspond to static areas, while pixels with a value of 1 correspond to dynamic areas.
[0014] 4D mask field By fusing coarse-grained static masks diffusion mask and time mask After obtaining the fine-grained static mask, the dynamic mask is obtained by inverting it.
[0015] In step (3), a discontinuous dynamic and static label field is constructed based on spiking neurons, specifically as follows: Using spiking neurons to analyze the continuous dynamic and static properties of the Gaussian ellipsoid Mapped to binary dynamic and static labels The spiking neuron adopts an integral firing IF model with a time step of 1, and the mapping relationship satisfies: ; in, Represents the step function. The membrane potential threshold of a spiking neuron; These are the continuous dynamic and static properties of the Gaussian ellipsoid; These are binary labels for dynamic and static elements, where 1 represents dynamic and 0 represents static. If the potential exceeds a given membrane potential threshold, the dynamic and static labels of the Gaussian ellipsoid are output as 1, and the Gaussian ellipsoid is set to dynamic Gaussian; otherwise, the Gaussian ellipsoid is set to static ellipsoid.
[0016] Since the step function is not differentiable, the arctangent function is used as the gradient substitution function for the step function to achieve differentiable training. The gradient calculation during backpropagation satisfies: ; in, For hyperparameters, and The first Binary labels and continuous dynamic and static properties of a Gaussian ellipsoid.
[0017] In step (4), for the dynamic Gaussian ellipsoid, the time-varying field is... The temporal information of a dynamic Gaussian is modeled using a 4D multi-resolution hash encoder, and then a multi-head multilayer perceptron is used to decode a given time step. Lower dynamic Gaussian geometric properties Gaussian properties relative to standard space Change This change only updates the spatial position, rotation matrix, and scaling matrix properties, while keeping the opacity, color, and motion / static properties unchanged; by changing the amount Gaussian properties applied to standard space Above, obtain any time step Geometric properties of the upper dynamic Gaussian: ; For a static Gaussian, its geometric properties are always consistent with those of a Gaussian in standard space and do not change over time.
[0018] In step (5), the three-dimensional Gaussian sputtering algorithm is implemented through differential rendering, calculating color pixel by pixel. Thus, the rendered image is obtained. : ; in, The set of Gaussian ellipsoids that contribute to the target pixel. For the first The opacity of a Gaussian ellipsoid after 2D projection. For the first The color characteristics of a Gaussian ellipsoid.
[0019] In step (6), through photometric loss Optimize the geometric properties of the Gaussian ellipsoid using mask loss. Optimize the dynamic and static properties of the Gaussian ellipsoid.
[0020] A novel perspective synthesis system for dynamic scenes based on spiking neurons, comprising: The Gaussian ellipsoid initialization module is used to initialize a set of three-dimensional Gaussian ellipsoids. The 4D mask field module is used to generate dynamic and static masks for each viewpoint and time step, and provides prior supervision for dynamic and static label assignment. The discrete dynamic and static labeling module uses spiking neurons to map the continuous dynamic properties of the Gaussian ellipsoid into binary dynamic and static labels, achieving accurate segmentation. The new perspective synthesis module is used to fuse static Gaussian ellipsoids with time-varying dynamic Gaussian ellipsoids and generate a rendered image of the target perspective through a 3D Gaussian sputtering rendering algorithm. The joint optimization module is used to calculate the loss based on the rendering results, and to reverse optimize the geometric and dynamic properties of the Gaussian ellipsoid to improve the compositing effect from new perspectives.
[0021] Compared with the prior art, the present invention has the following beneficial effects: 1. This invention generates spatiotemporal fine-grained dynamic mask priors through a 4D mask field, solving the problems of inconsistency and time-invariance in multiple viewpoints, providing accurate supervision for label allocation, and preserving fine-grained motion details.
[0022] 2. This invention constructs a differentiable discontinuous dynamic and static label field based on spiking neurons, enabling direct training of the discontinuous dynamic and static label field, solving gradient blocking, and improving label allocation accuracy.
[0023] 3. This invention can significantly improve the dynamic and static decomposition effect of dynamic scenes and the stability and accuracy of new perspective synthesis of dynamic scenes, and has strong versatility. Attached Figure Description
[0024] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0025] Figure 1 This is a simplified flowchart of a dynamic scene novel perspective synthesis method based on spiking neurons, according to an embodiment of the present invention.
[0026] Figure 2 This is a detailed flowchart illustrating a novel perspective synthesis method for dynamic scenes based on spiking neurons, according to an embodiment of the present invention.
[0027] Figure 3 This is a schematic diagram comparing the processes of obtaining dynamic and static mask priors using existing methods and the method of this invention.
[0028] Figure 4 This is a schematic diagram comparing the processes of obtaining discrete dynamic and static tags using existing methods and the method of this invention.
[0029] Figure 5 This is a diagram illustrating the effect of an embodiment of the present invention. Detailed Implementation
[0030] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0031] It should be noted that, unless otherwise specified, the features in the following embodiments and implementation methods can be combined with each other.
[0032] like Figure 1 As shown, a novel perspective synthesis method for dynamic scenes based on spiking neurons is proposed. First, a spatiotemporal fine-grained dynamic and static mask prior is established. Then, the optimization process of the dynamic and static labels of the Gaussian ellipsoid is supervised by this prior. Based on the dynamic and static labels, two types of Gaussians are processed separately, and finally, the image under the novel perspective is rendered.
[0033] like Figure 2 As shown, the specific steps are as follows: S01: Acquire multi-view synchronized video data of dynamic scenes and generate sparse 3D point clouds based on multi-view images at the initial moment of the video using the motion reconstruction structure algorithm. Inflate these points into a set of 3D Gaussian ellipsoids. Each Gaussian ellipsoid includes geometric attributes such as position, opacity, covariance matrix, and color, as well as dynamic and static attributes.
[0034] In this embodiment, for the original multi-view synchronized video data, the video data from each viewpoint is split into image sequences marked with frame numbers (time steps) to obtain... ,in Indicates perspective Time step The image below, (V represents the total number of viewpoints). (T is the total number of time steps).
[0035] Based on the first frame from each perspective The images are processed, and a sparse initial 3D point cloud is generated using the Structure from Motion (SfM) algorithm. The generated 3D point cloud is then dilated to obtain an initial set of 3D Gaussian ellipsoids.
[0036] The three-dimensional Gaussian ellipsoid is an anisotropic Gaussian ellipsoid, whose geometry is defined by its spatial position and 3D covariance matrix, and the covariance matrix can be decomposed into... ,in It is composed of quaternions The rotation matrix that represents the rotation. yes The scaling matrix of the representation.
[0037] The geometric properties of a three-dimensional Gaussian ellipsoid can be expressed as: ; in, Indicates the opacity of the ellipsoid. The ellipsoidal color characteristics are represented by spherical harmonic functions. All dynamic and static properties of the Gaussian ellipsoid are initialized. For static ( ).
[0038] S02 constructs a spatiotemporal fine-grained mask field to generate dynamic and static masks corresponding to different viewpoints and time steps, providing prior supervision for the dynamic and static label allocation of the three-dimensional Gaussian ellipsoid.
[0039] The spatiotemporal fine-grained mask field is a 4D mask field. Given perspective Time step Obtain 2D dynamic and static masks from a trained 4D mask field. : ; Among them, the mask value Pixels with a value of 0 correspond to static areas, while pixels with a value of 1 correspond to dynamic areas. It serves as the prior supervision basis for subsequent dynamic and static label assignment of the Gaussian ellipsoid.
[0040] Specifically, when performing static reconstruction on images from all viewpoints and at all time steps, the dynamic and static regions exhibit different characteristics in terms of photometric residuals and color variance. The static Gaussian ellipsoid set reconstructed from the initialized Gaussian ellipsoid set is called the 4D mask field. .
[0041] Because static regions exhibit better photometric consistency, they show smaller photometric residuals during reconstruction, resulting in a better 4D mask field. Static and dynamic regions can be filtered based on residuals. This is achieved by calculating the rendered image. With real images pixel-by-pixel residual And set a threshold The coarse-grained static mask based on residuals is obtained through filtering. The value of each pixel (i,j): ; However, some high-frequency static regions may also exhibit high residual characteristics similar to dynamic regions, but such high-frequency details do not have semantic consistency, that is, adjacent pixels are not continuous static regions; while dynamic regions have semantic consistency, that is, adjacent pixels are continuous dynamic regions.
[0042] Based on the difference in semantic consistency, the diffusion mask It is obtained as follows: from a 3×3 box filter For coarse-grained static masks Diffusion processing, through a threshold Filter out static pixels that lack semantic consistency: ; Meanwhile, since lighting conditions in real-world scene datasets may vary from different viewpoints, static areas may experience inconsistencies in luminance due to changes in illumination. Therefore, a temporal mask is introduced. Mark a given perspective Areas with uniform luminosity are considered static areas. perspective Pixels with a color variance less than a preset threshold are marked as static regions.
[0043] By fusing the three masks, a fine-grained static mask can be obtained. Then invert the result to obtain the dynamic mask. The mask It was used for subsequent supervision of the dynamic and static label optimization of the 3D Gaussian ellipsoid.
[0044] To further amplify the difference between the residuals in the dynamic and static regions, by... ; The static region residuals are accumulated to train and optimize the 4D mask field. .
[0045] like Figure 3 As shown, 4D mask field This invention integrates all viewpoint and time-step information during mask generation, ensuring multi-view consistency within the same time step and dynamic change over time within the same viewpoint. This guarantees the fine-grainedness and accuracy of the mask prior, a core improvement over existing methods. Based on this precise dynamic mask prior, subsequent dynamic and static labeling of the Gaussian ellipsoid is more reliable, resulting in better dynamic and static decomposition and improved reconstruction performance of dynamic scenes. Simultaneously, a 4D mask field... It can effectively suppress the over-division of dynamic regions caused by the occlusion of static regions by the movement of dynamic objects, reduce the number of dynamic Gaussian ellipsoids and model parameters, and speed up training and rendering.
[0046] S03 constructs a discontinuous dynamic and static label field based on spiking neurons, using dynamic and static masks as priors to map the continuous dynamic and static attributes of the three-dimensional Gaussian ellipsoid into discrete binary dynamic and static labels.
[0047] In this embodiment, the spiking neuron adopts an integral firing (IF) model with a time step of 1, and the mapping relationship satisfies: ; in, Represents the step function. This represents the membrane potential threshold of a spiking neuron. These are the continuous dynamic and static properties of the Gaussian ellipsoid. This is a binary label for dynamic / static data (1 represents dynamic, 0 represents static). When... If the potential exceeds a given membrane potential threshold, the dynamic and static labels of the Gaussian ellipsoid are output as 1, and the Gaussian ellipsoid is set to dynamic Gaussian; otherwise, the Gaussian ellipsoid is set to static ellipsoid.
[0048] In this embodiment, to avoid the gradient truncation problem caused by the step function, the arctangent function ATaN is used as the gradient substitution function. arrive A step function is used during forward propagation, while from... arrive During backpropagation, the gradient of this function is treated as the gradient of a step function, thus achieving differentiable training, i.e. right The backpropagation gradient satisfies: ; in, To control the shape of the gradient substitution function, and The first Binary labels and continuous dynamic and static properties of a Gaussian ellipsoid.
[0049] like Figure 4 As shown, for a Gaussian ellipsoid set with unoptimized dynamic and static label attributes, a discontinuous dynamic and static label field is directly trained based on spiking neurons, and the optimized dynamic and static labels are used to complete the dynamic and static partitioning and differentiation processing of the Gaussian ellipsoid. Compared with existing methods that use activation functions to approximate continuous dynamic and static attributes to discrete binary states during the optimization process, and then perform post-processing discretization using hyperparameters such as thresholds to obtain the final dynamic and static labels, the spiking neurons used in this invention can directly omit the post-processing step, reduce the dependence on hyperparameters, and avoid the accuracy loss caused by post-processing.
[0050] S04. Based on the dynamic and static labels of the Gaussian ellipsoid, the two types of Gaussian are processed separately. The dynamic Gaussian learns the time-varying geometric properties, while keeping the geometric properties of the static Gaussian unchanged over time.
[0051] In this embodiment, the time-varying deformation field... The temporal information of a dynamic Gaussian is modeled using a 4D multi-resolution hash encoder, and then a multi-head multilayer perceptron is used to decode a given time step. Lower dynamic Gaussian geometric properties Gaussian properties relative to standard space Change : ; This process updates only the spatial position, rotation, and scaling attributes, while keeping the opacity, color, and motion / static attributes unchanged. By applying these changes to the standard spatial Gaussian attributes, arbitrary time steps are obtained. Geometric properties of the upper dynamic Gaussian: ; For a static Gaussian, its geometric properties always remain consistent with those of a Gaussian in standard space and do not change over time. .
[0052] S05 integrates the differentiated dynamic Gaussian ellipsoid and the static Gaussian ellipsoid, and uses a 3D Gaussian sputtering algorithm to synthesize a new perspective of the dynamic scene, outputting a scene rendering image from the target perspective.
[0053] In this embodiment, for a given time step The Gaussian properties of the dynamic and static Gaussian ellipsoids are obtained, and rendering is performed using the Gaussian sputtering algorithm. The 3D Gaussian sputtering (3DGS) rendering is achieved through differential rendering, calculating color pixel by pixel. Thus, the rendered image is obtained. : ; in, The set of Gaussian ellipsoids that contribute to the target pixel. For the first The opacity of a Gaussian ellipsoid after 2D projection.
[0054] S06, based on the performance feedback of the new perspective synthesis task, reversely optimize the geometric and dynamic / static properties of the 3D Gaussian ellipsoid, and iteratively improve the scene reconstruction effect.
[0055] In this embodiment, the geometric properties of the Gaussian ellipsoid are optimized through photometric loss: ; in, , Compare the luminosity differences between the rendered and real images, and optimize the geometric properties of the Gaussian ellipsoid. Hyperparameters Used for balance and The proportion of.
[0056] In this embodiment, the dynamic and static properties of the Gaussian are optimized using mask loss: ; in, It is a 4D mask field The obtained dynamic region mask prior, It is a dynamic mapping obtained by rendering the static and dynamic label attributes of the Gaussian ellipsoid, representing the probability that each pixel is divided into a dynamic region: ; By making and To get as close as possible, directly optimize the dynamic and static label attributes of the Gaussian ellipsoid.
[0057] like Figure 2 As shown, the overall training process is divided into a coarse training phase and a fine training phase. In the coarse training phase, all Gaussian ellipsoids are treated as static Gaussians, and static reconstruction is performed based on the first frame of the video from each viewpoint, training and optimizing the standard spatial properties of the Gaussian. The fine training phase first utilizes a 4D mask field... and loss function The training and optimization of discrete dynamic and static label fields are guided, and then dynamic and static Gaussian data are processed separately according to the dynamic and static labels. A new perspective synthetic image is obtained through unified rendering, and then... Optimize the geometric properties of Gaussians, including the standard spatial geometric properties of all Gaussians and the time-varying field for dynamic Gaussians. .
[0058] like Figure 5 As shown, the present invention significantly improves the performance of new perspective synthesis tasks in dynamic scenes, especially in the side perspective setting, as well as the reconstruction results of fine-grained motion and boundary regions.
[0059] The embodiments described above provide a detailed explanation of the technical solutions and beneficial effects of the present invention. It should be understood that the above descriptions are merely specific embodiments of the present invention and are not intended to limit the present invention. Any modifications, additions, and equivalent substitutions made within the scope of the principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A method for synthesizing novel perspectives of dynamic scenes based on spiking neurons, characterized in that, Includes the following steps: (1) Acquire video frame data of dynamic scenes from multiple perspectives and time steps, generate sparse 3D point clouds through the structure of motion recovery algorithm, and inflate them into a set of 3D Gaussian ellipsoids; each 3D Gaussian ellipsoid contains geometric and dynamic / static attributes, and the geometric attributes include spatial position, rotation matrix, scaling matrix, opacity, and color; (2) Construct a spatiotemporal fine-grained mask field to generate dynamic and static masks corresponding to different viewpoints and time steps, providing prior supervision for the dynamic and static label allocation of the three-dimensional Gaussian ellipsoid; (3) Based on spiking neurons, a discontinuous dynamic and static label field is constructed. With dynamic and static masks as priors, the continuous dynamic and static properties of the three-dimensional Gaussian ellipsoid are mapped to discrete binary dynamic and static labels. (4) Perform differentiated processing based on the dynamic and static properties of the three-dimensional Gaussian ellipsoid: introduce time deformation and variable field to the dynamic Gaussian ellipsoid to characterize the dynamic changes of its properties over time; keep the properties of the static Gaussian ellipsoid unchanged over time. (5) The dynamic Gaussian ellipsoid after differential processing is fused with the static Gaussian ellipsoid, and the new perspective synthesis of the dynamic scene is completed by the three-dimensional Gaussian sputtering algorithm, and the scene rendering image of the target perspective is output. (6) Based on the task performance feedback of the new perspective synthesis, the geometric and dynamic properties of the three-dimensional Gaussian ellipsoid are optimized in reverse to iteratively improve the scene reconstruction effect.
2. The method for synthesizing novel perspectives of dynamic scenes based on spiking neurons according to claim 1, characterized in that, In step (1), Geometric properties of a 3D Gaussian ellipsoid Represented as: ; in, Indicates spatial location, Represents the rotation matrix. Represents the scaling matrix. Indicates the opacity of the ellipsoid. The color characteristics of the ellipsoid are represented by spherical harmonic functions.
3. The method for synthesizing novel perspectives of dynamic scenes based on spiking neurons according to claim 1, characterized in that, In step (2), the spatiotemporal fine-grained mask field is constructed as a 4D mask field. ; Given perspective Time step Obtain 2D dynamic and static masks from the trained 4D mask field: ; Among them, the mask value Pixels with a value of 0 correspond to static areas, while pixels with a value of 1 correspond to dynamic areas.
4. The method for synthesizing novel perspectives of dynamic scenes based on spiking neurons according to claim 3, characterized in that, 4D mask field By fusing coarse-grained static masks diffusion mask and time mask The fine-grained static mask is obtained, and the dynamic mask is obtained by inverting it.
5. The method for synthesizing novel perspectives of dynamic scenes based on spiking neurons according to claim 1, characterized in that, In step (3), a discontinuous dynamic and static label field is constructed based on spiking neurons, specifically as follows: Using spiking neurons to analyze the continuous dynamic and static properties of the Gaussian ellipsoid Mapped to binary dynamic and static labels The spiking neuron adopts an integral firing IF model with a time step of 1, and the mapping relationship satisfies: ; in, Represents the step function. The membrane potential threshold of a spiking neuron; These are the continuous dynamic and static properties of the Gaussian ellipsoid; These are binary labels for dynamic and static elements, where 1 represents dynamic and 0 represents static. If the potential exceeds a given membrane potential threshold, the dynamic and static labels of the Gaussian ellipsoid are output as 1, and the Gaussian ellipsoid is set to dynamic Gaussian; otherwise, the Gaussian ellipsoid is set to static ellipsoid.
6. The method for synthesizing novel perspectives of dynamic scenes based on spiking neurons according to claim 5, characterized in that, Differentiable training is achieved by using the arctangent function as the gradient substitution function for the step function, and the gradient calculation during backpropagation satisfies: ; in, For hyperparameters, and The first Binary labels and continuous dynamic and static properties of a Gaussian ellipsoid.
7. The method for synthesizing novel perspectives of dynamic scenes based on spiking neurons according to claim 1, characterized in that, In step (4), for the dynamic Gaussian ellipsoid, the time-varying field is... The temporal information of a dynamic Gaussian is modeled using a 4D multi-resolution hash encoder, and then a multi-head multilayer perceptron is used to decode a given time step. Lower dynamic Gaussian geometric properties Gaussian properties relative to standard space Change ; This change only updates the spatial position, rotation matrix, and scaling matrix properties, while keeping the opacity, color, and motion / static properties unchanged; by changing the change... Gaussian properties applied to standard space Above, obtain any time step Geometric properties of upper dynamic Gaussian ; For a static Gaussian, its geometric properties are always consistent with those of a Gaussian in standard space and do not change over time.
8. The method for synthesizing novel perspectives of dynamic scenes based on spiking neurons according to claim 1, characterized in that, In step (5), the three-dimensional Gaussian sputtering algorithm is implemented through differential rendering, calculating the color pixel by pixel. Thus, the rendered image is obtained. : ; in, The set of Gaussian ellipsoids that contribute to the target pixel. For the first The opacity of a Gaussian ellipsoid after 2D projection. For the first The color characteristics of a Gaussian ellipsoid.
9. The method for synthesizing novel perspectives of dynamic scenes based on spiking neurons according to claim 1, characterized in that, In step (6), through photometric loss Optimize the geometric properties of the Gaussian ellipsoid using mask loss. Optimize the dynamic and static properties of the Gaussian ellipsoid.
10. A novel perspective synthesis system for dynamic scenes based on spiking neurons, characterized in that, include: The Gaussian ellipsoid initialization module is used to initialize a set of three-dimensional Gaussian ellipsoids. The 4D mask field module is used to generate dynamic and static masks for each viewpoint and time step, and provides prior supervision for dynamic and static label assignment. The discrete dynamic and static labeling module uses spiking neurons to map the continuous dynamic properties of the Gaussian ellipsoid into binary dynamic and static labels, achieving accurate segmentation. The new perspective synthesis module is used to fuse static Gaussian ellipsoids with time-varying dynamic Gaussian ellipsoids and generate a rendered image of the target perspective through a 3D Gaussian sputtering rendering algorithm. The joint optimization module is used to calculate the loss based on the rendering results, and to reverse optimize the geometric and dynamic properties of the Gaussian ellipsoid to improve the compositing effect from new perspectives.