Curve model-based rendering method and device for performing same

The curve model-based rendering method addresses the inefficiencies of existing 3D Gaussian Splatting by modeling object movement in 3D space, enhancing accuracy and reducing computational burden in dynamic scene rendering.

WO2026127513A1PCT designated stage Publication Date: 2026-06-18KOREA ADVANCED INST OF SCI & TECH

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
KOREA ADVANCED INST OF SCI & TECH
Filing Date
2025-12-04
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

Existing 3D Gaussian Splatting techniques for rendering dynamic scenes are time-consuming and lack accuracy, especially when dealing with moving objects.

Method used

A curve model-based rendering method that includes learning a continuous function to model changes in position, size, and rotation of Gaussians over time in 3D space, using a spline or Bezier curve model for Gaussian splatting, and updating camera poses based on photometric and geometric consistency.

🎯Benefits of technology

The method efficiently renders dynamic scenes with improved accuracy and reduced computational cost, allowing for natural representation of moving objects without excessive complexity.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure KR2025020758_18062026_PF_FP_ABST
    Figure KR2025020758_18062026_PF_FP_ABST
Patent Text Reader

Abstract

Disclosed are a rendering method and device. The method for rendering an image corresponding to a target viewpoint according to an embodiment, comprises the operations of: acquiring a curve model trained to model a change in scene over time in a three-dimensional space; acquiring a Gaussian set representing a scene corresponding to the target viewpoint by inputting time information corresponding to the target viewpoint to the trained curve model; and rendering the image on the basis of a camera pose corresponding to the target viewpoint and the Gaussian set.
Need to check novelty before this filing date? Find Prior Art

Description

Curve model-based rendering method and device for performing the same

[0001] The following disclosure relates to a curve model-based rendering method and an apparatus for performing the same.

[0002] 3D Gaussian Splatting is a technique for rendering a scene by projecting a Gaussian—a set of points in the form of a probability distribution distributed in 3D space—onto the screen. Rendering a scene containing dynamic objects that move over time can be time-consuming and costly, and depending on the algorithm used, the accuracy can be very low.

[0003] The background technology described above is possessed or acquired by the inventor in the process of deriving the content of the disclosure of the present application, and cannot necessarily be considered as prior art disclosed to the general public prior to the filing of this application.

[0004] The present invention was developed with support from the Ministry of Science and ICT (Project No.: RS-2022-00144444, Project Name: Information and Communication / Broadcasting Technology Development Project, Research Project Name: Research on Learning and Rendering Spatial Image Representation of Static and Dynamic Scenes Based on Deep Learning, Lead Institution: Korea Advanced Institute of Science and Technology, Research Management Agency: Korea Institute of Information and Communication Technology Planning and Evaluation).

[0005] A method for rendering an image corresponding to a target view according to one embodiment may include the operation of acquiring a curve model learned to model changes in a scene over time in a three-dimensional space, the operation of acquiring a Gaussian set representing a scene corresponding to the target view by inputting time information corresponding to the target view into the learned curve model, and the operation of rendering the image based on a camera pose corresponding to the target view and the Gaussian set.

[0006] According to one embodiment, the Gaussian set may include a plurality of three-dimensional Gaussians that constitute an image frame corresponding to the target time point.

[0007] According to one embodiment, the learned curve model may be a continuous function in which the change in position, magnitude, and rotation of a Gaussian over time in three-dimensional space is modeled.

[0008] A method according to one embodiment may include: estimating a camera pose corresponding to a camera that captured the image frame based on an image frame and depth information of the image frame; modeling a curve model representing a change over time in a 3D space of a Gaussian set corresponding to the image frame based on the image frame and the camera pose; rendering an image corresponding to a target point in time using the curve model; and updating the parameters of the curve model and the camera pose using a training dataset including the rendered image and ground truth data.

[0009] According to one embodiment, the operation of estimating the camera pose may include the operation of acquiring a plurality of image frames captured using the camera, and the operation of updating the camera pose corresponding to the image frames based on photometric consistency and geometric consistency in three-dimensional space of a scene commonly included in the plurality of image frames.

[0010] According to one embodiment, the operation of modeling the curve model may include the operation of modeling a parameter representing a position change in three-dimensional space of the Gaussian set based on the camera pose and the depth information.

[0011] According to one embodiment, the operation of modeling the curve model may further include the operation of initializing the parameters of the curve model based on the image frame and the depth information of the image frame.

[0012] According to one embodiment, the operation of modeling the curve model may include the operation of modeling the change in position, size, and rotation of a Gaussian set corresponding to the image frame over time as a continuous function.

[0013] According to one embodiment, the operation of updating the camera pose may include updating the camera pose corresponding to the image frame based on depth information of the rendered image.

[0014] According to one embodiment, the curve model may include at least one of a spline curve model and a Bezier curve model for Gaussian splatting.

[0015] According to one embodiment, a computer-readable recording medium storing one or more computer programs may include instructions for performing the method in a processor.

[0016] An apparatus according to one embodiment may include at least one processor comprising a processing circuit and a memory for storing instructions. When the instructions are executed individually or collectively by the at least one processor, the apparatus may be able to perform the method.

[0017] However, technical challenges are not limited to the technical challenges described above, and other technical challenges may exist.

[0018] In relation to the description of the drawings, the same or similar reference numerals may be used for identical or similar components.

[0019] FIG. 1 is a drawing for explaining an image rendering device according to one embodiment.

[0020] FIG. 2 is a diagram illustrating the learning process of a model according to one embodiment.

[0021] FIG. 3 is a diagram illustrating a learning process according to one embodiment.

[0022] FIG. 4 is a diagram illustrating an operation for estimating a camera pose according to one embodiment.

[0023] FIG. 5 is a drawing for explaining a curve model according to one embodiment.

[0024] FIG. 6 is a diagram illustrating the operation of updating a camera pose based on a rendered image according to one embodiment.

[0025] FIGS. 7A and FIGS. 7B are drawings for illustrating rendered images according to one embodiment.

[0026] FIG. 8 is a flowchart illustrating a rendering method according to one embodiment.

[0027] FIG. 9 is a flowchart illustrating a learning method according to one embodiment.

[0028] FIG. 10 is a schematic block diagram of an electronic device according to one embodiment.

[0029] Specific structural or functional descriptions of the embodiments are disclosed for illustrative purposes only and may be modified and implemented in various forms. Accordingly, actual implementations are not limited to the specific embodiments disclosed, and the scope of this specification includes modifications, equivalents, or substitutions included in the technical concept described by the embodiments.

[0030] Terms such as "first" or "second" may be used to describe various components, but these terms should be interpreted solely for the purpose of distinguishing one component from another. For example, the first component may be named the second component, and similarly, the second component may be named the first component.

[0031] When it is stated that a component is "connected" to another component, it should be understood that it may be directly connected to or joined to that other component, or that there may be other components in between.

[0032] Singular expressions include plural expressions unless the context clearly indicates otherwise. In this document, phrases such as “A or B,” “at least one of A and B,” “at least one of A or B,” “A, B or C,” “at least one of A, B and C,” and “at least one of A, B, or C” may each include any one of the items listed together with the corresponding phrase, or all possible combinations thereof. In this specification, terms such as “comprising” or “having” are intended to designate the existence of the described feature, number, step, action, component, part, or combination thereof, and should be understood as not precluding the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof.

[0033] Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as generally understood by those skilled in the art. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with their meaning in the context of the relevant technology, and should not be interpreted in an ideal or overly formal sense unless explicitly defined in this specification.

[0034] As used herein, the term "module" may include a unit implemented in hardware, software, or firmware, and may be used interchangeably with terms such as logic, logic block, component, or circuit. A module may be a component formed integrally, or a minimum unit of said component or a part thereof that performs one or more functions. For example, according to one embodiment, a module may be implemented in the form of an application-specific integrated circuit (ASIC).

[0035] As used in this document, the term "part" refers to software or hardware components, such as FPGAs or ASICs, and the "part" performs certain roles. However, the meaning of "part" is not limited to software or hardware. The "part" may be configured to reside in an addressable storage medium or configured to operate one or more processors. For example, the "part" may include components such as software components, object-oriented software components, class components, and task components, as well as processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables. The functions provided within the components and "parts" may be combined into a smaller number of components and "parts" or further separated into additional components and "parts." Furthermore, the components and "parts" may be implemented to operate one or more CPUs within a device or secure multimedia card. Additionally, '~part' may include one or more processors.

[0036] Hereinafter, embodiments will be described in detail with reference to the attached drawings. In the description with reference to the attached drawings, identical components are given the same reference numeral regardless of the drawing number, and redundant descriptions thereof will be omitted.

[0037]

[0038] FIG. 1 is a drawing for explaining an image rendering device according to one embodiment.

[0039] Referring to FIG. 1, according to one embodiment, a video rendering apparatus (100) may include a device for rendering an image corresponding to a target view. The target view may include information about the position and direction of looking at the scene to be rendered. For example, the target view may include information about a point of view specified by a user in three-dimensional space, although the shooting has not actually taken place. The target view may include information about a camera pose in space. The camera pose (e.g., camera pose (101)) may include information about the position, direction, and angle of the camera that captured the scene to be rendered at the target view.

[0040] According to one embodiment, an image rendering device (100) can output an image (190) corresponding to a target point in time using a camera pose (101) and time information (103). The time information (103) is time information corresponding to a target point in time and may be information indicating the time of a scene to be rendered. The image rendering device (100) can receive the camera pose (101) and time information (103) and render an image viewed at a specific point in time at a specific time.

[0041] According to one embodiment, an image rendering device (100) may acquire a curve model trained to model changes in a scene over time in three-dimensional space. The trained curve model may be a continuous function modeling changes in position, size, and rotation of a Gaussian over time in three-dimensional space. The trained curve model is trained to render dynamic three-dimensional space images and may be used to predict the position of a moving object existing within a three-dimensional space scene over time. The curve model may include at least one of a spline curve model and a Bezier curve model for Gaussian splatting. For example, the curve model may be a model that predicts the position of an object using a cubic Hermite spline curve. A cubic Hermite spline curve is N c It includes control points and can be represented as a curve by interpolating between two adjacent control points.

[0042] According to one embodiment, an image rendering device (100) can input time information (103) corresponding to a target point in time into a learned curve model to obtain a Gaussian set representing a scene corresponding to a target point in time. The Gaussian set may include a plurality of 3D Gaussians that constitute an image frame corresponding to a target point in time. The image rendering device (100) can input time information (103) into a learned curve model to obtain a plurality of 3D Gaussians for that time. The image rendering device (100) can render an image based on a camera pose (101) corresponding to a target point in time and a Gaussian set. Based on the camera pose (101), the image rendering device (100) can render an image viewed from a target point in time (e.g., an image corresponding to a target point in time (190)) using the obtained Gaussian set.

[0043]

[0044] FIG. 2 is a diagram illustrating the learning process of a model according to one embodiment.

[0045] Referring to FIG. 2, according to one embodiment, a curve model can be learned through a learning process (e.g., steps (210–280)). The curve model can be learned by a learning device (not shown) (e.g., the electronic device (1000) of FIG. 10). The curve model may include at least one of a spline curve model and a Bezier curve model for Gaussian splatting.

[0046] According to one embodiment, in the camera pose estimation step (210), the learning device can estimate a camera pose corresponding to the camera that captured the image frame based on the image frame (201) and the depth information (202) of the image frame. The learning device can estimate a camera pose, such as the position, direction, and angle of the camera that captured the image frame, based on the image frame (201) and the depth information (202). The learning device can acquire a plurality of image frames captured using a camera. The learning device can update the camera pose corresponding to the image frame based on photometric consistency and geometric consistency in the three-dimensional space of the scene commonly included in the plurality of image frames. The learning device can predict the camera pose corresponding to the image frame using a multi-perceptron network. The step of estimating the camera pose prior to the learning of the curve model will be explained in detail later with reference to FIG. 4.

[0047] According to one embodiment, in the curve model modeling step (230), the learning device may model a curve model representing a change over time in a three-dimensional space of a Gaussian set corresponding to an image frame (201) based on an image frame (201) and a camera pose. The learning device may model a parameter representing a change in position in a three-dimensional space of a Gaussian set based on the camera pose and depth information (202).

[0048] According to one embodiment, in the parameter initialization step (250), the learning device can initialize the parameters of the curve model based on the image frame (201) and the depth information (202) of the image frame.

[0049] According to one embodiment, in the parameter update step (270), the learning device can model the change in position, size, and rotation over time of a Gaussian set corresponding to an image frame (201) as a continuous function. The learning device can render an image corresponding to a target point in time using the curve model.

[0050] According to one embodiment, in the camera pose update step (280), the learning device may update the parameters of the curve model and the camera pose using a learning data set including rendered images and correct data. The learning device may update the camera pose corresponding to the image frame (201) based on depth information of the rendered images. The process of the learning device modeling the curve model, initializing the parameters, and updating them will be described in detail later with reference to FIGS. 5 and 6.

[0051]

[0052] FIG. 3 is a diagram illustrating a learning process according to one embodiment.

[0053] Referring to FIG. 3, according to one embodiment, an image rendering device (e.g., the image rendering device (100) of FIG. 1) can render an image using a learned curve model. The curve model may be learned to model changes in a scene over time in three-dimensional space. A learning device (not shown) may model and train the curve model. The learning device (not shown) may be the same device as the image rendering device (100) or may be implemented as a different device, but is not limited thereto.

[0054] According to one embodiment, a process for creating (e.g., modeling) and training a model that renders an image corresponding to a target point in time (hereinafter, training process) may include a warm-up stage and a main training stage. The training process may be performed through one or more components (310 to 350). The components (310 to 350) may be software modules implemented on a processor of a training device (not shown) (e.g., the processor (1010) of FIG. 10). The components (310 to 350) are illustrated as an example to explain the modeling and / or training process of a curve model, and may include various variations as long as the operations of the training device (not shown) and / or image rendering device (100) described in this disclosure can be implemented. For example, two or more components may be combined, or one or more components may be added or omitted.

[0055] According to one embodiment, a learning device (not shown) has an image frame (e.g., I t Based on the depth information of the image frame and the camera pose corresponding to the camera that captured the image frame, the camera pose can be estimated. The depth information may include three-dimensional information for determining the distance between the camera and / or sensor and an object included in an image or video captured through the camera and / or sensor. The camera pose estimation module (310) estimates from the image frame the camera pose that captured the image frame (e.g., camera intrinsic parameters) ) can be estimated. The camera pose estimation module (310) is video frame I tBased on the depth information of the video frames, the camera pose that captured the video frames can be estimated. The camera pose estimation module (310) can estimate the camera pose based on photometric consistency and geometric consistency in the three-dimensional space of a scene commonly included in a plurality of video frames captured using a camera. The specific operation of the camera pose estimation module (310) will be explained in detail later with reference to FIG. 4.

[0056] According to one embodiment, a learning device (not shown) has an image frame I t and camera poses Based on this, a curve model representing the change over time in a three-dimensional space of a Gaussian set corresponding to a video frame can be modeled. The curve model modeling module (330) can model a curve model representing the change over time of a scene in three-dimensional space. The curve model modeling module (330) can model parameters representing the change in position of a Gaussian set in three-dimensional space based on camera pose and depth information. The parameters of the curve model may be the positions of a plurality of control points that serve as a reference for adjusting the trajectory and / or curvature of the curve. The curve model modeling module (330) can model the curve model by representing the position of a moving object included in the three-dimensional space scene as a curve. The curve model may include at least one of a spline curve model and a Bezier curve model for Gaussian splatting. The curve model modeling module (330) can initialize the parameters of the curve model based on the image frame and the depth information of the image frame. The curve model modeling module (330) can model the change in position, size, and rotation over time of the Gaussian set corresponding to the image frame as a continuous function. The specific operation of the curve model modeling module (330) will be explained in detail later with reference to FIG. 5.

[0057] According to one embodiment, a learning device (not shown) can perform pruning to smooth out the curve by reducing the number of control points of the curve model. The curve model pruning module (340) can perform lightweighting so that the curve model does not become excessively complex by reducing the number of control points of the curve model. The curve model pruning module (340) can model a curve model that can closely predict the movement of an actual object within an image by removing unnecessary parameters from the curve model. The specific operation of the curve model pruning module (340) will be explained in detail later with reference to FIG. 5.

[0058] According to one embodiment, a learning device (not shown) can render an image corresponding to a target point in time using a curve model. The learning device (not shown) can update the parameters of the curve model and the camera pose using a learning dataset that includes the rendered image and ground truth data. The learning device (not shown) can compare the rendered image with the ground truth data and update the parameters of the curve model so that the image can be rendered closer to the ground truth data. The learning device (not shown) can compare the color information, depth information, and the loss between the motion mask of the rendered image and the ground truth data. ) can be calculated, and the parameters of the curve model can be updated to minimize loss. The camera pose update module (350) can update the camera pose using the depth information of the rendered image using the curve model. The camera pose update module (350) can estimate the camera pose using the same process as the camera pose prediction module (310), but when unprojecting the image frame into 3D space, image frame I t Depth information of the rendered image, not the depth information of the object It can be used. The specific operation of the camera pose update module (350) will be explained in detail later with reference to FIG. 5.

[0059]

[0060] FIG. 4 is a diagram illustrating an operation for estimating a camera pose according to one embodiment.

[0061] Referring to FIG. 4, according to one embodiment, the camera pose prediction module (310) is image frame I t By using [this] as input, camera poses (e.g., camera parameters) can be estimated and optimized. Conventional Structure from Motion (SfM) methods (e.g., COLMAP) have the problem that they cannot reliably estimate camera poses (e.g., camera parameters) of dynamic scenes in monocular video captured in natural environments. The camera pose prediction module (310) according to the present disclosure is an image frame I of time t t Regarding, a learnable multilayer perceptron (MLP) network Using [this], camera pose (e.g., camera parameters) can be predicted. The camera pose prediction module (310) uses video frame I t position vector of Based on, the extrinsic parameters of a monocular camera can be predicted. Here, is a positional encoding, and is the camera's intrinsic parameter focal length It can be learned as a parameter shared across the entire video frame.

[0062] According to one embodiment, the camera pose prediction module (310) can estimate the camera pose based on photometric consistency and geometric consistency in three-dimensional space of a scene commonly included in a plurality of image frames captured using a camera. The camera pose prediction module (310) is image frame I t Based on the predicted camera pose, focal length, and depth information for, image frame I t It can be unprojected into three-dimensional space. The camera pose prediction module (310) is image frame I t Multiple image frames can be obtained by back-projecting onto a three-dimensional space. The camera pose prediction module (310) can determine optical consistency in multiple image frames captured using a camera. Optical consistency is a pre-calculated depth map Projected pixel coordinates based on It can be defined as the optical alignment of. Given a pre-calculated metric depth, optical alignment is achieved through an optimization process, reference frame The color of The color of this target frame It can converge if projected to align with the. is the pixel coordinate of the reference frame corresponding to the pixel coordinate of the target frame, and can be calculated as shown in Equation 1 below.

[0063] [Mathematical Formula 1]

[0064]

[0065] Such projection alignment can be defined as optical consistency, and the loss function L for deriving optical consistencypc It can be calculated according to the mathematical formula 2 below.

[0066] [Mathematical Formula 2]

[0067]

[0068] Here, is the Hadamard product, is video frame I t As a motion mask to exclude dynamic objects from, It is the same as, This may be intended to eliminate inconsistency caused by dynamic regions.

[0069]

[0070] In addition to optical consistency, the camera pose prediction module (310) can determine geometric consistency for the optimization of the multilayer perceptron network. Geometric consistency may refer to the geometric consistency of back-projected pixels in three-dimensional space. A loss function L for deriving geometric consistency gc It can be calculated according to the mathematical formula 3 below.

[0071] [Mathematical Formula 3]

[0072]

[0073] Here, and are each pixel and pixels It can represent a 3D position corresponding to.

[0074]

[0075] According to one embodiment, the camera pose prediction module (310) can optimize a multilayer perceptron (MLP) network and learnable parameters based on the consistency between the back-projection results of different image frames. The camera pose predicted by the camera pose prediction module (310) can be used in the main learning step. In the warm-up step according to the present disclosure, optimization of camera parameters using optical consistency and geometric consistency can be performed. The total loss in the warm-up step can be calculated as shown in Equation 4 below.

[0076] [Mathematical Formula 4]

[0077]

[0078] Subsequently, the camera pose can be updated using the loss between the rendered image and the ground truth data using a curve model. This will be explained in detail later with reference to Fig. 6.

[0079]

[0080] FIG. 5 is a drawing for explaining a curve model according to one embodiment.

[0081] Referring to FIG. 5, according to one embodiment, the main learning step may include a curve modeling process and a pruning process. The curve model modeling module (330) can model the curve model. Monocular image containing multiple image frames Given this, the curve model can be designed and trained to rapidly render spatio-temporal images based on the target viewpoint. For each frame at time t, the curve model uses extrinsic camera parameters and inner parameters ... can be estimated. The curve model modeling module (330) can initialize the parameters of the 3D curve model based on the camera pose coarsely optimized by the camera pose prediction module (310). The curve model may include at least one of a spline curve model and a Bezier curve model for Gaussian splatting. For example, the curve model is a spline curve-based deformation model, and a 3D Hermite spline curve is deformed and can be implemented as a Motion-Adaptive Spline (MAS) model learned to be suitable for dynamic objects. The 3D Hermite spline curve (cubic Hermite spline curve) is N c It may include control points, and the position of each control point is a learnable parameter. Additionally, the curve model pruning module (340) performs Motion-Adaptive Control Points Pruning (MACP) for pruning the MAS, and by adaptively adjusting the number of control points according to the type and intensity of the object's movement, it can control the size of the curve model MAS and increase efficiency.

[0082] According to one embodiment, the curve model modeling module (330) can model the change in position, magnitude, and rotation of a Gaussian set corresponding to an image frame over time as a continuous function. The curve model uses a 3D Hermit Spline curve, so that the mean of each dynamic 3D Gaussian at time t cast It may be intended for modeling as follows. Here, is learnable N c It is a set of control points and can serve as an additional parameter for the Gaussian. Curve model It can be calculated according to the mathematical formula 5 below.

[0083] [Mathematical Formula 5]

[0084]

[0085] Here, is the adjustment point p k The approximate slope (approximated tangent) at can be represented.

[0086]

[0087] According to one embodiment, the curve model modeling module (330) can initialize the parameters of the curve model based on video frames and depth information of the video frames. The process of initializing the parameters of the curve model can be performed through a control points initialization process. To stably optimize the curve model, the curve model modeling module (330) for all video frames, 2D tracks and metric depth You can use 2D tracks Convert from camera space to image space, and each pixel By back-projecting it into the world coordinate system, a 3D track can be calculated as shown in Equation 6 below.

[0088] [Mathematical Formula 6]

[0089]

[0090] Here, It can represent the back-projection function from image space to the world coordinate system, and camera parameters and The camera pose prediction module (310) described with reference to FIG. 4 may have made a rough prediction without correct data.

[0091]

[0092] Afterwards, the initial adjustment point P can be initialized using the least-squares (LS) method as shown in Equation 7 below.

[0093] [Mathematical Formula 7]

[0094]

[0095] The curve model modeling module (330) can model parameters representing changes in the position of a Gaussian set in three-dimensional space based on camera pose and depth information.

[0096] According to one embodiment, the curve model pruning module (340) can perform pruning to reduce the number of control points in order to prevent problems such as overfitting occurring or rendering speed being reduced due to the excessive number of control points of the curve model modeled by the curve model modeling module (330). The curve model pruning module (340) can set the number of required control points differently by considering the type and size of the object's movement.

[0097] According to one embodiment, the curve model pruning module (340) has a set of control points that has fewer control points (e.g., one fewer) than the existing set of control points P. can be defined. Here, It may be. The curve model pruning module (340) and Based on the error between, the set of control points at P It can be replaced with. For example, the curve model pruning module (340) is and The error between them is a fixed threshold value If it is smaller, the final set of Gaussian control points is in P It can be replaced with. This can be equal to mathematical formula 8 below.

[0098] [Mathematical Formula 8]

[0099]

[0100] The curve model pruning module (340) can increase both the efficiency and accuracy of the curve model by using a minimum number of control points for simple movements and providing more control points for complex movements.

[0101]

[0102] FIG. 6 is a diagram illustrating the operation of updating a camera pose based on a rendered image according to one embodiment.

[0103] Referring to FIG. 6, according to one embodiment, a learning device (not shown) may perform a two-stage optimization process consisting of a warm-up stage and a main learning stage to jointly and stably learn a curve model (hereinafter, curve model) representing the change over time in a three-dimensional space of a Gaussian set corresponding to an image frame and camera parameter estimation. In the warm-up stage, the camera pose prediction module (310) described with reference to FIG. 4 may perform optimization of camera parameters based on optical consistency and geometric consistency. When the warm-up stage is finished, the predicted inner parameters of the camera for all image frames and external parameters ...is obtained and can be used for curve model modeling and control point initialization of the curve model modeling module (330) described with reference to FIG. 5. In the main learning phase, the camera pose update module (350) can perform optimization of the 3D Gaussian for static objects as well as the 3D Gaussian for dynamic objects, and simultaneously perform estimation of camera parameters. The camera pose update module (350) [requires] the color information, depth information, and dynamic region mask (motion mask) between the rendered image and the ground truth data. The camera pose update module (350) can calculate the total loss and update the parameters of the curve model to minimize the loss. The camera pose update module (350) can calculate the total loss using the following mathematical formula 9.

[0104] [Mathematical Formula 9]

[0105]

[0106] Here, is depth information of the rendered image Optical consistency loss due to, It can represent a loss to derive geometric consistency.

[0107]

[0108] A learning device (not shown) according to the present disclosure can clearly induce separation between a 3D Gaussian for dynamic objects and a 3D Gaussian for static objects by introducing the concept of binary dice loss. Binary dice loss may be a loss that calculates the degree of overlap between two binary masks and trains to increase the similarity between the two masks. The learning device (not shown) uses a pre-calculated motion mask and rendered motion mask The binary die loss between can be calculated as shown in Equation 10 below.

[0109] [Mathematical Formula 10]

[0110]

[0111] Here, It can be a small value to prevent numerical issues.

[0112] Rendered motion mask is pixels It can be calculated through alpha-blending of 3D Gaussians corresponding to it, and can be as shown in Equation 11 below.

[0113] [Mathematical Formula 11]

[0114] Here, if the i-th 3D Gaussian is a static 3D Gaussian, and, otherwise am.

[0115] Through the learning process according to the present disclosure, the curve model can be guided to effectively and efficiently model dynamic 3D scenes using only monocular images, and can have the advantage of being able to directly estimate camera parameters without external modules.

[0116]

[0117] FIGS. 7A and FIGS. 7B are drawings for illustrating rendered images according to one embodiment.

[0118] Referring to FIGS. 7a and 7b, according to one embodiment, FIGS. 7a and 7b may compare an image rendered according to the image rendering method according to the present disclosure with an image rendered according to another rendering technique. Images (710 to 795) may represent image rendering performance according to a target viewpoint in an NVIDIA dataset.

[0119] According to one embodiment, images (710) and (715) may represent video frames rendered using the D3DGS (Dynamic 3D Gaussian Splatting) technique, images (720) and (725) may represent video frames rendered using the STGS (Spacetime Gaussian Splatting) technique, images (730) and (735) may represent video frames rendered using the DynNeRF (Dynamic Neural Radiance Fields) technique, and images (740) and (745) may represent video frames rendered using the RoDynRF (Robust Dynamic NeRF) technique. Images (790) and (795) may represent video frames rendered according to the rendering method according to the present disclosure. It can be confirmed that the image rendered according to the rendering method according to the present disclosure has less afterimage and reveals a more natural shape of dynamic objects compared to conventional video rendering techniques.

[0120] In addition, the results showing the Peak Signal-to-Noise Ratio (PSNR) as an evaluation metric for rendering performance using various video rendering methods may be as shown in Table 1 below.

[0121]

[0122]

[0123] FIG. 8 is a flowchart illustrating a rendering method according to one embodiment.

[0124] Referring to FIG. 8, according to one embodiment, operations 810 to 850 may be operations performed by the image rendering device (100) of FIG. 1 described with reference to FIG. 1 to 7b.

[0125] According to one embodiment, operations 810 to 850 may be understood to be performed in a processor (e.g., processor (1010) of FIG. 10) of an image rendering device (100) described with reference to FIG. 1 (e.g., electronic device (1000) of FIG. 10).

[0126] In operation 810, the image rendering device (100) may obtain a curve model learned to model changes in a scene over time in three-dimensional space. The learned curve model may be a continuous function modeled of changes in position, size, and rotation of a Gaussian over time in three-dimensional space.

[0127] In operation 830, the image rendering device (100) inputs time information corresponding to a target point in time into a learned curve model to obtain a Gaussian set representing a scene corresponding to a target point in time. The Gaussian set may include a plurality of 3D Gaussians that constitute an image frame corresponding to a target point in time.

[0128] In operation 850, the image rendering device (100) can render an image corresponding to a target viewpoint based on a camera pose and a Gaussian set corresponding to a target viewpoint.

[0129] Operations 810 through 850 may be performed sequentially, but are not limited thereto. For example, two or more operations may be performed in parallel.

[0130]

[0131] FIG. 9 is a flowchart illustrating a learning method according to one embodiment.

[0132] Referring to FIG. 9, according to one embodiment, operations 910 to 970 may be operations performed by a learning device (not shown) described with reference to FIG. 1 to 8.

[0133] According to one embodiment, operations 910 to 970 may be understood to be performed in a processor (e.g., processor (1010) of FIG. 10) of a learning device (not shown) (e.g., electronic device (1000) of FIG. 10).

[0134] In operation 910, a learning device (not shown) can estimate a camera pose corresponding to a camera that captured an image frame based on an image frame and depth information of the image frame. The learning device (not shown) can estimate a camera pose based on photometric consistency and geometric consistency in three-dimensional space of a scene that is commonly included in a plurality of image frames captured using a camera.

[0135] In operation 930, a learning device (not shown) can model a curve model representing a change over time in a three-dimensional space of a Gaussian set corresponding to an image frame based on an image frame and a camera pose. The learning device (not shown) can model a parameter representing a change in position in a three-dimensional space of a Gaussian set based on camera pose and depth information.

[0136] In operation 950, the learning device (not shown) can render an image corresponding to the target time point using a curve model.

[0137] In operation 970, a learning device (not shown) can update the parameters of a curve model and camera poses using a learning data set containing rendered images and ground truth data.

[0138] Operations 910 through 970 may be performed sequentially, but are not limited thereto. For example, two or more operations may be performed in parallel.

[0139]

[0140] FIG. 10 is a schematic block diagram of an electronic device according to one embodiment.

[0141] Referring to FIG. 10, according to one embodiment, an electronic device (1000) (e.g., image rendering device (100) of FIG. 1, learning device (not shown)) may include a memory (1030) and a processor (1010).

[0142] The memory (1030) can store instructions (or programs) executable by the processor (1010). For example, the instructions may include instructions for executing the operation of the processor (1010) and / or the operation of each component of the processor (1010).

[0143] The memory (1030) may include one or more computer-readable storage media. The memory (1030) may include non-volatile storage devices (e.g., magnetic hard disc, optical disc, floppy disc, flash memory, EPROM (electrically programmable memories), EEPROM (electrically erasable and programmable)).

[0144] The memory (1030) may be a non-transitory medium. The term "non-transitory" may indicate that the storage medium is not implemented by a carrier wave or a propagated signal. However, the term "non-transitory" should not be interpreted as meaning that the memory (1030) is immobile.

[0145] The processor (1010) can process data stored in memory (1030). The processor (1010) can execute computer-readable code (e.g., software) stored in memory (1030) and instructions triggered by the processor (1010).

[0146] The processor (1010) may be a data processing device implemented in hardware having a circuit having a physical structure for executing desired operations. For example, the desired operations may include code or instructions included in a program.

[0147] For example, a data processing device implemented in hardware may include a microprocessor, a central processing unit, a processor core, a multi-core processor, a multiprocessor, an Application-Specific Integrated Circuit (ASIC), and a Field Programmable Gate Array (FPGA).

[0148] The processor (1010) can cause the electronic device (1000) to perform one or more operations by executing code and / or instructions stored in memory (1030). The operations performed by the electronic device (1000) may be substantially the same as the operations performed by the image rendering device (100) or the learning device (not shown) described with reference to FIGS. 1 through 10. Such redundant descriptions are omitted.

[0149]

[0150] The embodiments described above may be implemented as hardware components, software components, and / or combinations of hardware and software components. For example, the devices, methods, and components described in the embodiments may be implemented using a general-purpose computer or a special-purpose computer, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing unit may execute an operating system (OS) and software applications executed on said operating system. Additionally, the processing unit may access, store, manipulate, process, and generate data in response to the execution of the software. For ease of understanding, the processing unit may be described as being used as a single unit, but those skilled in the art will understand that the processing unit may include multiple processing elements and / or multiple types of processing elements. For example, the processing unit may include multiple processors or one processor and one controller. In addition, other processing configurations, such as parallel processors, are also possible.

[0151] Software may include computer programs, code, instructions, or a combination of one or more of these, and may configure a processing unit to operate as desired or instruct the processing unit independently or collectively. Software and / or data may be stored on any type of machine, component, physical device, virtual equipment, computer storage medium, or device so as to be interpreted by the processing unit or to provide instructions or data to the processing unit. Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on computer-readable recording media.

[0152] The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may store program instructions, data files, data structures, etc., either individually or in combination, and the program instructions recorded on the medium may be those specifically designed and configured for the embodiment or those known and available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes; optical recording media such as CD-ROMs and DVDs; magneto-optical media such as floptical disks; and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, and flash memory. Examples of program instructions include machine code, such as that generated by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.

[0153] The hardware device described above may be configured to operate as one or more software modules to perform the operation of the embodiment, and vice versa.

[0154] Although the embodiments have been described above with reference to the limited drawings, those skilled in the art can apply various technical modifications and variations based thereon. For example, suitable results may be achieved even if the described techniques are performed in a different order than described, and / or if the components of the described system, structure, device, circuit, etc. are combined or assembled in a form different from described, or replaced or substituted by other components or equivalents.

[0155] Therefore, other implementations, other embodiments, and equivalents to the claims also fall within the scope of the claims set forth below.

Claims

1. A method for rendering an image corresponding to a target view, An operation to acquire a curve model trained to model changes in a scene over time in three-dimensional space; The operation of inputting time information corresponding to the above target time point into the learned curve model to obtain a Gaussian set representing a scene corresponding to the above target time point; and An operation of rendering the image based on the camera pose corresponding to the above target point in time and the above Gaussian set. A method including 2. In Paragraph 1, The above Gaussian set is, Multiple 3D Gaussians constituting the image frame corresponding to the above target time point A method including 3. In Paragraph 1, The above-mentioned learned curve model is, A method in which the change in position, magnitude, and rotation of a Gaussian over time in three-dimensional space is modeled as a continuous function.

4. An operation of estimating a camera pose corresponding to the camera that captured the image frame based on the image frame and the depth information of the image frame; An operation of modeling a curve model representing the change over time in a three-dimensional space of a Gaussian set corresponding to the image frame, based on the image frame and the camera pose; Using the above curve model, an operation of rendering an image corresponding to a target point in time; and An operation to update the parameters of the curve model and the camera pose using a training dataset including rendered images and ground truth data. A method including 5. In Paragraph 4, The operation of estimating the above camera pose is, The operation of acquiring a plurality of video frames captured using the above camera; and An operation to update a camera pose corresponding to an image frame based on photometric consistency and geometric consistency in three-dimensional space of a scene commonly included in the plurality of image frames. A method including 6. In Paragraph 4, The operation of modeling the above curve model is, An operation to model a parameter representing the positional change of the Gaussian set in three-dimensional space based on the camera pose and depth information. A method including 7. In Paragraph 4, The operation of modeling the above curve model is, Operation of initializing the parameters of the curve model based on the above image frame and the depth information of the above image frame A method that further includes.

8. In Paragraph 4, The operation of modeling the above curve model is, Operation of modeling the changes in position, size, and rotation of a Gaussian set corresponding to the above image frame over time as a continuous function A method including 9. In Paragraph 4, The action of updating the above camera pose is, An operation to update the camera pose corresponding to the image frame based on the depth information of the rendered image above. A method including 10. In Paragraph 4, The above curve model is, A method comprising at least one of a spline curve model and a Bezier curve model for Gaussian splatting.