Gaussian splat data encoding device, gaussian splat data encoding method, gaussian splat data decoding device, and gaussian splat data decoding method
Gaussian splat data transmission and reception devices/methods efficiently encode and decode point cloud data, addressing throughput and complexity issues to provide high-quality services for VR, AR, MR, and autonomous driving.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- LG ELECTRONICS INC
- Filing Date
- 2025-12-10
- Publication Date
- 2026-07-02
AI Technical Summary
Generating, transmitting, and receiving point cloud data is challenging due to the large number of points in 3D space, requiring significant throughput and complexity in encoding and decoding processes.
The method employs Gaussian splat data transmission and reception devices/methods to efficiently encode and decode point cloud data, using Gaussian splat data encoding and decoding processes to reduce latency and complexity.
This approach provides high-quality point cloud services with reduced latency and encoding/decoding complexity, supporting various video codec methods and general-purpose point cloud content including autonomous driving services.
Smart Images

Figure KR2025021281_02072026_PF_FP_ABST
Abstract
Description
Gaussian splat data encoding device, Gaussian splat data encoding method, Gaussian splat data decoding device and Gaussian splat data decoding method
[0001] The embodiments provide a method for providing Point Cloud content to provide various services to users, such as VR (Virtual Reality), AR (Augmented Reality), MR (Mixed Reality), and autonomous driving services.
[0002] A point cloud is a collection of points in 3D space. There is a problem in that it is difficult to generate point cloud data because there are many points in 3D space.
[0003] There is a problem in that a large amount of throughput is required to transmit and receive point cloud data.
[0004] The technical problem according to the embodiments is to provide a Gaussian splat data transmission device, a transmission method, a Gaussian splat data reception device, and a reception method for efficiently transmitting and receiving a point cloud by configuring it as Gaussian splat data in order to solve the aforementioned problems, etc.
[0005] The technical problem according to the embodiments is to provide a Gaussian splat data transmission device, a transmission method, a Gaussian splat data reception device, and a reception method for solving latency and encoding / decoding complexity.
[0006] However, the scope of rights of the embodiments is not limited to the technical problems described above, and may be extended to other technical problems that can be inferred by a person skilled in the art based on the entire content of this document.
[0007] A method according to the embodiments may include the step of receiving a bitstream containing Gaussian splat data; and the step of decoding the Gaussian splat data. A method according to the embodiments may include the step of encoding the Gaussian splat data; and the step of generating a bitstream containing the Gaussian splat data.
[0008] The Gaussian splat data transmission method, transmission device, Gaussian splat data reception method, and reception device according to the embodiments can provide a high-quality Gaussian splat service.
[0009] The Gaussian splat data transmission method, transmission device, Gaussian splat data reception method, and reception device according to the embodiments can achieve various video codec methods.
[0010] The point Gaussian splat transmission method, transmission device, Gaussian splat data reception method, and reception device according to the embodiments can provide general-purpose point cloud content such as autonomous driving services.
[0011] Drawings are included to further understand the embodiments, and the drawings illustrate the embodiments along with descriptions related to the embodiments.
[0012] FIG. 1 shows an example of the structure of a transmitting / receiving system for providing Point Cloud content according to embodiments.
[0013] FIG. 2 shows an example of point cloud data capture according to embodiments.
[0014] FIG. 3 shows examples of point clouds, geometry, and texture images according to embodiments.
[0015] FIG. 4 shows a V3C bitstream according to embodiments.
[0016] FIG. 5 illustrates a V-PCC encoding procedure according to embodiments.
[0017] FIG. 6 illustrates a 3D patch generation process according to embodiments.
[0018] FIG. 7 shows examples of a tangent plane and a normal vector of a surface according to embodiments.
[0019] FIG. 8 shows an example of a bounding box of a point cloud according to embodiments.
[0020] FIG. 9 shows an additional projection plane for improving visual quality according to embodiments.
[0021] FIG. 10 shows an example of the difference in occupancy map according to the size of the occupancy packing block according to the embodiments.
[0022] FIG. 11 shows an example of determining the location of individual patches of an occupancy map according to embodiments.
[0023] FIG. 12 shows an example of the relationship between the normal, tangent, and bitangent axes according to the embodiments.
[0024] FIG. 13 shows an example of the configuration of the minimum mode and maximum mode of the projection mode according to the embodiments.
[0025] FIG. 14 shows an example of an EDD code according to the embodiments.
[0026] FIG. 15 shows the patch boundaries of a smoothing point cloud and a trilinear filter according to the embodiments.
[0027] FIG. 16 shows an example of recoloring using color values of adjacent points according to embodiments.
[0028] FIG. 17 shows an example of an attribute interleaving process according to embodiments.
[0029] FIG. 18 shows an example of push-pull background filling according to embodiments.
[0030] FIG. 19 shows examples of possible traversal orders for a 4x4 block according to embodiments.
[0031] FIG. 20 shows an example of a best traversal order according to embodiments.
[0032] FIG. 21 shows an example of a 2D video / image encoder according to embodiments.
[0033] FIG. 22 illustrates a V-PCC decoding procedure according to embodiments.
[0034] FIG. 23 shows an example of a 2D video / image decoder according to embodiments.
[0035] FIG. 24 shows an example of an operation flowchart of a transmitting device according to embodiments.
[0036] FIG. 25 shows an example of an operation flowchart of a receiving device according to embodiments.
[0037] FIG. 26 shows 3DGS (3D Gaussian Splatting) data components according to embodiments.
[0038] FIG. 27 shows a V-GSC (Video-based Gaussian Splat Coding) encoder according to embodiments.
[0039] FIG. 28 shows pre-encoding according to embodiments.
[0040] FIG. 29 shows a V-GSC decoder according to embodiments.
[0041] FIG. 30 shows post-decoding according to embodiments.
[0042] FIG. 31 shows a graphic engine according to embodiments.
[0043] FIG. 32 illustrates a encoding method according to embodiments.
[0044] FIG. 33 illustrates a decoding method according to embodiments.
[0045] Preferred embodiments of the embodiments are described in detail, and examples thereof are shown in the accompanying drawings. The following detailed description, with reference to the accompanying drawings, is intended to describe preferred embodiments of the embodiments rather than merely embodiments that may be implemented according to the embodiments. The following detailed description includes details to provide a thorough understanding of the embodiments. However, it is obvious to those skilled in the art that the embodiments may be practiced without these details.
[0046] Most terms used in the embodiments are selected from those commonly used in the field, but some terms are chosen at the applicant's discretion, and their meanings are described in detail in the following description as necessary. Accordingly, the embodiments should be understood based on the intended meaning of the terms, rather than their mere names or meanings.
[0047] FIG. 1 shows an example of the structure of a transmitting / receiving system for providing Point Cloud content according to embodiments.
[0048] This document provides a method for providing Point Cloud content to provide various services to users, such as VR (Virtual Reality), AR (Augmented Reality), MR (Mixed Reality), and autonomous driving services. In the embodiments, Point Cloud content represents data in which objects are represented as points, and may be referred to as a point cloud, point cloud data, point cloud video data, point cloud image data, etc.
[0049] A point cloud data transmission device (Transmission device, 10000) according to embodiments includes a point cloud video acquisition unit (Point Cloud Video Acquisition, 10001), a point cloud video encoder (Point Cloud Video Encoder, 10002), a file / segment encapsulation unit (10003), and / or a transmitter (or communication module, 10004). The transmission device according to embodiments can acquire and process point cloud video (or point cloud content) and transmit it. According to embodiments, the transmission device may include a fixed station, a base transceiver system (BTS), a network, an AI (Artificial Intelligence) device and / or system, a robot, an AR / VR / XR device and / or server, etc. Additionally, according to embodiments, the transmission device (10000) may include a device, robot, vehicle, AR / VR / XR device, mobile device, home appliance, IoT (Internet of Thing) device, AI device / server, etc. that communicates with a base station and / or other wireless device using wireless access technology (e.g., 5G NR (New RAT), LTE (Long Term Evolution)).
[0050] A point cloud video acquisition unit (Point Cloud Video Acquisition, 10001) according to the embodiments acquires a point cloud video through a process such as capturing, synthesizing, or generating a point cloud video.
[0051] A point cloud video encoder (10002) according to embodiments encodes point cloud video data. According to embodiments, the point cloud video encoder (10002) may be referred to as a point cloud encoder, a point cloud data encoder, an encoder, etc. Furthermore, point cloud compression coding (encoding) according to embodiments is not limited to the embodiments described above. The point cloud video encoder may output a bitstream containing encoded point cloud video data. The bitstream may include not only the encoded point cloud video data but also signaling information related to the encoding of the point cloud video data.
[0052] The encoder according to the embodiments may support both G-PCC (Geometry-based Point Cloud Compression) encoding and / or V-PCC (Video-based Point Cloud Compression) encoding. Additionally, the encoder may encode a point cloud (referring to both point cloud data and points) and / or signaling data regarding the point cloud. The specific operation of the encoding according to the embodiments is described below.
[0053] Meanwhile, the term V-PCC used in this document refers to Video-based Point Cloud Compression (V-PCC), and the term V-PCC is synonymous with Visual Volumetric Video-based Coding (V3C) and may be referred to as mutually complementary.
[0054] A file / segment encapsulation module (10003) according to the embodiments encapsulates point cloud data in the form of a file and / or segment. A point cloud data transmission method / device according to the embodiments can transmit point cloud data in the form of a file and / or segment.
[0055] A transmitter (or communication module, 10004) according to the embodiments transmits encoded point cloud video data in the form of a bitstream. According to the embodiments, a file or segment may be transmitted to a receiving device via a network or stored on a digital storage medium (e.g., USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc.). The transmitter according to the embodiments can communicate wired or wirelessly with a receiving device (or receiver) via a network such as 4G, 5G, or 6G. Additionally, the transmitter may perform necessary data processing operations according to a network system (e.g., a communication network system such as 4G, 5G, or 6G). Additionally, the transmitting device may transmit encapsulated data according to an on-demand method.
[0056] A point cloud data receiving device (Reception device, 10005) according to embodiments includes a receiver (10006), a file / segment decapsulation unit (10007), a point cloud video decoder (Point Cloud Decoder, 10008), and / or a renderer (Renderer, 10009). According to embodiments, the receiving device may include a device, robot, vehicle, AR / VR / XR device, mobile device, home appliance, IoT (Internet of Thing) device, AI device / server, etc., which communicates with a base station and / or other wireless device using a wireless access technology (e.g., 5G NR (New RAT), LTE (Long Term Evolution)).
[0057] A receiver (10006) according to the embodiments receives a bitstream containing point cloud video data. According to the embodiments, the receiver (10006) can transmit feedback information to a point cloud data transmission device (10000).
[0058] A file / segment decapsulation module (10007) decapsulates a file and / or segment containing point cloud data. The decapsulation module according to the embodiments can perform the reverse process of the encapsulation process according to the embodiments.
[0059] A point cloud video decoder (Point Cloud Decoder, 10007) decodes received point cloud video data. The decoder according to the embodiments can perform the reverse process of encoding according to the embodiments.
[0060] A renderer (Renderer, 10007) renders decoded point cloud video data. According to embodiments, the renderer (10007) may transmit feedback information obtained from the receiving end to a point cloud video decoder (10006). According to embodiments, the point cloud video data may transmit feedback information to a receiver. According to embodiments, feedback information received by the point cloud transmission device may be provided to a point cloud video encoder.
[0061] The arrows indicated by dotted lines in the drawing represent the transmission path of feedback information obtained from the receiving device (10005). The feedback information is information intended to reflect interaction with a user consuming point cloud content, and includes user information (e.g., head orientation information), viewport information, etc. In particular, if the point cloud content is content for a service requiring interaction with a user (e.g., autonomous driving service, etc.), the feedback information may be transmitted to the content transmitting side (e.g., the transmitting device (10000)) and / or the service provider. Depending on the embodiments, the feedback information may be used in the receiving device (10005) as well as the transmitting device (10000), or it may not be provided.
[0062] Head orientation information according to the embodiments is information regarding the user's head position, direction, angle, movement, etc. The receiving device (10005) according to the embodiments can calculate viewport information based on the head orientation information. Viewport information is information about the area of the point cloud video that the user is looking at. The viewpoint refers to the point where the user is looking at the point cloud video, and may mean the exact center point of the viewport area. That is, the viewport is an area centered on the viewpoint, and the size and shape of the area can be determined by the FOV (Field Of View). Therefore, the receiving device (10004) can extract viewport information based on the vertical or horizontal FOV supported by the device in addition to the head orientation information. In addition, the receiving device (10005) performs gaze analysis, etc., to check the user's point cloud consumption method, the point cloud video area the user is looking at, the gaze time, etc. According to embodiments, the receiving device (10005) may transmit feedback information including gaze analysis results to the transmitting device (10000). According to embodiments, the feedback information may be obtained during the rendering and / or display process. According to embodiments, the feedback information may be obtained by one or more sensors included in the receiving device (10005). Also, according to embodiments, the feedback information may be obtained by the renderer (10009) or a separate external element (or device, component, etc.). The dotted line in FIG. 1 indicates the process of transmitting the feedback information obtained from the renderer (10009). The point cloud content providing system may process (encode / decode) point cloud data based on the feedback information. Accordingly, the point cloud video data decoder (10008) may perform a decoding operation based on the feedback information.Additionally, the receiving device (10005) can transmit feedback information to the transmitting device. The transmitting device (or point cloud video data encoder (10002)) can perform an encoding operation based on the feedback information. Thus, the point cloud content providing system can efficiently process necessary data (e.g., point cloud data corresponding to the user's head position) based on the feedback information without processing (encoding / decoding) all point cloud data, and provide point cloud content to the user.
[0063] According to embodiments, the transmission device (10000) may be referred to as an encoder, transmission device, transmitter, etc., and the receiving device (10004) may be referred to as a decoder, receiving device, receiver, etc.
[0064] Point cloud data processed in the point cloud content providing system of FIG. 1 according to embodiments (processed through a series of processes of acquisition / encoding / transmission / decoding / rendering) may be referred to as point cloud content data or point cloud video data. According to embodiments, point cloud content data may be used as a concept including metadata or signaling information related to point cloud data.
[0065] The elements of the point cloud content delivery system illustrated in FIG. 1 can be implemented using hardware, software, processors, and / or combinations thereof.
[0066] The embodiments may provide point cloud content to provide various services to users, such as VR (Virtual Reality), AR (Augmented Reality), MR (Mixed Reality), and autonomous driving services.
[0067] To provide Point Cloud content services, Point Cloud video may first be acquired. The acquired Point Cloud video is transmitted after undergoing a series of processes, and the receiving end can process the received data back into the original Point Cloud video and render it. Through this, the Point Cloud video can be provided to the user. The embodiments provide the necessary methods to effectively carry out this series of processes.
[0068] The entire process for providing Point Cloud content services (point cloud data transmission method and / or point cloud data reception method) may include an acquisition process, an encoding process, a transmission process, a decoding process, a rendering process and / or a feedback process.
[0069] According to the embodiments, the process of providing point cloud content (or point cloud data) may be referred to as a point cloud compression process. According to the embodiments, the point cloud compression process may mean a geometry-based point cloud compression process.
[0070] Each element of the point cloud data transmission device and the point cloud data receiving device according to the embodiments may mean hardware, software, a processor and / or a combination thereof, etc.
[0071] In order to provide Point Cloud content services, Point Cloud video may first be acquired. The acquired Point Cloud video is transmitted after undergoing a series of processes, and the receiving end can process the received data back into the original Point Cloud video and render it. Through this, the Point Cloud video can be provided to the user. The present invention provides a method necessary to effectively carry out this series of processes.
[0072] The entire process for providing Point Cloud content services may include an acquisition process, an encoding process, a transmission process, a decoding process, a rendering process, and / or a feedback process.
[0073] A Point Cloud Compression system may include a transmission device and a receiving device. The transmission device may encode Point Cloud video to output a bitstream and transmit it to the receiving device via a digital storage medium or network in the form of a file or streaming (streaming segment). The digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, and SSD.
[0074] The transmission device may schematically include a Point Cloud video acquisition unit, a Point Cloud video encoder, a file / segment encapsulation unit, and a transmission unit. The receiving device may schematically include a receiving unit, a file / segment decapsulation unit, a Point Cloud video decoder, and a renderer. The encoder may be referred to as a Point Cloud video / image / picture / frame encoding device, and the decoder may be referred to as a Point Cloud video / image / picture / frame decoding device. The transmitter may be included in the Point Cloud video encoder. The receiver may be included in the Point Cloud video decoder. The renderer may include a display unit, and the renderer and / or the display unit may be composed of separate devices or external components. The transmission device and the receiving device may further include separate internal or external modules / units / components for the feedback process.
[0075] According to the embodiments, the operation of the receiving device may follow the reverse process of the operation of the transmitting device.
[0076] The Point Cloud video acquisition unit can perform a process of acquiring Point Cloud video through the capture, synthesis, or generation process of Point Cloud video. Through the acquisition process, 3D position (x, y, z) / attribute (color, reflectance, transparency, etc.) data for multiple points, such as PLY (Polygon File format or the Stanford Triangle format) files, may be generated. In the case of a video with multiple frames, one or more files may be acquired. Point cloud-related metadata (e.g., metadata related to the capture, etc.) may be generated during the capture process.
[0077] A point cloud data transmission device according to the embodiments may include an encoder that encodes point cloud data; and a transmitter that transmits point cloud data. Additionally, it may be transmitted in the form of a bit stream containing a point cloud.
[0078] A point cloud data receiving device according to embodiments may include a receiving unit for receiving point cloud data; a decoder for decoding point cloud data; and a renderer for rendering point cloud data.
[0079] The method / device according to the embodiments represents a point cloud data transmitting device and / or a point cloud data receiving device.
[0080] FIG. 2 shows an example of point cloud data capture according to embodiments.
[0081] Point cloud data according to the embodiments can be acquired by a camera, etc. The capture method according to the embodiments may include, for example, inward-facing and / or outward-facing.
[0082] Inward-facing according to the embodiments allows one or more cameras to photograph an object of point cloud data from the outside to the inside of the object.
[0083] Outward-facing according to the embodiments allows one or more cameras to photograph an object of point cloud data from the inside out. For example, according to the embodiments, there may be four cameras.
[0084] The point cloud data or point cloud content according to the embodiments may be a video or still image of an object / environment expressed in various forms of 3D space. According to the embodiments, the point cloud content may include video / audio / image, etc. of an object.
[0085] To capture Point Cloud content, the system may be composed of camera equipment capable of acquiring depth (a combination of an infrared pattern projector and an infrared camera) and a combination of RGB cameras capable of extracting color information corresponding to the depth information. Alternatively, depth information can be extracted using LiDAR, which utilizes a radar system that measures the position coordinates of a reflector by emitting a laser pulse and measuring the time it takes for it to reflect back. From the depth information, the shape of the geometry composed of points in 3D space can be extracted, and from the RGB information, attributes representing the color / reflection of each point can be extracted. Point Cloud content can consist of position (x, y, z) and color (YCbCr or RGB) or reflectance (r) information for the points. Point Cloud content may utilize an outward-facing method that captures the external environment and an inward-facing method that captures the central object. When constructing Point Cloud content that allows users to freely view objects (e.g., characters, players, objects, actors, etc.) in a VR / AR environment in a 360-degree manner, the capture camera configuration may use an inward-facing method. When constructing Point Cloud content of the current surrounding environment in a vehicle, such as in autonomous driving, the capture camera configuration may use an outward-facing method. Since Point Cloud content can be captured through multiple cameras, a camera calibration process may be required before capturing the content to establish a global spatial coordinate system between the cameras.
[0086] Point Cloud content may be a video or still image of an object / environment displayed in various forms of 3D space.
[0087] Additionally, regarding methods for acquiring Point Cloud content, arbitrary Point Cloud videos can be synthesized based on captured Point Cloud videos. Alternatively, if the intention is to provide Point Cloud videos for a computer-generated virtual space, capture via a physical camera may not be performed. In this case, the capture process can be substituted by simply generating the relevant data.
[0088] Captured Point Cloud videos may require post-processing to improve content quality. While maximum and minimum depth values can be adjusted within the range provided by the camera equipment during the video capture process, unwanted point data may still be included; therefore, post-processing can be performed to remove unwanted areas (e.g., backgrounds) or to recognize connected spaces and fill in spatial holes. Additionally, Point Clouds extracted from cameras sharing a spatial coordinate system can be integrated into a single piece of content by converting each point to a global coordinate system based on the position coordinates of each camera obtained through a calibration process. This allows for the creation of a single, wide-ranging Point Cloud content or the acquisition of Point Cloud content with a high density of points.
[0089] A Point Cloud video encoder can encode input Point Cloud video into one or more video streams. A single video may contain multiple frames, and a single frame may correspond to a still image or picture. In this document, the term "Point Cloud video" may include Point Cloud images, frames, pictures, video, audio, images, etc., and the terms Point Cloud video may be used interchangeably with Point Cloud images, frames, or pictures. A Point Cloud video encoder can perform Video-based Point Cloud Compression (V-PCC) procedures. To improve compression and coding efficiency, a Point Cloud video encoder can perform a series of procedures such as prediction, transformation, quantization, and entropy coding. The encoded data (encoded video / video information) can be output in the form of a bitstream. Based on the V-PCC procedure, the Point Cloud video encoder can encode the Point Cloud video by dividing it into geometry video, attribute video, occupancy map video, and auxiliary information as described below. The geometry video may include geometry images, the attribute video may include attribute images, and the occupancy map video may include occupancy map images. The auxiliary information may include auxiliary patch information. The attribute video / image may include texture videos / images.
[0090] The encapsulation processing unit (file / segment encapsulation module, 10003) can encapsulate encoded point cloud video data and / or point cloud video-related metadata into a file or the like. Here, the point cloud video-related metadata may be received from a metadata processing unit or the like. The metadata processing unit may be included in the point cloud video encoder or may be configured as a separate component / module. The encapsulation processing unit can encapsulate the data into a file format such as ISOBMFF or process it into other forms such as DASH segments. According to the embodiment, the encapsulation processing unit may include point cloud video-related metadata in the file format. Point cloud video metadata may be included, for example, in boxes at various levels within the ISOBMFF file format or as data within a separate track within the file. According to the embodiment, the encapsulation processing unit may encapsulate the point cloud video-related metadata itself into a file. The transmission processing unit may apply processing for transmission to the point cloud video data encapsulated according to the file format. The transmission processing unit may be included in the transmission unit or may be configured as a separate component / module. The transmission processing unit may process point cloud video data according to any transmission protocol. Processing for transmission may include processing for delivery via a broadcast network and processing for delivery via broadband. According to an embodiment, the transmission processing unit may receive point cloud video-related metadata from the metadata processing unit in addition to point cloud video data, and apply processing for transmission to it.
[0091] The transmission unit (10004) can transmit encoded video / image information or data output in the form of a bitstream to a receiving unit of a receiving device via a digital storage medium or network in the form of a file or streaming. The digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. The transmission unit may include elements for creating a media file through a predetermined file format and elements for transmission via a broadcasting / communication network. The receiving unit may extract the bitstream and transmit it to a decoding device.
[0092] The receiver (10003) can receive point cloud video data transmitted by the point cloud video transmission device according to the present invention. Depending on the transmission channel, the receiver may receive point cloud video data through a broadcasting network or through broadband. Alternatively, it may receive point cloud video data through a digital storage medium.
[0093] The receiving processing unit can perform processing on the received point cloud video data according to the transmission protocol. The receiving processing unit may be included in the receiving unit or may be configured as a separate component or module. Corresponding to the processing for transmission performed on the transmitting side, the receiving processing unit may perform the reverse process of the aforementioned transmission processing unit. The receiving processing unit may transmit the acquired point cloud video data to the decapsulation processing unit and the acquired point cloud video-related metadata to the metadata parser. The point cloud video-related metadata acquired by the receiving processing unit may be in the form of a signaling table.
[0094] The decapsulation processing unit (file / segment decapsulation module, 10007) can decapsulate point cloud video data in file form received from the receiving processing unit. The decapsulation processing unit can decapsulate files according to ISOBMFF, etc., to obtain a point cloud video bitstream or point cloud video-related metadata (metadata bitstream). The obtained point cloud video bitstream can be transmitted to a point cloud video decoder, and the obtained point cloud video-related metadata (metadata bitstream) can be transmitted to a metadata processing unit. The point cloud video bitstream may include metadata (metadata bitstream). The metadata processing unit may be included in the point cloud video decoder or may be configured as a separate component / module. The point cloud video-related metadata obtained by the decapsulation processing unit may be in the form of boxes or tracks within the file format. If necessary, the decapsulation processing unit may receive metadata required for decapsulation from the metadata processing unit. Point cloud video-related metadata may be passed to a point cloud video decoder and used in the point cloud video decoding process, or passed to a renderer and used in the point cloud video rendering process.
[0095] A Point Cloud video decoder can receive a bitstream as input and perform an operation corresponding to the operation of a Point Cloud video encoder to decode video / images. In this case, the Point Cloud video decoder can decode the Point Cloud video by dividing it into geometry video, attribute video, occupancy map video, and auxiliary information as described below. Geometry video may include geometry images, attribute video may include attribute images, and occupancy map video may include occupancy map images. Auxiliary information may include auxiliary patch information. Attribute video / image may include texture video / image.
[0096] 3D geometry is restored using the decoded geometry image, occupancy map, and additional patch information, and can subsequently undergo a smoothing process. A color point cloud image / picture can be restored by assigning color values to the smoothed 3D geometry using a texture image. The renderer can render the restored geometry and the color point cloud image / picture. The rendered video / image can be displayed through a display unit. The user can view all or part of the rendered result through a VR / AR display or a standard display.
[0097] The feedback process may include the process of transmitting various feedback information, which can be obtained during the rendering / display process, to the transmitting side or to the decoder of the receiving side. Interactivity in Point Cloud video consumption may be provided through the feedback process. According to an embodiment, head orientation information, viewport information indicating the area the user is currently viewing, etc., may be transmitted during the feedback process. According to an embodiment, the user may interact with elements implemented in a VR / AR / MR / autonomous driving environment, and in this case, information related to such interaction may be transmitted to the transmitting side or the service provider side during the feedback process. According to an embodiment, the feedback process may not be performed.
[0098] Head orientation information can refer to information regarding the user's head position, angle, movement, etc. Based on this information, viewport information—that is, information about the area the user is currently viewing within the Point Cloud video—can be calculated.
[0099] Viewport information may be information about the area currently being viewed by the user in the Point Cloud video. Through this, gaze analysis can be performed to determine how the user consumes the Point Cloud video and which areas of the video they gaze at for how long. Gaze analysis may be performed at the receiving end and transmitted to the transmitting end via a feedback channel. Devices such as VR / AR / MR displays can extract the viewport area based on the user's head position / orientation, the vertical or horizontal FOV supported by the device, etc.
[0100] According to an embodiment, the aforementioned feedback information may not only be transmitted to the transmitting side but may also be consumed at the receiving side. That is, decoding and rendering processes at the receiving side may be performed using the aforementioned feedback information. For example, using head orientation information and / or viewport information, only the Point Cloud video of the area currently viewed by the user may be preferentially decoded and rendered.
[0101] Here, the viewport or viewport area may refer to the area that the user is viewing in the Point Cloud video. The viewpoint is the point that the user is viewing in the Point Cloud video, and may refer to the exact center point of the viewport area. In other words, the viewport is an area centered on the viewpoint, and the size and shape of that area can be determined by the Field of View (FOV).
[0102] This document relates to Point Cloud video compression as described above. For example, the methods / executions disclosed in this document may be applied to the MPEG (Moving Picture Experts Group) PCC (point cloud compression or point cloud coding) standard or next-generation video / image coding standards.
[0103] In this document, "picture" or "frame" generally refers to a unit representing a single image of a specific time period.
[0104] A pixel or pel may refer to the smallest unit that constitutes a picture (or image). Additionally, the term 'sample' may be used as a counterpart to pixel. A sample can generally represent a pixel or a pixel value, and may represent only the pixel / pixel value of the lumina component, only the pixel / pixel value of the chroma component, or only the pixel / pixel value of the depth component.
[0105] A unit may represent a basic unit of image processing. A unit may include at least one of a specific area of a picture and information related to that area. Depending on the case, the term unit may be used interchangeably with terms such as block or area. In general, an MxN block may include samples (or sample arrays) or a set (or array) of transform coefficients consisting of M columns and N rows.
[0106] FIG. 3 shows examples of point clouds, geometry, and texture images according to embodiments.
[0107] The point cloud according to the embodiments can be input into the V-PCC encoding process of FIG. 4, which will be described later, to generate a geometry image and a texture image. According to the embodiments, the point cloud can be used with the same meaning as point cloud data.
[0108] As shown in the drawing, the left side is a point cloud, representing a point cloud where an object is located in 3D space and can be represented by a bounding box, etc. The middle side represents geometry, and the right side represents a texture image (non-padding).
[0109] Video-based Point Cloud Compression (V-PCC) can provide a method for compressing 3D point cloud data based on 2D video codecs such as HEVC and VVC. The following data and information can be generated during the V-PCC compression process.
[0110] Occupancy map: Represents a binary map that indicates whether data exists at a corresponding location on a 2D plane with a value of 0 or 1 when dividing the points forming a point cloud into patches and mapping them onto a 2D plane. The occupancy map represents a 2D array corresponding to an atlas, and the value of the occupancy map can indicate whether each sample position within the atlas corresponds to a 3D point.
[0111] An atlas is a set of 2D bounding boxes located in rectangular frames corresponding to 3D bounding boxes in the 3D space where volumetric data is rendered, and related information.
[0112] An atlas bitstream is a bitstream of one or more atlas frames and associated data that make up an atlas.
[0113] An atlas frame is a 2D rectangular array of atlas samples onto which patches are projected.
[0114] An atlas sample is the position of a rectangular frame onto which patches associated with the atlas are projected.
[0115] An atlas frame can be divided into tiles. A tile is a unit that divides a 2D frame. In other words, a tile is a unit that divides the signaling information of point cloud data called an atlas.
[0116] Patch: A set of points that make up a point cloud, where points belonging to the same patch are adjacent to each other in 3D space and are mapped in the same direction among the 6 bounding box planes during the mapping process to a 2D image.
[0117] Geometry image: Represents an image in the form of a depth map that expresses the positional information (geometry) of each point forming a point cloud in patch units. A geometry image can consist of 1-channel pixel values. Geometry represents a set of coordinates associated with a point cloud frame.
[0118] Texture image: Represents an image that expresses color information of each point forming a point cloud in patch units. The texture image may consist of pixel values of multiple channels (e.g., 3 channels R, G, B). The texture is included in the attribute. Depending on the embodiments, the texture and / or attribute may be interpreted as having the same object and / or inclusion relationship.
[0119] Auxiliary patch info: Represents metadata necessary to reconstruct a point cloud from individual patches. Auxiliary patch info may include information about the location, size, etc. of the patch in 2D / 3D space.
[0120] Point cloud data according to the embodiments, for example, V-PCC components, may include atlases, accusation maps, geometry, attributes, etc.
[0121] An atlas represents a set of 2D bounding boxes. It can be patches, for example, patches projected onto a rectangular frame. It can also correspond to 3D bounding boxes in 3D space and represent a subset of a point cloud.
[0122] An attribute represents a scalar or vector associated with each point in the point cloud, and may include, for example, color, reflectance, surface normal, time stamps, material ID, etc.
[0123] The point cloud data according to the embodiments represents PCC data based on the V-PCC (Video-based Point Cloud Compression) method. The point cloud data may include a plurality of components. For example, it may include an accusation map, a patch, geometry and / or texture, etc.
[0124] FIG. 4 shows a V3C bitstream according to embodiments.
[0125] In V-PCC, point cloud content can be encoded into a V3C bitstream structure. Figure 4 illustrates the V3C bitstream structure used when encoding V3C content in the ISO / IEC 23090-5 V3C codec document. A V3C unit consists of a V3C unit header and a payload, and the V3C unit type may be V3C_VPS, V3C_AD, V3C_GVD, V3C_AVD, V3C_OVD, etc. VPS is a V3C parameter set containing V3C and V-GSC parameter information. AD is atlas data containing V3C atlas information. GVD is Geometry Video Data, which includes a geometry video sub-bitstream and related information. AVD is Attribute Video Data, which includes an attribute video sub-bitstream and related information. OVD is Occupancy Video Data, which includes an occupancy video sub-bitstream and related information. PVD is Packed Video Data, which includes a packed video sub-bitstream and related information.
[0126] FIG. 5 illustrates a V-PCC encoding procedure according to embodiments.
[0127] The V-PCC encoding procedure according to the embodiments includes the steps of generating and compressing an occupancy map, a geometry image, an attribute image, and auxiliary patch information. Hereinafter, each step is described with reference to each figure.
[0128] FIG. 6 illustrates a 3D patch generation process according to embodiments.
[0129] 3D Patch generation
[0130] The 3D patch generation process refers to the process of dividing a point cloud into patches, which are units for performing mapping, in order to map the point cloud onto a 2D image. The 3D patch generation process can be divided into three stages: normal value calculation, segmentation, and patch division.
[0131] FIG. 7 shows examples of a tangent plane and a normal vector of a surface according to embodiments.
[0132] The surface of Fig. 7 is used as follows in the patch generation process of the V-PCC encoding process of Fig. 5.
[0133] Normal calculation regarding patch generation:
[0134] Each point (e.g., a point) forming a point cloud has a unique orientation, which is represented by a 3D vector called a normal. By utilizing the neighbors of each point obtained using tools such as a KD tree, the tangent plane and normal vector of each point forming the surface of the point cloud, as shown in the drawing, can be determined. The search range during the process of finding neighbors can be defined by the user.
[0135] Tangent plane: Represents a plane that passes through a point on a surface and completely contains the tangent to a curve on the surface.
[0136] FIG. 8 shows an example of a bounding box of a point cloud according to embodiments.
[0137] According to the embodiments, a method / device, for example, may use a bounding box in the process of patch generation generating a patch from point cloud data.
[0138] A bounding box according to the embodiments refers to a unit box that divides point cloud data based on a cuboid in 3D space.
[0139] A bounding box can be used in the process of projecting an object that is the target of point cloud data onto the plane of each cuboid based on a cuboid in 3D space. The bounding box can be generated and processed by the point cloud video acquisition unit (10000) and the point cloud video encoder (10002) of FIG. 1. Additionally, based on the bounding box, patch generation (40000), patch packing (40001), geometry image generation (40002), and texture image generation (40003) of the V-PCC encoding process of FIG. 2 can be performed.
[0140] Segmentation regarding patch generation
[0141] Segmentation consists of two processes: initial segmentation and refine segmentation.
[0142] A point cloud encoder (10002) according to the embodiments projects points onto one side of a bounding box. Specifically, each point forming the point cloud is projected onto one of the six sides of a bounding box that encloses the point cloud as shown in the drawing, and initial segmentation is a process of determining one of the planes of the bounding box on which each point will be projected.
[0143] The normal values corresponding to each of the 6 planes is defined as follows.
[0144] (1.0, 0.0, 0.0), (0.0, 1.0, 0.0), (0.0, 0.0, 1.0), (-1.0, 0.0, 0.0), (0.0, -1.0, 0.0), (0.0, 0.0, -1.0).
[0145] The normal values of each point obtained in the previous normal value calculation process as shown in the following formula ( )class The plane with the maximum dot product is determined as the projection plane of that plane. That is, the plane having a normal in the direction most similar to the point's normal is determined as the projection plane of that point.
[0146]
[0147] The determined plane can be identified by a value in the form of an index (cluster index) of 0 to 5.
[0148] Refine segmentation is the process of improving the projection plane of each point forming the point cloud, determined in the preceding initial segmentation process, by considering the projection planes of adjacent points. In this process, a score normal—which represents the degree of similarity between the normal of each point considered for determining the projection plane in the initial segmentation process and the normal value of each plane of the bounding box—can be considered simultaneously with a score smooth—which indicates the degree of agreement between the projection plane of the current point and the projection planes of adjacent points.
[0149] Score smoothing can be considered by assigning weights to score normals, and the weight values can be defined by the user. Refine segmentation can be performed iteratively, and the number of iterations can also be defined by the user.
[0150] Patch division (segment patches) regarding patch generation
[0151] Patch segmentation is the process of dividing the entire point cloud into patches, which are sets of adjacent points, based on the projection plane information of each point forming the point cloud obtained during the preceding initial / refine segmentation process. Patch segmentation can consist of the following steps.
[0152] 1) Calculate the adjacent points of each point forming the point cloud using a KD tree, etc. The maximum number of adjacent points can be defined by the user.
[0153] 2) If adjacent points are projected onto the same plane as the current point (if they have the same cluster index value), the current point and the adjacent points are extracted as a single patch.
[0154] 3) Calculate the geometry values of the extracted patch. The detailed process is explained below.
[0155] 4) Repeat steps 2–4 until any unextracted points are gone.
[0156] Through the patch partitioning process, the size of each patch and the occupancy map, geometry image, texture image, etc., for each patch are determined.
[0157] To encode with better quality, you can additionally select normal values corresponding to each of the 12 corners, in addition to the plane. The normal values at that time is defined as follows.
[0158]
[0159]
[0160]
[0161] FIG. 9 shows an additional projection plane for improving visual quality according to embodiments.
[0162] The normal values of each point obtained in the previous normal value calculation process as shown in the following formula ( )class The plane with the maximum dot product is determined as the projection plane of that plane. That is, the plane having a normal in the direction most similar to the point's normal is determined as the projection plane of that point.
[0163]
[0164] The determined plane can be identified by a value in the form of an index (cluster index) of 0 to 5.
[0165] Refine segmentation is the process of improving the projection plane of each point forming the point cloud, determined in the preceding initial segmentation process, by considering the projection planes of adjacent points. In this process, a score normal—which represents the degree of similarity between the normal of each point considered for determining the projection plane in the initial segmentation process and the normal value of each plane of the bounding box—can be considered simultaneously with a score smooth—which indicates the degree of agreement between the projection plane of the current point and the projection planes of adjacent points.
[0166] Score smoothing can be considered by assigning weights to score normals, and the weight values can be defined by the user. Refine segmentation can be performed iteratively, and the number of iterations can also be defined by the user.
[0167] FIG. 10 shows an example of the difference in occupancy map according to the size of the occupancy packing block according to the embodiments.
[0168] FIG. 11 shows an example of determining the location of individual patches of an occupancy map according to embodiments.
[0169] The point cloud encoder (10002) according to the embodiments can generate patch packing and accupan maps.
[0170] Patch packing and occupancy map generation (40001)
[0171] This process determines the location of individual patches within a 2D image in order to map the previously divided patches onto a single 2D image. An occupancy map is a type of 2D image that is a binary map indicating whether data exists at a given location using a value of 0 or 1. An occupancy map consists of blocks, and its resolution can be determined by the size of the blocks; for example, if the block size is 1x1, it has a resolution in pixel units. The block size (occupancy packing block size) can be determined by the user.
[0172] The process of determining the location of individual patches within an occupancy map can be structured as follows.
[0173] 1) Set all values in the entire occupancy map to 0.
[0174] 2) Place the patch at point (u, v) on the occupancy map plane, where the horizontal coordinates are within the range [0, occupancySizeU - patch.sizeU0) and the vertical coordinates are within the range [0, occupancySizeV - patch.sizeV0).
[0175] 3) Set the current point (x, y) that exists on the patch plane and is within the range of horizontal coordinates [0, patch.sizeU0) and vertical coordinates [0, patch.sizeV0).
[0176] 4) For point (x, y), if the (x, y) coordinate value of the patch occupancy map is 1 (data exists at the corresponding point within the patch) and the (u+x, v+y) coordinate value of the entire occupancy map is 1 (the occupancy map is filled by the previous patch), change the (x, y) position in raster order and repeat the process of 3-4. Otherwise, perform the process of 6.
[0177] 5) Repeat steps 3-5 by changing the (u, v) position in raster order.
[0178] 6) Determine (u, v) as the location of the patch, and copy the patch's occupancy map data to the corresponding part of the entire occupancy map.
[0179] 7) Repeat the process of 2-7 for the next patch.
[0180] Occupancy Size U: Represents the width of the occupancy map, and the unit is the occupancy packing block size.
[0181] Occupancy Size V: Represents the height of the occupancy map, and the unit is the occupancy packing block size.
[0182] Patch size U0 (patch.sizeU0): Represents the width of the occupancy map, and the unit is the occupancy packing block size.
[0183] Patch size V0 (patch.sizeV0): Represents the height of the occupancy map, and the unit is the occupancy packing block size.
[0184] Raster order: The process of progressively covering an area one line at a time. It is similar to the direction in which the eyes move when reading a book.
[0185] For example, as shown in FIG. 11, there is a box corresponding to a patch having a patch size within a box corresponding to an accu-pan packing size block, and a point (x, y) can be located within the box.
[0186] FIG. 12 shows an example of the relationship between the normal, tangent, and bitangent axes according to the embodiments.
[0187] A point cloud encoder (10002) according to the embodiments can generate a geometry image. A geometry image refers to image data containing geometry information of a point cloud. The geometry image generation process can utilize the three axes (normal, tangent, and bitangent) of the patch of FIG. 8.
[0188] Geometry image generation (40002)
[0189] In this process, depth values constituting the geometry image of individual patches are determined, and the overall geometry image is generated based on the patch positions determined in the previous patch packing process. The process of determining the depth values constituting the geometry image of individual patches can be structured as follows.
[0190] 1) Parameters related to the location and size of individual patches are calculated. The parameters may include the following information.
[0191] Index representing the normal axis: The normal is obtained during the patch generation process mentioned earlier; the tangent axis is the axis perpendicular to the normal that coincides with the horizontal (u) axis of the patch image; and the bitangent axis is the axis perpendicular to the normal that coincides with the vertical (v) axis of the patch image; the three axes can be represented as shown in the diagram.
[0192] FIG. 13 shows an example of the configuration of the minimum mode and maximum mode of the projection mode according to the embodiments.
[0193] The point cloud encoder (10002) according to the embodiments can perform patch-based projection to generate a geometry image, and the projection modes according to the embodiments include a minimum mode and a maximum mode.
[0194] 3D spatial coordinates of the patch: These can be calculated through a bounding box of the smallest size that encloses the patch. For example, they may include the minimum value in the patch's tangent direction (patch 3d shift tangent axis), the minimum value in the patch's bitangent direction (patch 3d shift bitangent axis), the minimum value in the patch's normal direction (patch 3d shift normal axis), etc.
[0195] 2D size of a patch: Represents the horizontal and vertical dimensions when the patch is packed into a 2D image. The horizontal size (patch 2d size u) can be calculated as the difference between the maximum and minimum values of the bounding box's tangent direction, and the vertical size (patch 2d size v) can be calculated as the difference between the maximum and minimum values of the bounding box's bitangent direction.
[0196] 2) Determine the projection mode of the patch. The projection mode can be either a min mode or a max mode. The geometry information of the patch is represented by depth values, and when projecting each point forming the patch in the normal direction of the patch, two layers of images can be generated: one composed of the maximum depth value and the other composed of the minimum depth value.
[0197] When generating two layers of images d0 and d1, in min mode, the minimum depth is configured in d0 as shown in the drawing, and the maximum depth within the surface thickness from the minimum depth can be configured in d1.
[0198] For example, when a point cloud is located in 2D as shown in the drawing, there may be multiple patches containing multiple points. As shown in the drawing, points marked with the same style of shading indicate that they may belong to the same patch. The drawing illustrates the process of projecting a patch of points marked as blank.
[0199] When projecting points marked with blank spaces to the left / right, numbers for calculating the depth of the points can be displayed to the right by increasing the depth by 1 from the left side, such as 0, 1, 2, ..6, 7, 8, 9.
[0200] The projection mode can be customized so that the same method is applied to all point clouds, or applied differently per frame or patch. If different projection modes are applied per frame or patch, a projection mode that can increase compression efficiency or minimize missing points can be adaptively selected.
[0201] 3) Calculate the depth values of individual points.
[0202] In Min mode, the d0 image is constructed using depth0, which is the value obtained by subtracting the patch's normal direction minimum (patch 3d shift normal axis) calculated in step 1 from the patch's normal direction minimum (patch 3d shift normal axis) from the normal axis minimum of each point. If another depth value exists at the same location within the range of depth0 and surface thickness, this value is set as depth1. If no other value exists, the value of depth0 is assigned to depth1 as well. The d1 image is constructed using the depth1 value.
[0203] For example, when determining the depth of the points of d0, the minimum value may be calculated (4 2 4 4 0 6 0 0 9 9 0 8 0). Also, when determining the depth of the points of d1, the larger value among two or more points may be calculated, or if there is only one point, that value may be calculated (4 4 4 4 6 6 6 8 9 9 8 8 9). Furthermore, some points may be lost during the process of encoding and reconstructing the patch points (for example, 8 points were lost in the drawing).
[0204] In Max mode, the d0 image is constructed using depth0, which is the value obtained by subtracting the patch's normal direction minimum (patch 3d shift normal axis) calculated in step 1 from the patch's normal direction minimum (patch 3d shift normal axis) from the normal direction minimum (patch 3d shift normal axis) of each point. If another depth value exists at the same location within the range of depth0 and surface thickness, this value is set as depth1. If it does not exist, the value of depth0 is assigned to depth1 as well. The d1 image is constructed using the depth1 value.
[0205] For example, when determining the depth of the points of d0, the maximum value may be calculated (4 4 4 4 6 6 6 8 9 9 8 8 9). Also, when determining the depth of the points of d1, the smaller value among two or more points may be calculated, or if there is only one point, that value may be calculated (4 2 4 4 5 6 0 6 9 9 0 8 0). Additionally, some points may be lost during the process of encoding and reconstructing the patch points (for example, 6 points were lost in the drawing).
[0206] The entire geometry image can be generated by placing the geometry images of individual patches generated through the above process onto the entire geometry image using the patch position information determined in the previous patch packing process.
[0207] The d1 layer of the generated entire geometry image can be encoded in several ways. The first is to encode the depth values of the previously generated d1 image as they are (absolute d1 method). The second is to encode the difference between the depth values of the previously generated d1 image and the depth values of the d0 image (differential method).
[0208] Since the encoding method using depth values of two layers, d0 and d1, loses geometry information of points when other points exist between the two depths during the encoding process, Enhanced-Delta-Depth (EDD) code may be used for lossless coding.
[0209] Referring to Fig. 14, the EDD code is explained in detail.
[0210] FIG. 14 shows an example of an EDD code according to the embodiments.
[0211] A point cloud encoder (10002) and / or part / all of the V-PCC encoding process (e.g., video compression (40009)) can encode geometric information of points based on EOD codes.
[0212] The EDD code is a method of binary encoding the locations of all points within the surface thickness range, including d1, as shown in the drawing. For example, in the case of points included in the second column from the left of the drawing, points exist at the first and fourth positions above D0, while the second and third positions are empty, so they can be represented by the EDD code 0b1001 (=9). If the EDD code is encoded and sent along with D0, the receiving end can restore the geometry information of all points without loss.
[0213] For example, if a point exists above a reference point, it is 1, and if no point exists, it is 0, so the code can be represented based on 4 bits.
[0214] FIG. 15 shows the patch boundaries of a smoothing point cloud and a trilinear filter according to the embodiments.
[0215] Smoothing (Smoothing, 40004)
[0216] Smoothing is a process designed to eliminate discontinuities that may occur at patch boundaries due to image quality degradation during the compression process, and it is used to improve the visual quality of a reconstructed point cloud by filtering the patch boundaries.
[0217] The smoothing of the point cloud is applied to the edges of each patch as shown in Fig. 14, and the centroids of the decoded points are pre-calculated for each grid. Then, after deriving the centroids and the number of points within the 2x2x2 grid, a trilinear filter is applied. If the output calculated by applying the filter is greater than a set threshold, the point coordinates are moved to that output value, and if it is smaller than the threshold, the original position is maintained.
[0218] FIG. 16 illustrates an example of recoloring using color values of adjacent points according to embodiments.
[0219] The point cloud encoder or texture image generator (40003) according to the embodiments can generate a texture image based on recoloring.
[0220] Attribute image generation (Attribute image generation, 40003)
[0221] The process of generating an attribute image is similar to the geometry image generation process described earlier, consisting of generating attribute images of individual patches and placing them at determined locations to generate the entire attribute image. However, in the process of generating attribute images of individual patches, an image is generated that has color values (e.g., R, G, B) of points constituting the point cloud corresponding to the location, instead of depth values for geometry generation.
[0222] In the process of determining the color values of each point constituting the point cloud, the geometry that has undergone the smoothing process mentioned earlier may be used. Since the smoothed point cloud may have shifted the positions of some points compared to the original point cloud, a recoloring process may be necessary to find colors suitable for the changed locations. Recoloring can be performed using the color values of adjacent points. For example, as shown in the drawing, a new color value can be calculated by considering the color value of the nearest point and the color values of adjacent points.
[0223] For example, referring to the drawing, recoloring can calculate a suitable color value for the changed location based on the average of the attribute information of the nearest original points to the point and / or the average of the attribute information of the nearest original location to the point.
[0224] FIG. 17 shows an example of an attribute interleaving process according to embodiments.
[0225] Attribute images can also be generated with two layers, c0 and c1, just like geometry images generated with two layers, d0 and d1. Using these attributes, the Interleaved attribute image generation process is carried out, and missing attribute values can be predicted by averaging neighboring values in the same attribute layer using the following formula.
[0226]
[0227]
[0228] Auxiliary patch info compression (40005)
[0229] A point cloud encoder or oscillary patch information compressor according to the embodiments can compress oscillary patch information (additional information regarding the point cloud).
[0230] The oscillary patch information compressor compresses additional patch information generated during the previously described patch generation, patch packing, and geometry generation processes. The additional patch information may include the following parameters:
[0231] An index (cluster index) that identifies the projection plane (normal)
[0232] Patch's 3D spatial position: Patch's tangent minimum (patch 3d shift tangent axis), Patch's bitangent minimum (patch 3d shift bitangent axis), Patch's normal minimum (patch 3d shift normal axis)
[0233] Patch's 2D spatial position, size: horizontal size (patch 2d size u), vertical size (patch 2d size v), horizontal minimum (patch 2d shift u), vertical minimum (patch 2d shift u)
[0234] Mapping information for each block and patch: candidate index (when patches are positioned in order based on the 2D spatial location and size information of the patch above, multiple patches may be mapped to a single block. The patches being mapped constitute a candidate list, and this index indicates which patch's data exists in the corresponding block), local patch index (an index pointing to one of the total patches existing in the frame). Table X is a pseudo code representing the block and patch matching process using the candidate list and local patch index.
[0235] The maximum number of candidate lists can be defined by the user.
[0236] Table 1-1 Pseudo code for block and patch mapping
[0237] for (i = 0; i < BlockCount; i++) {
[0238] if (candidatePatches[i].size() == 1) {
[0239] blockToPatch[i] = candidatePatches[i][0]
[0240] } else {
[0241] candidate_index
[0242] if (candidate_index == max_candidate_count) {
[0243] blockToPatch[i] = local_patch_index
[0244] } else {
[0245] blockToPatch[i] = candidatePatches[i][candidate_index]
[0246] }
[0247] }
[0248] }
[0249] FIG. 18 shows an example of push-pull background filling according to embodiments.
[0250] Image padding and group dilation (40006, 40007, 40008)
[0251] The image feather according to the embodiments can fill the space outside the patch area with meaningless additional data based on a push-pull background filling method.
[0252] Image padding is a process of filling the space outside the patch area with meaningless data to improve compression efficiency. For image padding, a method can be used in which pixel values from columns or rows corresponding to the boundary of the patch are copied to fill the empty space. Alternatively, as in a drawing, a push-pull background filling method can be used in which the empty space is filled with pixel values from a low-resolution image by gradually reducing the resolution of an unpadded image and then increasing it again.
[0253] Group dilation is a method for filling empty spaces in an attribute image of geometry composed of two layers, d0 / d1 and c0 / c1, and is a process of filling the values of the empty spaces in the two layers calculated through image padding with the average value of the values for the same location in the two layers.
[0254] FIG. 18 shows examples of possible traversal orders for a 4x4 block according to embodiments.
[0255] Occupancy map compression (40012, 40011)
[0256] The occupancy map compressor according to the embodiments can compress the previously generated occupancy map. Specifically, there may be two methods: video compression for lossy compression and entropy compression for lossless compression. Video compression is described below.
[0257] The entropy compression process can be performed as follows.
[0258] 1) For each block constituting the occupancy map, if the block is fully filled, encode 1 and repeat the same process for the next block. Otherwise, encode 0 and perform steps 2-5.
[0259] 2) Determine the best traversal order for performing run-length coding on the filled pixels of the block. The figure shows four possible traversal orders as an example for a 4x4 block.
[0260] FIG. 20 shows an example of a best traversal order according to embodiments.
[0261] As described above, the entropy compressor according to the embodiments can code blocks based on a traversal order method as shown in the drawing.
[0262] For example, among the possible traversal orders, the best traversal order having the minimum number of runs is selected and its index is encoded. For example, the drawing shows the case where the third traversal order of Fig. 13 above is selected, and since the number of runs can be minimized to 2 in this case, it can be selected as the best traversal order.
[0263] At this time, the number of runs is encoded. In the example of Fig. 14, since there are 2 runs, 2 is encoded.
[0264] 4) Encode the occupancy of the first run. In the example of Fig. 14, since the first run corresponds to unfilled pixels, 0 is encoded.
[0265] 5) Encode the length (as many as the number of runs) for each individual run. In the example of FIG. 14, the lengths of the first and second runs, 6 and 10, are encoded sequentially.
[0266] Video compression(Video compression, 40009, 40010, 40011)
[0267] A video compressor according to the embodiments uses a 2D video codec such as HEVC, VVC, etc. to encode a sequence of geometry images, texture images, occupancy map images, etc., generated by the process described above.
[0268] FIG. 21 shows an example of a 2D video / image encoder according to embodiments.
[0269] The drawing shows a schematic block diagram of a 2D video / image encoder (15000) in which encoding of a video / image signal is performed, as an embodiment of the video compression (40009, 40010, 40011) or video compressor described above. The 2D video / image encoder (15000) may be included in the point cloud video encoder described above, or may be composed of internal / external components. Each component of FIG. 15 may correspond to software, hardware, a processor, and / or a combination thereof.
[0270] Here, the input image may include the geometry image, texture image (attribute(s) image), occupancy map image, etc. described above. The output bitstream of the point cloud video encoder (i.e., point cloud video / image bitstream) may include output bitstreams for each input image (geometry image, texture image (attribute(s) image), occupancy map image, etc.).
[0271] The inter prediction unit (15090) and the intra prediction unit (15100) may be collectively referred to as the prediction unit. That is, the prediction unit may include the inter prediction unit (15090) and the intra prediction unit (15100). The conversion unit (15030), the quantization unit (15040), the inverse quantization unit (15050), and the inverse conversion unit (15060) may be included in the residual processing unit. The residual processing unit may further include a subtraction unit (15020). The above-described image segmentation unit (15010), subtraction unit (15020), conversion unit (15030), quantization unit (15040), inverse quantization unit (15050), inverse conversion unit (15060), addition unit (155), filtering unit (15070), inter prediction unit (15090), intra prediction unit (15100), and entropy encoding unit (15110) may be configured by a single hardware component (e.g., an encoder or a processor) according to an embodiment. Additionally, the memory (15080) may include a decoded picture buffer (DPB) and may be configured by a digital storage medium.
[0272] The image segmentation unit (15010) can divide an input image (or picture, frame) input to an encoding device (15000) into one or more processing units. For example, a processing unit may be called a coding unit (CU). In this case, the coding unit may be recursively divided from a coding tree unit (CTU) or a largest coding unit (LCU) according to a QTBT (Quad-tree binary-tree) structure. For example, a single coding unit may be divided into multiple coding units of a deeper depth based on a quad-tree structure and / or a binary tree structure. In this case, for example, the quad-tree structure may be applied first and the binary tree structure may be applied later. Or, the binary tree structure may be applied first. A coding procedure according to the present invention may be performed based on the final coding unit that is no longer divided. In this case, based on coding efficiency according to image characteristics, the maximum coding unit may be used directly as the final coding unit, or, if necessary, the coding unit may be recursively divided into lower-depth coding units so that a coding unit of the optimal size is used as the final coding unit. Here, the term "coding procedure" may include procedures such as prediction, transformation, and restoration described below. As another example, the processing unit may further include a prediction unit (PU) or a transformation unit (TU). In this case, the prediction unit and the transformation unit may each be divided or partitioned from the aforementioned final coding unit. The prediction unit may be a unit for sample prediction, and the transformation unit may be a unit for deriving transformation coefficients and / or a unit for deriving a residual signal from transformation coefficients.
[0273] The term "unit" may be used interchangeably with terms such as "block" or "area" depending on the context. In general, an MxN block may represent a set of samples or transform coefficients consisting of M columns and N rows. A sample can generally represent a pixel or a pixel value, and may represent only the pixel / pixel value of the luminance component or only the pixel / pixel value of the chroma component. A sample may be used to refer to a single picture (or image) as a term corresponding to a pixel or pel.
[0274] The encoding device (15000) can generate a residual signal (residual block, residual sample array) by subtracting a prediction signal (predicted block, prediction sample array) output from an inter prediction unit (15090) or an intra prediction unit (15100) from an input image signal (original block, original sample array), and the generated residual signal is transmitted to a conversion unit (15030). In this case, as illustrated, the unit that subtracts the prediction signal (predicted block, prediction sample array) from the input image signal (original block, original sample array) within the encoder (15000) may be called a subtraction unit (15020). The prediction unit performs a prediction for a block to be processed (hereinafter referred to as the current block) and can generate a predicted block containing prediction samples for the current block. The prediction unit can determine whether intra prediction is applied or inter prediction is applied at the current block or CU level. The prediction unit can generate various information regarding prediction, such as prediction mode information, as described below in the description of each prediction mode, and transmit it to the entropy encoding unit (15110). The information regarding prediction can be encoded in the entropy encoding unit (15110) and output in the form of a bitstream.
[0275] The intra prediction unit (15100) can predict the current block by referencing samples within the current picture. The referenced samples may be located near the current block or away from it, depending on the prediction mode. In intra prediction, the prediction modes may include a plurality of non-directional modes and a plurality of directional modes. The non-directional modes may include, for example, a DC mode and a Planar mode. The directional modes may include, for example, 33 directional prediction modes or 65 directional prediction modes, depending on the degree of fineness of the prediction direction. However, this is merely an example, and depending on the settings, more or fewer directional prediction modes may be used. The intra prediction unit (15100) may also determine the prediction mode applied to the current block by using the prediction mode applied to the surrounding blocks.
[0276] The inter prediction unit (15090) can derive a predicted block for the current block based on a reference block (reference sample array) specified by a motion vector on a reference picture. At this time, to reduce the amount of motion information transmitted in the inter prediction mode, motion information can be predicted in blocks, sub-blocks, or samples based on the correlation of motion information between neighboring blocks and the current block. Motion information may include motion vectors and reference picture indices. Motion information may further include information on inter prediction directions (L0 prediction, L1 prediction, Bi prediction, etc.). In the case of inter prediction, neighboring blocks may include spatial neighboring blocks existing within the current picture and temporal neighboring blocks existing in the reference picture. The reference picture containing the reference blocks and the reference picture containing the temporal neighboring blocks may be the same or different. Temporal surrounding blocks may be referred to by names such as collocated reference block, collocated CU (colCU), etc., and a reference picture containing temporal surrounding blocks may be referred to as a collocated picture (colPic). For example, the inter prediction unit (15090) may construct a list of motion information candidates based on surrounding blocks and generate information indicating which candidate is used to derive the motion vector and / or reference picture index of the current block. Inter prediction may be performed based on various prediction modes, for example, in the case of skip mode and merge mode, the inter prediction unit (15090) may use the motion information of surrounding blocks as the motion information of the current block. In the case of skip mode, unlike merge mode, a residual signal may not be transmitted.In the motion vector prediction (MVP) mode, the motion vector of surrounding blocks is used as a motion vector predictor, and the motion vector of the current block can be indicated by signaling the motion vector difference.
[0277] The prediction signal generated through the inter prediction unit (15090) and the intra prediction unit (15100) can be used to generate a restoration signal or to generate a residual signal.
[0278] The transformation unit (15030) can generate transform coefficients by applying a transformation technique to a residual signal. For example, the transformation technique may include at least one of a Discrete Cosine Transform (DCT), a Discrete Sine Transform (DST), a Karhunen-Loeve Transform (KLT), a Graph-Based Transform (GBT), or a Conditionally Non-linear Transform (CNT). Here, GBT refers to a transformation obtained from a graph when the relationship information between pixels is represented as a graph. CNT refers to a transformation obtained based on a prediction signal generated using all previously reconstructed pixels. Additionally, the transformation process may be applied to a pixel block of the same size in a square, or to a block of variable size that is not square.
[0279] The quantization unit (15040) quantizes the transformation coefficients and transmits them to the entropy encoding unit (15110), and the entropy encoding unit (15110) can encode the quantized signal (information regarding the quantized transformation coefficients) and output it as a bitstream. The information regarding the quantized transformation coefficients may be called residual information. The quantization unit (15040) can rearrange the block-shaped quantized transformation coefficients into a one-dimensional vector form based on the coefficient scan order, and can also generate information regarding the quantized transformation coefficients based on the one-dimensional vector-shaped quantized transformation coefficients. The entropy encoding unit (15110) can perform various encoding methods such as, for example, exponential Golomb, CAVLC (context-adaptive variable length coding), CABAC (context-adaptive binary arithmetic coding), etc. The entropy encoding unit (15110) may encode information necessary for video / image restoration (e.g., values of syntax elements) together or separately, in addition to quantized transformation coefficients. The encoded information (e.g., encoded video / image information) may be transmitted or stored in the form of a bitstream in units of NAL (network abstraction layer) units. The bitstream may be transmitted through a network or stored in a digital storage medium. Here, the network may include a broadcasting network and / or a communication network, and the digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. A transmission unit (not shown) that transmits the signal output from the entropy encoding unit (15110) and / or a storage unit (not shown) that stores it may be configured as internal / external elements of the encoding device (15000), or the transmission unit may be included in the entropy encoding unit (15110).
[0280] Quantized transform coefficients output from the quantization unit (15040) can be used to generate a prediction signal. For example, a residual signal (residual block or residual samples) can be restored by applying inverse quantization and inverse transformation to the quantized transform coefficients through the inverse quantization unit (15040) and the inverse transformation unit (15060). An adder (155) can generate a reconstructed signal (reconstructed picture, reconstructed block, reconstructed sample array) by adding the restored residual signal to the prediction signal output from the inter-prediction unit (15090) or the intra-prediction unit (15100). In cases where there is no residual for the block to be processed, such as when a skip mode is applied, the predicted block can be used as the reconstructed block. The adder (155) may be called a reconstruction unit or a reconstruction block generation unit. The generated restoration signal can be used for intra prediction of the next processing target block within the current picture, and can also be used for inter prediction of the next picture after filtering as described below.
[0281] The filtering unit (15070) can improve subjective / objective image quality by applying filtering to the restored signal. For example, the filtering unit (15070) can generate a modified restored picture by applying various filtering methods to the restored picture, and can store the modified restored picture in memory (15080), specifically in the DPB of memory (15080). Various filtering methods may include, for example, deblocking filtering, sample adaptive offset, adaptive loop filter, bilateral filter, etc. The filtering unit (15070) can generate various information regarding filtering and transmit it to the entropy encoding unit (15110), as described below in the description of each filtering method. The information regarding filtering can be encoded in the entropy encoding unit (15110) and output in the form of a bitstream.
[0282] The modified restored picture transmitted to the memory (15080) can be used as a reference picture in the inter-prediction unit (15090). Through this, when inter-prediction is applied, the encoding device can avoid prediction mismatches between the encoding device (15000) and the decoding device, and can also improve encoding efficiency.
[0283] The memory (15080) DPB can store the modified restored picture to be used as a reference picture in the inter-prediction unit (15090). The memory (15080) can store motion information of blocks from which motion information is derived (or encoded) within the current picture and / or motion information of blocks within the picture that have already been restored. The stored motion information can be transmitted to the inter-prediction unit (15090) to be used as motion information of spatially surrounding blocks or motion information of temporally surrounding blocks. The memory (15080) can store restoration samples of the blocks restored within the current picture and transmit them to the intra-prediction unit (15100).
[0284] Meanwhile, at least one of the aforementioned prediction, transformation, and quantization procedures may be omitted. For example, for a block to which pulse coding mode (PCM) is applied, the prediction, transformation, and quantization procedures may be omitted, and the value of the original sample may be encoded as is and output as a bitstream.
[0285] FIG. 22 illustrates a V-PCC decoding process according to embodiments.
[0286] FIG. 22 illustrates the configuration or operation of the point cloud video decoder of FIG. 1 in more detail.
[0287] Referring to FIG. 22, an example of the configuration of a decoding and post-processing pipeline for point cloud data according to embodiments is described. As shown in FIG. 22, an input bitstream is separated into multiple streams by a demultiplexer, and one of the separated streams is provided to an SPS parsing module to obtain encoding parameters and signaling information included in a sequence parameter set (SPS).
[0288] The patch sequence decompression module decodes a patch sequence containing patch index, patch location and size information, etc., and multiple video streams are input to a corresponding video decompression module and decoded into multiple video frames used as geometry data, attribute data, accusation information, etc.
[0289] The decoded patch sequence and multiple video frames are input into the Geometry / Attribute Reconstruction module, where point-based geometric coordinates and attribute values are reconstructed based on each patch and video signal.
[0290] The reconstructed geometry is passed to the Geometry Post-Processing module, where geometric noise can be reduced and continuity improved through post-processing operations such as smoothing, filtering, and hole filling.
[0291] The attribute transfer and smoothing module can improve the visual quality of the entire point cloud by referencing post-processed geometric information to interpolate or filter attribute values between adjacent points and transferring attribute information decoded from different viewpoints or layers to the target point cloud.
[0292] As shown in FIG. 22, a decoding process of V-PCC is performed to reconstruct a point cloud by decoding the compressed occupancy map, geometry image, texture (attribute) image, and auxiliary path information.
[0293] FIG. 23 shows an example of a 2D video / image decoder according to embodiments.
[0294] The decoder in Fig. 23 can correspond to the decoder in Fig. 22. The 2D video / image decoder can follow the inverse process of the 2D video / image encoder in Fig. 21.
[0295] The 2D video / image decoder of FIG. 23 is an embodiment of the video decompression or video decompressor of FIG. 22, and represents a schematic block diagram of a 2D video / image decoder (17000) in which decoding of a video / image signal is performed. The 2D video / image decoder (17000) may be included in the point cloud video decoder of FIG. 1, or may be composed of internal / external components. Each component of FIG. 23 may correspond to software, hardware, a processor, and / or a combination thereof.
[0296] Here, the input bitstream may include a bitstream for the geometry image, texture image (attribute(s) image), occupancy map image, etc. described above. The reconstructed image (or output image, decoded image) may represent a reconstructed image for the geometry image, texture image (attribute(s) image), and occupancy map image described above.
[0297] Referring to the drawings, the inter prediction unit (17070) and the intra prediction unit (17080) may be collectively referred to as the prediction unit. That is, the prediction unit may include the inter prediction unit (180) and the intra prediction unit (185). The inverse quantization unit (17020) and the inverse transform unit (17030) may be collectively referred to as the residual processing unit. That is, the residual processing unit may include the inverse quantization unit (17020) and the inverse transform unit (17030). The above-described entropy decoding unit (17010), inverse quantization unit (17020), inverse transform unit (17030), addition unit (17040), filtering unit (17050), inter prediction unit (17070), and intra prediction unit (17080) may be configured by a single hardware component (e.g., a decoder or a processor) according to the embodiment. In addition, the memory (170) may include a DPB (decoded picture buffer) and may be configured by a digital storage medium.
[0298] When a bitstream containing video / image information is input, the decoding device (17000) can restore the image in correspondence with the process in which the video / image information is processed by the encoding device. For example, the decoding device (17000) can perform decoding using a processing unit applied by the encoding device. Accordingly, the processing unit for decoding may be, for example, a coding unit, and the coding unit may be divided along a quad tree structure and / or a binary tree structure from a coding tree unit or a maximum coding unit. And, the restored image signal decoded and output through the decoding device (17000) can be played back through a playback device.
[0299] The decoding device (17000) can receive a signal output from the encoding device in the form of a bitstream, and the received signal can be decoded through the entropy decoding unit (17010). For example, the entropy decoding unit (17010) can parse the bitstream to derive information (e.g., video / image information) required for image restoration (or picture restoration). For example, the entropy decoding unit (17010) can decode information within the bitstream based on coding methods such as exponential coding, CAVLC, or CABAC, and output values of syntax elements required for image restoration and quantized values of transformation coefficients regarding residuals. More specifically, the CABAC entropy decoding method receives a bin corresponding to each syntactic element in a bitstream, determines a context model using information on the syntactic element to be decoded, decoding information of surrounding and target blocks, or information on symbols / bins decoded in the previous step, predicts the probability of bin occurrence according to the determined context model, and performs arithmetic decoding of the bin to generate a symbol corresponding to the value of each syntactic element. At this time, after determining the context model, the CABAC entropy decoding method can update the context model using information on the decoded symbols / bins for the context model of the next symbol / bin. Information regarding prediction among the information decoded in the entropy decoding unit (17010) is provided to the prediction unit (inter prediction unit (17070) and intra prediction unit (265)), and residual values, i.e., quantized transformation coefficients and related parameter information, from which entropy decoding is performed in the entropy decoding unit (17010) can be input to the inverse quantization unit (17020). Additionally, information regarding filtering among the information decoded in the entropy decoding unit (17010) can be provided to the filtering unit (17050).Meanwhile, a receiving unit (not shown) that receives a signal output from an encoding device may be further configured as an internal / external element of a decoding device (17000), or the receiving unit may be a component of an entropy decoding unit (17010).
[0300] In the inverse quantization unit (17020), the quantized transform coefficients can be inversely quantized to output transform coefficients. The inverse quantization unit (17020) can rearrange the quantized transform coefficients into a two-dimensional block form. In this case, the rearrangement can be performed based on the coefficient scan order performed by the encoding device. The inverse quantization unit (17020) can perform inverse quantization on the quantized transform coefficients using quantization parameters (e.g., quantization step size information) and obtain transform coefficients.
[0301] In the inverse conversion unit (17030), the conversion coefficients are inversely converted to obtain a residual signal (residual block, residual sample array).
[0302] The prediction unit performs a prediction for the current block and can generate a predicted block containing prediction samples for the current block. Based on the prediction information output from the entropy decoding unit (17010), the prediction unit can determine whether an intra prediction or an inter prediction is applied to the current block and can determine a specific intra / inter prediction mode.
[0303] The intra prediction unit (265) can predict the current block by referring to samples within the current picture. The referenced samples may be located near the current block or away from it, depending on the prediction mode. In intra prediction, the prediction modes may include a plurality of non-directional modes and a plurality of directional modes. The intra prediction unit (265) may determine the prediction mode applied to the current block by using the prediction mode applied to the surrounding blocks.
[0304] The inter prediction unit (17070) can derive a predicted block for the current block based on a reference block (reference sample array) specified by a motion vector on a reference picture. At this time, to reduce the amount of motion information transmitted in the inter prediction mode, motion information can be predicted in blocks, sub-blocks, or samples based on the correlation of motion information between neighboring blocks and the current block. Motion information may include a motion vector and a reference picture index. Motion information may further include information on the inter prediction direction (L0 prediction, L1 prediction, Bi prediction, etc.). In the case of inter prediction, neighboring blocks may include spatial neighboring blocks existing within the current picture and temporal neighboring blocks existing in the reference picture. For example, the inter prediction unit (17070) can construct a motion information candidate list based on the neighboring blocks and derive the motion vector and / or reference picture index of the current block based on the received candidate selection information. Inter-prediction can be performed based on various prediction modes, and information regarding the prediction may include information indicating the mode of inter-prediction for the current block.
[0305] The adder (17040) can generate a restoration signal (restored picture, restored block, restored sample array) by adding the acquired residual signal to the prediction signal (predicted block, predicted sample array) output from the inter prediction unit (17070) or the intra prediction unit (265). In cases where there is no residual for the block to be processed, such as when a skip mode is applied, the predicted block can be used as the restoration block.
[0306] The addition unit (17040) may be called a restoration unit or a restoration block generation unit. The generated restoration signal may be used for intra-predicting the next block to be processed within the current picture, and may also be used for inter-predicting the next picture after filtering as described below.
[0307] The filtering unit (17050) can improve subjective / objective image quality by applying filtering to the restored signal. For example, the filtering unit (17050) can generate a modified restored picture by applying various filtering methods to the restored picture, and can transmit the modified restored picture to memory (17060), specifically to the DPB of memory (17060). Various filtering methods may include, for example, deblocking filtering, sample adaptive offset, adaptive loop filter, bilateral filter, etc.
[0308] The (modified) restored picture stored in the DPB of the memory (17060) can be used as a reference picture in the inter-prediction unit (17070). The memory (17060) can store motion information of blocks from which motion information within the current picture has been derived (or decoded) and / or motion information of blocks within the picture that have already been restored. The stored motion information can be transmitted to the inter-prediction unit (17070) to be used as motion information of spatially surrounding blocks or motion information of temporally surrounding blocks. The memory (170) can store restoration samples of blocks restored within the current picture and transmit them to the intra-prediction unit (17080).
[0309] In this specification, the embodiments described in the filtering unit (160), inter prediction unit (180), and intra prediction unit (185) of the encoding device (100) may be applied to the filtering unit (17050), inter prediction unit (17070), and intra prediction unit (17080) of the decoding device (17000), respectively, in the same or corresponding manner.
[0310] Meanwhile, at least one of the aforementioned prediction, transformation, and quantization procedures may be omitted. For example, for blocks to which pulse coding mode (PCM) is applied, the prediction, transformation, and quantization procedures may be omitted, and the decoded sample values may be used as samples of the reconstructed image.
[0311] Occupancy map decompression
[0312] This is the reverse process of the previously explained occupancy map compression, a process to decode the compressed occupancy map bitstream and restore the occupancy map.
[0313] Auxiliary patch info decompression
[0314] The auxiliary patch info can be restored by performing the reverse process of the auxiliary patch info compression described earlier and decoding the compressed auxiliary patch info bitstream.
[0315] Geometry reconstruction
[0316] This is the inverse process of the geometry image generation described earlier. First, patches are extracted from the geometry image using the restored occupancy map, the 2D position / size information of the patches included in the auxiliary patch info, and the mapping information between blocks and patches. Subsequently, a point cloud is restored in 3D space using the geometry image of the extracted patches and the 3D position information of the patches included in the auxiliary patch info. Let g(u, v) be the geometry value corresponding to an arbitrary point (u, v) existing within a single patch, and let (d0, s0, r0) be the normal, tangent, and bitangent axis coordinate values of the patch's position in 3D space. Then, the normal, tangent, and bitangent axis coordinate values d(u, v), s(u, v), and r(u, v) of the 3D space position mapped to point (u, v) can be expressed as follows.
[0317] d(u, v) = d0 + g(u, v)
[0318] s(u, v) = s0 + u
[0319] r(u, v) = r0 + v
[0320] Smoothing (16006)
[0321] It is identical to the smoothing described earlier in the encoding process and is a process designed to eliminate discontinuities that may occur at patch boundaries due to image quality degradation during compression.
[0322] Texture reconstruction
[0323] This is a process of restoring a color point cloud by assigning color values to each point constituting the smoothed point cloud. This can be performed by using the mapping information between the geometry image and the point cloud from the geometry reconstruction process described in 2.4, and assigning color values corresponding to texture image pixels at the same location in the geometry image in 2D space to points in the point cloud at the same location in 3D space.
[0324] Color smoothing
[0325] Similar to the geometry smoothing process described earlier, this is a task designed to eliminate discontinuities in color values that may occur at patch boundaries due to image quality degradation during compression. It can be performed through the following process.
[0326] 1) Calculate the adjacent points of each point constituting the reconstructed color point cloud using a KD tree, etc. Alternatively, the adjacent point information calculated during the geometry smoothing process described in Section 2.5 can be used as is.
[0327] 2) For each point, determine whether the point is located on the patch boundary. The boundary information calculated during the geometry smoothing process described in Section 2.5 may be used as is.
[0328] 3) For points adjacent to a point on the boundary surface, the distribution of color values is examined to determine whether smoothing is necessary. For example, if the entropy of the luminance value is below the threshold local entry (i.e., if there are many similar luminance values), it is determined to be a non-edge area, and smoothing can be performed. Smoothing methods may include changing the color value of the point to the average of its adjacent points.
[0329] FIG. 24 shows an example of an operation flowchart of a transmitting device according to embodiments.
[0330] A transmitting device according to the embodiments may correspond to the transmitting device of FIG. 1, the encoding process of FIG. 4, and the 2D video / image encoder of FIG. 21, or may perform some or all of their operations. Each component of the transmitting device may correspond to software, hardware, a processor, and / or a combination thereof.
[0331] The operation process of the transmitting end for compressing and transmitting point cloud data using V-PCC can be as shown in the diagram.
[0332] The point cloud data transmission device according to the embodiments may be referred to as a transmission device, etc.
[0333] Regarding the patch generation unit (18000), first, a patch for 2D image mapping of a point cloud is generated. Additional patch information is generated as a result of the patch generation, and this information can be used in the geometry image generation, texture image generation, and geometry restoration process for smoothing.
[0334] Regarding the patch packing section (18001), the generated patches undergo a patch packing process in which they are mapped into a 2D image. An occupancy map can be generated as a result of the patch packing, and the occupancy map can be used in the geometry image generation, texture image generation, and geometry restoration process for smoothing.
[0335] The geometry image generation unit (18002) generates a geometry image using additional patch information and an occupancy map, and the generated geometry image is encoded into a single bitstream through video encoding.
[0336] The encoding preprocessing (18003) may include an image padding procedure. The generated geometry image or the geometry image regenerated by decoding the encoded geometry bitstream can be used for 3D geometry restoration and can then undergo a smoothing process.
[0337] The texture image generation unit (18004) can generate a texture image using (smoothed) 3D geometry, a point cloud, additional patch information, and an occupancy map. The generated texture image can be encoded into a single video bitstream.
[0338] The metadata encoding unit (18005) can encode additional patch information into a single metadata bitstream.
[0339] The video encoding unit (18006) can encode the occupancy map into a single video bitstream.
[0340] The multiplexer (18007) multiplexes the video bitstream of the generated geometry, texture image, and occupancy map, and the additional patch information metadata bitstream into a single bitstream.
[0341] The transmitter (18008) can transmit the bitstream to the receiver. Alternatively, the video bitstream of the generated geometry, texture image, and Occupancy map and the additional patch information metadata bitstream can be created as a file with one or more track data or encapsulated into segments and transmitted to the receiver through the transmitter.
[0342] FIG. 25 shows an example of an operation flowchart of a receiving device according to embodiments.
[0343] A receiving device according to the embodiments may correspond to the receiving device of FIG. 1, the decoding process of FIG. 16, and the 2D video / image encoder of FIG. 23, or may perform some or all of their operations. Each component of the receiving device may correspond to software, hardware, a processor, and / or a combination thereof.
[0344] The operation process of the receiving end for receiving and restoring point cloud data using V-PCC can be as shown in the figure. The operation of the V-PCC receiving end can follow the inverse process of the operation of the V-PCC transmitting end in Fig. 18.
[0345] The point cloud data receiving device according to the embodiments may be referred to as a receiving device, etc.
[0346] The bitstream of the received point cloud is demultiplexed by the demultiplexer (19000) into video bitstreams of a compressed geometry image, texture image, and occupancy map, and additional patch information metadata bitstream after file / segment decapsulation. The video decoding unit (19001) and the metadata decoding unit (19002) decode the demultiplexed video bitstreams and metadata bitstreams. The geometry image, occupancy map, and additional patch information decoded by the geometry restoration unit (19003) are used to restore the 3D geometry, and then undergo a smoothing process by the smoother (19004). By assigning color values to the smoothed 3D geometry using the texture image, the color point cloud image / picture can be restored by the texture restoration unit (19005). Subsequently, a color smoothing process may be additionally performed to improve objective / subjective visual quality, and the modified point cloud image / picture derived therefrom is displayed to the user through a rendering process (e.g., by a point cloud renderer). Meanwhile, the color smoothing process may be omitted depending on the case.
[0347] In the structure according to the embodiments, at least one of a server, a robot, an autonomous vehicle, an XR device, a smartphone, a home appliance, and / or an HMD is connected to a cloud network. Here, a robot, an autonomous vehicle, an XR device, a smartphone, or a home appliance (etc.) may be referred to as a device. Additionally, the XR device may correspond to or be linked with a point cloud data (PCC) device according to the embodiments.
[0348] A cloud network may refer to a network that constitutes part of a cloud computing infrastructure or exists within a cloud computing infrastructure. Here, the cloud network may be configured using a 3G network, a 4G or LTE (Long Term Evolution) network, or a 5G network, etc.
[0349] The server is connected to at least one of a robot, autonomous vehicle, XR device, smartphone, home appliance, and / or HMD via a cloud network and can assist in at least some of the processing of the connected devices.
[0350] HMD (Head-Mount Display) represents one of the types in which an XR device and / or PCC device according to the embodiments can be implemented. A device of the HMD type according to the embodiments includes a communication unit, a control unit, a memory unit, an I / O unit, a sensor unit, and a power supply unit, etc.
[0351] Hereinafter, various embodiments of a device to which the above-described technology is applied are described.
[0352] <PCC+XR> XR / PCC devices may be implemented as HMDs (Head-Mount Displays), HUDs (Head-Up Displays) equipped in vehicles, televisions, mobile phones, smartphones, computers, wearable devices, home appliances, digital signage, vehicles, stationary robots, or mobile robots by applying PCC and / or XR (AR+VR) technology.
[0353] The XR / PCC device can obtain information about surrounding space or real-world objects by analyzing 3D point cloud data or image data acquired through various sensors or from an external device to generate positional and attribute data for 3D points, and can render and output an XR object to be output. For example, the XR / PCC device can output an XR object containing additional information about a recognized object by associating it with the recognized object.
[0354] <PCC+자율주행+XR> Autonomous vehicles can be implemented as mobile robots, vehicles, unmanned aerial vehicles, etc., by applying PCC technology and XR technology.
[0355] An autonomous vehicle equipped with XR / PCC technology may refer to an autonomous vehicle equipped with means for providing XR images, or an autonomous vehicle that is the subject of control / interaction within the XR images. In particular, an autonomous vehicle that is the subject of control / interaction within the XR images is distinct from the XR device and can be interconnected with it.
[0356] An autonomous vehicle equipped with means for providing XR / PCC images can acquire sensor information from sensors including cameras and output XR / PCC images generated based on the acquired sensor information. For example, the autonomous vehicle can provide an XR / PCC object corresponding to a real-world object or an object on the screen to the occupant by providing an XR / PCC image by outputting an XR / PCC image equipped with a HUD.
[0357] In this case, when an XR / PCC object is displayed on a HUD, at least a portion of the XR / PCC object may be displayed so as to overlap with the actual object to which the occupant's gaze is directed. Conversely, when an XR / PCC object is displayed on a display installed inside the autonomous vehicle, at least a portion of the XR / PCC object may be displayed so as to overlap with an object on the screen. For example, the autonomous vehicle may display XR / PCC objects corresponding to objects such as lanes, other vehicles, traffic lights, traffic signs, motorcycles, pedestrians, and buildings.
[0358] VR (Virtual Reality) technology, AR (Augmented Reality) technology, MR (Mixed Reality) technology and / or PCC (Point Cloud Compression) technology according to the embodiments can be applied to various devices.
[0359] In other words, VR technology is a display technology that provides real-world objects or backgrounds solely as CG images. On the other hand, AR technology refers to a technology that displays virtual CG images alongside images of real objects. Furthermore, MR technology is similar to the aforementioned AR technology in that it mixes and combines virtual objects with the real world. However, it is distinguished from AR technology in that while AR technology maintains a clear distinction between real-world objects and virtual objects created from CG images, using virtual objects to complement real-world objects, MR technology regards virtual objects as having the same nature as real-world objects. To give a more specific example, the aforementioned MR technology is applied in hologram services.
[0360] However, recently, rather than clearly distinguishing between VR, AR, and MR technologies, they are also referred to as XR (extended Reality) technology. Therefore, embodiments of the present invention are applicable to all VR, AR, MR, and XR technologies. Such technology may utilize encoding / decoding based on PCC, V-PCC, and G-PCC technologies.
[0361] The PCC method / device according to the embodiments can be applied to a vehicle providing autonomous driving services.
[0362] Vehicles providing autonomous driving services are connected to PCC devices to enable wired / wireless communication.
[0363] When a point cloud data (PCC) transceiver according to the embodiments is connected to a vehicle for wired or wireless communication, it can receive and process content data related to AR / VR / PCC services that can be provided along with an autonomous driving service, and transmit it to the vehicle. Additionally, when the point cloud data transceiver is mounted on a vehicle, the point cloud transceiver can receive and process content data related to AR / VR / PCC services according to a user input signal received through a user interface device and provide it to the user. A vehicle or a user interface device according to the embodiments can receive a user input signal. The user input signal according to the embodiments may include a signal indicating an autonomous driving service.
[0364] FIG. 26 shows 3DGS (3D Gaussian Splatting) data components according to embodiments.
[0365] The embodiments include a method and device for encoding / decoding 3DGS data based on a video codec.
[0366] Recently, 3D Gaussian Splatting (GS) technology has been actively researched. 3D Gaussian Splatting is a technique that represents 3D space in the form of small Gaussians. It utilizes a Gaussian distribution, which has the characteristic that values gradually decrease as they move away from the center, to represent 3D space and construct and render scenes. It possesses more efficient characteristics compared to the previously researched NeRF technology. In other words, because 3D Gaussian Splatting effectively represents the details of complex 3D scenes using Gaussian splatters, it can achieve similar performance with relatively less data compared to NeRF. Furthermore, due to the characteristics of the Gaussian distribution, it can naturally overlap to generate smooth and realistic images, and offers the ability to provide more natural video performance.
[0367] MPEG standards are also paying attention to such 3D GS, and discussions on compression methods for 3D GS data are underway. Since it utilizes points in 3D space, it shares many similarities with the characteristics of a point cloud, and in fact, there are cases where the same attributes are used. Therefore, MPEG is making efforts to interpret this 3D GS data as a kind of extended form of a point cloud and to create a compression standard. Among the two point cloud compression methods, G-PCC and V-PCC, the present invention proposes a method for compressing 3D GS based on and extending V-PCC (V3C). That is, the embodiments relate to the V-GSC (Video-based Gaussian Splat Coding) method, which compresses 4D GS—a time-axis extended version of 3D GS data representing three-dimensional spatial space—using an existing 2D video codec.
[0368] The embodiments relate to an encoding / decoding method for 3D GS processing and compression in V-PCC encoding / decoding. A method for efficiently compressing attributes and data constituting 3D GS based on V-PCC is proposed.
[0369] The range of data to be compressed according to the embodiments includes not only 3D GS but also 4D GS, which is a time-axis extension concept of 3D GS, wherein 4D GS refers to a data sequence in which each frame consists of 3D GS data. (If applied to a general video codec, 3D GS can be understood as a single video frame, and 4D GS as a video sequence.) That is, 3D GS refers to a single still volumetric image frame representing a three-dimensional space of a specific time period composed of numerous Gaussian splats, and 4D GS refers to a dynamic visual volumetric data sequence composed of a set of dynamic volumetric frames that change along the time axis. Unless otherwise specifically stated, the functions described in this document apply commonly to both 3D GS and 4D GS.
[0370] In addition, in the present invention, V3C is merely one example for utilizing a video codec and is not limited thereto, and can also be described based on general video codecs.
[0371] Existing V-PCC encoding / decoding standards have been developed to efficiently encode and decode point cloud data. These point cloud data stores attribute information along with vertices, and the types of attributes are practically developed to primarily support only color information. On the other hand, GS, which is currently under research, has more components (coefficients / parameters) compared to existing point cloud data.
[0372] Figure 26 shows the components of GS data. The V-PCC encoder and decoder of Figures 1 to 25 described above can encode and decode point cloud data including the GS data of Figure 26.
[0373] As shown in Fig. 26, 3DGS data is broadly divided into geometry and attributes. In the geometry part, as in the existing V-PCC / V3C, 3 dimensions (x,y,z) are used to represent the positions (of vertices). Additionally, 4 dimensions (x,y,z,w) are used for rotation information, and 3 dimensions are used for scale representation. Furthermore, in the attributes, 1 dimension for opacity, 3 dimensions for DC, and 45 dimensions for SH coefficient are used.
[0374] Accordingly, the present invention proposes a method based on a 2D video code, focusing on an embodiment that extends V-PCC (V3C) to compress 3D GS and 4D GS data.
[0375] As illustrated in FIG. 26, each GSplat can be divided into geometry information and attribute information. The geometry information may first include three-dimensional (x, y, z) components as position information to indicate the location of vertices, and four-dimensional (x, y, z, w) components as rotation information to indicate the direction of the Gaussian. Additionally, three-dimensional components may be used as scale information to define the size and spread of the Gaussian. Meanwhile, the attribute information may first include three-dimensional components as DCs (DC components) to indicate the default values of color or luminance components, and 45-dimensional components as spherical harmonic coefficients (SH coefficients) to express the directional lighting characteristics of the object surface. Furthermore, opacity information indicating the transparency or opacity of each GSplat may be included as one-dimensional components. As such, according to the structure illustrated in FIG. 26, a single GSplat can be represented as a set of multidimensional parameters corresponding to geometry and attributes, and high-quality volumetric image data can be represented by efficiently encoding and decoding this set of parameters.
[0376] As used in this document, the term V-PCC (Video-based Point Cloud Compression) may be used interchangeably with V3C (Visual Volumetric Video-based Coding), and the two terms may be used interchangeably. Therefore, in this document, the term V-PCC may be interpreted as V3C.
[0377] The term 3DGS used in this document may be used with the same meaning as Gaussian splatting, Gsplat, GS, etc., and the two terms may be used interchangeably.
[0378] As mentioned above, the term 4DGS used in this document is a concept that adds time t (temporal) to the concept of 3DGS. It is a term specifically used for video data in a 3DGS dataset that has one or more varying parameters as time changes. Therefore, dynamic 3DGS and moving 3DGS also have the same meaning.
[0379] Gaussian splatting content can be encoded into a V3C bitstream structure. Figure 2 below shows the V3C bitstream structure used when encoding V3C content according to the V3C codec document (ISO / IEC 23090-5).
[0380] Referring to FIG. 4, the V3C unit in the bitstream is composed of a V3C unit header and a payload, and may include V3C_VPS, V3C_AD, V3C_GVD, V3C_AVD, V3C_OVD, etc. as embodiments constituting the V3C unit type.
[0381] VPS: A V3C / V-GSC Parameter Set containing V3C and V-GSC parameter information.
[0382] AD: Atlas Data, which can include V3C atlas information.
[0383] GVD: Geometry Video Data, which includes geometry video sub-bitstreams and related information.
[0384] AVD: Attribute Video Data, which includes attribute video sub-bitstreams and related information.
[0385] OVD: Occupancy Video Data, which includes the occupancy video sub-bitstream and related information.
[0386] The definition of the data configuration according to the embodiments is as follows.
[0387] DC component (DC coefficient): "DC component" refers to the coefficient corresponding to order = 0 and m = 0 when expressing the color or radiance corresponding to the Gaussian splat in Spherical Harmonics (SH). The DC component is a zero-order component representing a constant basic color or brightness component regardless of the line of sight, and defines the direction-independent (view-independent) basic color information of the Gaussian splat.
[0388] Spherical Harmonic Coefficients (SH Coefficients): "Spherical harmonic coefficients (SH coefficients)" refer to coefficients used to express the color or radiance distribution of a Gaussian splat as a linear combination of spherical harmonic basis functions, and may include a DC component (zero-order coefficient) or higher-order coefficients excluding it. Spherical harmonic coefficients are parameters used to model direction-dependent characteristics, such as changes in color or brightness according to the line of sight, and can be used to ensure that each Gaussian splat has different color values depending on the direction of observation.
[0389] Opacity (α): "Opacity (α)" is a scalar value representing the degree to which each Gaussian splat contributes to pixel color or radiance during rendering, and can be defined, for example, as a value between 0 and 1. When the opacity (α) value is 0, the Gaussian splat is considered transparent and does not make a substantial contribution, and as the opacity (α) value approaches 1, the Gaussian splat can be treated as contributing strongly to the foreground, and multiple Gaussian splats can be combined using alpha blending or volume rendering methods with opacity (α).
[0390] FIG. 27 shows a V-GSC (Video-based Gaussian Splat Coding) encoder according to embodiments.
[0391] The encoder in Fig. 27 corresponds to the point cloud video encoder in Fig. 1, the encoder in Fig. 5, the encoder in Fig. 21, the transmitting device in Fig. 24, etc.
[0392] The embodiments extended the encoder structure of the existing V3C to support Gaussian splatting data as shown in FIG. 27.
[0393] In the V-GSC encoder, the input 4DGS parameters are pre-encoded to produce a video frame, which is then encoded through a video encoder. Afterward, the unit encapsulated into a V3C unit can be configured into a V3C-based bitstream structure through a multiplexer.
[0394] The pre-encoding function includes a series of data processing steps executed before video encoding is performed on the input GS parameters, which may include linear / non-linear conversion of the GS parameter values, bit-depth conversion, rotation parameter conversion, clipping, quantization, packing into 2D frames, etc.
[0395] While three existing video encoders (such as HEVC or VVC) are used in the current V3C, the newly proposed video-based GS compression method can utilize six video encoders.
[0396] The atlas data of V3C may initially follow the existing method and not be extended otherwise. The atlas data may be metadata for decoding the V-GSC bitstream and restoring it to GS.
[0397] With reference to FIG. 27, the multiplexing structure of V-GSC bitstreams according to embodiments is schematically described. As shown in FIG. 27, a plurality of geometric video sub-bitstreams encoding geometric parameters such as position, rotation, and scale are each encapsulated into a V3C_GVD unit, and a occupied video sub-bitstream encoding occupancy information can be encapsulated into a V3C_OVD unit. Additionally, a plurality of attribute video sub-bitstreams encoding attribute parameters such as DC component, SH coefficient, and opacity are each encapsulated into a V3C_AVD unit. Meanwhile, atlas data including V3C / V-GSC parameters and atlas-related metadata can be configured into a V3C_AD unit. These V3C_ADs, multiple V3C_GVDs, V3C_OVDs, and V3C_AVD units are input into a multiplexer and can be multiplexed into a single V3C-based bitstream according to time order and specifications. As such, according to the structure of FIG. 27, multiple video sub-bitstreams corresponding to different geometric and attribute parameter groups can be flexibly combined to be transmitted or stored as a single integrated V-GSC bitstream, and the number and types of each sub-bitstream can be varied according to system design or implementation requirements.
[0398] FIG. 28 shows pre-encoding according to embodiments.
[0399] FIG. 28 illustrates the pre-encoding process of FIG. 27 encoding.
[0400] As an example of processes performed before video encoding 4DGS parameters, rotation conversion, GSTF, 2D frame packing, and video encoder processes may be performed for each 4DGS parameter according to the example. Some of these processes may be omitted or added considering the characteristics of the parameters.
[0401] The X, Y, Z positions (center positions) of the GS can be compressed using GVD from the existing V3C. In this case, additional projection or transformation may be required for the efficient utilization of the video codec. The 3 dimensions of X, Y, and Z may need to be converted to RGB or YUV for video use. After 2D frame packing, it can be compressed into video.
[0402] Additionally, a mapping process to represent position values as integers for video compression can be performed prior to video compression. The opacity data of GS is compressed using OVD from the existing V3C.
[0403] Since opacity is one-dimensional data, video compression can be performed using conventional OVD, or simply entropy coding can be performed using arithmetic coding. When using conventional OVD, it can be compressed into video after 2D frame packing.
[0404] Additionally, a mapping process to represent opacity values as integers for video compression can be performed prior to video compression. The scale data of GS is 3-dimensional data and can be compressed using the existing V3C's GVD. In this case, additional projection or transformation may be required for the efficient utilization of the video codec.
[0405] For video utilization, it may be necessary to transform the three dimensions of X, Y, and Z into RGB or YUV. After 2D frame packing, it can be compressed into video. Additionally, a mapping process to represent scale values as integers for video compression may be performed prior to video compression. The rotation data of GS can convert the rotation representation method by performing a rotation conversion.
[0406] Rotation data expressed in quaternion format can be converted and expressed in Euler angles. The input can be rotation data of GS expressed in quaternion and can be composed of 4 dimensions. The output can be rotation data of GS expressed in Euler angles and can be composed of 3 dimensions. An example of converting rotation data from quaternion format to Euler angle format may be as shown in the following Equation 1.
[0407]
[0408]
[0409]
[0410]
[0411] can mean a rotation parameter expressed in quaternion form, and can mean a rotation parameter expressed in Euler angle form.
[0412] The rotation data of S is 4-dimensional data and can be compressed using the existing V3C's GVD. Generally, video encoders have 3-channel data in the form of RGB or YUV. Therefore, the rotation data can be converted to 3-channels accordingly and input into the video encoder; to this end, as described in the above embodiment You can use a method of converting to a format and inputting each coefficient into each video channel.
[0413] To input 3D-converted rotation data into a video codec, it can be utilized as an input of the existing GVD type by performing a 1:1 mapping between RGB and YUV. Additionally, a mapping process to represent rotation values as integers for video compression can be performed prior to video compression. When 4-channel data in quaternion format is input directly into the video codec without conversion, an embodiment is possible in which rotation parameters belonging to each GS included in the frame are sequentially packed during the 2D frame construction process. (In the case of the YUV 4:2:2 format)
[0414] for (i=0; i <Fmax; i++) {
[0415] if (i%2==0) Y(i) = Qw(i);
[0416] else Y(i) = Qx(i);
[0417] U(i) = Qy(i);
[0418] V(i) = Qz(i);
[0419] }
[0420] Here, Y(i), U(i), and V(i) represent the Y, U, and V components constituting the 2D frame for the video encoder input, respectively; Fmax represents the total number of pixels in the frame (based on Y), and i represents the order when raster scanning the frame. Additionally, Qw(i), Qx(i), Qy(i), and Qz(i) are, respectively Represents the integer value converted to 10-bit for the component.
[0421] SH DC coefficient data from GS is 3-dimensional data and can be compressed using the existing V3C AVD. For video utilization, conversion to RGB or YUV may be required. It can be compressed into video after 2D frame packing. Since there is no significant difference between the color information in GS and V3C, the AVD can be utilized as is. Additionally, a mapping process to represent SH DC values as integers for video compression may be performed prior to video compression.
[0422] The SH AC coefficients data of GS is 15 x 3 dimensions and can be compressed using the existing V3C AVD. Since it is 15 x 3 dimensions, the existing AVD can be expanded to utilize multiple 3 dimensions (e.g., 15 layers). Alternatively, if the k-th SH AC coefficient of the j-th GS is expressed as (shr(j, k), shg(j, k), shb(j, k)), then the 15 coefficients from r(j, 0.) to r(j, 14) constituting the AC coefficients of spherical harmonics for the R color signal can be used to map the data corresponding to R among the R, G, and B elements constituting the frame input to the video encoder to the (j*15 to j*15+14)-th position based on the raster scan order.
[0423] SH AC coefficient information may be skipped depending on its importance. Currently, since it belongs to the category of low-importance data within the GS data itself, there is a possibility of optimization through future research. When using AVD, conversion to RGB or YUV may be required for video utilization. It can be compressed into video after 2D frame packing. Additionally, a mapping process to represent SH AC values as integers for video compression may be performed prior to video compression.
[0424] As an example of the mapping process for the above GS parameters, a Gaussian Splat Transfer Function (GSTF) may be used as the mapping function. The GSTF may be used to map the values of the GS (Gaussian Splats) parameters, expressed as real numbers, to symbol values, expressed as integers. The GSTF may be a monotonically increasing function. As an example of the GSTF, the GSTF may be expressed as a linear function as shown in Equation 2.
[0425]
[0426] a can be a real value among the GS parameters position, opacity, scale, rotation, SH DC, and SH AC. Here, if a GS parameter is a parameter expressed as a vector with a length of 1 or more, such as position, a can represent the value of one element of the vector.
[0427] can be a Gaussian parameter value expressed as an integer. Here, are each The maximum value of, The minimum value of, expressed as an integer It can refer to the number of bits. GSFT parameters can have their values determined by inputting them from the user, or by analyzing the GS input to the encoder. Additionally, GSFT parameters can be included in the bitstream and transmitted to the decoder. Furthermore, GSFT parameters can have different values for each GS parameter. Alternatively, when a set containing one or more GS parameters is referred to as a GS parameter group, different GSFT parameters can be present for each GS parameter group. In this case, is before being input to GSFT It may be clipped using the value.
[0428] Each compressed GS data can be converted into a V-GSC bitstream and transmitted through a multiplexer.
[0429] FIG. 29 shows a V-GSC decoder according to embodiments.
[0430] FIG. 29 can correspond to the reverse process of the encoder of FIG. 27. The decoder of FIG. 29 can correspond to the point cloud video decoder of FIG. 1, the decoder of FIG. 23, the receiving device of FIG. 25, etc.
[0431] Referring to Fig. 29, the decoding process of the V-GSC system is explained.
[0432] A V-GSC bitstream is received, separated into individual element bitstreams through a demultiplexer, and reconstructed through a post-decoding process after undergoing video decoding. The reconstructed GS data can be stored in a GS data file format and, depending on the embodiment, may consist of position data, opacity, scale, rotation, SH DC / AC coefficients, etc., and some components may be omitted or / and added. The components of the 3D Gaussian stored in the GS data file format may be stored in a listed form corresponding to each component constituting the 3D Gaussian.
[0433] An Atlas decoder can be utilized, similar to V3C's AD. Position data generated using V3C's GVD can be reconstructed through nominal format conversion after passing through a video decoder (such as HEVC or VVC). Opacity generated using V3C's OVD can be reconstructed through nominal format conversion after passing through a video decoder (such as HEVC or VVC). In the case of AC coding, it can be converted via an AC decoder.
[0434] Scale data generated using V3C's GVD can be reconstructed through nominal format conversion after passing through a video decoder (such as HEVC or VVC). Rotation data generated using V3C's GVD can be reconstructed through nominal format conversion after passing through a video decoder (such as HEVC or VVC).
[0435] SH DC data generated using the V3C AVD can be reconstructed through nominal format conversion after passing through a video decoder (such as HEVC or VVC). SH AC data generated using the V3C AVD can be reconstructed through nominal format conversion after passing through a video decoder (such as HEVC or VVC).
[0436] Compared to the current V3C which uses three existing video decoders (HEVC or VVC, etc.), the newly proposed video-based GS compression method is expected to require six video decoders.
[0437] Referring to FIG. 29, an example of a structure for decoding a V-GSC bitstream and reconstructing GS parameters according to embodiments is described. As shown in FIG. 29, the received V-GSC bitstream is first separated by a demultiplexer into V3C_AD, a plurality of V3C_GVDs, V3C_OVDs, V3C_AVDs, etc. The V3C_AD unit is decoded through an atlas decoder and provides atlas-related information and parameters used for subsequent GS data recovery. Among the V3C_GVD units, the first V3C_GVD outputs decoded GS position data corresponding to the center position of the GS through decoding and nominal format conversion via a video decoder, and the V3C_OVD outputs decoded GS opacity data corresponding to the opacity of the GS through a video decoder and nominal format conversion. Another V3C_GVD outputs decoded GS scale data corresponding to the scale of the GS through the same video decoding path, and another V3C_GVD outputs decoded GS rotation data corresponding to the rotation of the GS. Meanwhile, one of the V3C_AVD units converts the result decoded through the video decoder into a nominal format to provide decoded GS SH DC data corresponding to the SH DC (color component) of the GS, and another V3C_AVD can provide decoded GS SH AC coeff data corresponding to the SH AC coefficient. In this way, each V3C unit separated from the demultiplexer by the structure according to FIG. 29 is reconstructed into individual GS parameters through the corresponding decoder and format conversion unit, and the reconstructed center position, opacity, scale, rotation, SH DC, and SH AC coefficients can be input into a subsequent GS reconstruction module or GS data file format generation module to be integrated into a final GS data set.
[0438] FIG. 30 shows post-decoding according to embodiments.
[0439] FIG. 30 illustrates the post-decoding of the decoder of FIG. 29.
[0440] This may be an example of a post-decoding process during the V-GSC decoding process.
[0441] The input to the restoration process may be video decoded through a video decoder, and the output of the post-decoding process may be restored 4DGS data. The post-decoding process may consist of a frame inversion unit (2D frame unpacking), inverse GSTF, rotation conversion, composer, etc., and some processes may be omitted or added depending on the characteristics of each GS data.
[0442] The post-decoding process may be performed independently for each component of the 4DGS data, or it may be performed in combination for all or some of the 4DGS components.
[0443] According to the embodiment, GS data decoded through a 2D video codec can perform a 2D frame unpacking process. Basically, the unpacking process for decoded GS components can follow the following process.
[0444] decoded_GS(i) can be considered as the i-th GS parameter that goes into the renderer as input. In this case, depending on the GS data, these i-th GS parameters may be configured in the image domain (x, y) as follows.
[0445] position (x,y) = decoded position data at (x,y)
[0446] opacity (x,y) = decoded opacity data at (x,y)
[0447] scale (x,y) = decoded scale data at (x,y)
[0448] rotation (x,y) = decoded rotation data at (x,y)
[0449] SH DC coeff (x,y) = decoded color (SH DC coeff) data at (x,y)
[0450] SH AC coeff (x,y) = decoded color (SH AC coeff) data at (x,y)
[0451] Each of the above "position(x, y), ......, SH AC coefficient(x, y)" refers to a parameter included in the same Gaussian splat for any image domain (x, y). That is, since each GS parameter is packed in the image domain (x, y) by the same rule (example in Equation 3, x+frame_width*y), it can be assumed that no additional work is required to achieve special synchronization.
[0452] decoded_GS(i) = {position (x + frame_width * y), opacity (x + frame_width * y), scale (x + frame_width * y), rotation (x + frame_width * y), SH DC (x + frame_width * y) SH AC (x + frame_width *y)}
[0453] If necessary, additional parameters, data, etc., may be required for each parameter to synchronize.
[0454] The rotation conversion part may be a process of converting the representation method of the rotation parameter. Rotation data that has undergone unpacking and inverse GSTF processes can undergo the rotation conversion process. If the rotation data is expressed in Euler angle form, it can be converted and expressed in quaternion form.
[0455] The input can be rotation data of a GS expressed in Euler angles and can be composed of three dimensions. The output can be rotation data of a GS expressed in quaternions and can be composed of four dimensions. The process of converting rotation data from Euler angle format to quaternion format can be as shown in the following Equation 4.
[0456]
[0457]
[0458]
[0459]
[0460] can mean a rotation parameter expressed in quaternion form, and can mean a rotation parameter expressed in Euler angle form.
[0461] The Composer part may be a process of merging the decoded 4DGS parameters.
[0462] According to the embodiment, the process may involve aligning components corresponding to the decoded Gaussian splatting. According to the embodiment, the components constituting each 3D Gaussian may be aligned in the order of their elements, and the 3D Gaussian elements packed at co-located pixel locations within the frame may be aligned to correspond to each other by considering packing information.
[0463] The Inverse GSTF part may be an inverse mapping process for the restored GS parameters. The Inverse GSTF may be a process of converting the restored integer data into a real number for rendering. As an example, a linear function as shown in Equation 5 may be used for the Inverse GSTF.
[0464]
[0465] Here, can mean the restored GS parameter of integer type and the restored GS parameter of floating-point type, respectively. Also, are each The maximum value of, The minimum value of, expressed as an integer It can refer to the number of bits and can be included in the bitstream and received from the encoder. Additionally, It can be clipped using the value.
[0466] Referring to FIG. 30, a schematic configuration for reconstructing a 4D Gaussian splat from decoded Gaussian splat parameters according to embodiments is described. As shown in FIG. 30, the decoded Gaussian distribution center position, decoded opacity, decoded Gaussian scale, decoded rotation parameter, decoded SH DC coefficients, and decoded SH AC coefficients obtained through the demultiplexing and video decoding processes can be input to a Composer section after being converted into real parameter values through corresponding inverse GSTF and rotation transformation modules. The composer can construct a frame-specific Gaussian set by aligning and merging each of the above parameters belonging to the same Gaussian splat in 3D Gaussian units, and generate a finally reconstructed 4DGS (Reconstructed 4DGS) by sequentially combining these Gaussian sets for multiple frames on the time axis. In this way, according to the structure of FIG. 30, various GS parameters transmitted and stored in bitstream units can be efficiently restored into a 4D Gaussian splat representation that can be directly used in the renderer.
[0467] FIG. 31 shows a graphic engine according to embodiments.
[0468] FIG. 31 may be an example of a graphic engine for rendering 4DGS.
[0469] The image plane tiling process can generate an image plane through camera parameters for rendering. At this time, the image plane can be divided into one or more regions for parallel processing of a graphic device capable of using multiple processing units. In this case, one region can be called a tile.
[0470] In the Gaussian splat culling process, all GS (Gaussian splats) among multiple GS (Gaussian splats) can be removed, leaving only those GS that will be included in one divided tile.
[0471] The process of projecting 3D Gaussian splats to 2D can project the GS contained in the tiles onto a 2D image plane.
[0472] The process of calculating the 2D covariance matrix involves first calculating the 3D covariance matrix using the restored scale and rotation parameters, and then approximating the 3D variance matrix to the 2D covariance matrix using camera parameters.
[0473] Convert SH to Color can convert the restored SH DC and AC values into color (RGB or YUV). In this case, to convert SH to color, the center position value of the GS used in 2D and camera parameters can be used.
[0474] In the alpha blending process, pixel values of images to be rendered can be calculated using 2D projected GSs, a 2D covariance matrix, and transformed colors. At this time, multiple GSs may exist at a specific location among the 2D projected GSs. In such cases, before calculating pixel values, the distance can be calculated using camera parameters and the center position of the GSs, and the GSs can be sorted based on this distance. Subsequently, the color information of the pixel value can be calculated by multiplying the color and transparency in order of shortest distance from the camera and accumulating the results. Since the total sum of transparency cannot exceed 1, accumulation can only be performed continuously when the accumulated sum of transparency is less than 1.
[0475] The rendered image can finally be scaled through a graphics device and provided to the user through a display device.
[0476] Referring to FIG. 31, a flowchart of a 4D Gaussian Splat (4DGS) rendering procedure using a Gaussian splat according to embodiments is described. First, in the image plane tiling step, a rendering target image plane is created using camera parameters, and the image plane is divided into one or more tile units for parallel processing. Next, in the Gaussian splats culling step, GS that do not contribute to each tile are removed from the entire set of GS, leaving only the GS that need to be projected onto the corresponding tile. Subsequently, in the step of projecting 3D Gaussian splats to 2D, the selected 3D GS are mapped to a position on the 2D image plane using a camera projection transformation. Next, in the Calculate 2D covariance matrix step, a 3D covariance matrix is calculated based on the restored scale and rotation parameters, and this is approximated to a 2D covariance matrix using camera parameters. Then, in the Convert SH to Color step, the color value (e.g., RGB or YUV) corresponding to each GS is determined using the restored SH DC and SH AC coefficients. Finally, in the Alpha Blending step, a final rendered image can be generated by performing cumulative transparency and color synthesis for each pixel within the tile using the 2D covariance matrix and color values of the GS projected into 2D.
[0477] FIG. 32 illustrates a encoding method according to embodiments.
[0478] The method according to the embodiments may include the step of encoding Gaussian splat data (S3200); and / or the step of generating a bitstream containing Gaussian splat data (S3210); etc.
[0479] The step of encoding Gaussian splat data (S3200) refers to the descriptions of the encoder in FIG. 1, the encoder in FIG. 5, the encoder in FIG. 21, the transmission device in FIG. 24, the Gaussian splat-based encoding in FIG. 26, the encoder in FIG. 27, the encoder in FIG. 28, etc.
[0480] The step of generating a bitstream (S3210) describes the bitstream generated by the encoding step and can generate a bitstream structure such as that of FIG. 4.
[0481] The step of encoding Gaussian splat data (S3200) may include: a step of encoding atlas data for Gaussian splat data; a step of encoding position information of Gaussian splat data; a step of encoding opacity of Gaussian splat data; a step of encoding scale information of Gaussian splat data; a step of decoding rotation information of Gaussian splat data; and a step of encoding SH coefficients of Gaussian splat data.
[0482] The method of FIG. 32 is performed by a device, the device includes a memory; and at least one processor connected to the memory; and the at least one processor may be configured to: encode Gaussian splat data; and generate a bitstream containing Gaussian splat data.
[0483] FIG. 33 illustrates a decoding method according to embodiments.
[0484] The method according to the embodiments may include the step of receiving a bitstream containing Gaussian splat data (S3300); and / or the step of decoding the Gaussian splat data (S3310); etc.
[0485] In the step of receiving the bitstream (S3300), a bitstream such as that shown in Fig. 4 can be obtained.
[0486] The step of decoding Gaussian splat data (S3310) refers to the descriptions of the decoder in FIG. 1, the decoder in FIG. 22, the decoder in FIG. 23, the receiving device in FIG. 25, the Gaussian splat-based decoding in FIG. 26, the decoder in FIG. 29, the decoder in FIG. 30, etc.
[0487] Methods Fig. 32 and Fig. 33 can correspond to each other as inverse processes.
[0488] Referring to FIG. 26, regarding the 3DGS configuration, the Gaussian splat includes geometry data and attribute data, the geometry data includes position information, rotation information, and scale information regarding the vertices, and the attribute data may include opacity, DC coefficients, and SH coefficients (Spherical Harmonics coefficients) regarding the vertices.
[0489] Referring together with FIG. 29, with respect to the V-GSC decoder, the step of decoding Gaussian splat data may include: a step of decoding atlas data in a bitstream; a step of decoding position information of the Gaussian splat data in the bitstream; a step of decoding the opacity of the Gaussian splat data; a step of decoding scale information of the Gaussian splat data; a step of decoding rotation information of the Gaussian splat data; and a step of decoding SH coefficients of the Gaussian splat data.
[0490] With reference to FIG. 29, Gaussian splat data can be decoded based on a video codec.
[0491] Referring together with FIG. 30, regarding post-decoding: 2D unpacking, the decoded Gaussian splat data is unpacked from the frame, and the decoded Gaussian splat data can be restored based on at least one of decoded position information, decoded opacity, decoded scale information, decoded rotation information, or decoded SH coefficients.
[0492] Referring to FIG. 30, regarding post-decoding: rotation conversion, the decoded rotation information can be converted based on rotation parameters.
[0493] Referring together with FIG. 31, regarding the graphic engine (rendering), the method of FIG. 30 further comprises the step of rendering decoded Gaussian splat data; and the rendering step may include: the step of generating an image plane based on parameters for rendering from the decoded Gaussian splat data; the step of deriving Gaussian splat data contained in one tile; the step of projecting Gaussian splat data regarding the tile onto the image plane; and the step of deriving a color based on the SH coefficients of the Gaussian splat data.
[0494] The method of FIG. 30 can be performed by a device. A device according to embodiments includes a memory; and at least one processor connected to the memory; and the at least one processor may be configured to: receive a bitstream containing Gaussian splat data; and decode the Gaussian splat data.
[0495] The method of FIG. 32-33 according to the embodiments provides the following technical effects.
[0496] We propose a method to effectively compress 3DGS data, which is currently the subject of active research, using the existing V-PCC (V3C) method. Through the proposed video-based GS compression method (V-GSC), the 3DGS technology currently under active research can be efficiently compressed using V3C, an existing video codec-based stereoscopic data compression technology. Furthermore, by utilizing the existing video codec infrastructure, the proposed method is expected to significantly reduce the cost of introducing new GS technology and the time required for technology development, thereby enabling the rapid commercialization of GS services.
[0497] Operations according to the embodiments described above may be described in combination with a point cloud data transmission / reception device / method according to the embodiments described below. Operations according to the embodiments described in this document may be performed by a transmission / reception device including a memory and / or a processor according to the embodiments. The memory may store programs for processing / controlling operations according to the embodiments, and the processor may control various operations described in this document. The processor may be referred to as a controller, etc. Operations in the embodiments may be performed by firmware, software, and / or a combination thereof, and the firmware, software, and / or a combination thereof may be stored in the processor or in memory.
[0498] The embodiments have been described in terms of methods and / or devices, and the description of the methods and the description of the devices may be applied complementarily.
[0499] Although the drawings have been described separately for the convenience of explanation, it is also possible to design a new embodiment by combining the embodiments described in each drawing. Furthermore, designing a computer-readable recording medium containing a program for executing the previously described embodiments, as required by a person skilled in the art, falls within the scope of the claims of the embodiments. The apparatus and method according to the embodiments are not limited to the configuration and method of the embodiments described above; rather, the embodiments may be configured by selectively combining all or part of each embodiment to allow for various modifications. Although preferred embodiments have been illustrated and described, the embodiments are not limited to the specific embodiments described above. It is not only possible for a person skilled in the art to make various modifications without departing from the essence of the embodiments claimed in the claims, but such modifications should not be understood individually from the technical concept or perspective of the embodiments.
[0500] Various components of the device of the embodiments may be implemented by hardware, software, firmware, or a combination thereof. Various components of the embodiments may be implemented as a single chip, for example, a single hardware circuit. Depending on the embodiments, the components according to the embodiments may each be implemented as separate chips. Depending on the embodiments, at least one of the components of the device according to the embodiments may be composed of one or more processors capable of executing one or more programs, and one or more programs may include instructions for performing or executing any one or more of the operations / methods according to the embodiments. Executable instructions for performing the methods / operations of the device according to the embodiments may be stored in non-transient CRMs or other computer program products configured to be executed by one or more processors, or may be stored in transient CRMs or other computer program products configured to be executed by one or more processors. Additionally, memory according to the embodiments may be used as a concept that includes not only volatile memory (e.g., RAM, etc.) but also non-volatile memory, flash memory, PROM, etc. In addition, it may also include implementation in the form of carrier waves, such as transmission over the Internet. Furthermore, processor-readable recording media are distributed across networked computer systems, allowing processor-readable code to be stored and executed in a distributed manner.
[0501] In this document, “ / ” and “,” are interpreted as “and / or.” For example, “A / B” is interpreted as “A and / or B,” and “A, B” is interpreted as “A and / or B.” Additionally, “A / B / C” means “at least one of A, B and / or C.” Also, “A, B, C” means “at least one of A, B and / or C.” Additionally, in this document, “or” is interpreted as “and / or.” For example, “A or B” may mean 1) “A” alone, 2) “B” alone, or 3) “A and B.” In other words, “or” in this document may mean “additionally or alternatively.”
[0502] Terms such as "first," "second," etc., may be used to describe various components of the embodiments. However, the interpretation of the various components according to the embodiments should not be limited by these terms. These terms are merely used to distinguish one component from another. For example, the first user input signal may be referred to as the second user input signal. Similarly, the second user input signal may be referred to as the first user input signal. The use of these terms should be interpreted as not departing from the scope of the various embodiments. Although the first user input signal and the second user input signal are both user input signals, they do not imply the same user input signals unless clearly indicated in the context.
[0503] The terms used to describe the embodiments are intended for the purpose of describing specific embodiments and are not intended to limit the embodiments. As used in the description of the embodiments and in the claims, the singular is intended to include the plural unless explicitly indicated in the context. Expressions of and / or are used to mean including all possible combinations between the terms. Expressions of include describe the presence of features, numbers, steps, elements, and / or components and do not imply the exclusion of additional features, numbers, steps, elements, and / or components. Conditional expressions such as "if" or "when" used to describe the embodiments are not limited to being optional. It is intended to be interpreted as "when a specific condition is satisfied," "when a related action is performed in response to a specific condition," or "when a related definition is interpreted."
[0504] Additionally, operations according to the embodiments described herein may be performed by a transmitting and receiving device including memory and / or a processor, depending on the embodiments. The memory may store programs for processing / controlling operations according to the embodiments, and the processor may control various operations described in this document. The processor may be referred to as a controller, etc. Operations in the embodiments may be performed by firmware, software, and / or a combination thereof, and the firmware, software, and / or a combination thereof may be stored in the processor or in memory.
[0505] As described above, the relevant details have been explained in the best mode for carrying out the embodiments.
[0506] As described above, the embodiments may be applied wholly or partially to point cloud data transmission and reception devices and systems.
[0507] Those skilled in the art may make various changes or modifications to the embodiments within the scope of the embodiments.
[0508] The embodiments may include modifications / variations, and such modifications / variations do not exceed the scope of the claims and their equivalents.
Claims
1. Receiving a bitstream containing Gaussian splat data; and The step of decoding the above Gaussian splat data; comprising, Decryption method.
2. In Paragraph 1, The above Gaussian splat includes geometry data and attribute data, and The above geometry data includes position information, rotation information, and scale information regarding vertices. The above attribute data includes opacity, DC coefficients, and SH coefficients (Spherical Harmonics coefficients) with respect to the vertex, method.
3. In claim 1, the step of decoding the Gaussian splat data is: A step of decoding atlas data within the bitstream above; A step of decoding position information of the Gaussian splat data within the bitstream; A step of decoding the opacity of the above Gaussian splat data; A step of decoding the scale information of the above Gaussian splat data; A step of decoding rotation information of the above Gaussian splat data; and The step of decoding the SH coefficients of the above Gaussian splat data; comprising method.
4. In Paragraph 3, The above Gaussian splat data is decoded based on a video codec, method.
5. In Paragraph 3, The above decoded Gaussian splat data is unpacked from the frame, and The above-decoded Gaussian splat data is restored based on at least one of decoded position information, decoded opacity, decoded scale information, decoded rotation information, or decoded SH coefficients, method.
6. In Paragraph 3, The above decoded rotation information is converted based on rotation parameters, method.
7. In paragraph 1, the above method is: The step of rendering the decoded Gaussian splat data; further comprising, The above rendering step is: A step of generating an image plane based on parameters for rendering from the above-decoded Gaussian splat data; A step of deriving Gaussian splat data contained in a single tile; A step of projecting Gaussian splat data regarding the above tile onto an image plane; and A step of inducing a color based on the SH coefficients of the Gaussian splat data; comprising method.
8. Memory; and At least one processor connected to the memory; comprising, wherein the at least one processor: Receive a bitstream containing Gaussian splat data; and Configured to decode the above Gaussian splat data, device.
9. Step of encoding Gaussian splat data; and A step of generating a bitstream including the above Gaussian splat data; comprising, method.
10. In Paragraph 9, The above Gaussian splat includes geometry data and attribute data, and The above geometry data includes position information, rotation information, and scale information regarding vertices. The above attribute data includes opacity, DC coefficients, and SH coefficients (Spherical Harmonics coefficients) with respect to the vertex, method.
11. In claim 9, the step of encoding the Gaussian splat data is: A step of encoding atlas data for the above Gaussian splat data; A step of encoding the position information of the above Gaussian splat data; A step of encoding the opacity of the above Gaussian splat data; A step of encoding scale information of the above Gaussian splat data; A step of decoding rotation information of the above Gaussian splat data; and The step of encoding the SH coefficients of the above Gaussian splat data; comprising method.
12. In Paragraph 11, The above Gaussian splat data is encoded based on a video codec, method.
13. Memory; and At least one processor connected to the memory; comprising, wherein the at least one processor: Encoding Gaussian splat data; and Configured to generate a bitstream including the above Gaussian splat data, device.
14. A computer-readable storage medium for storing a bitstream generated by the method according to paragraph 9.
15. Step of acquiring a bitstream for Gaussian splat data, The bitstream is generated based on the steps of: encoding atlas data regarding the Gaussian splat data; encoding position information of the Gaussian splat data; encoding opacity of the Gaussian splat data; encoding scale information of the Gaussian splat data; encoding rotation information of the Gaussian splat data; and encoding SH coefficients of the Gaussian splat data; and A method comprising the step of transmitting data including the bitstream above.