Point cloud data transmission method, point cloud data transmission device, point cloud data reception method, and point cloud data reception device
The method addresses the challenge of transmitting 3D multimedia content by encoding and decoding point cloud data adaptively, ensuring efficient and consistent quality through spatiotemporal signaling, overcoming limitations in existing streaming technologies.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- FOUND FOR RES & BUSINESS SEOUL NAT UNIV OF SCI & TECH
- Filing Date
- 2026-01-02
- Publication Date
- 2026-06-18
AI Technical Summary
Current technologies lack the ability to efficiently transmit 3D multimedia content produced with point clouds via streaming, particularly in adapting to user location and environment changes.
A method for transmitting point cloud data involves encoding, encapsulating, and decoding point cloud data using spatiotemporal information to adaptively transmit 3D multimedia content based on user location and environment, utilizing V-PCC (Video-based Point Cloud Compression) with adaptive encoding and signaling.
Enables efficient and adaptive transmission of 3D multimedia content, ensuring consistent quality and bandwidth utilization by grouping objects with the same Node Depth and Threshold Level, and incorporating spatiotemporal information for real-time adjustments.
Smart Images

Figure OP251003WOZZ-APPB-IMG-000001 
Figure OP251003WOZZ-APPB-IMG-000002 
Figure OP251003WOZZ-APPB-IMG-000003
Abstract
Description
Point cloud data transmission method, point cloud data transmission device, point cloud data reception method, and point cloud data reception device
[0001] The embodiments relate to a technology for transmitting 3D multimedia content produced based on V-PCC (Video-based Point Cloud Compression) in a streaming manner.
[0002] With the introduction of digital twin and metaverse concepts, related technologies are growing and establishing themselves as the center of high-tech industries. Accordingly, media services are also expected to evolve into a format that provides 6DoF 360VR video or 3D multimedia content via streaming.
[0003] Meanwhile, one of the methods widely used when describing digital worlds, such as the metaverse, is to create 3D multimedia content based on V-PCC (Video-based Point Cloud Compression). The reason for using V-PCC is that it is efficient to describe objects using V-PCC because the relationship between the user and other objects in the digital world is not a fixed value but changes from moment to moment.
[0004] As explained earlier, although 3D multimedia content is expected to evolve into a streaming format, there is currently no technology to transmit 3D multimedia content produced with point clouds via streaming.
[0005] The embodiments were conceived against this technical background and aim to enable the transmission of 3D multimedia content in a streaming manner by designing signaling that includes spatiotemporal information to adaptively transmit 3D multimedia content according to the user's location, environment, etc.
[0006] A method for transmitting point cloud data according to embodiments may include the steps of: encoding point cloud data; encapsulating point cloud data into a file; and transmitting the file. A method for receiving point cloud data according to embodiments may include the steps of: receiving a file containing point cloud data; decapsulating the file; and decoding the point cloud data.
[0007] The method and apparatus according to the embodiments can efficiently encode and transmit point cloud data.
[0008] The method and apparatus according to the embodiments can spatially adaptively encode and transmit point cloud data.
[0009] Drawings are included to further understand the embodiments, and the drawings illustrate the embodiments along with descriptions related to the embodiments. For a better understanding of the various embodiments described below, one must refer to the description of the embodiments below in relation to the following drawings, which include parts corresponding to similar reference numerals throughout the drawings.
[0010] FIG. 1 shows an object within the user's field of view according to embodiments.
[0011] FIG. 2 shows a group of objects according to embodiments.
[0012] FIG. 3 shows a group of objects within the user's field of view according to embodiments.
[0013] FIG. 4 shows a structure for transmitting V3C data according to embodiments.
[0014] FIG. 5 shows the multi-track encapsulation of V-PCC data according to ISOBMFF (ISO Base Media File Format) according to the embodiments.
[0015] FIG. 6 shows DASH signaling according to embodiments.
[0016] FIG. 7 shows group information within the MPD according to the embodiments.
[0017] FIG. 8 shows group information within the MPD according to the embodiments.
[0018] FIG. 9 shows group information within an ISOBMFF-based file according to embodiments.
[0019] FIG. 10 shows group information within an ISOBMFF-based file according to embodiments.
[0020] FIG. 11 shows objects and events according to embodiments.
[0021] FIG. 12 shows an event-based group according to embodiments.
[0022] FIG. 13 illustrates the playback of event-based connected data according to embodiments.
[0023] FIG. 14 illustrates the reconstruction of grouping-based data according to embodiments.
[0024] FIG. 15 illustrates a method for transmitting point cloud data according to embodiments.
[0025] FIG. 16 illustrates a method for receiving point cloud data according to embodiments.
[0026] Preferred embodiments of the embodiments are described in detail, and examples thereof are shown in the accompanying drawings. The following detailed description, with reference to the accompanying drawings, is intended to describe preferred embodiments of the embodiments rather than merely embodiments that may be implemented according to the embodiments. The following detailed description includes details to provide a thorough understanding of the embodiments. However, it is obvious to those skilled in the art that the embodiments may be practiced without these details.
[0027] Most terms used in the embodiments are selected from those commonly used in the field, but some terms are chosen at the applicant's discretion, and their meanings are described in detail in the following description as necessary. Accordingly, the embodiments should be understood based on the intended meaning of the terms, rather than their mere names or meanings.
[0028] FIG. 1 shows an object within the user's field of view according to embodiments.
[0029] As exemplified in Figure 1, the proportion θ and φ occupied by an object within the user's field of view is compared with a set of reference values to determine the TL value. The proportion occupied by an object within the field of view is calculated as the difference between the largest θ and φ and the smallest θ and φ among the eight polar coordinate representations (r, θ, φ) of the eight points constituting the object.
[0030] The TL value is determined by comparing the θ and φ obtained through this process with the TL reference value set of θ and φ. For example, if the object's θ is greater than θ1, the TL is 1. If θ is less than θ4, the TL is 5. Through this method, the values of the object's θ and φ are checked, and the TL values for each of the two axes Designate it as such. Subsequently, when grouping or splitting, split / group separately for each direction.
[0031] Meanwhile, in one embodiment, based on the aforementioned ND and TL, objects having the same ND and TL can be grouped into the same group and reconstructed into a single file.
[0032] For example, objects with a TL value greater than or equal to a threshold value (e.g., 4 or higher) can be grouped into the same group. The objects in this group are those with a TL of 4 or 5 within the user's field of view (FoV).
[0033] FIG. 2 shows a group of objects according to embodiments.
[0034] Objects grouped into a single group are reorganized into a single file by creating an additional node called a group root node above the root node of the existing node structure, as shown in Fig. 2, and are decoded with the same Node Depth. In this process, if the ND of individual objects differs, there may be a significant gap in the quality of the objects within the user screen.
[0035] Therefore, when grouping using TL, ND must also be considered. Consequently, objects within a single group have the same ND and TL. This means that objects within the group appear to the user with the same quality at the same Node Depth.
[0036] In addition, groups must be presented in various combinations for adaptive transmission based on the user's viewing direction. This is because fixed combinations of groups are a factor that wastes user bandwidth. As shown in Fig. 3, when combining groups, the number of objects grouped is adjusted according to the user's viewing direction so that objects with the same ND TL are grouped into the same group.
[0037] This will be explained in more detail with reference to Fig. 3.
[0038] FIG. 3 shows a group of objects within the user's field of view according to embodiments.
[0039] As previously explained, the server generates various combinations of object groups to provide object groups suitable for the user's head direction. Object groups are combined within the user's field of view, and various types of object groups can be generated depending on the user's head direction. Figure 3 illustrates object groups that are combined in various ways within the same TL and ND depending on the user's head direction. The user's head direction can be expressed as θ and φ, which are the latitude and longitude values in the polar coordinate representation of the head direction. Additionally, the user's field of view (FoV) is expressed as (q,j) by the service provider terminal. Although it is necessary to determine whether an object group is included within the user's field of view for both the latitude and longitude axes in order to generate object groups, for the sake of ease of understanding as shown in the example Figure 3, an example based on the longitude axis is explained.
[0040] When the user's head direction is represented by θ and the field of view is defined by q, the user's field of view is represented as (θ-q / 2, θ+q / 2). At this time, when θ is 0, an object group is created by combining objects included within the field of view, considering TL and ND, and a new object group is created whenever the composition of objects within the field of view changes while increasing θ.
[0041] Based on the above explanation, the following three elements are largely required to describe objects reconfigured to fit the user's location in MPD.
[0042] 1. Adaptation Set
[0043] 2. Representation (Resolution)
[0044] 3. SRD(Spartial Relationship Description)
[0045] 1. Adaptation Set
[0046] An Adaptation Set is generally a content element that can be individually decoded. This Adaptation Set is defined as an individual Adaptation Set of objects that have been divided or combined through the previously explained TL and ND.
[0047] 2. Representation (Playback Quality)
[0048] Representation refers to the quality of the content and is adjusted according to the user's network environment. Since V-PCC-based content has various Node Depths and providing all Node Depths as Representations can cause disparities between objects, the number of levels (m) and the depth gap (n) provided by the system are defined. Here, the number of levels refers to the number of playback quality levels of the content provided by the system, and the depth gap refers to the interval between quality levels.
[0049] m and n, in conjunction with ND, set the Node Depth of each object assigned to the Representation. ND is the lowest Depth Level of the Representation, and creates up to m levels with a difference of at least n steps up to the total Node Depth of the object.
[0050] For example, if an object with a Leaf Node Depth of 16 has an ND of 6 and m=5, the object has a maximum of m representations, which is 5. Among these, the lowest
[0051] The Representation has a Node Depth of 6. The highest Representation has a Node Depth of 16. Node depths of 9 and 12 become Representations to allow for a gap of at least n between them. Consequently, although the system can provide 5 resolutions, it only provides 4 Representations due to the relationship between the Object ND and the maximum Depth.
[0052] 3. SRD(Spartial Relationship Description)
[0053] An SRD (Spartial Relation Description) is an entry that describes the relationship between a user and an object, as well as the object itself. This entry is written as a sub-item called an Essential Property within an Adaptation Set and serves to provide additional explanations about the Adaptation Set. The SRD of this system is Write in the form.
[0054] represents the positional relationship from the user to the center of an individual object or group of objects in a polar coordinate system. is the Width, Depth, and Height of the object. is a quaternion value describing the rotation of the object. Since 3-axis-based rotation values, such as yaw, pitch, and roll, have the problem that the final direction changes depending on the order of rotation, the rotation of the object is represented using a quaternion.
[0055] Based on the corresponding SRD value, the user can check information such as the object's position, size, and rotation, and place the object in the space.
[0056] FIG. 4 shows a structure for transmitting V3C (visual volumetric video-based coding) data according to embodiments.
[0057] A real-world or synthetic visual scene (A) is captured by a camera set, such as a camera device with multiple lenses and sensors or a virtual camera. The acquired result is source volumetric data (B). One or more volumetric frames are encoded into a coded V3C bitstream containing an atlas bitstream, up to one occupancy bitstream, a geometry bitstream, and zero or more attribute bitstreams (Ev). Then, one or more coded bitstreams are packaged into a media file for local playback (F) or a sequence of media segments (Fs) and an initialization segment for streaming, according to a specific media container file format. In the embodiments, the media container file format is the ISO basic media file format specified in ISO / IEC 14496-12. The file encapsulator may also include metadata in the file or segments. The segments Fs are delivered to a player using a delivery mechanism.
[0058] The file (F) output by the file encapsulator is identical to the file (F') received as input by the file decapsulator. The file decapsulator processes the file (F') or the received segments (F's), extracts the coded bitstream (E'v), and parses the metadata. The V3C bitstream is then decoded into a decoded signal (D'). The decoded volumetric data (D') is reconstructed and rendered to be displayed on the screen of a head-mounted display or other display device according to the current viewing orientation or viewport. The current viewing orientation is determined by head tracking and eye tracking functions. In viewport-dependent transmission, the current viewing orientation is also passed to the strategy module, which determines the track to receive based on the viewing orientation.
[0059] The process described above can be applied to both live and on-demand use cases.
[0060] The interface definition in Fig. 5 is as follows:
[0061] F / F': A media file containing specifications for the track format, which may include constraints on the underlying stream included in the track sample. Timed V3C content and / or non-timed V3C data can be encapsulated in file form, respectively.
[0062] The system of FIG. 5 according to the embodiments includes a transmission-related interface for DASH transmission.
[0063] The system of FIG. 5 according to the embodiments includes a transmission-related interface for MMT transmission.
[0064] FIG. 5 shows the multi-track encapsulation of V-PCC data according to ISOBMFF (ISO Base Media File Format) according to the embodiments.
[0065] The point cloud data encoding method according to the embodiments can encapsulate the timed V-PCC into an ISOBMFF-based file.
[0066] Single Track Encapsulation:
[0067] Single-track encapsulation of V3C data can create a single track where the V3C bitstream is a V3C bitstream track.
[0068] Single-track encapsulation of V3C data is utilized in the case of direct ISOBMFF encapsulation of the V3C bitstream. The V3C bitstream is stored directly as a single track without additional processing. The V3C unit header data structure is stored in the bitstream. Single-track encapsulated V3C data can be combined with multiple-track file generation, transcoding, DASH segments, etc., for further processing.
[0069] V3C Bitstream Sample Entry:
[0070] Sample entry types: 'v3e1', 'v3eg'
[0071] Container: SampleDescriptionBox
[0072] Required: 'v3e1' or 'v3eg' sample entry is required
[0073] Quantity: One or more
[0074] V3C bitstream tracks use VolumetricVisualSampleEntry with the 'v3e1' or 'v3eg' sample entry type.
[0075] In the 'v3e1' sample entry, all atlas parameter sets and SEI messages defined in ISO / IEC 23090-5 are in the setup_unit array. In the 'v3eg' sample entry, the atlas parameter sets and SEI messages may be in the setup_unit array or in samples of the V3C bitstream track.
[0076] The V3C bitstream track sample entry includes a V3CConfigurationBox, and the following restrictions apply.
[0077] In the 'v3e1' sample entry, for an array containing an atlas parameter set, the array_completeness value is 1.
[0078] In the 'v3eg' sample entry, for an array containing an atlas parameter set, the array_completeness value is 0.
[0079] The 2D video configuration box of the V3C video component sub-bitstream defined in ISO / IEC 14496-15 is in the V3C bitstream sample entry to signal the corresponding 2D video decoder configuration and initialization information.
[0080] An optional BitRateBox defined in ISO / IEC 14496-12 may be present in the V3C bitstream sample entry to signal bit rate information of the V3C bitstream track.
[0081] Syntax:
[0082] aligned(8) class V3CBitstreamSampleEntry()
[0083] extends VolumetricVisualSampleEntry (type) {
[0084] / type is 'v3e1' or 'v3eg'
[0085] V3CConfigurationBox v3c_config;
[0086] / additional boxes
[0087] }
[0088] Semantics:
[0089] The compressorname of the base class VolumetricVisualSampleEntry represents the name of the compressor used with the recommended value " / 012V3C Coding". The first byte is the number of remaining bytes, which is represented here as / 012 (since it is octal 12) and is 10 (decimal), and is the number of bytes of the remaining string.
[0090] V3C Bitstream Track Sample Type
[0091] A V3C bitstream sample contains one or more V3C units belonging to the same presentation time, i.e., a single V3C configuration unit. The sample may be self-contained (e.g., a synchronization sample) or dependent on other samples in the V3C bitstream track in terms of decoding.
[0092] Syntax
[0093] aligned(8) class V3CBitstreamSample {
[0094] / sample_size Sample size of SampleSizeBox
[0095] for (int i=0; i < sample_size; ) {
[0096] unsigned int(v3c_config.unit_size_precision_bytes_minus1 + 1)*8) v3c_unit_size;
[0097] bit(8) ss_v3c_unit[v3c_unit_size];
[0098] i += v3c_unit_size + v3c_config.unit_size_precision_bytes_minus1 + 1;
[0099] }
[0100] }
[0101] Semantics
[0102] v3c_unit_size represents the size of the ss_v3c_unit array in bytes. The size is the same as the sample stream v3c unit size ssvu_v3c_unit_size defined in ISO / IEC 23090-5, Appendix C.
[0103] ss_v3c_unit contains a single V3C unit in the V3C unit sample stream format defined in ISO / IEC 23090-5:2021, Appendix C.
[0104] V3C Bitstream Track Synchronization Sample
[0105] The V3C bitstream synchronization sample satisfies all of the following conditions.
[0106] It can be decoded independently.
[0107] Samples following a synchronized sample (decoding order) have no decoding dependency on samples preceding the synchronized sample.
[0108] All samples following the synchronized sample (decoding order) can be successfully decoded.
[0109] V3C Bitstream Track Subsample
[0110] A V3C bitstream track subsample is a V3C unit included in a V3C bitstream track sample.
[0111] The V3C bitstream track contains one SubSampleInformationBox in the SampleTableBox or lists V3C bitstream track subsamples in the TrackFragmentBox of each MovieFragmentBox.
[0112] The 32-bit unit header of the V3C unit representing the subsample is copied to the 32-bit codec_specific_parameters field of the subsample entry in SubSampleInformationBox. The V3C unit type of each subsample is identified by parsing the codec_specific_parameters field of the subsample entry in SubSampleInformationBox.
[0113] Multiple track encapsulation
[0114] A multiple-track encapsulated V3C data container may contain three types of tracks: V3C atlas tracks, V3C atlas tile tracks, and V3C video component tracks. A multiple-track encapsulated V3C data container contains one or more V3C atlas tracks that reference zero or more V3C atlas tile tracks or zero or more V3C video component tracks. If there are V3C atlas tile tracks, they reference zero or more V3C video component tracks. The number of V3C video component tracks in a multiple-track encapsulated V3C data container depends on the V3C toolkit profile defined in ISO / IEC 23090-5 used.
[0115] ISOBMFF track references are utilized to indicate the association between a V3C video component track and a V3C atlas track or V3C atlas tile track, wherein the V3C atlas track or V3C atlas tile track includes a track reference to the V3C video component track.
[0116] Tracks belonging to the same CVS are time-aligned. Samples contributing to the same volumetric frame in different V3C video component tracks, V3C atlas tracks, and V3C atlas tile tracks have the same construction time. The atlas parameter set used for these samples has a decoding time that is equal to or earlier than the construction time of the volumetric frame. Additionally, all tracks belonging to the same CVS have the same implicit or explicit edit list.
[0117] Referring to Fig. 6, which illustrates a multi-track encapsulated V3C data container, the V3C unit payloads of the V3C bitstream are mapped to individual tracks within the multi-track container file according to their type.
[0118] Multi-track encapsulated V3C data containers include the following:
[0119] One or more V3C atlas tracks that may include track references: other tracks carrying the payload of video compression V3C units (i.e., V3C unit types identical to V3C_OVD, V3C_GVD, V3C_AVD, or V3C_PVD specified in ISO / IEC 23090-5);
[0120] V3C Atlas Tile Track: If there are multiple atlases in the bitstream, another V3C Atlas Track;
[0121] Zero or more V3C video component tracks containing access units of video-coded base streams for occupied data in the sample (i.e., V3C unit payloads of the same type as the V3C_OVD specified in ISO / IEC 23090-5);
[0122] Zero or more V3C video component tracks containing access units of video-coded base streams for geometry data in the sample (i.e., payloads of V3C units of the same type as V3C_GVD specified in ISO / IEC 23090-5);
[0123] Zero or more V3C video component tracks containing access units of video-coded base streams for attribute data in the sample (i.e., payloads of V3C units of the same type as the V3C_AVD specified in ISO / IEC 23090-5);
[0124] Zero or more V3C video component tracks containing access units of video-coded base streams for packed data in the sample (i.e., payloads of V3C units of the same type as V3C_PVD specified in ISO / IEC 23090-5);
[0125] Zero or more V3C atlas tile tracks containing only ACL NAL units for a subset of atlas tiles in the sample. V3C atlas tile tracks may include track references to other tracks carrying payloads of video compression V3C units for a specified subset of atlas tiles (i.e., V3C unit types such as V3C_OVD, V3C_GVD, V3C_AVD, and V3C_PVD); etc.
[0126] V3C Atlas Sample Entry
[0127] Sample entry type: 'v3c1', 'v3cg', 'v3cb', 'v3a1' or 'v3ag'
[0128] Container: SampleDescriptionBox
[0129] Required: 'v3c1', 'v3cg', 'v3cb', 'v3a1', or 'v3ag'. A sample item (sample entry) is required.
[0130] Quantity: One or more
[0131] The V3C atlas track uses V3CAtlasSampleEntry, which extends VolumetricVisualSampleEntry using the 'v3c1', 'v3cg', 'v3cb', 'v3a1', or 'v3ag' sample entry types. The restrictions for the V3C atlas track are as follows.
[0132] V3C atlas tracks must not carry ACL NAL units belonging to two or more atlases.
[0133] The V3C Atlas Track sample items include V3CConfigurationBox and V3CUnitHeaderBox.
[0134] Depending on the V3C bitstream or sample entry type of the atlas track, the following restrictions apply to the V3C atlas track.
[0135] If the V3C bitstream contains a single atlas, use the V3C atlas track with sample item 'v3c1' or 'v3cg'.
[0136] If a V3C bitstream contains multiple atlases, each atlas bitstream is stored as a separate V3C atlas track with a sample entry type of 'v3a1' or 'v3ag'. There must be one additional track with a sample entry type of 'v3cb', which serves as an entry point track referencing another atlas track with a sample entry type of 'v3a1' or 'v3ag'.
[0137] In the 'v3a1' and 'v3ag' sample entries, num_of_v3c_parameter_sets is equal to 0. The V3C parameter sets are stored in the sample entry of the atlas track containing 'v3cb'.
[0138] V3C atlas tracks with a sample item type of 'v3cb' do not include ACL NAL units.
[0139] For arrays containing sets of atlas parameters in the 'v3c1' and 'v3a1' sample entries, the array_completeness value is 1.
[0140] In the 'v3cg' and 'v3ag' sample entries, for arrays containing the set of atlas parameters, the value of array_completeness is 0.
[0141] The parameter set and SEI message in the atlas track with the 'v3cb' sample entry apply to all referenced V3C atlas tracks.
[0142] For tracks with sample entry type 'v3c1', 'v3cg', or 'v3cb', the track_in_movie flag in the track header is set to 1.
[0143] For tracks where the sample entry type is 'v3a1' or 'v3ag', the track_in_movie flag in the track header is set to 0.
[0144] The optional BitRateBox can signal bit rate information of V3C atlas tracks in the V3C atlas sample item.
[0145] Syntax
[0146] aligned(8) class V3CAtlasSampleEntry()
[0147] extends VolumetricVisualSampleEntry (type) {
[0148] / type is 'v3c1', 'v3cg', 'v3cb', 'v3a1' or 'v3ag'
[0149] V3CConfigurationBox config;
[0150] V3CUnitHeaderBox unit_header;
[0151] }
[0152] Semantics
[0153] The compressorname of the base class VolumetricVisualSampleEntry represents the name of the compressor used with the recommended value " / 012V3C Coding".
[0154] V3C Atlas Tile Sample Entry
[0155] Sample entry type: 'v3t1'
[0156] Container: SampleDescriptionBox
[0157] Required: Yes
[0158] Quantity: One or more
[0159] The V3C Atlas Tile Track uses V3CAtlasTileSampleEntry, which extends VolumetricVisualSampleEntry with the 'v3t1' sample item type.
[0160] The V3C Atlas Tile Track sample contains only ACL NAL units belonging to the same atlas. The V3C Atlas Tile Track contains the ACL NAL unit of at least one tile indicated by the tile_id of the V3CAtlasTileConfigurationBox.
[0161] V3CAtlasTileSampleEntry does not include V3CConfigurationBox or V3CUnitHeaderBox. The information provided by these boxes can be found in the V3C Atlas Track Sample Entry, which references the V3C Atlas Tile Track. Other optional boxes may be included.
[0162] Syntax
[0163] class V3CAtlasTileConfigurationBox
[0164] extends FullBox('v3tC', version = 0, 0) {
[0165] unsigned int(3) unit_size_precision_bytes_minus1;
[0166] unsigned int(1) spatial_scalability_enabled_flag;
[0167] bit(4) reserved = 0;
[0168] if (spatial_scalability_enabled_flag) {
[0169] unsigned int(8) lod_index;
[0170] }
[0171] unsigned int(16) num_tiles;
[0172] for(int i=0; i < num_tiles; i++){
[0173] unsigned int(16) tile_id;
[0174] }
[0175] }
[0176] aligned(8) class V3CAtlasTileSampleEntry()
[0177] extends VolumetricVisualSampleEntry ('v3t1') {
[0178] V3CAtlasTileConfigurationBox tile_info;
[0179] }
[0180] Semantics
[0181] unit_size_precision_bytes_minus1 plus 1 indicates the precision in bytes of the sample stream NAL units to which the sample item containing this configuration box applies. The value of this field is equal to ssnh_unit_size_precision_bytes_minus1 in the sample_stream_nal_header() of the atlas component bitstream.
[0182] spatial_scalability_enabled_flag is a flag indicating whether LoD-based scalability is supported in delivered V3C content.
[0183] lod_index represents the LoD index value associated with a tile passed from the atlas tile track. An atlas tile track with a specific LoD index (if any) is selected along with all atlas tile tracks containing that tile that have a lower lod_index value. The set of LoD tiles associated with the lower lod_index value is processed first.
[0184] num_tiles is the number of tiles included in the track.
[0185] tile_id is the tile ID of a tile on the track. The value of tile_id is equal to the value of the afti_tile_id syntax element of the atlas frame tile information defined in ISO / IEC 23090-5.
[0186] V3C Atlas Sample Format
[0187] Each sample of the V3C Atlas Track or V3C Atlas Tile Track corresponds to a single coded atlas access unit and has the following additional description.
[0188] When using 'v3cb' sample entries, each sample in the V3C atlas track corresponds to one or more non-ACL NAL units.
[0189] When using 'v3c1', 'v3cg', 'v3a1', or 'v3ag' sample entries, each sample in the V3C atlas track corresponds to a coded atlas access unit associated with the same vuh_atlas_id displayed in the sample entry's V3C unit header box.
[0190] Syntax
[0191] aligned(8) class V3CAtlasSample {
[0192] / The sample_size value is the sample size of SampleSizeBox
[0193] for (int i=0; i < sample_size; ) {
[0194] unsigned int(v3c_config.unit_size_precision_bytes_minus1 + 1)*8) nal_size;
[0195] bit(8) ss_nal_unit[nal_size];
[0196] i += nal_size + v3c_config.unit_size_precision_bytes_minus1 + 1;
[0197] }
[0198] }
[0199] Semantics
[0200] nal_size represents the size of the ss_nal_unit array in bytes.
[0201] ss_nal_unit is a data array containing a single NAL unit defined in ISO / IEC 23090-5.
[0202] V3C Atlas Track and V3C Atlas Tile Track Synchronization Sample
[0203] A synchronization sample of a V3C atlas track or V3C atlas tile track is a sample containing an atlas access unit coded as an Intra Random Access Point (IRAP) as defined in ISO / IEC 23090-5.
[0204] V3C Video Component Track
[0205] V3C video component tracks transmit 2D video-encoded data of V3C video components. Storage of V3C video component tracks utilizes existing functions of ISO-based media file formats and derived specifications. For example, ISO / IEC 14496-15 defines a mechanism for transmitting V3C video components coded in ISO / IEC 14496-10 and ISO / IEC 23008-2.
[0206] Referring to FIG. 6, the file includes an atlas track which is an entry point. The file includes a geometry track, an attribute track, and an accusation track which are video component tracks. Each track includes a sample entry and one or more samples. The sample entry of each track may include configuration information and / or a set of parameters. The atlas track includes reference information that references the video component tracks. The sample entry of each track may include a unit header. The samples of each track may include units.
[0207] FIG. 6 shows DASH signaling according to embodiments.
[0208] The point cloud data transmission method according to the embodiments may further include MPEG-DASH-based encapsulation.
[0209] Single Track Mode
[0210] DASH's single-track mode enables streaming of V3C ISOBMFF files containing V3C content using single-track encapsulation. DASH's single-track mode is represented as a single adaptation set with one or more representations.
[0211] If the representation consists of two or more media segments, there is an initialization media segment. The initialization segment contains a V3CDecoderConfigurationRecord with a v3c_parameter_set syntax structure defined in (ISO / IEC 23090-5, Clause 7) and a Component Codec Mapping SEI message defined in (ISO / IEC FDIS 23090-5, Appendix E).
[0212] The first sample of the media segment has a stream access point (SAP) of type 1 or 2. That is, each sub-sample of the first sample has a stream access point (SAP) of type 1 or 2.
[0213] V3C Pre-selection
[0214] V3C preselection can be signaled in MPD using a PreSelection element within a Period element or a Preselection descriptor at the Adaptation Set level. The V3C PreSelection element is signaled with a list of IDs for the @preselectionComponents attribute as defined in ISO / IEC 23009-1, which includes the ID of the Main Adaptation Set of the volumetric media followed by the IDs of the Video Component Adaptation Sets. The @codecs attribute for the Preselection is set to 'v3c1', 'v3cg', or 'v3cb' to indicate that the media represented by the Preselection is visual volumetric video-based coding media.
[0215] Figure 7 shows a DASH configuration for grouping V3C components belonging to a single V3C content within an MPEG-DASH MPD file.
[0216] If there are multiple atlases in V3C content, each atlas track is represented as a separate adaptation set considered as an atlas adaptation set. An atlas adaptation set is defined by setting the @codecs attribute to 'v3a1' or 'v3ag'. The representation of an atlas adaptation set is defined by setting the @dependencyId attribute to the ID of the representation of the main adaptation set. Each atlas adaptation set is part of a separate preselection that includes the atlas adaptation set, which is the main adaptation set of the preselection, and the video component adaptation set of that atlas.
[0217] V3C Atlas Tile Pre-selection
[0218] If a V3C atlas tile is transmitted as a separate track, it must be represented as a separate adaptation set considered as an atlas tile adaptation set, and the @codecs attribute of the adaptation set is set to 'v3t1'. The V3C video component track associated with the atlas tile track is also transmitted as a separate adaptation set with the @codecs attribute set to 'resv.vvvc.XXXX', where XXXX corresponds to the 4-character code (4CC) of the video codec (e.g., 'avc1' or 'hvc1').
[0219] Atlas Tile Adaptation Sets and associated Video Component Adaptation Sets must be part of a single Atlas Tile Preselection in MPD, and the Atlas Tile Adaptation Set is the primary Adaptation Set of that Preselection (i.e., the ID of the Atlas Tile Adaptation Set is the first ID in the list of Adaptation Sets in the @preselectionComponents attribute of the Preselection element or the @value attribute of the Preselection descriptor). The representation of the Atlas Tile Adaptation Set of the Atlas Tile Preselection has an @dependencyId attribute set to the representation ID of that Atlas Adaptation Set.
[0220] The point cloud data transmission method according to the embodiments can generate and transmit elements and attributes of the V3CVideoComponent descriptor (V3C video component descriptor) of the MPD for MPEG-DASH signaling as follows.
[0221] Use the V3CVideoComponent descriptor to identify the type of video component adaptation set. The V3CVideoComponent descriptor is an EssentialProperty descriptor with @schemeIdUri set to "urn:mpeg:mpegI:v3c:2020:videoComponent".
[0222] At the adaptation set level, for each V3C video component in the representation of the video component adaptation set (Fig. 7 Video Component Adaptation Set), one V3CVideoComponent descriptor is transmitted as a signal.
[0223] videoComponent@type: Indicates the type of the V3C video component. The value 'geom' indicates a geometry component, 'occp' indicates an occupancy component, and 'attr' indicates an attribute component.
[0224] videoComponent@is_auxiliary: A flag indicating whether the V3C video component information represented in the Adaptation Set with the V3CVideoComponent descriptor is for auxiliary video. A value of true indicates that the video is auxiliary video and contains RAW and / or EOM patches. If equal to false, it indicates that the video may contain RAW and / or EOM patches.
[0225] videoComponent@map_index: Represents the index of one of the maps of components in the Adaptation Set with the V3CVideoComponent descriptor.
[0226] videoComponent@attribute_type: Indicates the type of attribute as defined in Table 3 of ISO / IEC 23090-5:2021.
[0227] videoComponent@attribute_index: Represents the index of the attribute.
[0228] videoComponent@atlas_id: Represents the atlas ID of the component in the Adaptation Set with the V3CVideoComponent descriptor.
[0229] videoComponent@tile_ids: Represents atlas tiles associated with the data included in the Adaptation Set by providing a space-separated list of tile ID values.
[0230] Point cloud data according to the embodiments can generate a V3C descriptor of an MPD for MPEG-DAH signaling.
[0231] The SupplementalProperty element with @schemeIdUri "urn:mpeg:mpegI:v3c:2020:v3c" is a V3C descriptor. At most one V3C descriptor may exist in the Main Adaptation Set, Atlas Adaptation Set, Atlas Tile Adaptation Set, V3C Preselection, or Atlas Tile Preselection, as shown in FIG. 7.
[0232] v3c:@vId: This is the ID of the volume media. This attribute exists when multiple versions of the same volume media are signaled in separate adaptation sets of the MPD.
[0233] v3c:@atlas_id: Represents the atlas ID for the volume media information of the track delivered in the adaptation set.
[0234] v3c:@tile_ids: If present, indicates the atlas tile ID passed from the atlas tile adaptation set.
[0235] For ISOBMFF, it includes all tile IDs listed in V3CAtlasTileSampleEntry of the V3C Atlas Tile Track.
[0236] The point cloud data transmission method according to the embodiments can generate V3C3DRegions descriptors within the MPD as follows for spatial domain signaling for partial access.
[0237] static space area
[0238] If the 3D spatial regions are static (i.e., the location and size of each region do not change during the presentation time), the characteristics of the spatial regions and the mapping between those regions and V3C tiles are signaled using the V3C3DRegions descriptor. This descriptor is a SupplementalProperty element where @schemeIdUri is "urn:mpeg:mpegI:v3c:2020:v3sr". A single V3C3DRegions descriptor exists at the Adaptation Set level, the Representation level of the Main Adaptation Set, or the Preselection level of the V3C content, as shown in D7.
[0239] The elements of the V3C3DRegions descriptor are as follows.
[0240] v3sr: A container element whose attributes and elements specify the mapping between 3D space regions and V3C tiles.
[0241] v3sr.spatialRegion: This is an element whose attribute defines a 3D spatial region and provides a mapping between the defined region and multiple V3C tiles.
[0242] v3sr.spatialRegion@id: Identifier of a 3D spatial region.
[0243] The value of this attribute matches the value of the region_id field signaled for the corresponding region of the ISOBMFF container.
[0244] v3sr.spatialRegion@type: This property indicates the type of spatial region. A value of 0 indicates a cubic region. A value of 1 indicates a region corresponding to a viewport.
[0245] v3sr.spatialRegion.cuboid: An element that specifies a cube extending from a reference point in a spatial region. This element exists only when the spatialRegion@type attribute is set to 0.
[0246] v3sr.spatialRegion.cuboid@anchor: An attribute containing three pairs of values describing the x, y, and z components of bb_position for the V3CBoundingBox signaled from the corresponding ISOBMFF container. The values in the array are arranged in that order, and the length of the array is 3.
[0247] v3sr.spatialRegion.cuboid@dimensions: An attribute containing three pairs of values describing the x, y, and z dimensions of bb_scale for V3CBoundingBox signaled from the corresponding ISOBMFF container. The values in the array are arranged in that order, and the length of the array is 3.
[0248] v3sr.spatialRegion.viewport: An element that specifies the viewport corresponding to the spatial region. This element exists only when the spatialRegion@type attribute is set to 1.
[0249] v3sr.spatialRegion.viewport@rvIds: A space-separated list of identifiers corresponding to the @viewport_id attribute values of the RV descriptors representing the viewports in this region.
[0250] v3sr.spatialRegion@tile_ids: Represents the atlas tile IDs mapped to this spatial region.
[0251] The value of the @tile_ids: attribute is a space-separated list of atlas tile IDs.
[0252] This attribute does not exist in the case of single-track encapsulation of V3C content or when there is one or more lod elements.
[0253] v3sr.spatialRegion.lod: This is a container element whose attribute provides LoD information and the V3C tile corresponding to that LoD.
[0254] v3sr.spatialRegion.lod@idx: An identifier representing the order of LoDs for an associated 3D spatial region.
[0255] The value of this attribute matches the value of the lod_index field signaled for the corresponding LoD of the ISOBMFF container.
[0256] v3sr.spatialRegion.lod@tile_ids: A space-separated list of identifiers corresponding to the values of the atlas tile IDs mapped to this LoD.
[0257] Dynamic spatial area
[0258] If the 3D partition is dynamic, a time metadata track must be used to signal the position and size of each 3D region on the presentation timeline, and it is included in a separate adaptation set with a single representation associated with the representation of the main adaptation set using the @associationId attribute defined in ISO / IEC 23009-1 and the @associationType value containing 4CC 'cdsc'.
[0259] FIG. 7 shows group information within the MPD according to the embodiments.
[0260] The point cloud data transmission method according to the embodiments can generate group information of point cloud data in groups having event dependencies. The point cloud data reception method can decode point cloud data in groups based on the group information. That is, the V-PCC system can provide group information for simultaneously playing multiple objects (e.g., objects / effects / sound / video, etc.) dependent on a specific event.
[0261] To increase compression and decompression efficiency during encoding or decoding, the V-PCC system can provide group information, which is a new structure between object tiles.
[0262] The point cloud data encoding / decoding method according to the embodiments may utilize group information, which is a new structure between object tiles, to increase compression and restoration efficiency.
[0263] For example, to increase compression / recovery efficiency during the compression / recovery process of multiple objects, tiles of different objects can be combined for compression / recovery, and a set of combined tiles can be represented.
[0264] The point cloud data transmission method according to the embodiments can generate group information within a file according to the ISOBMFF format, and / or generate group information within an MPD.
[0265] Objects in point cloud data can be associated with each other. If different objects exist, an action regarding the first object can trigger an action regarding the second object, and actions of different objects can proceed simultaneously. The embodiments may additionally generate group information regarding these objects.
[0266] For example, related objects can be represented as group information, such as a group of objects (body) and objects (clothing), or an object (switch) and an object (device operated by the switch).
[0267] In addition, various types of objects, such as sound (audio), video, and volumetric video, can be grouped and represented as group information.
[0268] MPD can include information at the group level of objects. The decoder can use MPD to receive relevant information at the group level of objects. The decoder receives a file in ISOBMFF format and can synchronize, decode, and render objects with the same Group ID.
[0269] A file according to the embodiments may include Group ID Class structure information. Additionally, the embodiments may represent the grouping of active objects based on user location or action as file information.
[0270] Group information between object tiles according to the embodiments may be a unit that is reorganized for reasons such as improving compression efficiency, regardless of the object among the multiple tiles of a plurality of objects.
[0271] In the process of compressing each object based on GPCC, a tile arrangement that can improve uniformity and compression efficiency, such as when there is a significant difference in the number of points included within individual tiles, and a set group for that purpose can be represented as group information.
[0272] Group information may include information regarding group assignment after tile placement for compression efficiency, groups and locations within the groups, original locations, etc.
[0273] Based on decoding and download priority, group information can be recorded under all Adaptation Sets within MPD (e.g., Adaptation Set for Atlas, Adaptation Set for Accupanciation, Adaptation Set for Geometry, and / or Adaptation Set for Attributes) to indicate that they are of the same group.
[0274] When creating group boxes within ISOBMFF, group information can be added to atlas-related tracks and / or boxes because atlas data has a high encoding and / or decoding priority.
[0275] After understanding the ISOBMFF configuration, group information can be displayed in the initial header box.
[0276] Referring to FIG. 7, when the file according to the embodiments is a single track, group information within the MPD can be generated as follows.
[0277] <mpd>
[0278] <period>
[0279] <adaptationset mimetype="video / mp4" codecs="v3e1.L2.0.0.1, resv.vvvc.avc1.4D401E" framerate="30">
[0280] <Group id="1”
[0281] <segmentlist>
[0282] <initialization sourceURL="seg-m-init.mp4" / >
[0283] < / segmentlist>
[0284] <representation bandwidth="512000">
[0285] <baseurl>vpcc-512k.mp4 < / baseurl>
[0286] < / representation>
[0287] <representation bandwidth="1024000">
[0288] <baseurl> vpcc-1024k.mp4 < / baseurl>
[0289] < / representation>
[0290] <representation bandwidth="2048000">
[0291] <baseurl> vpcc-2048k.mp4< / baseurl>
[0292] < / representation>
[0293] < / adaptationset>
[0294] < / period>
[0295] < / mpd>
[0296] An MPD may include a period element. The period element may represent an adaptation set and a group ID. A list of segments identified by the group ID (initialization address information) and representation bandwidth information for group information identified by the group ID may be included within the period element.
[0297] FIG. 8 shows group information within the MPD according to the embodiments.
[0298] Referring to FIG. 8, when the file according to the embodiments is multi-track, group information within the MPD can be generated as follows.
[0299] <mpd> <period>
[0300] <!-- Main V3C AdaptationSet -->
[0301] <adaptationset id="1" codecs="v3c1"> <essentialproperty schemeIdUri="urn:mpeg:dash:preselection:2016" / >
[0302] <Group id="1”
[0303] <representation> ...< / representation>
[0304] < / adaptationset>
[0305] <!-- Occupancy -->
[0306] <adaptationset id="2" mimetype="video / mp4" codecs="resv.vvvc.hvc1">
[0307] <essentialproperty schemeiduri="urn:mpeg:dash:preselection:2016 schemeIdUri=" urn:mpeg:mpegi:v3c:2020:component"> <v3c:videocomponent type="”켹ccp"" / > <Group id="1”
[0308] < / essentialproperty>
[0309] <representation> ...< / representation>
[0310] < / adaptationset>
[0311] <!-- Geometry -->
[0312] <adaptationset id="4" mimetype="video / mp4" codecs="resv.vvvc.hvc1">
[0313] <essentialproperty schemeiduri="urn:mpeg:dash:preselection:2016 schemeIdUri=" urn:mpeg:mpegi:v3c:2020:component"> <v3c:videocomponent type="”켫eom"" / > <Group id="1”
[0314] < / essentialproperty>
[0315] <representation> ...< / representation>
[0316] < / adaptationset>
[0317] <!-- Attribute -->
[0318] <adaptationset id="6" mimetype="video / mp4" codecs="resv.vvvc.hvc1">
[0319] <essentialproperty schemeiduri="urn:mpeg:dash:preselection:2016 schemeIdUri=" urn:mpeg:mpegi:v3c:2020:component"> <v3c:videocomponent type="attr" / > <Group id="1”
[0320] < / essentialproperty>
[0321] <representation> ...< / representation>
[0322] < / adaptationset>
[0323] < / period> < / mpd>
[0324] The MPD includes a period element. The MPD includes a main adaptation set within the period element (an adaptation set for atlas data, which is the entry point as the main adaptation set), an adaptation set for accusation data, an adaptation set for geometry data, and an adaptation set for attribute data. The main adaptation set includes a group ID associated with the atlas data. The adaptation set for accusation data includes a group ID associated with the accusation data. The adaptation set for geometry data includes a group ID associated with the geometry data. The adaptation set for attribute data includes a group ID associated with the attribute data.
[0325] The point cloud data receiving method according to the embodiments can partially decode point cloud data grouped by the same ID based on the group ID within each adaptation set.
[0326] FIG. 9 shows group information within an ISOBMFF-based file according to embodiments.
[0327] The point cloud data transmission method according to the embodiments can generate group information within the ISOBMFF file structure.
[0328] Moov
[0329] └── mvhd
[0330] └── trak
[0331] └──tkhd
[0332] └──mdia
[0333] └──mdhd
[0334] └──hdlr
[0335] └──minf
[0336] └──vmhd
[0337] └──dinf
[0338] └──stbl
[0339] └──stsd
[0340] └──v3e1(v3c bitstream sample entry, 'v3e1','v3eg','v3c1','v3cg','v3cb', etc)
[0341] └──group_id(indexing group id info)
[0342] To maintain the same format regardless of the track, v3c bitstream sample entries and group IDs related to the track can be created under stsd.
[0343] moov is a container for ISOBMFF-based metadata.
[0344] trak is a container for individual tracks or streams based on ISOBMFF.
[0345] mdia is a container for media information within an ISOBMFF-based track.
[0346] minf is an ISOBMFF-based media information container.
[0347] stbl is an ISOBMFF-based sample table box, a container for time / space maps.
[0348] stsd is a sample descriptor for ISOBMFF-based codec types and initialization.
[0349] A file according to the embodiments may include a V3C bitstream sample entry and a group ID (identifying information based on the group ID) within a sample descriptor.
[0350] FIG. 10 shows group information within an ISOBMFF-based file according to embodiments.
[0351] A group class within a file according to the embodiments may include the following elements: information indicating whether it is affected by a specific event, information indicating whether it includes tiles that ignore objects, information indicating the tile position during encoding / decoding, etc.
[0352] For example, a specific event dependency may indicate that group classes have the same Group ID.
[0353] In addition, information regarding whether an object includes tiles and the tile location during decoding can be expressed as a subclass of the group class or as additional information.
[0354] The group class of a file according to the embodiments can be expressed as follows:
[0355] group_ID Box {
[0356] unsigned int size;
[0357] unsigned int type = 'grp '; / Box type indicator
[0358] unsigned int version;
[0359] unsigned int flags;
[0360] / Group Type
[0361] unsigned int group_type; / 1 for Event Dependency, 2 for Tile Group
[0362]
[0363] / Group Specific Data
[0364] if (group_type == 1) {
[0365] / Event Dependency Data
[0366] unsigned int event_dependency_count; / for Group ID
[0367] } else if (group_type == 2) {
[0368] / Tile Group Data unsigned
[0369] int tile_count;
[0370] for (i = 0; i < tile_count; i++) {
[0371] unsigned int tile_id;
[0372] unsigned int tile_x_position; unsigned int tile_y_position; unsigned int tile_z_position;
[0373] unsigned int tile_width; unsigned int tile_height; unsigned int tile_depth;}}}
[0374] A group box may include the size of an area identified by a group ID, information indicating the type of the box, version information, flags, etc.
[0375] The group box may include additional information indicating the group type. For example, if the group type is 1, it indicates that the current group is dependent on the event, and if the group type is 2, it indicates that the current group is a tile group independent of the event.
[0376] The group box may include additional group characteristic data. For example, if the group type is 1, it may include an event dependency count for the group ID as event-dependent data. The event dependency count represents the number of event dependencies. If the group type is 2, it may include tile group-related data such as a tile count representing the number of tiles, a tile ID per tile count, tile location information, tile width, tile bunny, tile depth, etc.
[0377] FIG. 11 shows objects and events according to embodiments.
[0378] For example, there may be correlations or dependencies between the objects and events targeted for grouping. The action of a user controlling a switch and the action of a light connected to the switch are related. To increase the efficiency of encoding and decoding when encoding, decoding, and transmitting / receiving point cloud data for these objects and events, the interconnectedness of a specific event and multiple media objects can be represented according to the aforementioned method.
[0379] Accordingly, the point cloud data receiving method according to the embodiments can play events simultaneously.
[0380] FIG. 12 shows an event-based group according to embodiments.
[0381] In order to play events simultaneously, the embodiments can identify the group ID associated with the event through the MPD. Media corresponding to the group ID can be downloaded from the server.
[0382] When multiple events and objects within point cloud content are related to each other, the group IDs associated with the events are parsed from MPD, and the related media can be retrieved from the server based on the group IDs.
[0383] FIG. 13 illustrates the playback of event-based connected data according to embodiments.
[0384] In order to play events simultaneously, the embodiments may pre-download multiple media on the user decoder and / or display side, and when a specific event occurs, play the media connected to that event simultaneously.
[0385] For example, Object 1, Object 2, and Object 4 can be identified as Group ID 1. Object 3 and Object 5 can be identified as Group ID 2. When a specific event occurs during the process of decoding and displaying point cloud data, if the data associated with the specific event consists of objects for Group ID 1, media data identified as Group ID 1 can be played simultaneously.
[0386] FIG. 14 illustrates the reconstruction of grouping-based data according to embodiments.
[0387] When compressing point cloud data in tile units to adaptively transmit and receive multiple objects playing simultaneously, tiles with similar colors and / or information can be grouped together during the encoding process to reduce differences between neighboring points. The resulting group IDs can be defined in the MPD and / or file as described above. For example, encoding data containing sky objects by grouping them together increases compression and transmission performance.
[0388] FIG. 15 illustrates a method for transmitting point cloud data according to embodiments.
[0389] A method for transmitting point cloud data may include a step of encoding point cloud data (S1500), a step of encapsulating point cloud data into a file (S1510), and / or a step of transmitting the file (S1520).
[0390] The encoding step (S1500) may include a V3C encoding step as shown in FIG. 4. The V3C encoding step may generate a V3C bitstream by projecting V-PCC (V3C) data obtained by volumetric capture, encoding atlas information of the V-PCC data, encoding accusation data of the V-PCC data, encoding geometry data of the V-PCC data, and encoding attribute data of the V-PCC data. The encoding operation may include encoding according to the ISO / IEC 23090-5 standard.
[0391] The encapsulating step (S1510) can generate a file based on a single track and / or multi-track V3C bitstream as shown in FIG. 5.
[0392] The step of transmitting a file (S1520) can transmit a file.
[0393] The encoding step (S1500) can additionally generate MPD as shown in FIG. 6.
[0394] Referring to FIG. 5, the file may include an atlas track containing atlas data of the point cloud data, a geometry track containing geometry data of the point cloud data, an attribute track containing attribute data of the point cloud data, and an accusation track containing accusation data of the point cloud data.
[0395] Referring to FIG. 9, at least one of the atlas track, the geometry track, the attribute track, or the accusation track includes a sample descriptor, and the sample entry of the sample descriptor may include a group ID associated with the sample descriptor.
[0396] Referring to FIG. 6, the method further comprises the step of transmitting an MPD for the point cloud data, wherein the MPD may include a main adaptation set including atlas data of the point cloud data, a geometry adaptation set including geometry data of the point cloud data, an attribute adaptation set including attribute data of the point cloud data, and an accusation adaptation set including accusation data of the point cloud data.
[0397] Referring to FIG. 8, regarding the MPD group ID, at least one of the main adaptation set, the geometry adaptation set, the attribute adaptation set, or the accusation adaptation set includes a group ID, and the group ID may represent identification information in which the point cloud data is grouped.
[0398] Referring to FIG. 10, with respect to the ISOBMFF group ID box, the file further includes a group ID box related to the group ID, the group ID box includes the size of the group or the type of the group, the type of the group indicates a dependency on at least one of a tile group or an event for the group, and when the type of the group is a first value, the group ID box includes the number of events for the group ID, and when the type of the group is a second value, the group ID box includes the number of tiles, and the group ID box may include at least one of a tile ID, location information of a tile, or size information of a tile.
[0399] A method for transmitting point cloud data can be performed by a transmitting device as shown in FIG. 4. The point cloud data transmitting device may include an encoder that encodes point cloud data; an encapsulator that encapsulates the point cloud data into a file; and a transmitter that transmits the file.
[0400] FIG. 16 illustrates a method for receiving point cloud data according to embodiments.
[0401] A method for receiving point cloud data may include the step of receiving a file containing point cloud data (S1600), the step of decapsulating the file (S1610), and / or the step of decoding the point cloud data (S1620).
[0402] The decoding step (S1620) may include: encoding atlas information of the point cloud data, decoding accusation data of the point cloud data, decoding geometry data of the point cloud data, and decoding attribute data of the point cloud data.
[0403] In the decapsulating step (S1610), the file may include an atlas track containing atlas data of the point cloud data, a geometry track containing geometry data of the point cloud data, an attribute track containing attribute data of the point cloud data, and an accusation track containing accusation data of the point cloud data.
[0404] At least one of the atlas track, the geometry track, the attribute track, or the accusation track includes a sample descriptor, and the sample entry of the sample descriptor may include a group ID associated with the sample descriptor.
[0405] The above method further includes the step of receiving an MPD for the point cloud data, wherein the MPD may include a main adaptation set including atlas data of the point cloud data, a geometry adaptation set including geometry data of the point cloud data, an attribute adaptation set including attribute data of the point cloud data, and an accusation adaptation set including accusation data of the point cloud data.
[0406] At least one of the main adaptation set, the geometry adaptation set, the attribute adaptation set, or the accusation adaptation set includes a group ID, and the group ID may represent identification information in which the point cloud data is grouped.
[0407] In the decapsulating step (S1610), the file further includes a group ID box related to the group ID, the group ID box includes the size of the group or the type of the group, the type of the group indicates a dependency on at least one of a tile group or an event for the group, and when the type of the group is a first value, the group ID box includes the number of events for the group ID, and when the type of the group is a second value, the group ID box includes the number of tiles, and the group ID box may include at least one of a tile ID, location information of a tile, or size information of a tile.
[0408] A method for receiving point cloud data can be performed by a receiving device as shown in FIG. 4. The receiving device may include a receiving unit that receives a file containing point cloud data; a decapsulator that decapsulates the file; and a decoder that decodes the point cloud data.
[0409] As a result, a decoder receiving ISOBMFF and / or MPD can identify groups of V-PCC data and partially decode the groups. Furthermore, this has the effect of enabling partial decoding of related events or tiles in groups.
[0410] The embodiments have been described in terms of methods and / or devices, and the description of the methods and the description of the devices may be applied complementarily.
[0411] Although the drawings have been described separately for the convenience of explanation, it is also possible to design a new embodiment by combining the embodiments described in each drawing. Furthermore, designing a computer-readable recording medium containing a program for executing the previously described embodiments, as required by a person skilled in the art, falls within the scope of the claims of the embodiments. The apparatus and method according to the embodiments are not limited to the configuration and method of the embodiments described above; rather, the embodiments may be configured by selectively combining all or part of each embodiment to allow for various modifications. Although preferred embodiments have been illustrated and described, the embodiments are not limited to the specific embodiments described above. It is not only possible for a person skilled in the art to make various modifications without departing from the essence of the embodiments claimed in the claims, but such modifications should not be understood individually from the technical concept or perspective of the embodiments.
[0412] Various components of the device of the embodiments may be implemented by hardware, software, firmware, or a combination thereof. Various components of the embodiments may be implemented as a single chip, for example, a single hardware circuit. Depending on the embodiments, the components according to the embodiments may each be implemented as separate chips. Depending on the embodiments, at least one of the components of the device according to the embodiments may be composed of one or more processors capable of executing one or more programs, and one or more programs may include instructions for performing or executing any one or more of the operations / methods according to the embodiments. Executable instructions for performing the methods / operations of the device according to the embodiments may be stored in non-transient CRMs or other computer program products configured to be executed by one or more processors, or may be stored in transient CRMs or other computer program products configured to be executed by one or more processors. Additionally, memory according to the embodiments may be used as a concept that includes not only volatile memory (e.g., RAM, etc.) but also non-volatile memory, flash memory, PROM, etc. In addition, it may also include implementation in the form of carrier waves, such as transmission over the Internet. Furthermore, processor-readable recording media are distributed across networked computer systems, allowing processor-readable code to be stored and executed in a distributed manner.
[0413] In this document, “ / ” and “,” are interpreted as “and / or.” For example, “A / B” is interpreted as “A and / or B,” and “A, B” is interpreted as “A and / or B.” Additionally, “A / B / C” means “at least one of A, B and / or C.” Also, “A, B, C” means “at least one of A, B and / or C.” Additionally, in this document, “or” is interpreted as “and / or.” For example, “A or B” may mean 1) “A” alone, 2) “B” alone, or 3) “A and B.” In other words, “or” in this document may mean “additionally or alternatively.”
[0414] Terms such as "first," "second," etc., may be used to describe various components of the embodiments. However, the interpretation of the various components according to the embodiments should not be limited by these terms. These terms are merely used to distinguish one component from another. For example, the first user input signal may be referred to as the second user input signal. Similarly, the second user input signal may be referred to as the first user input signal. The use of these terms should be interpreted as not departing from the scope of the various embodiments. Although the first user input signal and the second user input signal are both user input signals, they do not imply the same user input signals unless clearly indicated in the context.
[0415] The terms used to describe the embodiments are intended for the purpose of describing specific embodiments and are not intended to limit the embodiments. As used in the description of the embodiments and in the claims, the singular is intended to include the plural unless explicitly indicated in the context. Expressions of and / or are used to mean including all possible combinations between the terms. Expressions of include describe the presence of features, numbers, steps, elements, and / or components and do not imply the exclusion of additional features, numbers, steps, elements, and / or components. Conditional expressions such as "if" or "when" used to describe the embodiments are not limited to being optional. It is intended to be interpreted as "when a specific condition is satisfied," "when a related action is performed in response to a specific condition," or "when a related definition is interpreted."
[0416] Additionally, operations according to the embodiments described herein may be performed by a transmitting and receiving device including memory and / or a processor, depending on the embodiments. The memory may store programs for processing / controlling operations according to the embodiments, and the processor may control various operations described in this document. The processor may be referred to as a controller, etc. Operations in the embodiments may be performed by firmware, software, and / or a combination thereof, and the firmware, software, and / or a combination thereof may be stored in the processor or in memory.
[0417] Meanwhile, the operation according to the embodiments described above may be performed by a transmitting device and / or a receiving device according to the embodiments. The transmitting and receiving device may include a transmitting and receiving unit for transmitting and receiving media data, a memory for storing instructions (program code, algorithm, flowchart and / or data) for a process according to the embodiments, and a processor for controlling the operations of the transmitting and receiving devices.
[0418] The processor may be referred to as a controller, etc., and may correspond, for example, to hardware, software, and / or a combination thereof. The operation according to the embodiments described above may be performed by the processor. Additionally, the processor may be implemented as an encoder / decoder, etc., for the operation of the embodiments described above.
[0419] As described above, the relevant details have been explained in the best mode for carrying out the embodiments.
[0420] As described above, the embodiments may be applied wholly or partially to point cloud data transmission and reception devices and systems.
[0421] Those skilled in the art may make various changes or modifications to the embodiments within the scope of the embodiments.
[0422] The embodiments may include modifications / variations, and such modifications / variations do not exceed the scope of the claims and their equivalents.
Claims
Step to encode point cloud data; A step of encapsulating the above point cloud data into a file; and The step of transmitting the above file; comprising, Point cloud data transmission method. In paragraph 1, The above encoding step is: A method comprising the step of generating a bitstream by projecting point cloud data obtained by volumetric capture, encoding atlas information of the point cloud data, encoding accusation data of the point cloud data, encoding geometry data of the point cloud data, and encoding attribute data of the point cloud data. Point cloud data transmission method. In paragraph 1, The above file comprises an atlas track containing atlas data of the point cloud data, a geometry track containing geometry data of the point cloud data, an attribute track containing attribute data of the point cloud data, and an accusation track containing accusation data of the point cloud data, Point cloud data transmission method. In paragraph 2, At least one of the atlas track, the geometry track, the attribute track, or the accusation track comprises a sample descriptor, and the sample entry of the sample descriptor comprises a group ID associated with the sample descriptor. Point cloud data transmission method. In paragraph 1, The above method further includes the step of transmitting an MPD for the point cloud data, and The above MPD comprises a main adaptation set including atlas data of the point cloud data, a geometry adaptation set including geometry data of the point cloud data, an attribute adaptation set including attribute data of the point cloud data, and an accusation adaptation set including accusation data of the point cloud data. Point cloud data transmission method. In paragraph 5, At least one of the main adaptation set, the geometry adaptation set, the attribute adaptation set, or the accupancy adaptation set includes a group ID, and The above group ID represents identification information in which the above point cloud data is grouped, Point cloud data transmission method. In paragraph 4, The above file further includes a group ID box related to the above group ID, and The above group ID box includes the size of the group or the type of the group, and The type of the above group indicates a dependency on at least one of the tile group or event for the above group, and When the type of the above group is the first value, the above group ID box includes the number of events for the above group ID, and When the type of the above group is a second value, the group ID box includes the number of tiles, and the group ID box includes at least one of a tile ID, tile location information, or tile size information. Point cloud data transmission method. Encoder for encoding point cloud data; An encapsulator that encapsulates the above point cloud data into a file; and A transmitter that transmits the above file; comprising, Point cloud data transmission device. A step of receiving a file containing point cloud data; A step of decapsulating the above file; and A step of decoding the above point cloud data; comprising, Method for receiving point cloud data. In Paragraph 9, The above decoding step is: A method comprising the steps of encoding atlas information of the point cloud data, decoding accusation data of the point cloud data, decoding geometry data of the point cloud data, and decoding attribute data of the point cloud data. Method for receiving point cloud data. In Paragraph 9, The above file comprises an atlas track containing atlas data of the point cloud data, a geometry track containing geometry data of the point cloud data, an attribute track containing attribute data of the point cloud data, and an accusation track containing accusation data of the point cloud data, Method for receiving point cloud data. In Paragraph 10, At least one of the atlas track, the geometry track, the attribute track, or the accusation track comprises a sample descriptor, and the sample entry of the sample descriptor comprises a group ID associated with the sample descriptor. Method for receiving point cloud data. In Paragraph 9, The above method further includes the step of receiving an MPD for the point cloud data, and The above MPD comprises a main adaptation set including atlas data of the point cloud data, a geometry adaptation set including geometry data of the point cloud data, an attribute adaptation set including attribute data of the point cloud data, and an accusation adaptation set including accusation data of the point cloud data. Method for receiving point cloud data. In Paragraph 13, At least one of the main adaptation set, the geometry adaptation set, the attribute adaptation set, or the accupancy adaptation set includes a group ID, and The above group ID represents identification information in which the above point cloud data is grouped, Method for receiving point cloud data. A receiver that receives a file containing point cloud data A decapsulator for decapsulating the above file; and A decoder for decoding the above point cloud data; comprising, Point cloud data receiving device.