Point cloud data transmission method, point cloud data transmission device, point cloud data reception method, and point cloud data reception device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
The method addresses the challenge of streaming 3D multimedia content by encoding and decoding point cloud data adaptively, ensuring efficient transmission and reduced parsing time through MPEG-DASH signaling and ISOBMFF format, optimizing bandwidth and quality based on user location.

WO2026126183A1PCT designated stage Publication Date: 2026-06-18FOUND FOR RES & BUSINESS SEOUL NAT UNIV OF SCI & TECH

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: FOUND FOR RES & BUSINESS SEOUL NAT UNIV OF SCI & TECH
Filing Date: 2026-01-02
Publication Date: 2026-06-18

Smart Images

Figure OP251004WOZZ-APPB-IMG-000001
Figure OP251004WOZZ-APPB-IMG-000002
Figure OP251004WOZZ-APPB-IMG-000003

Patent Text Reader

Abstract

A point cloud data transmission method according to embodiments may comprise the steps of: encoding point cloud data; encapsulating the point cloud data into a file; and transmitting the file. A point cloud data reception method according to embodiments may comprise the steps of: receiving a file including point cloud data; decapsulating the file; and decoding the point cloud data.

Need to check novelty before this filing date? Find Prior Art

Description

Point cloud data transmission method, point cloud data transmission device, point cloud data reception method, and point cloud data reception device

[0001] The embodiments relate to a technology for transmitting 3D multimedia content produced based on V-PCC (Video-based Point Cloud Compression) in a streaming manner.

[0002] With the introduction of digital twin and metaverse concepts, related technologies are growing and establishing themselves as the center of high-tech industries. Accordingly, media services are also expected to evolve into a format that provides 6DoF 360VR video or 3D multimedia content via streaming.

[0003] Meanwhile, one of the methods widely used when describing digital worlds, such as the metaverse, is to create 3D multimedia content based on V-PCC (Video-based Point Cloud Compression). The reason for using V-PCC is that it is efficient to describe objects using V-PCC because the relationship between the user and other objects in the digital world is not a fixed value but changes from moment to moment.

[0004] As explained earlier, although 3D multimedia content is expected to evolve into a streaming format, there is currently no technology to transmit 3D multimedia content produced with point clouds via streaming.

[0005] The embodiments were conceived against this technical background and aim to enable the transmission of 3D multimedia content in a streaming manner by designing signaling that includes spatiotemporal information to adaptively transmit 3D multimedia content according to the user's location, environment, etc.

[0006] A method for transmitting point cloud data according to embodiments may include the steps of: encoding point cloud data; encapsulating point cloud data into a file; and transmitting the file. A method for receiving point cloud data according to embodiments may include the steps of: receiving a file containing point cloud data; decapsulating the file; and decoding the point cloud data.

[0007] The method and apparatus according to the embodiments can efficiently encode and transmit point cloud data.

[0008] The method and apparatus according to the embodiments can spatially adaptively encode and transmit point cloud data.

[0009] Drawings are included to further understand the embodiments, and the drawings illustrate the embodiments along with descriptions related to the embodiments. For a better understanding of the various embodiments described below, one must refer to the description of the embodiments below in relation to the following drawings, which include parts corresponding to similar reference numerals throughout the drawings.

[0010] FIG. 1 is a flowchart schematically showing a method for streaming V-PCC-based content via MPEG-DASH according to embodiments.

[0011] FIG. 2 shows an object within the user's field of view according to embodiments.

[0012] FIG. 3 shows a group of objects according to embodiments.

[0013] FIG. 4 shows a group of objects within the user's field of view according to embodiments.

[0014] FIG. 5 shows a structure for transmitting V3C data according to embodiments.

[0015] FIG. 6 shows the multi-track encapsulation of V-PCC data according to ISOBMFF (ISO Base Media File Format) according to the embodiments.

[0016] FIG. 7 shows DASH signaling according to embodiments.

[0017] FIG. 8 shows an IDD (Init Description Document) according to the embodiments.

[0018] Figure 9 illustrates a method for describing IDD-based spatial information.

[0019] FIG. 10 illustrates an IDD-based spatial signaling method according to embodiments.

[0020] FIG. 11 illustrates a method for signaling dynamic position according to embodiments.

[0021] FIG. 12 illustrates a method for transmitting point cloud data according to embodiments.

[0022] FIG. 13 illustrates a method for receiving point cloud data according to embodiments.

[0023] Preferred embodiments of the embodiments are described in detail, and examples thereof are shown in the accompanying drawings. The following detailed description, with reference to the accompanying drawings, is intended to describe preferred embodiments of the embodiments rather than merely embodiments that may be implemented according to the embodiments. The following detailed description includes details to provide a thorough understanding of the embodiments. However, it is obvious to those skilled in the art that the embodiments may be practiced without these details.

[0024] Most terms used in the embodiments are selected from those commonly used in the field, but some terms are chosen at the applicant's discretion, and their meanings are described in detail in the following description as necessary. Accordingly, the embodiments should be understood based on the intended meaning of the terms, rather than their mere names or meanings.

[0025] FIG. 1 is a flowchart schematically illustrating a method for streaming V-PCC-based content via MPEG-DASH according to embodiments.

[0026] In step S10, when a user (10) requests data transmission from a server (20) to view specific V-PCC-based content, the server (20) transmits an IDD (Init Description Document) containing information about media presentation descriptions (MPDs) that divide and describe the entire space to the user (10).

[0027] This is intended to reduce the time required for the user (10) to parse, as providing spatial information based on MPEG-DASH results in a large file size because one MPD contains too many objects. This step may be applied selectively depending on the system environment.

[0028] In step S20, the user (10) selects an MPD that describes the spatial information needed by the user based on their location and IDD, and then requests the related MPD from the server (20) while transmitting their location information.

[0029] In step S30, the server (20) generates an MPD file that describes the relationship between the user and the objects placed in space based on the user's location. This MPD file may be generated in advance based on 3D spatial information, or it may be generated in real time according to the user's requirements if there is a request from the user in step S20.

[0030] In step S40, the server (20) delivers the generated MPD file to the user (10) according to the user's request.

[0031] In step S50, the user (10) requests the necessary amount of V-PCC-based content from the server (20) by considering the relative distance, angle, etc. between their network environment and the object based on the received MPD.

[0032] In step S60, the server (20) that received the request delivers the requested segment data.

[0033] In the above operation process, since MPEG-DASH specifies that an MPD describing the relationship between the user and the content should be generated and distributed, the server (20) basically generates and distributes an MPD.

[0034] To prevent too much information from being entered into the MPD, the entire space can be divided into multiple sections, and an MPD can be generated that describes only the divided areas based on the user's location information.

[0035] In this case, since the generated MPD cannot represent the entire space, the server (20) additionally generates an IDD containing information about which space of the entire space each MPD describes. To this end, the generated IDD information includes the size of the separated space and a URI that can receive the corresponding MPD. Table 1 is an example of an IDD.

[0036] SpaceURI(0, 0, 0 / 180, 259, 372)http: / / www.IDD.kr / first_MPD(180, 0, 0 / 375, 259, 372)http: / / www.IDD.kr / seco_MPD(0, 259, 0 / 180, 511, 372)http: / / www.IDD.kr / first_MPD(180, 259, 0 / 375, 511, 372)http: / / www.IDD.kr / first_MPD

[0037] To reiterate step S20, the user (10) selects an MPD that describes the partitioned space required by the user within the entire space based on their location and IDD, and then requests the relevant MPD by transmitting their location information to the server (20). In step S30, the server (20) generates an MPD corresponding to the space. The reason for generating the MPD after the user request is that various relationships, such as the proportion of the Occlusion Area and Object within the screen, vary depending on the user's location within the space.

[0038] Meanwhile, MPD is updated based on the length of the content and the speed of the user's movement. If the user's movement is fast, more frequent updates are required because the relationship between the object and the user can change frequently.

[0039] Accordingly, in this embodiment, the time to live (TTL) of the MPD is set as shown in Equation 1, taking into account the segment length, the movement speed of the user within the space, etc. Here, is the duration, is the segment length, is the user's movement speed.

[0040]

[0041] At this time, the segment length is the time unit in which the server (20) divides the content. The content is maintained for the segment length, and the space occupied during the segment length time is defined as the object size.

[0042] Content is produced based on V-PCC. This point cloud compression method utilizing geometry is based on a 3D tree structure called an octree. Users must be able to receive objects separately based on their relative position and distance, or receive multiple objects at once.

[0043] The resolution quality of objects compressed by node-based compression algorithms is determined by the relative distance between the user and the object. Therefore, a standard is required when dividing or grouping objects, or when specifying content quality.

[0044] Therefore, the following presents the weight values used to specify these two, and explains how to group them using those values and determine the quality.

[0045] ND: Node Depth

[0046] ND stands for Node Depth weight and is the required resolution necessary for an object to be consumed at a specific quality. To determine the ND, the Required Node Depth (RND) is first required, which is the minimum Node Depth value required for the object at a reference distance (e.g., 1m). This RND is stored in the MPD as additional information for each object.

[0047] According to RND, ND is defined as Equation 2. Here, r is the distance between the object and the user.

[0048]

[0049] TL: Tile Level

[0050] TL is the Tile Level weight. TL is a measure of whether to split and transmit an object based on the user's viewport (FOV) or to transmit it together with other objects as a group when transmitting an object to a user. Here, a tile is a concept equivalent to a unit for storing space by dividing it, and a single object can be divided and represented as multiple tiles depending on the level of division. TL is a level value, for example, ranging from [1, 2, 3, 4, 5], and each value is as shown in Table 2.

[0051] Meaning of TL 1. Transmit a single object divided into two or more small pieces 2. Transmit a single object after splitting it into one stage 3. Transmit a single object 4. Transmit as a group containing one or more objects 5. Transmit a space containing one or more groups

[0052] The TL value of each object is determined through a set of reference values, which serves as an element for distinguishing TLs and expresses the extent to which it occupies the user's field of view using latitude and longitude. Each TL reference value set is configured as a criterion for advancing through the TL stages, and it serves as the distinction between TL n and (n+1). Each value Defined as. Hardness standard ( ) and latitude reference( )silver class It can be expressed as follows. FIG. 2 shows an object within the user's field of view according to embodiments. As exemplified in FIG. 2, the proportion θ, φ occupied by the object within the user's field of view is compared with a set of reference values to determine the TL value. The proportion occupied by the object within the field of view is calculated as the difference between the largest θ, φ and the smallest θ, φ among the eight polar coordinate system representations (r, θ, φ) of the eight points constituting the object.

[0053] The TL value is determined by comparing the θ and φ obtained through this process with the TL reference value set of θ and φ. For example, if the object's θ is greater than θ1, the TL is 1. If θ is less than θ4, the TL is 5. Through this method, the values of the object's θ and φ are checked, and the TL values for each of the two axes Designate it as such. Subsequently, when grouping or splitting, split / group separately for each direction.

[0054] Meanwhile, in one embodiment, based on the aforementioned ND and TL, objects having the same ND and TL can be grouped into the same group and reconstructed into a single file.

[0055] For example, objects with a TL value greater than or equal to a threshold value (e.g., 4 or higher) can be grouped into the same group. The objects in this group are those with a TL of 4 or 5 within the user's field of view (FoV).

[0056] FIG. 3 shows a group of objects according to embodiments.

[0057] Objects grouped into a single group are reorganized into a single file by creating an additional node called a group root node above the root node of the existing node structure, as shown in Fig. 3, and are decoded with the same Node Depth. In this process, if the ND of individual objects differs, there may be a significant disparity in the quality of the objects within the user screen.

[0058] Therefore, when grouping using TL, ND must also be considered. Consequently, objects within a single group have the same ND and TL. This means that objects within the group appear to the user with the same quality at the same Node Depth.

[0059] In addition, groups must be presented in various combinations for adaptive transmission based on the user's viewing direction. This is because fixed combinations of groups are a factor that wastes user bandwidth. As shown in Fig. 4, when combining groups, the number of objects grouped is adjusted according to the user's viewing direction so that objects with the same ND TL are grouped into the same group.

[0060] This will be explained in more detail with reference to Fig. 4.

[0061] FIG. 4 shows a group of objects within the user's field of view according to embodiments.

[0062] As previously explained, the server generates various combinations of object groups to provide object groups suitable for the user's head direction. Object groups are combined within the user's field of view, and various types of object groups can be generated depending on the user's head direction. Figure 4 illustrates object groups that are combined in various ways within the same TL and ND depending on the user's head direction. The user's head direction can be expressed as θ and φ, which are the latitude and longitude values in the polar coordinate representation of the head direction. Additionally, the user's field of view (FoV) is represented as (q, j) by the service provider terminal. Although it is necessary to determine whether an object group is included within the user's field of view for both the latitude and longitude axes in order to generate object groups, an example based on the longitude axis is explained for ease of understanding, as shown in Figure 4.

[0063] When the user's head direction is represented by θ and the field of view is defined by q, the user's field of view is represented as (θ-q / 2, θ+q / 2). At this time, when θ is 0, an object group is created by combining objects included within the field of view, considering TL and ND, and a new object group is created whenever the composition of objects within the field of view changes while increasing θ.

[0064] Based on the above explanation, the following three elements are largely required to describe objects reconfigured to fit the user's location in MPD.

[0065] 1. Adaptation Set

[0066] 2. Representation (Resolution)

[0067] 3. SRD(Spatial Relationship Description)

[0068] 1. Adaptation Set

[0069] An Adaptation Set is generally a content element that can be individually decoded. This Adaptation Set is defined as an individual Adaptation Set of objects that have been divided or combined through the previously explained TL and ND.

[0070] 2. Representation (Playback Quality)

[0071] Representation refers to the quality of the content and is adjusted according to the user's network environment. Since V-PCC-based content has various Node Depths and providing all Node Depths as Representations can cause disparities between objects, the number of levels (m) and the depth gap (n) provided by the system are defined. Here, the number of levels refers to the number of playback quality levels of the content provided by the system, and the depth gap refers to the interval between quality levels.

[0072] m and n, in conjunction with ND, set the Node Depth of each object assigned to the Representation. ND is the lowest Depth Level of the Representation, and creates up to m levels with a difference of at least n steps up to the total Node Depth of the object.

[0073] For example, if an object with a Leaf Node Depth of 16 has an ND of 6 and m=5, the object has a maximum of m representations, which is 5. Among these, the lowest

[0074] The Representation has a Node Depth of 6. The highest Representation has a Node Depth of 16. Node depths of 9 and 12 become Representations to allow for a gap of at least n between them. Consequently, although the system can provide 5 resolutions, it only provides 4 Representations due to the relationship between the Object ND and the maximum Depth.

[0075] 3. SRD(Spatial Relationship Description)

[0076] An SRD (spatial relation description) is an item that describes the relationship between a user and an object, as well as the object itself. This item is written as a sub-item called an Essential Property within an Adaptation Set and serves to provide additional explanations about the Adaptation Set. The SRD of this system is Write in the form.

[0077] represents the positional relationship from the user to the center of an individual object or group of objects in a polar coordinate system. is the Width, Depth, and Height of the object. is a quaternion value describing the rotation of the object. Since 3-axis-based rotation values, such as yaw, pitch, and roll, have the problem that the final direction changes depending on the order of rotation, the rotation of the object is represented using a quaternion.

[0078] Based on the corresponding SRD value, the user can check information such as the object's position, size, and rotation, and place the object in the space.

[0079] FIG. 5 shows a structure for transmitting V3C (visual volumetric video-based coding) data according to embodiments.

[0080] A real-world or synthetic visual scene (A) is captured by a camera set, such as a camera device with multiple lenses and sensors or a virtual camera. The acquired result is source volumetric data (B). One or more volumetric frames are encoded into a coded V3C bitstream containing an atlas bitstream, up to one occupancy bitstream, a geometry bitstream, and zero or more attribute bitstreams (Ev). Then, one or more coded bitstreams are packaged into a media file for local playback (F) or a sequence of media segments (Fs) and an initialization segment for streaming, according to a specific media container file format. In the embodiments, the media container file format is the ISO basic media file format specified in ISO / IEC 14496-12. The file encapsulator may also include metadata in the file or segments. The segments Fs are delivered to a player using a delivery mechanism.

[0081] The file (F) output by the file encapsulator is identical to the file (F') received as input by the file decapsulator. The file decapsulator processes the file (F') or the received segments (F's), extracts the coded bitstream (E'v), and parses the metadata. The V3C bitstream is then decoded into a decoded signal (D'). The decoded volumetric data (D') is reconstructed and rendered to be displayed on the screen of a head-mounted display or other display device according to the current viewing orientation or viewport. The current viewing orientation is determined by head tracking and eye tracking functions. In viewport-dependent transmission, the current viewing orientation is also passed to the strategy module, which determines the track to receive based on the viewing orientation.

[0082] The process described above can be applied to both live and on-demand use cases.

[0083] The interface definition in Fig. 5 is as follows:

[0084] F / F': A media file containing specifications for the track format, which may include constraints on the underlying stream included in the track sample. Timed V3C content and / or non-timed V3C data can be encapsulated in file form, respectively.

[0085] The system of FIG. 5 according to the embodiments includes a transmission-related interface for DASH transmission.

[0086] The system of FIG. 5 according to the embodiments includes a transmission-related interface for MMT transmission.

[0087] FIG. 6 shows the multi-track encapsulation of V-PCC data according to ISOBMFF (ISO Base Media File Format) according to the embodiments.

[0088] The point cloud data encoding method according to the embodiments can encapsulate the timed V-PCC into an ISOBMFF-based file.

[0089] Single Track Encapsulation:

[0090] Single-track encapsulation of V3C data can create a single track where the V3C bitstream is a V3C bitstream track.

[0091] Single-track encapsulation of V3C data is utilized in the case of direct ISOBMFF encapsulation of the V3C bitstream. The V3C bitstream is stored directly as a single track without additional processing. The V3C unit header data structure is stored in the bitstream. Single-track encapsulated V3C data can be combined with multiple-track file generation, transcoding, DASH segments, etc., for further processing.

[0092] V3C Bitstream Sample Entry:

[0093] Sample entry types: 'v3e1', 'v3eg'

[0094] Container: SampleDescriptionBox

[0095] Required: 'v3e1' or 'v3eg' sample entry is required

[0096] Quantity: One or more

[0097] V3C bitstream tracks use VolumetricVisualSampleEntry with the 'v3e1' or 'v3eg' sample entry type.

[0098] In the 'v3e1' sample entry, all atlas parameter sets and SEI messages defined in ISO / IEC 23090-5 are in the setup_unit array. In the 'v3eg' sample entry, the atlas parameter sets and SEI messages may be in the setup_unit array or in samples of the V3C bitstream track.

[0099] The V3C bitstream track sample entry includes a V3CConfigurationBox, and the following restrictions apply.

[0100] In the 'v3e1' sample entry, for an array containing an atlas parameter set, the array_completeness value is 1.

[0101] In the 'v3eg' sample entry, for an array containing an atlas parameter set, the array_completeness value is 0.

[0102] The 2D video configuration box of the V3C video component sub-bitstream defined in ISO / IEC 14496-15 is in the V3C bitstream sample entry to signal the corresponding 2D video decoder configuration and initialization information.

[0103] An optional BitRateBox defined in ISO / IEC 14496-12 may be present in the V3C bitstream sample entry to signal bit rate information of the V3C bitstream track.

[0104] Syntax:

[0105] aligned(8) class V3CBitstreamSampleEntry()

[0106] extends VolumetricVisualSampleEntry (type) {

[0107] / type is 'v3e1' or 'v3eg'

[0108] V3CConfigurationBox v3c_config;

[0109] / additional boxes

[0110] }

[0111] Semantics:

[0112] The compressorname of the base class VolumetricVisualSampleEntry represents the name of the compressor used with the recommended value " / 012V3C Coding". The first byte is the number of remaining bytes, which is represented here as / 012 (since it is octal 12) and is 10 (decimal), and is the number of bytes of the remaining string.

[0113] V3C Bitstream Track Sample Type

[0114] A V3C bitstream sample contains one or more V3C units belonging to the same presentation time, i.e., a single V3C configuration unit. The sample may be self-contained (e.g., a synchronization sample) or dependent on other samples in the V3C bitstream track in terms of decoding.

[0115] Syntax

[0116] aligned(8) class V3CBitstreamSample {

[0117] / sample_size Sample size of SampleSizeBox

[0118] for (int i=0; i < sample_size; ) {

[0119] unsigned int(v3c_config.unit_size_precision_bytes_minus1 + 1)*8) v3c_unit_size;

[0120] bit(8) ss_v3c_unit[v3c_unit_size];

[0121] i += v3c_unit_size + v3c_config.unit_size_precision_bytes_minus1 + 1;

[0122] }

[0123] }

[0124] Semantics

[0125] v3c_unit_size represents the size of the ss_v3c_unit array in bytes. The size is the same as the sample stream v3c unit size ssvu_v3c_unit_size defined in ISO / IEC 23090-5, Appendix C.

[0126] ss_v3c_unit contains a single V3C unit in the V3C unit sample stream format defined in ISO / IEC 23090-5:2021, Appendix C.

[0127] V3C Bitstream Track Synchronization Sample

[0128] The V3C bitstream synchronization sample satisfies all of the following conditions.

[0129] It can be decoded independently.

[0130] Samples following a synchronized sample (decoding order) have no decoding dependency on samples preceding the synchronized sample.

[0131] All samples following the synchronized sample (decoding order) can be successfully decoded.

[0132] V3C Bitstream Track Subsample

[0133] A V3C bitstream track subsample is a V3C unit included in a V3C bitstream track sample.

[0134] The V3C bitstream track contains one SubSampleInformationBox in the SampleTableBox or lists V3C bitstream track subsamples in the TrackFragmentBox of each MovieFragmentBox.

[0135] The 32-bit unit header of the V3C unit representing the subsample is copied to the 32-bit codec_specific_parameters field of the subsample entry in SubSampleInformationBox. The V3C unit type of each subsample is identified by parsing the codec_specific_parameters field of the subsample entry in SubSampleInformationBox.

[0136] Multiple track encapsulation

[0137] A multiple-track encapsulated V3C data container may contain three types of tracks: V3C atlas tracks, V3C atlas tile tracks, and V3C video component tracks. A multiple-track encapsulated V3C data container contains one or more V3C atlas tracks that reference zero or more V3C atlas tile tracks or zero or more V3C video component tracks. If there are V3C atlas tile tracks, they reference zero or more V3C video component tracks. The number of V3C video component tracks in a multiple-track encapsulated V3C data container depends on the V3C toolkit profile defined in ISO / IEC 23090-5 used.

[0138] ISOBMFF track references are utilized to indicate the association between a V3C video component track and a V3C atlas track or V3C atlas tile track, wherein the V3C atlas track or V3C atlas tile track includes a track reference to the V3C video component track.

[0139] Tracks belonging to the same CVS are time-aligned. Samples contributing to the same volumetric frame in different V3C video component tracks, V3C atlas tracks, and V3C atlas tile tracks have the same construction time. The atlas parameter set used for these samples has a decoding time that is equal to or earlier than the construction time of the volumetric frame. Additionally, all tracks belonging to the same CVS have the same implicit or explicit edit list.

[0140] Referring to Fig. 6, which illustrates a multi-track encapsulated V3C data container, the V3C unit payloads of the V3C bitstream are mapped to individual tracks within the multi-track container file according to their type.

[0141] Multi-track encapsulated V3C data containers include the following:

[0142] One or more V3C atlas tracks that may include track references: other tracks carrying the payload of video compression V3C units (i.e., V3C unit types identical to V3C_OVD, V3C_GVD, V3C_AVD, or V3C_PVD specified in ISO / IEC 23090-5);

[0143] V3C Atlas Tile Track: If there are multiple atlases in the bitstream, another V3C Atlas Track;

[0144] Zero or more V3C video component tracks containing access units of video-coded base streams for occupied data in the sample (i.e., V3C unit payloads of the same type as the V3C_OVD specified in ISO / IEC 23090-5);

[0145] Zero or more V3C video component tracks containing access units of video-coded base streams for geometry data in the sample (i.e., payloads of V3C units of the same type as V3C_GVD specified in ISO / IEC 23090-5);

[0146] Zero or more V3C video component tracks containing access units of video-coded base streams for attribute data in the sample (i.e., payloads of V3C units of the same type as the V3C_AVD specified in ISO / IEC 23090-5);

[0147] Zero or more V3C video component tracks containing access units of video-coded base streams for data packed in the sample (i.e., payloads of V3C units of the same type as V3C_PVD specified in ISO / IEC 23090-5);

[0148] Zero or more V3C atlas tile tracks containing only ACL NAL units for a subset of atlas tiles in the sample. V3C atlas tile tracks may include track references to other tracks carrying payloads of video compression V3C units for a specified subset of atlas tiles (i.e., V3C unit types such as V3C_OVD, V3C_GVD, V3C_AVD, and V3C_PVD); etc.

[0149] V3C Atlas Sample Entry

[0150] Sample entry type: 'v3c1', 'v3cg', 'v3cb', 'v3a1' or 'v3ag'

[0151] Container: SampleDescriptionBox

[0152] Required: 'v3c1', 'v3cg', 'v3cb', 'v3a1', or 'v3ag'. A sample item (sample entry) is required.

[0153] Quantity: One or more

[0154] The V3C atlas track uses V3CAtlasSampleEntry, which extends VolumetricVisualSampleEntry using the 'v3c1', 'v3cg', 'v3cb', 'v3a1', or 'v3ag' sample entry types. The restrictions for the V3C atlas track are as follows.

[0155] V3C atlas tracks must not carry ACL NAL units belonging to two or more atlases.

[0156] The V3C Atlas Track sample items include V3CConfigurationBox and V3CUnitHeaderBox.

[0157] Depending on the V3C bitstream or sample entry type of the atlas track, the following restrictions apply to the V3C atlas track.

[0158] If the V3C bitstream contains a single atlas, use the V3C atlas track with sample item 'v3c1' or 'v3cg'.

[0159] If a V3C bitstream contains multiple atlases, each atlas bitstream is stored as a separate V3C atlas track with a sample entry type of 'v3a1' or 'v3ag'. There must be one additional track with a sample entry type of 'v3cb', which serves as an entry point track referencing another atlas track with a sample entry type of 'v3a1' or 'v3ag'.

[0160] In the 'v3a1' and 'v3ag' sample entries, num_of_v3c_parameter_sets is equal to 0. The V3C parameter sets are stored in the sample entry of the atlas track containing 'v3cb'.

[0161] V3C atlas tracks with a sample item type of 'v3cb' do not include ACL NAL units.

[0162] For arrays containing sets of atlas parameters in the 'v3c1' and 'v3a1' sample entries, the array_completeness value is 1.

[0163] In the 'v3cg' and 'v3ag' sample entries, for arrays containing the set of atlas parameters, the value of array_completeness is 0.

[0164] The parameter set and SEI message in the atlas track with the 'v3cb' sample entry apply to all referenced V3C atlas tracks.

[0165] For tracks with sample entry type 'v3c1', 'v3cg', or 'v3cb', the track_in_movie flag in the track header is set to 1.

[0166] For tracks where the sample entry type is 'v3a1' or 'v3ag', the track_in_movie flag in the track header is set to 0.

[0167] The optional BitRateBox can signal bit rate information of V3C atlas tracks in the V3C atlas sample item.

[0168] Syntax

[0169] aligned(8) class V3CAtlasSampleEntry()

[0170] extends VolumetricVisualSampleEntry (type) {

[0171] / type is 'v3c1', 'v3cg', 'v3cb', 'v3a1' or 'v3ag'

[0172] V3CConfigurationBox config;

[0173] V3CUnitHeaderBox unit_header;

[0174] }

[0175] Semantics

[0176] The compressorname of the base class VolumetricVisualSampleEntry represents the name of the compressor used with the recommended value " / 012V3C Coding".

[0177] V3C Atlas Tile Sample Entry

[0178] Sample entry type: 'v3t1'

[0179] Container: SampleDescriptionBox

[0180] Required: Yes

[0181] Quantity: One or more

[0182] The V3C Atlas Tile Track uses V3CAtlasTileSampleEntry, which extends VolumetricVisualSampleEntry with the 'v3t1' sample item type.

[0183] The V3C Atlas Tile Track sample contains only ACL NAL units belonging to the same atlas. The V3C Atlas Tile Track contains the ACL NAL unit of at least one tile indicated by the tile_id of the V3CAtlasTileConfigurationBox.

[0184] V3CAtlasTileSampleEntry does not include V3CConfigurationBox or V3CUnitHeaderBox. The information provided by these boxes can be found in the V3C Atlas Track Sample Entry, which references the V3C Atlas Tile Track. Other optional boxes may be included.

[0185] Syntax

[0186] class V3CAtlasTileConfigurationBox

[0187] extends FullBox('v3tC', version = 0, 0) {

[0188] unsigned int(3) unit_size_precision_bytes_minus1;

[0189] unsigned int(1) spatial_scalability_enabled_flag;

[0190] bit(4) reserved = 0;

[0191] if (spatial_scalability_enabled_flag) {

[0192] unsigned int(8) lod_index;

[0193] }

[0194] unsigned int(16) num_tiles;

[0195] for(int i=0; i < num_tiles; i++){

[0196] unsigned int(16) tile_id;

[0197] }

[0198] }

[0199] aligned(8) class V3CAtlasTileSampleEntry()

[0200] extends VolumetricVisualSampleEntry ('v3t1') {

[0201] V3CAtlasTileConfigurationBox tile_info;

[0202] }

[0203] Semantics

[0204] unit_size_precision_bytes_minus1 plus 1 indicates the precision in bytes of the sample stream NAL units to which the sample item containing this configuration box applies. The value of this field is equal to ssnh_unit_size_precision_bytes_minus1 in the sample_stream_nal_header() of the atlas component bitstream.

[0205] spatial_scalability_enabled_flag is a flag indicating whether LoD-based scalability is supported in delivered V3C content.

[0206] lod_index represents the LoD index value associated with a tile passed from the atlas tile track. An atlas tile track with a specific LoD index (if any) is selected along with all atlas tile tracks containing that tile that have a lower lod_index value. The set of LoD tiles associated with the lower lod_index value is processed first.

[0207] num_tiles is the number of tiles included in the track.

[0208] tile_id is the tile ID of a tile on the track. The value of tile_id is equal to the value of the afti_tile_id syntax element of the atlas frame tile information defined in ISO / IEC 23090-5.

[0209] V3C Atlas Sample Format

[0210] Each sample of the V3C Atlas Track or V3C Atlas Tile Track corresponds to a single coded atlas access unit and has the following additional description.

[0211] When using 'v3cb' sample entries, each sample in the V3C atlas track corresponds to one or more non-ACL NAL units.

[0212] When using 'v3c1', 'v3cg', 'v3a1', or 'v3ag' sample entries, each sample in the V3C atlas track corresponds to a coded atlas access unit associated with the same vuh_atlas_id displayed in the sample entry's V3C unit header box.

[0213] Syntax

[0214] aligned(8) class V3CAtlasSample {

[0215] / The sample_size value is the sample size of SampleSizeBox

[0216] for (int i=0; i < sample_size; ) {

[0217] unsigned int(v3c_config.unit_size_precision_bytes_minus1 + 1)*8) nal_size;

[0218] bit(8) ss_nal_unit[nal_size];

[0219] i += nal_size + v3c_config.unit_size_precision_bytes_minus1 + 1;

[0220] }

[0221] }

[0222] Semantics

[0223] nal_size represents the size of the ss_nal_unit array in bytes.

[0224] ss_nal_unit is a data array containing a single NAL unit defined in ISO / IEC 23090-5.

[0225] V3C Atlas Track and V3C Atlas Tile Track Synchronization Sample

[0226] A synchronization sample of a V3C atlas track or V3C atlas tile track is a sample containing an atlas access unit coded as an Intra Random Access Point (IRAP) as defined in ISO / IEC 23090-5.

[0227] V3C Video Component Track

[0228] V3C video component tracks transmit 2D video-encoded data of V3C video components. Storage of V3C video component tracks utilizes existing functions of ISO-based media file formats and derived specifications. For example, ISO / IEC 14496-15 defines a mechanism for transmitting V3C video components coded in ISO / IEC 14496-10 and ISO / IEC 23008-2.

[0229] Referring to FIG. 6, the file includes an atlas track which is an entry point. The file includes a geometry track, an attribute track, and an accusation track which are video component tracks. Each track includes a sample entry and one or more samples. The sample entry of each track may include configuration information and / or a set of parameters. The atlas track includes reference information that references the video component tracks. The sample entry of each track may include a unit header. The samples of each track may include units.

[0230] FIG. 7 shows DASH signaling according to embodiments.

[0231] The point cloud data transmission method according to the embodiments may further include MPEG-DASH-based encapsulation.

[0232] Single Track Mode

[0233] DASH's single-track mode enables streaming of V3C ISOBMFF files containing V3C content using single-track encapsulation. DASH's single-track mode is represented as a single adaptation set with one or more representations.

[0234] If the representation consists of two or more media segments, there is an initialization media segment. The initialization segment contains a V3CDecoderConfigurationRecord with a v3c_parameter_set syntax structure defined in (ISO / IEC 23090-5, Clause 7) and a Component Codec Mapping SEI message defined in (ISO / IEC FDIS 23090-5, Appendix E).

[0235] The first sample of the media segment has a stream access point (SAP) of type 1 or 2. That is, each sub-sample of the first sample has a stream access point (SAP) of type 1 or 2.

[0236] V3C Pre-selection

[0237] V3C preselection can be signaled in MPD using a PreSelection element within a Period element or a Preselection descriptor at the Adaptation Set level. The V3C PreSelection element is signaled with a list of IDs for the @preselectionComponents attribute as defined in ISO / IEC 23009-1, which includes the ID of the Main Adaptation Set of the volumetric media followed by the IDs of the Video Component Adaptation Sets. The @codecs attribute for the Preselection is set to 'v3c1', 'v3cg', or 'v3cb' to indicate that the media represented by the Preselection is visual volumetric video-based coding media.

[0238] Figure 7 shows a DASH configuration for grouping V3C components belonging to a single V3C content within an MPEG-DASH MPD file.

[0239] If there are multiple atlases in V3C content, each atlas track is represented as a separate adaptation set considered as an atlas adaptation set. An atlas adaptation set is defined by setting the @codecs attribute to 'v3a1' or 'v3ag'. The representation of an atlas adaptation set is defined by setting the @dependencyId attribute to the ID of the representation of the main adaptation set. Each atlas adaptation set is part of a separate preselection that includes the atlas adaptation set, which is the main adaptation set of the preselection, and the video component adaptation set of that atlas.

[0240] V3C Atlas Tile Pre-selection

[0241] If a V3C atlas tile is transmitted as a separate track, it must be represented as a separate adaptation set considered as an atlas tile adaptation set, and the @codecs attribute of the adaptation set is set to 'v3t1'. The V3C video component track associated with the atlas tile track is also transmitted as a separate adaptation set with the @codecs attribute set to 'resv.vvvc.XXXX', where XXXX corresponds to the 4-character code (4CC) of the video codec (e.g., 'avc1' or 'hvc1').

[0242] Atlas Tile Adaptation Sets and associated Video Component Adaptation Sets must be part of a single Atlas Tile Preselection in MPD, and the Atlas Tile Adaptation Set is the primary Adaptation Set of that Preselection (i.e., the ID of the Atlas Tile Adaptation Set is the first ID in the list of Adaptation Sets in the @preselectionComponents attribute of the Preselection element or the @value attribute of the Preselection descriptor). The representation of the Atlas Tile Adaptation Set of the Atlas Tile Preselection has an @dependencyId attribute set to the representation ID of that Atlas Adaptation Set.

[0243] The point cloud data transmission method according to the embodiments can generate and transmit elements and attributes of the V3CVideoComponent descriptor (V3C video component descriptor) of the MPD for MPEG-DASH signaling as follows.

[0244] Use the V3CVideoComponent descriptor to identify the type of video component adaptation set. The V3CVideoComponent descriptor is an EssentialProperty descriptor with @schemeIdUri set to "urn:mpeg:mpegI:v3c:2020:videoComponent".

[0245] At the adaptation set level, for each V3C video component in the representation of the video component adaptation set (Fig. 7 Video Component Adaptation Set), one V3CVideoComponent descriptor is transmitted as a signal.

[0246] videoComponent@type: Indicates the type of the V3C video component. The value 'geom' indicates a geometry component, 'occp' indicates an occupancy component, and 'attr' indicates an attribute component.

[0247] videoComponent@is_auxiliary: A flag indicating whether the V3C video component information represented in the Adaptation Set with the V3CVideoComponent descriptor is for auxiliary video. A value of true indicates that the video is auxiliary video and contains RAW and / or EOM patches. If equal to false, it indicates that the video may contain RAW and / or EOM patches.

[0248] videoComponent@map_index: Represents the index of one of the maps of components in the Adaptation Set with the V3CVideoComponent descriptor.

[0249] videoComponent@attribute_type: Indicates the type of attribute as defined in Table 3 of ISO / IEC 23090-5:2021.

[0250] videoComponent@attribute_index: Represents the index of the attribute.

[0251] videoComponent@atlas_id: Represents the atlas ID of the component in the Adaptation Set with the V3CVideoComponent descriptor.

[0252] videoComponent@tile_ids: Represents atlas tiles associated with the data included in the Adaptation Set by providing a space-separated list of tile ID values.

[0253] Point cloud data according to the embodiments can generate a V3C descriptor of an MPD for MPEG-DAH signaling.

[0254] The SupplementalProperty element with @schemeIdUri "urn:mpeg:mpegI:v3c:2020:v3c" is a V3C descriptor. At most one V3C descriptor may exist in the Main Adaptation Set, Atlas Adaptation Set, Atlas Tile Adaptation Set, V3C Preselection, or Atlas Tile Preselection, as shown in FIG. 7.

[0255] v3c:@vId: This is the ID of the volume media. This attribute exists when multiple versions of the same volume media are signaled in separate adaptation sets of the MPD.

[0256] v3c:@atlas_id: Represents the atlas ID for the volume media information of the track delivered in the adaptation set.

[0257] v3c:@tile_ids: If present, indicates the atlas tile ID passed from the atlas tile adaptation set.

[0258] For ISOBMFF, it includes all tile IDs listed in V3CAtlasTileSampleEntry of the V3C Atlas Tile Track.

[0259] The point cloud data transmission method according to the embodiments can generate V3C3DRegions descriptors within the MPD as follows for spatial domain signaling for partial access.

[0260] static space area

[0261] If the 3D spatial regions are static (i.e., the location and size of each region do not change during the presentation time), the characteristics of the spatial regions and the mapping between those regions and V3C tiles are signaled using the V3C3DRegions descriptor. This descriptor is a SupplementalProperty element where @schemeIdUri is "urn:mpeg:mpegI:v3c:2020:v3sr". A single V3C3DRegions descriptor exists at the Adaptation Set level, the Representation level of the Main Adaptation Set, or the Preselection level of the V3C content, as shown in D7.

[0262] The elements of the V3C3DRegions descriptor are as follows.

[0263] v3sr: A container element whose attributes and elements specify the mapping between 3D space regions and V3C tiles.

[0264] v3sr.spatialRegion: This is an element whose attribute defines a 3D spatial region and provides a mapping between the defined region and multiple V3C tiles.

[0265] v3sr.spatialRegion@id: Identifier of a 3D spatial region.

[0266] The value of this attribute matches the value of the region_id field signaled for the corresponding region of the ISOBMFF container.

[0267] v3sr.spatialRegion@type: This property indicates the type of spatial region. A value of 0 indicates a cubic region. A value of 1 indicates a region corresponding to a viewport.

[0268] v3sr.spatialRegion.cuboid: An element that specifies a cube extending from a reference point in a spatial region. This element exists only when the spatialRegion@type attribute is set to 0.

[0269] v3sr.spatialRegion.cuboid@anchor: An attribute containing three pairs of values describing the x, y, and z components of bb_position for the V3CBoundingBox signaled from the corresponding ISOBMFF container. The values in the array are arranged in that order, and the length of the array is 3.

[0270] v3sr.spatialRegion.cuboid@dimensions: An attribute containing three pairs of values describing the x, y, and z dimensions of bb_scale for V3CBoundingBox signaled from the corresponding ISOBMFF container. The values in the array are arranged in that order, and the length of the array is 3.

[0271] v3sr.spatialRegion.viewport: An element that specifies the viewport corresponding to the spatial region. This element exists only when the spatialRegion@type attribute is set to 1.

[0272] v3sr.spatialRegion.viewport@rvIds: A space-separated list of identifiers corresponding to the @viewport_id attribute values of the RV descriptors representing the viewports in this region.

[0273] v3sr.spatialRegion@tile_ids: Represents the atlas tile IDs mapped to this spatial region.

[0274] The value of the @tile_ids: attribute is a space-separated list of atlas tile IDs.

[0275] This attribute does not exist in the case of single-track encapsulation of V3C content or when there is one or more lod elements.

[0276] v3sr.spatialRegion.lod: This is a container element whose attribute provides LoD information and the V3C tile corresponding to that LoD.

[0277] v3sr.spatialRegion.lod@idx: An identifier representing the order of LoDs for an associated 3D spatial region.

[0278] The value of this attribute matches the value of the lod_index field signaled for the corresponding LoD of the ISOBMFF container.

[0279] v3sr.spatialRegion.lod@tile_ids: A space-separated list of identifiers corresponding to the values of the atlas tile IDs mapped to this LoD.

[0280] Dynamic spatial area

[0281] If the 3D partition is dynamic, a time metadata track must be used to signal the position and size of each 3D region on the presentation timeline, and it is included in a separate adaptation set with a single representation associated with the representation of the main adaptation set using the @associationId attribute defined in ISO / IEC 23009-1 and the @associationType value containing 4CC 'cdsc'.

[0282] The point cloud data transmission method according to the embodiments may add information related to the location or size of the point cloud data within an adaptation set for the atlas of the MPD and / or an adaptation set for the geometry. The point cloud data reception method according to the embodiments may decode the entire space and subspace of the point cloud data based on the MPD.

[0283] An adaptation set for MPD atlases and / or an adaptation set for geometry may include @component@geometry_type. @component@geometry_type may include Anchor Point(x, y, z), Position(x, y, z), Size(x, y, z), Rotation(1, 1, j, k), and / or Dynamic / Static.

[0284] @component@geometry_type is component geometry type information. Anchor Point(x, y, z) is anchor point information of the spatial region. Position(x, y, z) is location information of the spatial region. Position(x,y,z) represents the position of the spatial region in the Cartesian coordinate system. Size(x,y,z) is size information of the spatial region. Rotation(1, I, j, k) represents rotation information of the spatial region. Dynamic / Static indicates whether the spatial region is a dynamic region that changes dynamically over time or a static region.

[0285] If the file according to the embodiments is a single track, position information within the MPD can be generated as follows.

[0286] <mpd>

[0287] <period>

[0288] <adaptationset mimetype="video / mp4" codecs="v3e1.L2.0.0.1, resv.vvvc.avc1.4D401E" framerate="30">

[0289] <geom_type Anchor Point="x, y, z”Position="x, y, z”Size="x, y, z”Rotation="1, i, j, k""dynamic”>< / geom_type>

[0290]

[0291] <segmentlist>

[0292] <initialization sourceURL="seg-m-init.mp4" / >

[0293] < / segmentlist>

[0294] <representation bandwidth="512000">

[0295] <baseurl>vpcc-512k.mp4 < / baseurl>

[0296] < / representation>

[0297] <representation bandwidth="1024000">

[0298] <baseurl> vpcc-1024k.mp4 < / baseurl>

[0299] < / representation>

[0300] <representation bandwidth="2048000">

[0301] <baseurl> vpcc-2048k.mp4 < / baseurl>

[0302] < / representation>

[0303] < / adaptationset>

[0304] < / period>

[0305] < / mpd>

[0306] A decoder or receiving device according to the embodiments receives an MPD and <geom_type Anchor Point=“x, y, z”Position=“x, y, z”Size "x,y,z" Rotation“1, i, j, k””dynamic”>< / geom_type> Parses .

[0307] Anchor Point=“x, y, z” represents the location of the bounding box containing the spatial region.

[0308] Position="x, y, z" represents the position of the spatial region.

[0309] Size="x, y, z" represents the size of the spatial region.

[0310] Rotation=“1, i, j, k” represents the degree of rotation of the spatial region.

[0311] “Dynamic” indicates that the spatial region is a dynamic region. If it is “static,” it indicates that the spatial region is a static region.

[0312] If the file according to the embodiments is multi-track, position information within the MPD can be generated as follows.

[0313] <mpd> <period>

[0314]

[0315] <adaptationset id="1" codecs="v3c1"> <essentialproperty schemeIdUri="urn:mpeg:dash:preselection:2016" / >

[0316] <geom_type Anchor Point="x,y,z”Position="x,y,z”Rotation"1, i, j, k”"dynamic”>< / geom_type>

[0317] <representation> ...< / representation>

[0318] < / adaptationset>

[0319]

[0320]

[0321] <adaptationset id="2" mimetype="video / mp4" codecs="resv.vvvc.hvc1">

[0322] <essentialproperty schemeiduri="urn:mpeg:dash:preselection:2016 schemeIdUri=" urn:mpeg:mpegi:v3c:2020:component"> <v3c:videocomponent type="”occp"" / > < / essentialproperty>

[0323] <representation> ...< / representation>

[0324] < / adaptationset>

[0325]

[0326] <adaptationset id="4" mimetype="video / mp4" codecs="resv.vvvc.hvc1">

[0327] <essentialproperty schemeiduri="urn:mpeg:dash:preselection:2016 schemeIdUri=" urn:mpeg:mpegi:v3c:2020:component"> <v3c:videocomponent type="”geom"" / > < / essentialproperty>

[0328]

[0329] * <representation> ...< / representation>

[0330] < / adaptationset>

[0331]

[0332] <adaptationset id="6" mimetype="video / mp4" codecs="resv.vvvc.hvc1">

[0333] <essentialproperty schemeiduri="urn:mpeg:dash:preselection:2016 schemeIdUri=" urn:mpeg:mpegi:v3c:2020:component"> <v3c:videocomponent type="attr" / >

[0334] < / essentialproperty>

[0335] <representation> ...< / representation>

[0336] < / adaptationset>

[0337] < / period> < / mpd>

[0338] As shown above, MPD's main V3C adaptation set is <geom_type Anchor Point=“x,y,z”Position=“x,y,z”size“x,y,z”Rotation”1,I,j,k”“Dynamic”>< / geom_type> It may include.

[0339] Or, as shown below, the adaptation set for the geometry of the MPD is <geom_type Anchor Point=“x,y,z”Position=“x,y,z”size“x,y,z”Rotation”1,i,j,k”"dynamic">< / geom_type> It may include.

[0340] If the file according to the embodiments is multi-track, position information within the MPD can be generated as follows.

[0341] <mpd> <period>

[0342]

[0343] <adaptationset id="1" codecs="v3c1"> <essentialproperty schemeIdUri="urn:mpeg:dash:preselection:2016" / >

[0344] <representation> ...< / representation>

[0345] < / adaptationset>

[0346]

[0347] <adaptationset id="2" mimetype="video / mp4" codecs="resv.vvvc.hvc1">

[0348] <essentialproperty schemeiduri="urn:mpeg:dash:preselection:2016 schemeIdUri=" urn:mpeg:mpegi:v3c:2020:component"> <v3c:videocomponent type="”occp"" / > < / essentialproperty>

[0349] <representation> ...< / representation>

[0350] < / adaptationset>

[0351]

[0352] <adaptationset id="4" mimetype="video / mp4" codecs="resv.vvvc.hvc1">

[0353] <essentialproperty schemeiduri="urn:mpeg:dash:preselection:2016 schemeIdUri=" urn:mpeg:mpegi:v3c:2020:component"> <v3c:videocomponent type="”geom"" / >

[0354] < / essentialproperty>

[0355] <geom_type Anchor Point="x,y,z”Position="x,y,z”size"x,y,z”Rotation”1,i,j,k”"dynamic”>< / geom_type>

[0356] <representation> ...< / representation>

[0357] < / adaptationset>

[0358]

[0359] <adaptationset id="6" mimetype="video / mp4" codecs="resv.vvvc.hvc1">

[0360] <essentialproperty schemeiduri="urn:mpeg:dash:preselection:2016 schemeIdUri=" urn:mpeg:mpegi:v3c:2020:component"> <v3c:videocomponent type="attr" / >

[0361] < / essentialproperty>

[0362] <representation> ...< / representation>

[0363] < / adaptationset>

[0364] < / period> < / mpd>

[0365] The adaptation set for Geometry <geom_type Anchor Point=“x,y,z”Position=“x,y,z”size“x,y,z”Rotation”1,i,j,k”"dynamic">< / geom_type> It may also include.

[0366] FIG. 8 shows an IDD (Init Description Document) according to the embodiments.

[0367] A point cloud data transmission device according to embodiments may include a point cloud data acquisition unit, a point cloud data encoder, and / or a file / segment encapsulator, as shown in FIG. 5. A point cloud data reception device according to embodiments may include a file / segment decapsulator, a point cloud data decoder, and / or a point cloud data renderer.

[0368] A point cloud data transmitting device according to the embodiments can adaptively transmit V-PCC data, and a point cloud data receiving device according to the embodiments can adaptively receive V-PCC data.

[0369] Through the embodiments, Descriptor Document efficiency can be secured when transmitting data over a wide space based on a hierarchical MPD. Data can be transmitted adaptively through an MPD containing object information. Transmission efficiency can be increased by changing the object configuration in units of Group or Tile. In the overall operation, a user information feedback channel can be defined for the server to push optimal content based on user information.

[0370] Referring to FIG. 8, a transmitting device according to the embodiments may transmit by additionally including IDD information within a file. When the receiving side selects the MPD, it may select necessary information based on the user's location, motion vector, user's field of view, etc. The decoder at the receiving side may receive the MPD based on the user's selection. The decoder may request a segment from the server. The decoder may receive the segment data.

[0371] Referring to FIGS. 3 and 4, multiple objects can be grouped into a single group, and information regarding this can be added to the MPD and / or file. The object group may vary depending on the user's position or field of view. Using the user's position or field of view information, the decoder can obtain the necessary information from the MPD and / or file to decode and render the desired V-PCC data.

[0372] The transmission method according to the embodiments can perform a scalable encoding method for adaptive transmission.

[0373] The transmission method according to the embodiments can encode an Object by dividing it into stages based on the user's location. For example, the unit of division may include groups and / or tiles, etc.

[0374] The transmission method according to the embodiments can scalably encode an Object divided into steps.

[0375] The transmission method according to the embodiments can generate and transmit tracks of a hierarchical MPD and / or file for wide-space transmission implementation.

[0376] The transmission method according to the embodiments can divide the space into a fixed size and represent it as individual MPDs, and generate a Descriptor Document that explains which location each MPD is responsible for.

[0377] The transmission method according to the embodiments can generate object information within the MPD for user location-based adaptive selection. For example, it may include the object's location information, size information, and / or rotation information. Additionally, it may include dynamic-related information.

[0378] The transmission method according to the embodiments can form a feedback channel to support a push operation for a user. For example, by using a feedback channel that allows content to be pushed from a server considering the user's location, direction, speed, etc., track information of the relevant MPD and / or file can be transmitted to a decoder so that the receiving decoder can decode V-PCC content optimized for the user.

[0379] The IDD according to the embodiments refers to an initial description document and may be abbreviated as initial information, etc. The IDD may include the following information:

[0380] Information distinguishing the scope of the entire space, background and / or foreground, etc.: For example, it may include information distinguishing the background and foreground to separately convey background elements, which are elements that must be visualized invariably at the user's location in an open space.

[0381] Information distinguishing whether a space is open and / or closed when defining a space: For example, relevant information may be included to determine whether elements not contained in the corresponding MPD are visible when a user looks in a specific direction. Additionally, if it is an open space, objects may need to be downloaded via the background for that direction or through an additional MPD.

[0382] Definition of a segmented space and the URL of the MPD containing the corresponding space information: For example, the space described by the MPD can be described as accurately as possible by distinguishing between Points and Faces to include spatial elements with curved shapes rather than simple shapes like a rectangular prism. Additionally, the Face Index can be used to describe whether a face is an open space or not. The Face Index can have the same meaning as a Face ID.

[0383] The IDD according to the embodiments can be configured as follows:

[0384] The Global Structure of an IDD contains size information for the entire space. Specifically, the bounding box of an IDD contains range information for the entire space. For example, it may include the location of the bounding box and the size of the bounding box.

[0385] A Spatial Partition in an IDD contains information about subspaces. Information about subspaces of a whole space can be included as follows. For example, Space ID represents the ID value of the space. Name represents the name of the space as a String. Bounding Box provides vertex information for implementing the space. Face provides information about the faces formed by connecting the vertices. Properties provides additional information about the space. State indicates whether the space is open. Face represents the Face Index that is open. Type indicates whether the space is background or foreground. Reference represents the URL address of the MPD. MPD contains the URL address of the MPD.

[0386] The Bounding Box element may include vertex IDs constituting the bounding box and location information of points corresponding to the vertex IDs.

[0387] The Face element includes a face ID and information about the vertices constituting the face identified by the face ID.

[0388] The Properties element indicates whether the space is open or closed. In this case, the State element can be used to display the open / closed status of the space.

[0389] Figure 9 illustrates a method for describing IDD-based spatial information.

[0390] As shown in FIG. 9, a total space exists, and a sub-space of the total space may exist. The total space can be represented based on a background and a certain range. The space may include open spaces and / or closed spaces. By representing this space with various information, the decoder has the effect of enabling sparse access.

[0391] The configuration of the aforementioned IDD can be re-expressed as follows:

[0392] <idd>

[0393] <globalstructure>

[0394] <boundingbox>

[0395] <minx> 0< / minx> <miny> 0< / miny> <minz> 0< / minz>

[0396] <maxx> 1000< / maxx> <maxy> 1000< / maxy> <maxz> 1000< / maxz>

[0397] < / boundingbox>

[0398] < / globalstructure>

[0399] <spatialpartition>

[0400] <space id="S1">

[0401] <name> MainRoom < / name>

[0402] <boundingbox>

[0403] <vertices>

[0404] <vertex id="A" x="0" y="0" z="0" / >

[0405] <vertex id="B" x="2" y="0" z="0" / >

[0406] <vertex id="C" x="2" y="1" z="0" / >

[0407] <vertex id="D" x="3" y="1" z="0" / >

[0408] <vertex id="E" x="3" y="3" z="0" / >

[0409] …

[0410] < / vertices> < / boundingbox>

[0411] <faces>

[0412] <Face id="1" <vertexref> ABCDEFGHIJ < / vertexref>

[0413] <Face id="2” <vertexref> ABBA' < / vertexref> <face>…

[0414] <Properties = "open”

[0415] <state> 1,2< / state>

[0416]

[0417] <Properties state = "close”

[0418] <state> 3,4< / state>

[0419]

[0420] <type> Background < / type>

[0421] <reference>

[0422] <mpd> http: / example.com / mainroom.mpd < / mpd>

[0423] < / reference>

[0424] < / face> < / faces> < / space>

[0425] < / spatialpartition>

[0426] < / idd>

[0427] That is, as described above, IDD information may include global structure information. Global structure information may include bounding box information. Bounding box information may include minimum and maximum values of position information for each axis of the bounding box.

[0428] IDD information may include sparse partition information. Sparse partition information may include identification information of a space corresponding to the sparse partition. Sparse partition information may include name information of a space identified by the identification information of the space. Sparse partition information may include identification information of vertices of a bounding box, which is a space identified by the identification information of the space, and location information of the vertices.

[0429] Spatial partition information may include face information. Face information may include information identifying a face. Face information may include reference information of vertices included in a face identified by face identification information. Spatial partition information may include face state information. State information may indicate at least one state, such as closed or open, through attribute information. If the attribute (property) is open, the ID information of faces having the same open state may be indicated through state information. For example, if the IDs of faces with a face property state of closed are 3 and 4, the state information may include the IDs of faces having the same closed state. Spatial partition information may include type information. Type information may indicate attributes of the space, such as whether the space of the sparse partition is background or foreground.

[0430] Spatial partition information may include MPD reference information. MPD reference information may include address information from which the MPD can be obtained.

[0431] The point cloud data transmission method according to the embodiments can transmit V-PCC configuration information to an MPD-based descriptor based on the DASH method.

[0432] You can indicate that information for VPCC is being passed through the schemeIDUri within the EssentialProperty in MPD: "urn:mpeg:mpegI:VPCC:2020:component".

[0433] To pass a V-PCC, each item within the MPD can include a VPCC Component Descriptor. This descriptor contains information about the corresponding V-PCC. For example, as in @component@group_id, the descriptor includes a group identifier. As in @component@component_type, the descriptor indicates the V-PCC data type. 'geom' refers to geometry coordinates, and 'attr' refers to attributes such as color. Regarding @component@geomtry_type, Anchor Point (X,Y,Z) includes location information of the area's anchor point, Position (X,Y,Z) includes geometry coordinate information, and Dynamic Position (X,Y,Z) includes geometry position information that changes over time. Additionally, Final Position can convey the final position information of a dynamic area point.

[0434] The point cloud data transmission method according to the embodiments can generate and transmit a VPCC Component Descriptor within the MPD. The VPCC Component Descriptor may include a group ID that identifies a group of geometry data of the point cloud data. Additionally, it may include a group ID that identifies a group of attribute data.

[0435] A method for receiving point cloud data according to the embodiments can receive and parse a VPCC Component Descriptor within an MPD. The VPCC Component Descriptor may include a group ID that identifies a group of geometry data of the point cloud data. Additionally, it may include a group ID that identifies a group of attribute data. Through the group ID within the MPD, point cloud data associated with a specific object and / or a specific space can be efficiently partially decoded.

[0436] The method / device according to the embodiments can adaptively transmit and receive point cloud data through the recording of position information within the IDD and / or MPD.

[0437] The MPD according to the embodiments may further include additional elements that define and verify the relationship between the IDD and the MPD. For example, the MPD may further include IDD index information, etc.

[0438] The method / device according to the embodiments can check the IDD in the MPD using the following method. For example, information indicating the ID value can be added to the MPD item itself. <IDD ID = “Number” / <MPD IDD = “Number” 와 같은 형태로 IDD의 ID를 나타낼 수 있다. MPD 하위 항목에 IDD의 ID를 추가하여 동일성 여부 나타낼 수 있다.

[0439] FIG. 10 illustrates an IDD-based spatial signaling method according to embodiments.

[0440] The encoding method according to the embodiments can signal point cloud data based on space. For example, objects in the point cloud data may include cars, objects, people, etc. The foreground of the point cloud data may include objects. The background of the point cloud data may be a background having a texture such as a wall. The encoding method according to the embodiments can encode the space where the point cloud data is located by dividing it into a plurality of rooms and add related signaling information to the tracks and / or MPDs of the file. The atlas tracks of the file may include location and size information of objects for the plurality of rooms. Each MPD may include location and size information of objects for each room. The encoding method according to the embodiments can generate information regarding the 3D space, which is the entire space where the point cloud data is located, as an IDD.

[0441] A decoder or receiving device according to the embodiments can receive an IDD, parse 3D space information, and parse MPDs associated with the IDD to partially access and decode an object of a specific room.

[0442] FIG. 11 illustrates a method for signaling dynamic position according to embodiments.

[0443] The point cloud data encoding / decoding method according to the embodiments can generate and transmit / receive signaling information regarding a dynamic region (position).

[0444] The aforementioned adaptation set can identify Dynamic / Static information as “dynamic” / “static” respectively, depending on whether the region is dynamic or static.

[0445] If the region-related information within the adaptation set is "dynamic," the adaptation set includes anchor points and "dynamic" information, and the segment item may include additional information values. To efficiently signal dynamic regions, the necessary information can be divided and defined for each of the MPD's adaptation set and segment item. Since a segment transmits media for a specific time interval, the information provided by the segment can represent the entire time interval. Therefore, the additional information provided by the segment can be set to a single static value that encompasses all movement of the object during the segment length. In other words, the box generated through the position, size, and rotation provided by the segment can be a value that includes all of the object during that time interval.

[0446] FIG. 11 illustrates an area signaled through information values defined within a segment. As previously explained, since a segment describes a specific time interval, it can include information that encompasses all objects spanning that time interval (objects according to t1 to t3). For example, when a segment includes time intervals _1, _2, and _3, the object position, size, and rotation for each time interval are applied, resulting in a bounding box similar to that in FIG. 11.

[0447] The encoding method according to the embodiments can generate information related to the position, size, and rotation of a box that encompasses the entire time interval from t1 to t3. Additionally, during actual operation, information related to the position, size, and rotation of each region (box) included within the entire box can be parsed to calculate the maximum and minimum values of each box value, thereby generating position, size, and rotation values.

[0448] The information values within the Segment for the aforementioned operation are as follows.

[0449] <mpd>

[0450] <period duration="PT10M">

[0451] <adaptationset mimetype="video / mp4" codecs="v3e1.L2.0.0.1, resv.vvvc.avc1.4D401E" framerate="30">

[0452] <geom_type Anchor Point="x,y,z”d"ynamic">< / geom_type>

[0453] <representation id="720p" bandwidth="3200000" width="1280" height="720">

[0454] <segmentlist timescale="90000" duration="5400000">

[0455] <segmenturl Position=""x_1,y_1," z_1”Size=""w_1,d_1," h_1”Rotation=""1_1,i_1," j_1, k_1”media="segment-1.ts" / >

[0456] <segmenturl Position=""x_2,y_2," z_2”Size=""w_2,d_2," h_2”Rotation=""1_2,i_2," j_2, k_2”media="segment-2.ts" / >

[0457] <segmenturl Position=""x_3,y_3," z_3”Size=""w_3,d_3," h_3”Rotation=""1_3,i_3," j_3, k_3”media="segment-3.ts" / >

[0458] <segmenturl Position=""x_4,y_4," z_4”Size=""w_4,d_4," h_4”Rotation=""1_4,i_4," j_4, k_4”media="segment-4.ts" / >

[0459] < / segmentlist>

[0460] < / representation>

[0461] < / adaptationset>

[0462] < / period>

[0463] For example, as AdaptationSet geom_type information, it may include an anchor point (Anchor Point="x,y,z") and dynamic type information ("Dynamic") indicating that the region associated with the anchor point is dynamic.

[0464] Furthermore, in order to signal detailed area information as a segment, for example, when signaling four areas that change over time using one or more SegmentURL information, information regarding the location, size, and rotation status of each area can be transmitted as follows.

[0465] <segmenturl Position=""x_1,y_1," z_1”Size=""w_1,d_1," h_1”Rotation=""1_1,i_1," j_1, k_1”media="segment-1.ts" / >

[0466] <segmenturl Position=""x_2,y_2," z_2”Size=""w_2,d_2," h_2”Rotation=""1_2,i_2," j_2, k_2”media="segment-2.ts" / >

[0467] <segmenturl Position=""x_3,y_3," z_3”Size=""w_3,d_3," h_3”Rotation=""1_3,i_3," j_3, k_3”media="segment-3.ts" / >

[0468] <segmenturl Position=""x_4,y_4," z_4”Size=""w_4,d_4," h_4”Rotation=""1_4,i_4," j_4, k_4”media="segment-4.ts" / >

[0469] In addition, the encoding method according to the embodiments can generate segment information as follows.

[0470] When point cloud objects have dynamic characteristics but the area described at the segment level does not change (e.g., composed of common area information at the segment level): MPD can be generated as follows.

[0471] <mpd>

[0472] <period duration="PT10M">

[0473] <adaptationset mimetype="video / mp4" codecs="v3e1.L2.0.0.1, resv.vvvc.avc1.4D401E" framerate="30">

[0474] <geom_type Anchor Point="x,y,z”"dynamic”>< / geom_type>

[0475] <representation id="720p" bandwidth="3200000" width="1280" height="720">

[0476]

[0477] * <segmentlist position=""x_1,y_1," z_1” size=""w_1,d_1," h_1” rotation=""1_1," i_1,j_1, k_1” timescale="90000" duration="5400000">

[0478] <segmenturl media="segment-1.ts" / >

[0479] <segmenturl media="segment-2.ts" / >

[0480] <segmenturl media="segment-3.ts" / >

[0481] <segmenturl media="segment-4.ts" / >

[0482] < / segmentlist>

[0483] < / representation>

[0484] < / adaptationset>

[0485] < / period>

[0486] Each <segmenturl media="segment-1.ts" / > , <segmenturl media="segment-2.ts" / > , <segmenturl media="segment-3.ts" / > , <segmenturl media="segment-4.ts" / > a higher-level element <segmentlist position=""x_1,y_1," z_1” size=""w_1,d_1," h_1” rotation=""1_1," i_1,j_1, k_1” timescale="90000" duration="5400000">Through this, area information of dynamic objects can be signaled.

[0487] Referring to FIG. 9, the method / device according to the embodiments may provide additional information necessary for dynamic region decoding based on IDD when the object of the point cloud is dynamic over time. For example, IDD may represent region division, and MPD may convey descriptive information regarding the divided region. In this process, a signaling method is required for the decoder when the object moves and crosses the region. To this end, when describing the space of the IDD, changes in the object may also be expressed, so that the information of the IDD can be configured to allow the decoder to access the continuous movement of the object. Furthermore, the IDD may further include descriptive information regarding the dynamic object existing within the space and / or information regarding movement details upon entry / exit. Additionally, the IDD may have a structure that allows for the simultaneous creation of an object list and description of the object's entry / exit and movement path.

[0488] For example, the aforementioned IDD can be configured as follows:

[0489] <idd>

[0490] <globalstructure> …< / globalstructure>

[0491] <spatialpartition>

[0492] <space id="S1">

[0493] <name> MainRoom < / name>

[0494] <boundingbox> …

[0495] <faces>…

[0496] <Properties = "open”…

[0497] <Properties state = "close”…

[0498] <type> Background < / type>

[0499] <Included Obejct>

[0500] <Object name="person1”App="58s, S2”Dis="88s, S6”

[0501] <Object name="person3”App="24s, S3, 90s, S5”Dis="45s, S4”

[0502] < / Included Object>

[0503] <reference>

[0504] <mpd> http: / example.com / mainroom.mpd< / mpd>

[0505] < / reference>

[0506] < / faces> < / boundingbox> < / space>

[0507] <space id="S2"> …< / space>

[0508] <space id="S3"> …< / space>

[0509] <space id="S4"> …< / space>

[0510] <space id="S5"> …< / space>

[0511] <space id="S6"> …< / space>

[0512] <space id="S7"> …< / space>

[0513] <space id="S8"> …< / space>

[0514] < / spatialpartition>

[0515] < / idd>

[0516] <Included Obejct> : Indicates that it is area information related to the space represented by the IDD.

[0517] <object name=""person1”App="58s,S2”Dis="88s,S6”">: Object name represents the name of the object (it can represent the type of object, such as person, thing, building, etc.), App represents the time when the object appears in the space related to the IDD, and Dis represents the time when the object disappears in the space related to. person1 appears in space S2 at 58 seconds and disappears in space S6 at 88 seconds.

[0518] <Object name="person3”App="24s,S3,90s,s5”Dis="45s,S4": Object name은 오브젝트의 이름을 나타낸다(사람, 사물, 건물 등 오브젝트의 타입을 나타낼 수 있음), App은 IDD에 관한 공간 내 오브젝트가 등장(Appearance)하는 시간을 나타낸다. Dis는 에 관한 공간 내 오브젝트가 퇴장(Disappearence)하는 시간을 나타낸다. person3이 S3공간에서 24초에 등장하고, S4 공간에서 45초에 사라진다.

[0519] FIG. 12 illustrates a method for transmitting point cloud data according to embodiments.

[0520] The method for transmitting point cloud data according to the embodiments may include a step (S1200) of encoding point cloud data.

[0521] The method for transmitting point cloud data according to the embodiments may further include the step (S1210) of encapsulating point cloud data into a file.

[0522] The method for transmitting point cloud data according to the embodiments may further include the step of transmitting a file (S1220).

[0523] The encoding step (S1200) may include a V3C encoding step as shown in FIG. 4. The V3C encoding step may generate a V3C bitstream by projecting V-PCC (V3C) data obtained by volumetric capture, encoding atlas information of the V-PCC data, encoding accusation data of the V-PCC data, encoding geometry data of the V-PCC data, and encoding attribute data of the V-PCC data. The encoding operation may include encoding according to the ISO / IEC 23090-5 standard.

[0524] Referring to FIG. 6, regarding multi-track, the file may include a first track containing geometry of the point cloud data and a second track containing attributes of the point cloud data.

[0525] Referring to FIG. 7, regarding the sample entry and sample of the track, each track of the file includes a sample entry containing configuration information regarding the point cloud data and a sample containing the point cloud data, and the sample may further include a parameter set regarding the point cloud data.

[0526] Referring to FIG. 8, with respect to MPD, the present method further comprises the step of transmitting MPD information for point cloud data, wherein the MPD includes first adaptation set information for geometry of the point cloud data and second adaptation set information for attributes of the point cloud data, the first adaptation set information may include a component descriptor for geometry, and the second adaptation set information may include a component descriptor for attributes.

[0527] Referring to FIG. 9, with respect to IDD, the present method further comprises the step of transmitting initial information representing information about an entire space including point cloud data, wherein the initial information includes at least one of size information of the entire space or information about a subspace of the entire space, and the information about the subspace may include at least one of an identifier for the subspace, a name for the subspace, or location information of a bounding box for the subspace.

[0528] Referring to FIG. 9, with respect to the face, properties, state, face, type, MPD URL, etc. of the IDD, the initial information may further include at least one of a face ID that identifies a face generated based on points of the point cloud data, vertex information included in the face identified by the face ID, information indicating whether the subspace is an open space or a closed space, index information for the face, information indicating whether the subspace is a background or a foreground, or address information of the MPD regarding the point cloud data.

[0529] A method for transmitting point cloud data can be performed by a transmitting device of FIG. 5. The point cloud data transmitting device may include an encoder that encodes point cloud data; an encapsulator that encapsulates point cloud data into a file; and a transmitter that transmits the file.

[0530] FIG. 13 illustrates a method for receiving point cloud data according to embodiments.

[0531] The receiving method of Fig. 13 can follow the reverse process of the transmitting method of Fig. 12.

[0532] A method for receiving point cloud data according to embodiments may include the step (S1300) of receiving a file containing point cloud data.

[0533] The method for receiving point cloud data according to the embodiments may further include the step (S1310) of decapsulating a file.

[0534] The method for receiving point cloud data according to the embodiments may further include a step (S1320) of decoding point cloud data.

[0535] The decoding step (S1320) may include: encoding atlas information of the point cloud data, decoding accusation data of the point cloud data, decoding geometry data of the point cloud data, and decoding attribute data of the point cloud data.

[0536] The received file may include a first track containing the geometry of the point cloud data and a second track containing the attributes of the point cloud data.

[0537] Each track of the received file includes a sample entry containing configuration information regarding the point cloud data and a sample containing the point cloud data, and the sample may further include a set of parameters regarding the point cloud data.

[0538] Referring to the MPD of FIG. 8, the present method further comprises the step of receiving initial information representing information about an entire space including point cloud data, wherein the initial information includes at least one of size information of the entire space or information about a subspace of the entire space, and the information about the subspace may include at least one of an identifier for the subspace, a name for the subspace, or location information of a bounding box for the subspace.

[0539] With respect to the IDD of FIG. 9, the present method further comprises the step of receiving initial information representing information about an entire space including the point cloud data, wherein the initial information includes size information of the entire space, the initial information includes information about a subspace of the entire space, and the information about the subspace may include an identifier for the subspace, a name for the subspace, and location information of a bounding box for the subspace.

[0540] Additionally, the initial information may further include at least one of a face ID that identifies a face generated based on points of the point cloud data, vertex information included in the face identified by the face ID, information indicating whether the subspace is an open space or a closed space, index information for the face, information indicating whether the subspace is a background or a foreground, or address information of the MPD regarding the point cloud data.

[0541] A method for receiving point cloud data can be performed by a receiving device of FIG. 5. The point cloud data receiving device may include a receiving unit for receiving a file containing point cloud data; a decapsulator for decapsulating the file; and a decoder for decoding the point cloud data.

[0542] The encoding / decoding method according to the embodiments may additionally generate and transmit / receive a region descriptor for signaling a dynamic region. The region descriptor for the dynamic region may be included in an adaptation set for an atlas within the MPD.

[0543] If the regions of the point cloud data change over time, information regarding the dynamic regions can be conveyed via a timed metadata track and can be conveyed in the main adaptation set (atlas) within the MPD. The adaptation set for dynamic regions may further include the @associationId attribute. The adaptation set for dynamic regions may further include the @associationType value. @associationId can identify relationships between other representations that convey information regarding the regions changing over time. @associationType can identify the type of the dynamic region. The representations identified by @associationId may include the following information described above.

[0544] <mpd>

[0545] <period duration="PT10M">

[0546] <adaptationset mimetype="video / mp4" codecs="v3e1.L2.0.0.1, resv.vvvc.avc1.4D401E" framerate="30">

[0547] <geom_type Anchor Point="x,y,z”"dynamic”>< / geom_type>

[0548] <representation id="720p" bandwidth="3200000" width="1280" height="720">

[0549] <segmentlist timescale="90000" duration="5400000">

[0550] <segmenturl Position=""x_1,y_1," z_1”Size=""w_1,d_1," h_1”Rotation=""1_1,i_1," j_1, k_1”media="segment-1.ts" / >

[0551] <segmenturl Position=""x_2,y_2," z_2”Size=""w_2,d_2," h_2”Rotation=""1_2,i_2," j_2, k_2”media="segment-2.ts" / >

[0552] <segmenturl Position=""x_3,y_3," z_3”Size=""w_3,d_3," h_3”Rotation=""1_3,i_3," j_3, k_3”media="segment-3.ts" / >

[0553] <segmenturl Position=""x_4,y_4," z_4”Size=""w_4,d_4," h_4”Rotation=""1_4,i_4," j_4, k_4”media="segment-4.ts" / >

[0554] < / segmentlist>

[0555] < / representation>

[0556] < / adaptationset>

[0557] < / period>

[0558] Referring to FIG. 12, a point cloud data transmission method may include the steps of: encoding point cloud data; encapsulating the point cloud data into a file; and transmitting the file.

[0559] Referring to FIG. 5-6, regarding the V-PCC structure and multi-track (atlas), point cloud data is encoded based on a video method, and the file may include at least one of an atlas track containing atlas information regarding the point cloud data, a first track containing geometry of the point cloud data, a second track containing attributes of the point cloud data, or a third track containing accusations of the point cloud data.

[0560] Referring to FIG. 7, regarding MPD (atlas), the transmission method further comprises the step of generating MPD information for the point cloud data, wherein the MPD includes a first adaptation set information containing atlas information regarding the point cloud data, a second adaptation set information for the geometry of the point cloud data, a third adaptation set information for the attributes of the point cloud data, and a fourth adaptation set information for the accusation of the point cloud data, wherein the second adaptation set information includes a component descriptor for the geometry, the third adaptation set information includes a component descriptor for the attributes, and the fourth adaptation set information includes a component descriptor for the accusation, and the MPD may further include information identifying initial information representing information about the entire space containing the point cloud data.

[0561] Referring to FIG. 9, with respect to IDD, the transmission method further comprises the step of transmitting initial information representing information about an entire space including the point cloud data, wherein the initial information includes at least one of size information of the entire space or information about a subspace of the entire space, and the information about the subspace may include at least one of an identifier for the subspace, a name for the subspace, or location information of a bounding box for the subspace.

[0562] Referring to FIG. 9, with respect to the face, properties, state, face, type, and MPD URL of the IDD, the initial information may further include at least one of a face ID that identifies a face generated based on points of the point cloud data, vertex information included in the face identified by the face ID, information indicating whether the subspace is an open space or a closed space, index information for the face, information indicating whether the subspace is a background or a foreground, or address information of the MPD regarding the point cloud data.

[0563] Referring to FIG. 5, a point cloud data transmission device may include an encoder that encodes point cloud data; an encapsulator that encapsulates the point cloud data into a file; and a transmitter that transmits the file.

[0564] Referring to FIG. 12, a method for receiving point cloud data may include the steps of: receiving a file containing point cloud data; decapsulating the file; and decoding the point cloud data.

[0565] The file may include a first track containing the geometry of the point cloud data and a second track containing the attributes of the point cloud data.

[0566] Each track of the file includes a sample entry containing configuration information regarding the point cloud data and a sample containing the point cloud data, and the sample may further include a set of parameters regarding the point cloud data.

[0567] Referring to FIG. 8, with respect to MPD, a receiving method according to embodiments further comprises the step of transmitting MPD information for the point cloud data, wherein the MPD includes a first adaptation set information for the geometry of the point cloud data and a second adaptation set information for the attributes of the point cloud data, wherein the first adaptation set information includes a component descriptor for the geometry and the second adaptation set information includes a component descriptor for the attributes, and the MPD may further include information identifying initial information representing information about the entire space containing the point cloud data.

[0568] Referring to FIG. 9, with respect to IDD, the method according to the embodiments further comprises the step of receiving initial information representing information about an entire space including point cloud data, wherein the initial information includes size information of the entire space, and the initial information includes information about a subspace of the entire space, and the information about the subspace may include an identifier for the subspace, a name for the subspace, and location information of a bounding box for the subspace.

[0569] Referring to FIG. 9, regarding the IDD, the initial information (IDD) may further include at least one of a face ID that identifies a face generated based on points of the point cloud data, vertex information included in the face identified by the face ID, information indicating whether the subspace is an open space or a closed space, index information for the face, information indicating whether the subspace is a background or a foreground, or address information of the MPD regarding the point cloud data.

[0570] Referring to FIG. 5, a point cloud data receiving device may include: a receiving unit for receiving a file containing point cloud data; a decapsulator for decapsulating the file; and a decoder for decoding the point cloud data.

[0571] The embodiments have been described in terms of methods and / or devices, and the description of the methods and the description of the devices may be applied complementarily.

[0572] Although the drawings have been described separately for the convenience of explanation, it is also possible to design a new embodiment by combining the embodiments described in each drawing. Furthermore, designing a computer-readable recording medium containing a program for executing the previously described embodiments, as required by a person skilled in the art, falls within the scope of the claims of the embodiments. The apparatus and method according to the embodiments are not limited to the configuration and method of the embodiments described above; rather, the embodiments may be configured by selectively combining all or part of each embodiment to allow for various modifications. Although preferred embodiments have been illustrated and described, the embodiments are not limited to the specific embodiments described above. It is not only possible for a person skilled in the art to make various modifications without departing from the essence of the embodiments claimed in the claims, but such modifications should not be understood individually from the technical concept or perspective of the embodiments.

[0573] Various components of the device of the embodiments may be implemented by hardware, software, firmware, or a combination thereof. Various components of the embodiments may be implemented as a single chip, for example, a single hardware circuit. Depending on the embodiments, the components according to the embodiments may each be implemented as separate chips. Depending on the embodiments, at least one of the components of the device according to the embodiments may be composed of one or more processors capable of executing one or more programs, and one or more programs may include instructions for performing or executing any one or more of the operations / methods according to the embodiments. Executable instructions for performing the methods / operations of the device according to the embodiments may be stored in non-transient CRMs or other computer program products configured to be executed by one or more processors, or may be stored in transient CRMs or other computer program products configured to be executed by one or more processors. Additionally, memory according to the embodiments may be used as a concept that includes not only volatile memory (e.g., RAM, etc.) but also non-volatile memory, flash memory, PROM, etc. In addition, it may also include implementation in the form of carrier waves, such as transmission over the Internet. Furthermore, processor-readable recording media are distributed across networked computer systems, allowing processor-readable code to be stored and executed in a distributed manner.

[0574] In this document, " / " and "," are interpreted as "and / or." For example, "A / B" is interpreted as "A and / or B," and "A, B" is interpreted as "A and / or B." Additionally, "A / B / C" means "at least one of A, B and / or C." Also, "A, B, C" means "at least one of A, B and / or C." Additionally, in this document, "or" is interpreted as "and / or." For example, "A or B" may mean 1) "A" only, 2) "B" only, or 3) "A and B." In other words, "or" in this document may mean "additionally or alternatively."

[0575] Terms such as "first," "second," etc., may be used to describe various components of the embodiments. However, the interpretation of the various components according to the embodiments should not be limited by these terms. These terms are merely used to distinguish one component from another. For example, the first user input signal may be referred to as the second user input signal. Similarly, the second user input signal may be referred to as the first user input signal. The use of these terms should be interpreted as not departing from the scope of the various embodiments. Although the first user input signal and the second user input signal are both user input signals, they do not imply the same user input signals unless clearly indicated in the context.

[0576] The terms used to describe the embodiments are intended for the purpose of describing specific embodiments and are not intended to limit the embodiments. As used in the description of the embodiments and in the claims, the singular is intended to include the plural unless explicitly indicated in the context. Expressions of and / or are used to mean including all possible combinations between the terms. Expressions of include describe the presence of features, numbers, steps, elements, and / or components and do not imply the exclusion of additional features, numbers, steps, elements, and / or components. Conditional expressions such as "if" or "when" used to describe the embodiments are not limited to being optional. It is intended to be interpreted as "when a specific condition is satisfied," "when a related action is performed in response to a specific condition," or "when a related definition is interpreted."

[0577] Additionally, operations according to the embodiments described herein may be performed by a transmitting and receiving device including memory and / or a processor, depending on the embodiments. The memory may store programs for processing / controlling operations according to the embodiments, and the processor may control various operations described in this document. The processor may be referred to as a controller, etc. Operations in the embodiments may be performed by firmware, software, and / or a combination thereof, and the firmware, software, and / or a combination thereof may be stored in the processor or in memory.

[0578] Meanwhile, the operation according to the embodiments described above may be performed by a transmitting device and / or a receiving device according to the embodiments. The transmitting and receiving device may include a transmitting and receiving unit for transmitting and receiving media data, a memory for storing instructions (program code, algorithm, flowchart and / or data) for a process according to the embodiments, and a processor for controlling the operations of the transmitting and receiving devices.

[0579] The processor may be referred to as a controller, etc., and may correspond, for example, to hardware, software, and / or a combination thereof. The operation according to the embodiments described above may be performed by the processor. Additionally, the processor may be implemented as an encoder / decoder, etc., for the operation of the embodiments described above.

[0580] As described above, the relevant details have been explained in the best mode for carrying out the embodiments.

[0581] As described above, the embodiments may be applied wholly or partially to point cloud data transmission and reception devices and systems.

[0582] Those skilled in the art may make various changes or modifications to the embodiments within the scope of the embodiments.

[0583] The embodiments may include modifications / variations, and such modifications / variations do not exceed the scope of the claims and their equivalents.< / mpd> < / object> < / segmentlist> < / mpd> < / mpd>

Claims

1. Step of encoding point cloud data; A step of encapsulating the above point cloud data into a file; and The step of transmitting the above file; comprising, Point cloud data transmission method.

2. In Paragraph 1, The above point cloud data is encoded based on a video method, and The above file comprises at least one of an atlas track containing atlas information regarding the point cloud data, a first track containing geometry of the point cloud data, a second track containing attributes of the point cloud data, or a third track containing accusations of the point cloud data. Point cloud data transmission method.

3. In paragraph 1, the above method is: The method further includes the step of generating MPD information for the above point cloud data, and The above MPD includes first adaptation set information including atlas information regarding the point cloud data, second adaptation set information for the geometry of the point cloud data, third adaptation set information for the attributes of the point cloud data, and fourth adaptation set information for the accusation of the point cloud data, and The second adaptation set information includes a component descriptor for the geometry, and The above third adaptation set information includes a component descriptor for the attribute, and The above fourth adaptation set information includes a component descriptor for the above accusation, and The above MPD further includes information identifying initial information representing information about the entire space including the point cloud data, Point cloud data transmission method.

4. In paragraph 1, the above method is: The method further includes the step of transmitting initial information representing information about the entire space including the above point cloud data, The above initial information includes at least one of the size information of the entire space or information regarding a subspace of the entire space, and Information regarding the above-mentioned subspace includes at least one of an identifier for the above-mentioned subspace, a name for the above-mentioned subspace, or location information of a bounding box for the above-mentioned subspace. Point cloud data transmission method.

5. In Paragraph 4, The above initial information further comprises at least one of a face ID identifying a face generated based on points of the point cloud data, vertex information included in the face identified by the face ID, information indicating whether the subspace is an open space or a closed space, index information for the face, information indicating whether the subspace is a background or a foreground, or address information of an MPD regarding the point cloud data. Point cloud data transmission method.

6. Encoder for encoding point cloud data; An encapsulator that encapsulates the above point cloud data into a file; and A transmitter that transmits the above file; comprising, Point cloud data transmission device.

7. A step of receiving a file containing point cloud data; A step of decapsulating the above file; and A step of decoding the above point cloud data; comprising, Method for receiving point cloud data.

8. In Paragraph 7, The above file includes a first track containing the geometry of the point cloud data and a second track containing the attributes of the point cloud data, Method for receiving point cloud data.

9. In Paragraph 7, Each track of the above file includes a sample entry containing configuration information regarding the point cloud data and a sample containing the point cloud data, and The above sample further includes a parameter set regarding the above point cloud data, Method for receiving point cloud data.

10. In paragraph 7, the above method is: The method further includes the step of transmitting MPD information for the above point cloud data, The above MPD includes first adaptation set information for the geometry of the point cloud data and second adaptation set information for the attributes of the point cloud data, and The first adaptation set information includes a component descriptor for the geometry, and The above second adaptation set information includes a component descriptor for the attribute, and The above MPD further includes information identifying initial information representing information about the entire space including the point cloud data, Method for receiving point cloud data.

11. In paragraph 7, the above method is: The method further includes the step of receiving initial information representing information about the entire space including the above point cloud data, The above initial information includes size information of the above entire space, and The above initial information includes information regarding a subspace of the above entire space, and Information regarding the above-mentioned subspace includes an identifier for the above-mentioned subspace, a name for the above-mentioned subspace, and location information of a bounding box for the above-mentioned subspace. Method for receiving point cloud data.

12. In Paragraph 11, The above initial information further comprises at least one of a face ID identifying a face generated based on points of the point cloud data, vertex information included in the face identified by the face ID, information indicating whether the subspace is an open space or a closed space, index information for the face, information indicating whether the subspace is a background or a foreground, or address information of an MPD regarding the point cloud data. Method for receiving point cloud data.

13. A receiver that receives a file containing point cloud data; A decapsulator for decapsulating the above file; and A decoder for decoding the above point cloud data; comprising, Point cloud data receiving device.