3D data decoding device and 3D data encoding device

The 3D data decoding and encoding devices address inefficiencies in existing methods by using tile and submesh information to specify codec tiles, enabling efficient and flexible decoding of 3D data with reduced redundancy and improved processing speed.

WO2026134208A1PCT designated stage Publication Date: 2026-06-25SHARP KK

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SHARP KK
Filing Date
2025-12-16
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing 3D data encoding and decoding methods require decoding all tiles of a video codec to decode a specific tile, leading to inefficiencies in decoding time and redundant data processing.

Method used

A 3D data decoding device and encoding device that utilize tile information and submesh information to decode specific tiles efficiently by indicating the tile type and index of the codec tile, allowing parallel decoding and reducing redundant processing.

Benefits of technology

Enables high-quality encoding and decoding of 3D data with increased flexibility and reduced decoding time by specifying the correspondence between codec tiles, enhancing the efficiency of 3D data transmission systems.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure JP2025043883_25062026_PF_FP_ABST
    Figure JP2025043883_25062026_PF_FP_ABST
Patent Text Reader

Abstract

The purpose of the present invention is to enable, in order to decode a specific tile, codec tiles necessary for decoding the specific tile to be decoded in parallel, by decoding an index of each codec tile associated with the tile, thereby shortening a time for decoding the specific tile, and efficiently encoding and decoding 3D data without redundancy. A 3D data decoding device that decodes mesh data or point cloud data is characterized in that an atlas information decoding unit that decodes atlas information from encoded data includes a tile information decoding unit that decodes tile information, and an additional information decoding unit that decodes an index indicating a type of a tile and codec tile information included in the tile, wherein the additional information decoding unit decodes index information of a codec tile necessary for decoding the tile according to the index indicating the type of the tile.
Need to check novelty before this filing date? Find Prior Art

Description

3D Data Decoding Device and 3D Data Encoding Device

[0001] Embodiments of the present invention relate to a 3D data encoding device and a 3D data decoding device.

[0002] In order to efficiently transmit or record 3D data, there are a 3D data encoding device that converts 3D data into a 2D image, encodes it using a moving image encoding method, and generates encoded data, and a 3D data decoding device that decodes a 2D image from the encoded data and reconstructs 3D data.

[0003] Specific 3D data encoding methods include, for example, ISO / IEC 23090-5 V3C (Volumetric Video-based Coding) and V-PCC (Video-based Point Cloud Compression) of MPEG-I. V3C can encode and decode a point cloud composed of point positions and attribute information. Furthermore, it is also used for encoding and decoding multi-view video and mesh video by ISO / IEC 23090-12 (MPEG Immersive Video, MIV) and ISO / IEC 23090-29 (Video-based Dynamic Mesh Coding, V-DMC) under standardization. The latest draft document of the V-DMC method is disclosed in Non-Patent Document 1.

[0004] In these 3D data encoding methods, the geometry and attributes constituting the 3D data are encoded and decoded as images using a moving image encoding method such as H.265 / HEVC (High Efficiency Video Coding) or H.266 / VVC (Versatile Video Coding).

[0005] In the case of a point cloud, the geometry image is the depth on the projection plane, and the attribute image is the image obtained by projecting the attributes onto the projection plane.

[0006] 3D data (mesh) like that described in Non-Patent Document 1 consists of a base mesh, mesh displacement, and texture mapping image. Vertex coding schemes such as Draco can be used to encode the base mesh. Mesh displacement can be encoded either by encoding a 2D mesh displacement image using a video codec, or directly by arithmetic coding. The texture mapping image is encoded as an attribute image using a video codec. The aforementioned HEVC and VVC video codecs can be used.

[0007] Text of ISO / IEC DIS 23090-29 Video-based mesh coding, ISO / IEC JTC 1 / SC 29 / WG 7N1027, 2024-12-12

[0008] In the 3D data encoding method described in Non-Patent Literature 1, the relationship between the tiles that divide the mesh submesh and point cloud, the mesh geometry, and attribute pictures, and the tiles of the video codec is unclear. Therefore, when decoding a specific submesh or specific tile, it is necessary to decode all the tiles of the video codec, which presents a problem.

[0009] The present invention aims to enable the parallel decoding of codec tiles necessary to decode a particular tile by decoding the correspondence between the codec tile to be decoded in order to decode a specific tile, thereby shortening the time required to decode a specific tile and efficiently encoding and decoding non-redundant 3D data.

[0010] To solve the above problems, a 3D data decoding device according to one aspect of the present invention is a 3D data decoding device that decodes mesh data or point cloud data, comprising a tile information decoding unit that decodes tile information and an additional information decoding unit that decodes tile submesh information, wherein the tile submesh information includes a first syntax element indicating the tile type, and the additional information decoding unit decodes a second syntax element indicating the index of the codec tile included in the tile according to the value of the syntax element.

[0011] To solve the above problems, a 3D data encoding device according to one aspect of the present invention is a 3D data encoding device for decoding mesh data or point cloud data, comprising: a tile information encoding unit for encoding tile information; and an additional information encoding unit for encoding tile submesh information, wherein the tile submesh information includes a first syntax element indicating the tile type, and the additional information encoding unit encodes a second syntax element indicating the index of the codec tile included in the tile according to the value of the syntax element.

[0012] To solve the above problems, a method for transmitting an encoded stream according to one aspect of the present invention is characterized in that the encoded stream includes tile information and tile submesh information, the tile submesh information includes a first syntax element indicating the tile type, and a second syntax element indicating the index of the codec tile included in the tile, depending on the value of the syntax element.

[0013] According to one aspect of the present invention, the flexibility of tile definitions is increased, and 3D data can be encoded and decoded with high quality.

[0014] This is a schematic diagram showing the configuration of the 3D data transmission system according to this embodiment. This is a diagram showing the hierarchical structure of the encoded stream data. This is a functional block diagram showing the schematic configuration of the 3D data decoding device 31. This is a functional block diagram showing the configuration of the atlas information decoding unit 302. This is a functional block diagram showing the configuration of the base mesh decoding unit 303. This is a functional block diagram showing the configuration of the mesh displacement decoding unit 305. This is a functional block diagram showing the configuration of the mesh reconstruction unit 307. This is an example of the syntax of ASVE (ASPS Vdmc Extension), which is a sequence-level mesh data extended coding parameter set. This is an example of the syntax of extended coding parameter information in AFPS, which is a picture / frame-level parameter set. This is an example of the syntax of atlas tile information in an atlas frame. This is an example of the syntax of attribute tile information in an atlas frame. This is a diagram for explaining the operation of the mesh reconstruction unit 307. This is a functional block diagram showing the schematic configuration of the 3D data encoding device 11. This is a functional block diagram showing the configuration of the atlas information coding unit 101. This is a functional block diagram showing the configuration of the base mesh coding unit 103. This is a functional block diagram showing the configuration of the mesh displacement coding unit 107. This is a functional block diagram showing the configuration of the mesh separation unit 115. This diagram is for explaining the operation of the mesh separation unit 115. This diagram is for explaining the relationship between tiles and codec tiles or attribute tiles and codec tiles and the index of the codec tile. This diagram is for explaining the relationship between tiles and codec tiles, attribute tiles and codec tiles and the index of the codec tile when tmsm_tile_type_idc == 2. This is an example of the syntax of tile submesh information SEI. This is another example of the syntax of tile submesh information SEI. This is another example of the syntax of tile submesh information SEI. This is another example of the syntax of tile submesh information SEI.

[0015] Embodiments of the present invention will be described below with reference to the drawings.

[0016] Figure 1 is a schematic diagram showing the configuration of the 3D data transmission system 1 according to this embodiment.

[0017] The 3D data transmission system 1 is a system that transmits an encoded stream containing encoded 3D data to be encoded, decodes the transmitted encoded stream, and displays the 3D data. The 3D data transmission system 1 comprises a 3D data encoding device 11, a network 21, a 3D data decoding device 31, and a 3D data display device 41.

[0018] The 3D data encoding device 11 receives 3D data T as input.

[0019] Network 21 transmits the encoded stream Te generated by the 3D data encoding device 11 to the 3D data decoding device 31. Network 21 is the Internet, a wide area network (WAN), a local area network (LAN), or a combination thereof. Network 21 is not necessarily limited to a bidirectional communication network; it may also be a unidirectional communication network that transmits broadcast waves such as terrestrial digital broadcasting or satellite broadcasting. Furthermore, network 21 may be replaced by a storage medium that records the encoded stream Te, such as a DVD (Digital Versatile Disc) or a BD (Blu-ray Disc).

[0020] The 3D data decoding device 31 decodes each of the encoded streams Te transmitted by the network 21 and generates one or more decoded 3D data Td.

[0021] The 3D data display device 41 displays all or part of one or more decoded 3D data Td generated by the 3D data decoding device 31. The 3D data display device 41 includes a display device such as a liquid crystal display or an organic EL (electro-luminescence) display. Examples of display forms include stationary, mobile, and HMD (head-mounted display). Furthermore, if the 3D data decoding device 31 has high processing power, it displays high-resolution images, and if it has lower processing power, it displays images that do not require high processing power or display capabilities.

[0022] <Operators> The operators used in this specification are listed below.

[0023] >> is a right bit shift, << is a left bit shift, & is a bitwise AND, | is a bitwise OR, |= is the OR assignment operator, and || represents logical OR.

[0024] x?y:z is a ternary operator that takes y when x is true (non-zero) and z when x is false (0).

[0025] y..z represents a set of integers from y to z.

[0026] Log2(x) is a function that returns the base 2 logarithm of x.

[0027] Ceil(x) is a function that returns the smallest integer greater than or equal to x.

[0028] Floor(x) is a function that returns the largest integer less than or equal to x.

[0029] The Sign(x) function returns 1 if x is greater than 0, 0 if x is equal to 0, and -1 if x is less than 0.

[0030] Abs(x) is a function that returns the absolute value of x.

[0031] The Round(x) function returns an integer obtained by rounding x to the first decimal place.

[0032] Round( x ) = Sign( x ) * Floor( Abs( x ) + 0.5 ).

[0033] The operator ' / ' is integer division, truncating towards zero. For example, 7 / 4 is truncated to 1, and -7 / 4 is truncated to -1.

[0034] The division operator (÷) performs division without rounding or truncation.

[0035] <Structure of Encoded Stream Te> Before giving a detailed description of the 3D data encoding device 11 and the 3D data decoding device 31 according to this embodiment, the data structure of the encoded stream Te, which is generated by the 3D data encoding device 11 and decoded by the 3D data decoding device 31, will be described. The 3D data may be ISO / IEC 23090-5 V3C (Volumetric Video-based Coding) and V-PCC (Video-based Point Cloud Compression) of MPEG-I, and ISO / IEC 23090-12 (MPEG Immersive Video, MIV) and ISO / IEC 23090-29 (Video-based Dynamic Mesh Coding, V-DMC) based on V3D.

[0036] Figure 2 shows the hierarchical structure of data in the encoded stream Te. The encoded stream Te has either a V3C sample stream or a V3C unit stream data structure. The V3C sample stream includes a sample stream header and a V3C unit. The V3C unit stream includes a V3C unit.

[0037] A V3C unit includes a V3C unit header and a V3C unit payload. The V3C unit header is the Unit Type, which is an ID indicating the type of V3C unit, and takes values ​​indicated by labels such as V3C_VPS, V3C_AD, V3C_AVD, V3C_GVD, and V3C_OVD.

[0038] If the Unit Type is V3C_VPS (Video Parameter Set), the V3C unit includes the V3C parameter set.

[0039] When the Unit Type is V3C_AD (Atlas Data), the V3C unit includes the VPS ID, atlasID, sample stream NAL header, and multiple NAL units. The atlasID is an Identification ID and takes a non-negative integer value.

[0040] A NAL unit includes NALUnitType, layerID, TemporalID, and RBSP (Raw byte sequence payload).

[0041] NAL units are identified by NALUnitType and include ASPS (Atlas Sequence Parameter Set), AAPS (Atlas Adaptation Parameter Set), ATL (Atlas Tile layer), SEI (Supplemental Enhancement Information), etc.

[0042] An ATL file includes an ATL header and an ATL data unit, which contains information such as the location and size of a patch, including patch information data.

[0043] SEI includes payloadType, which indicates the type of SEI; payloadSize, which indicates the size (in bytes) of the SEI; and sei_payload, which contains the SEI data.

[0044] When the Unit Type is V3C_AVD (Attribute Video Data), the V3C unit includes a VPS ID, an atlasID, an attrIdx which is the ID of the attribute image, a partIdx which is the partition ID, a mapIdx which is the map ID, an auxFlag which is a flag indicating whether it is Auxiliary data, and a video stream. The video stream is data encoded with HEVC, VVC, etc. The attribute data corresponds to the texture image in V-DMC. The attrIdx may be an integer between 0 and ai_attribute_count[RecAtlasID] - 1. Here, ai_attribute_count is a syntax element of attribute_information, and RecAtlasID is the ID (atlasID) of the target atlas.

[0045] Here, ai_attribute_count[j] indicates the number of attributes related to the atlas with the atlas ID of index j. If it does not exist, the value of ai_attribute_count[j] is presumed to be 0.

[0046] When the NalUnitType is V3C_GVD (Geometory Video Data), the V3C unit includes a VPS ID, an atlasID, a mapIdx, an auxFlag, and a video stream. The geometry data corresponds to the mesh displacement in V-DMC.

[0047] When the Unit Type is V3C_OVD (Occupancy Video Data), the V3C unit includes a VPS ID, an atlasID, and a video stream.

[0048] When the Unit Type is V3C_MD (Mesh data), the V3C unit includes a VPS ID, an atlasID, and a mesh_payload. It corresponds to the base mesh in V-DMC.

[0049] (Configuration of the 3D data decoding device according to the first embodiment) Figure 3 is a functional block diagram showing the schematic configuration of the 3D data decoding device 31 according to the first embodiment. The 3D data decoding device 31 consists of a demultiplexing unit 301, an atlas information decoding unit 302, a base mesh decoding unit 303, a mesh displacement decoding unit 305, a mesh reconstruction unit 307, an attribute decoding unit 306, and a color space conversion unit 308. The 3D data decoding device 31 receives encoded 3D data as input and outputs additional information, atlas information, mesh, and attribute images.

[0050] The demultiplexing unit 301 receives encoded data multiplexed in a byte stream format, ISOBMFF (ISO Base Media File Format), etc., demultiplexes it, and outputs an atlas information encoded stream (Atlas Data stream, NALunit in V3C_AD), a base mesh encoded stream (mesh_payload in V3C_MD), a mesh displacement encoded stream (geometry video stream, geometry video stream in V3C_GVD), and an attribute video stream (attribute video stream in V3C_AVD). The geometry video stream and attribute video stream are two-dimensional videos, and HEVC and VVC video codecs may be used for encoding and decoding. In other words, the codecs for the mesh displacement stream and attribute video stream described below may be video codecs.

[0051] The Atlas information decoding unit 302 receives the Atlas information encoded stream output from the demultiplexing unit 301 and decodes the Atlas information.

[0052] The base mesh decoding unit 303 decodes the base mesh encoding stream encoded with vertex coding (3D data compression encoding scheme, e.g., Draco) and outputs the base mesh. The base mesh will be described later. The codec type of the base mesh may be obtained by decoding the syntax elements bmsps_intra_mesh_codec_id and bmsps_inter_mesh_codec_id.

[0053] The mesh displacement decoding unit 305 decodes the mesh displacement encoded stream and outputs the mesh displacement. The type of codec used for encoding is indicated by ptl_profile_codec_group_idc, which is obtained by decoding the V3C parameter set of the encoded data. Alternatively, it may be indicated by the Four CC code (4-character code, 4CC code) indicated by gi_geometry_codec_id[atlasID] in the V3C parameter set. gi_geometry_codec_id[atlasID] indicates the index in the atlas ID corresponding to the codec ID of the decoder used to decode the geometry video stream. Alternatively, the syntax element dsps_codec_id indicating the type of codec may be decoded from the parameter set. The set showing the correspondence between the codec ID (ccm_codec_id) and its 4CC code (ccm_codec_4cc[ccm_codec_id]) may be transmitted using a separate codec mapping SEI (component_codec_mapping SEI).

[0054] The mesh reconstruction unit 307 receives the base mesh and mesh displacement as input and reconstructs the mesh in 3D space.

[0055] The attribute decoding unit 306 decodes the attribute video stream encoded with VVC, HEVC, etc., and outputs an attribute image. The attribute image is a texture image unfolded along the UV axis (a texture mapping image converted using the UV atlas method) and may be in YCbCr format. The type of codec used for encoding is indicated by ptl_profile_codec_group_idc, which is obtained by decoding the V3C parameter set of the encoded data. Alternatively, it may be indicated by the Four CC code indicated by ai_attribute_codec_id[atlasID] in the V3C parameter set. ai_attribute_codec_id[atlasID] indicates the index in the atlas ID that corresponds to the codec ID of the decoder used to decode the attribute video stream.

[0056] The color space conversion unit 308 converts the attribute image from YCbCr format to RGB format. Alternatively, the attribute video stream encoded in RGB format can be decoded, and the color space conversion can be omitted.

[0057] (Decoding of Atlas Information) Figure 4 is a functional block diagram showing the configuration of the atlas information decoding unit 302. The atlas information decoding unit 302 consists of a parameter decoding unit 3021, a tile information decoding unit 3022, an extended information decoding unit 3023, and an additional information decoding unit 3024.

[0058] (Decoding and Derivation of Encoded Parameters) The parameter decoding unit 3021 decodes the encoded parameters from the Atlas information encoded stream. The encoded parameters include the sequence-level parameter set ASPS (Atlas Sequence Parameter Set) and the picture / frame-level parameter set AFPS (Atlas Frame Parameter Set).

[0059] Figure 8 shows an example of the syntax for ASVE (ASPS Vdmc Extension), a sequence-level mesh data extended coding parameter set. The semantics of each element of the syntax are as follows:

[0060] asve_subdivision_iteration_count: Indicates the number of iterations for mesh subdivision.

[0061] asve_1d_displacement_flag: This flag indicates whether the mesh displacement is one-dimensional or not. A value of true indicates that the mesh displacement is one-dimensional. A value of false indicates that the mesh displacement is three-dimensional.

[0062] asve_attribute_information_present_flag: If asve_attribute_information_present_flag is equal to 1, it indicates that the atlas contains information related to the attribute. If it is equal to 0, it indicates that the atlas does not contain information related to the attribute.

[0063] asve_consistent_attribute_frame_flag: If asve_consistent_attribute_frame_flag is equal to 1, it indicates that only one nominal frame is used for atlas attributes and is associated with all V3C attribute components of the current atlas. If it is equal to 0, it indicates that there is one attribute nominal frame for each V3C attribute component of the current atlas and that asve_attribute_frame_count exists.

[0064] asve_attribute_frame_count: Indicates the number of nominal frames of the atlas attribute. If present, the value of asve_attribute_frame_count must be equal to the value of VpsAttributeNominalFrameCount[j] (where j is the ID of the current atlas), which is a requirement for V3C bitstream compliance. If not present, the value of asve_attribute_frame_count is assumed to be equal to 1. The variable AspsAttributeNominalFrameCount, which represents the number of nominal frames of the atlas attribute signaled in the Atlas Sequence Parameter Set V-DMC extension, is derived as follows:

[0065] if(!asve_attribute_information_present_flag) { AspsAttributeNominalFrameCount = 0} else if(asve_consistent_attribute_frame_flag) { AspsAttributeNominalFrameCount = 1} else { AspsAttributeNominalFrameCount = asve_attribute_frame_count} (Decoding and derivation of tile encoding parameters) The tile information (tile selection information, tile division information), which is an encoding parameter that defines the tile to be decoded from the encoded data by the tile information decoding unit 3022, is described below. In the V3C standard, a screen division (partition division) common to atlas frames, occupancy frames, geometry frames, and attribute frames can be defined as a tile.

[0066] In V-DMC, tiles may be called geometry tiles because they are applied to displacements that encode tiles geometrically. Furthermore, if tiles specifically for certain components, such as attributes, are defined using the extended syntax of AFPS, they are called attribute tiles. The unit of a tile is a rectangle, and (common to both tile information and attribute tile information) the tile definition may include the number of columns and rows of tiles that make up the screen, the width of a tile in a given column, and the height of a tile in a given row. Hereinafter, all tiles defined in the Atlas Frame Parameter Set (AFPS), including both tiles (geometry tiles) and attribute tiles, may be referred to as atlas styles. This specification uses this definition of atlas styles. Note that only tiles defined using the standard syntax (in this case, geometry tiles) may also be referred to as atlas styles.

[0067] Figure 10 shows an example of the syntax for tile information in the Atlas Frame Parameter Set (AFPS), which is a picture / frame-level parameter set. The semantics of each syntax element are as follows. When dividing the atlas frame into tiles, the occupancy frame, geometry frame, and attribute frame are also divided into tiles in the same way. However, the attribute frame can be divided into independent attribute tiles as described later.

[0068] afti_single_tile_in_atlas_frame_flag: A flag indicating whether there is only one tile in each atlas frame referencing the AFPS. If the value is true, it indicates that there is only one tile in each atlas frame referencing the AFPS. If the value is false, it indicates that there are multiple (greater than one) tiles in each atlas frame referencing the AFPS.

[0069] afti_single_partition_per_tile_flag: A flag indicating whether each tile referencing an AFPS contains only one tile partition. A value of true indicates that each tile referencing an AFPS contains only one tile partition, while a value of false indicates that each tile referencing an AFPS contains multiple (greater than one) tile partitions. If not present, the value of afti_single_partition_per_tile_flag is assumed to be equal to 1.

[0070] afti_num_tiles_in_atlas_frame_minus1: Specifies the number of tiles in each atlas frame referencing the AFPS. The value of afti_num_tiles_in_atlas_frame_minus1 must be in the range of 0 to NumPartitionsInAtlasFrame-1. If it does not exist and afti_single_partition_per_tile_flag is equal to 1, the value of afti_num_tiles_in_atlas_frame_minus1 is assumed to be equal to NumPartitionsInAtlasFrame-1.

[0071] afti_signalled_tile_id_flag: A flag indicating whether or not the tile ID of each tile is signaled. If the flag is equal to 1, the tile ID of each tile is signaled. If the flag is equal to 0, the tile ID is not signaled.

[0072] afti_signalled_tile_id_length_minus1: afti_signalled_tile_id_length_minus1+1 specifies the number of bits used to represent the syntax element afti_tile_id[i] (if present) and the syntax element ath_id in the tile header. The value of afti_signalled_tile_id_length_minus1 must be in the range of 0 to 15.

[0073] afti_tile_id[i]: Specifies the tile ID of the i-th tile. If it does not exist, the value of afti_tile_id[i] is assumed to be equal to i for each i in the range from 0 to afti_num_tiles_in_atlas_frame_minus1. A requirement for the conformance of the bitstream is that afti_tile_id[i] is not equal to afti_tile_id[j] for all i != j (there should be no cases where they are equal). The 3D data decoding device 31 decodes the bitstream that satisfies the conformance requirement (and so on).

[0074] The tile information decoding unit 3022 may decode and encode the syntax element afti_num_tiles_in_atlas_frame_minus2, which indicates the number of tiles minus 2 (the number of tiles minus 2), when decoding and encoding afti_single_tile_in_atlas_frame_flag and afti_single_partition_per_tile_flag. Alternatively, the syntax element afti_num_tiles_in_atlas_frame_minus2, which indicates the number of referenced tiles minus 2, may be decoded and encoded only when the value of afti_single_tile_in_atlas_frame_flag is false and the value of afti_single_partition_per_tile_flag is false. The following example semantics may be used.

[0075] afti_num_tiles_in_atlas_frame_minus2: Specifies the number of tiles in each atlas frame that references the Atlas Frame parameter set AFPS. The value of afti_num_tiles_in_atlas_frame_minus1 must be in the range of 0 to NumPartitionsInAtlasFrame-2. If it does not exist and afti_single_partition_per_tile_flag is equal to 1, the value of afti_num_tiles_in_atlas_frame_minus2 is assumed to be equal to NumPartitionsInAtlasFrame-2.

[0076] In this configuration, the case where there is only one referenced tile can be represented by afti_single_tile_in_atlas_frame_flag. Therefore, by decoding and encoding a syntax element that indicates the number of tiles minus 2, the overhead of the coding amount is reduced.

[0077] (Decoding and Derivation of Extended Encoding Parameters) The extended encoding parameters decoded from the encoded data by the extended information decoding unit 3023 will be explained below.

[0078] Figure 9 shows an example of the syntax for extended coding parameter information in AFPS, which is a picture / frame-level parameter set.

[0079] afve_overriden_flag: This flag indicates whether or not to update the mesh displacement coordinate system. If this flag is equal to true, the mesh displacement coordinate system will be updated based on the value of mdu_displacement_coordinate_system described below. If this flag is equal to false, the mesh displacement coordinate system will not be updated.

[0080] afve_subdivision_iteration_count: Indicates the number of mesh subdivision iterations.

[0081] (Decoding and Derivation of Attribute Tile Level Encoding Parameters) This section explains the "attribute-level tile definition (attribute tile information)," which is the attribute tile level encoding parameter decoded from the encoded data by the extended information decoding unit 3023. In the V3C standard, a partition division common to all data is defined as a tile, but there may also be a partition division that applies only to attribute data. Such a different tile is called an "attribute tile," and its tile definition is called "attribute tile information." The syntax elements and parameters of attribute tile information are basically the same as those of tile information, but the difference is that the application is limited to attributes, and encoding and decoding are done on an attribute-by-attribute basis (for each attrIdx).

[0082] Figure 11 shows an example of the syntax for attribute tile information coding parameters in AFPS, which is a picture / frame-level parameter set.

[0083] aftai_single_tile_in_atlas_frame_flag[attrIdx]: If aftai_single_tile_in_atlas_frame_flag[attrIdx] is equal to 1, it specifies that there is only one tile for the attribute signaled per attribute video data unit with index attrIdx. If aftai_single_tile_in_atlas_frame_flag[attrIdx] is equal to 0, it specifies that there are two or more tiles for the attribute signaled per attribute video data unit with index attrIdx.

[0084] aftai_uniform_partition_spacing_flag[attrIdx]: If aftai_uniform_partition_spacing_flag[attrIdx] is equal to 1, it specifies that the atlas tiling for the signaled attribute in the attribute video data unit with index attrIdx will use a method that uniformly divides the column and row boundaries across the attribute atlas frame (attribute frame). Information corresponding to these boundaries is signaled using the syntax elements aftai_partition_cols_width_minus1[attrIdx] and aftai_partition_rows_height_minus1[attrIdx], respectively. If aftai_uniform_partition_spacing_flag[attrIdx] is equal to 0, it specifies that the atlas tiling for the signaled attribute in the attribute video data unit with index attrIdx will use a method that may result in column and row boundaries that are not uniformly divided across atlas frames. In this case, these boundaries are signaled using the syntax elements aftai_num_partition_columns_minus1[attrIdx] and aftai_num_partition_rows_minus1[attrIdx], as well as the syntax elements aftai_partition_column_width_minus1[attrIdx][i] and aftai_partition_row_height_minus1[attrIdx][i]. If none exist, the value of aftai_ti_uniform_partition_spacing_flag[attrIdx] is assumed to be equal to 1.

[0085] aftai_partition_cols_width_minus1[attrIdx]: The value of aftai_partition_cols_width_minus1[attrIdx] plus 1 specifies the width of the attribute tile partition column (tile column width) of the attribute video data unit with index attrIdx, excluding the rightmost attribute tile partition column of the attribute atlas frame, in units of 64 samples, when aftai_uniform_partition_spacing_flag[attrIdx] is equal to 1. The value of aftai_partition_cols_width_minus1[attrIdx] is in the range of 0 to asve_attribute_frame_width[attrIdx] / 64 - 1. If it does not exist, the value of aftai_partition_cols_width_minus1[attrIdx] is estimated to be equal to asve_attribute_frame_width[attrIdx] / 64 - 1.

[0086] aftai_partition_rows_height_minus1[attrIdx]: The value of aftai_partition_rows_height_minus1[attrIdx] plus 1 specifies the height of the attribute tile partition row in the attribute video data unit of index attrIdx, excluding the lowest attribute tile partition row of the 64-sample unit attribute atlas frame, when aftai_uniform_partition_spacing_flag[attrIdx] is equal to 1. The value of aftai_partition_rows_height_minus1[attrIdx] is in the range of 0 to asve_attribute_frame_height[attrIdx] / 64 - 1. If it does not exist, the value of aftai_partition_rows_height_minus1[attrIdx] is estimated to be equal to asve_attribute_frame_height[attrIdx] / 64 - 1.

[0087] aftai_num_partition_columns_minus1[attrIdx]: The value of aftai_num_partition_columns_minus1[attrIdx] plus 1 specifies the number of attribute tile partition columns in the attribute video data with index attrIdx that are used to partition the attribute frame when aftai_uniform_partition_spacing_flag[attrIdx] is equal to 0. The value of aftai_num_partition_columns_minus1[attrIdx] is in the range of 0 to asve_attribute_frame_width[attrIdx] / 64 - 1. If aftai_single_tile_in_atlas_frame_flag[attrIdx] is 1, the value of aftai_num_partition_columns_minus1[attrIdx] is assumed to be equal to 0.

[0088] aftai_num_partition_rows_minus1[attrIdx]: The value of aftai_num_partition_rows_minus1[attrIdx] plus 1 specifies the number of attribute tile partition rows of attribute video data with index attrIdx that are used to partition the attribute atlas frame when aftai_uniform_partition_spacing_flag[attrIdx] is equal to 0. The value of aftai_num_partition_rows_minus1[attrIdx] is in the range of 0 to asve_attribute_frame_height[attrIdx] / 64 - 1. If aftai_single_tile_in_atlas_frame_flag[attrIdx] is 1, the value of aftai_num_partition_rows_minus1[attrIdx] is assumed to be equal to 0.

[0089] aftai_partition_column_width_minus1[attrIdx][i]: The value of aftai_partition_column_width_minus1[attrIdx][i] plus 1 specifies the width of the i-th attribute tile partition column of the attribute video data with index attrIdx in units of 64 samples.

[0090] aftai_partition_row_height_minus1[attrIdx][i]: The value of aftai_partition_row_height_minus1[attrIdx][i] plus 1 specifies the height in 64 samples of the i-th attribute tile partition row of the attribute video data with index attrIdx.

[0091] aftai_single_partition_per_tile_flag[attrIdx][i]: If aftai_single_partition_per_tile_flag[attrIdx] is equal to 1, it specifies that each attribute tile of the attribute indicated by the attribute video data unit with index attrIdx contains one tile partition. If aftai_single_partition_per_tile_flag[attrIdx] is equal to 0, it specifies that an attribute tile of the attribute video data unit with index attrIdx may contain multiple attribute tile partitions. If it does not exist, the value of aftai_single_partition_per_tile_flag[attrIdx] is assumed to be equal to 1.

[0092] aftai_num_tiles_in_atlas_frame_minus1[attrIdx]: The value of aftai_num_tiles_in_atlas_frame_minus1[attrIdx] plus 1 specifies the number of attribute tiles in each attribute atlas frame for the attribute signaled by the attribute video data unit at index attrIdx. The value of aftai_num_tiles_in_atlas_frame_minus1[attrIdx] is in the range of 0 to NumPartitionsInAtlasFrameAtt[attrIdx]-1. If aftai_num_tiles_in_atlas_frame_minus1[attrIdx] does not exist and aftai_single_partition_per_tile_flag[attrIdx] is equal to 1, the value of aftai_num_tiles_in_atlas_frame_minus1[attrIdx] is presumed to be equal to NumPartitionsInAtlasFrameAtt[attrIdx] - 1. Here, the variable NumPartitionsInAtlasFrameAtt[attrIdx] is set to be equal to NumPartitionColumnsAtt[attrIdx] * NumPartitionRowsAtt[attrIdx]. If aftai_single_tile_in_atlas_frame_flag[attrIdx] is equal to 0, then NumPartitionsInAtlasFrameAtt[attrIdx] must be greater than 1.

[0093] aftai_top_left_partition_idx[attrIdx][i]: aftai_top_left_partition_idx[attrIdx][i] specifies the partition index of the attribute tile partition located in the top-left corner of the i-th tile of the attribute video data with index attrIdx. The value of aftai_top_left_partition_idx[attrIdx][i] ranges from 0 to NumPartitionsInAtlasFrameAtt[attrIdx] - 1. The length of the aftai_top_left_partition_idx[attrIdx][i] syntax element is Ceil(Log2(NumPartitionsInAtlasFrameAtt[attrIdx])) bits.

[0094] aftai_bottom_right_partition_column_offset[attrIdx][i]: aftai_bottom_right_partition_column_offset[attrIdx][i] specifies the offset between the column position of the attribute tile partition of the attribute video data with index attrIdx located in the bottom right corner of the i-th attribute tile and the column position of the attribute tile partition with a partition index equal to aftai_bottom_right_partition_column_offset[attrIdx][i]. When aftai_single_partition_per_tile_flag[attrIdx] is equal to 1, the value of aftai_bottom_right_partition_column_offset[attrIdx][i] is assumed to be equal to 0.

[0095] aftai_bottom_right_partition_row_offset[attrIdx][i]: aftai_bottom_right_partition_row_offset[attrIdx][i] specifies the offset between the row position of the attribute tile partition of the attribute video data with index attridx located in the bottom right corner of the i-th attribute tile and the row position of the attribute tile partition with a partition index equal to aftai_top_left_partition_idx[attrIdx][i]. When aftai_single_partition_per_tile_flag[attrIdx] is equal to 1, the value of aftai_bottom_right_partition_row_offset[attrIdx][i] is assumed to be equal to 0.

[0096] aftai_signalled_tile_id_flag[attrIdx]: If aftai_signalled_tile_id_flag[attrIdx] is equal to 1, it specifies that the attribute tile ID of each attribute tile in the attribute video data with index attrIdx will be signaled. If aftai_signalled_tile_id_flag[attrIdx] is equal to 0, it specifies that the attribute tile ID will not be signaled.

[0097] aftai_signalled_tile_id_length_minus1[attrIdx]: The value of aftai_signalled_tile_id_length_minus1[attrIdx] plus 1 specifies the number of bits used to represent the syntax element aftai_tile_id[attrIdx][i], if it exists. The value of aftai_signalled_tile_id_length_minus1[attrIdx] is in the range of 0 to 15. If it does not exist, the value of aftai_signalled_tile_id_length_minus1[attrIdx] is estimated to be equal to Ceil( Log2( aftai_num_tiles_in_atlas_frame_minus1[attrIdx] + 1 ) ) - 1.

[0098] aftai_tile_id[attrIdx][i]: Specifies the tile ID of the i-th tile in the atlas attribute nominal frame with attribute index attrIdx. If it does not exist, the value of aftai_tile_id[attrIdx][i] is estimated to be the StartId[attrIdx] corresponding to attrIdx plus i. This estimation applies to all values ​​of i from 0 to aftai_num_tiles_in_atlas_frame_minus1[attrIdx]. As an atlas bitstream compliance requirement, aftai_tile_id[attrIdx][i] must not be equal to aftai_tile_tile_id[attrIdx][j]. (However, this applies only when i != j.) The length of the syntactic element of aftai_tile_id[attrIdx][i] is aftai_signalled_tile_id_length_minus1[attrIdx] + 1 bits. Additionally, as a bitstream compliance requirement, the TileIndexToID array must contain a single-digit value. Furthermore, if aftai_tile_id[attrIdx][i] does not exist and afve_consistent_tiling_across_attribute_video_flag is 1, the value of aftai_tile_id[attrIdx][i] is presumed to be aftai_tile_id[refAttrIdx][i].

[0099] (Configuration of Attribute Tile Information Syntax) An attribute video stream frame (attribute frame) can be divided into one or more partitions, and attribute tiles can be constructed from these units (partitions). Typical cases include: - Using the entire attribute frame as a single attribute tile without dividing it (aftai_single_tile_in_atlas_frame_flag[attrIdx]==1). - Dividing the attribute frame into multiple partitions and using one partition as one attribute tile (aftai_single_tile_in_atlas_frame_flag[attrIdx]==0 and aftai_single_partition_per_tile_flag[attrIdx]==1). - Divide the attribute frame into multiple partitions, and use one or more partitions that are consecutive horizontally and vertically as a single attribute tile (aftai_single_tile_in_atlas_frame_flag[attrIdx]==0 and aftai_single_partition_per_tile_flag[attrIdx]==0).

[0100] An attribute frame can be divided into tile partitions (hereinafter also called partitions) of NumPartitionColumns * NumPartitionRows. When dividing, you can choose to divide the frame at equal intervals or at specified units. NumPartitionColumns and NumPartitionRows are the number of partitions in the horizontal and vertical directions, respectively.

[0101] Note that tiles are not limited to attribute frames; they can also be attributes, geometries, displacements, or meshes. In other words, the following syntax elements and their bitstream conformance conditions can also be used for attribute, geometry, displacement, and mesh tiles.

[0102] Figure 11 is a diagram of the syntax for atlas_frame_attribute_tile_information() tile information for V-DMC.

[0103] The extended information decoding unit 3023 of the atlas information decoding unit 302 decodes the syntax element aftai_single_tile_in_atlas_frame_flag[attrIdx]. aftai_single_tile_in_atlas_frame_flag[attrIdx] is a binary flag that indicates whether the attribute frame consists of a single tile or not, and has a value that indicates the attribute frame consists of a single tile (e.g., 1) or a value that indicates the attribute frame consists of multiple tiles (e.g., 0). If the value of aftai_single_tile_in_atlas_frame_flag[attrIdx] is a value that indicates multiple tiles, the extended information decoding unit 3023 decodes the syntax element aftai_uniform_partition_spacing_flag[attrIdx]. Here, aftai_uniform_partition_spacing_flag[attrIdx] is a binary flag that indicates whether or not to divide the attribute frame into equally spaced partitions. It can take either a value (e.g., 1) to indicate that the attribute frame should be divided into equally spaced partitions, or a value (e.g., 0) to indicate that the attribute frame should not be divided into equally spaced partitions.

[0104] The extended information decoding unit 3023 decodes parameters indicating the position and size of the tiles.

[0105] If aftai_uniform_partition_spacing_flagAtt[attrIdx] is a value of 1, the extended information decoding unit 3023 decodes the syntax elements aftai_partition_cols_width_minus1[attrIdx] and aftai_partition_cols_width_minus1[attrIdx] which indicate the width (column width) and height (row height) of each partition except for the rightmost column (rightmost column) and the bottommost row (bottommost row). For each i=0..NumPartitionColumnsAtt[attrIdx]-1 and j=0..NumPartitionRowsAtt[attrIdx], the following are calculated for PartitionPosXAtt[attrIdx][i], PartitionPosYAtt[attrIdx][j], PartitionWidthAtt[attrIdx][i], and PartitionHeightAtt[attrIdx][j], which represent the x, y coordinates, width, and height of the top-left corner of each partition.

[0106] widthPartition = ( aftai_partition_cols_width_minus1[attrIdx] + 1 ) * 64 NumPartitionColumnsAtt[attIdx] = asve_attribute_frame_width[attrIdx] / widthPartition PartitionPosXAtt[attrIdx][ 0 ] = 0 PartitionWidthAtt[attrIdx][ 0 ] = widthPartition for( i = 1; i < NumPartitionColumnsAtt[attrIdx] - 1; i++ ) { PartitionPosXAtt[attrIdx][ i ] = PartitionPosXAtt[attrIdx][ i - 1 ] + PartitionWidthAtt[attrIdx][ i - 1 ] PartitionWidthAtt[attrIdx][ i ] = widthPartition } partitionHeightAtt[attrIdx] = (aftai_partition_rows_height_minus1[attrIdx] +1) * 64 NumPartitionRowsAtt[attrIdx] = asve_attribute_frame_height[attrIdx] / partitionHeight PartitionPosYAtt[attrIdx][ 0 ] = 0 PartitionHeightAtt[attrIdx][ 0 ] = heightPartition for( j = 1; j < NumPartitionRowsAtt[attrIdx] - 1;If aftai_uniform_partition_spacing_flagAtt[attrIdx] is a value of 0, the extended information decoding unit 3023 decodes the syntax elements aftai_num_partition_columns_minus1[attrIdx] and aftai_num_partition_rows_minus1[attrIdx], which indicate the number of horizontal and vertical tile partitions.

[0107] For each i=0..NumPartitionColumnsAtt[attrIdx]-1 and j=0..NumPartitionRowsAtt[attrIdx], the following are calculated for PartitionPosXAtt[attrIdx][i], PartitionPosYAtt[attrIdx][j], PartitionWidthAtt[attrIdx][i], and PartitionHeightAtt[attrIdx][j], which represent the x, y coordinates, width, and height of the top-left corner of each partition.

[0108] NumPartitionColumnsAtt[attrIdx] = aftai_num_partition_columns_minus1[attrIdx] + 1 PartitionPosXAtt[attrIdx][ 0 ] = 0 partitionWidthAtt[attrIdx][ 0 ] = ( aftai_partition_column_width_minus1[attrIdx][ 0 ] + 1 ) * 64 for( i = 1; i < NumPartitionColumnsAtt[attrIdx] - 1; i++ ) { PartitionPosXAtt[attrIdx][ i ] = PartitionPosXAtt[attrIdx][ i - 1 ] + PartitionWidthAtt[attrIdx][ i - 1 ] PartitionWidthAtt[attrIdx][ i ] = ( aftai_partition_column_width_minus1[attrIdx][ i ] + 1 ) * 64 } NumPartitionRowsAtt[attrIdx] = aftai_num_partition_rows_minus1[attrIdx] + 1 PartitionPosYAtt[attrIdx][ 0 ] = 0 PartitionHeightAtt[attrIdx][ 0 ] = ( aftai_partition_row_height_minus1[attrIdx][ 0 ] + 1 ) * 64 for( j = 1; j < NumPartitionRowsAtt[attrIdx] - 1;j++ ) { PartitionPosYAtt[attrIdx][ j ] = PartitionPosYAtt[attrIdx][ j - 1 ] + PartitionHeightAtt[attrIdx][ j - 1 ] PartitionHeightAtt[attrIdx][ j ] = ( aftai_partition_row_height_minus1[attrIdx][ j ] + 1 ) * 64} Also, if the number of partitions in the horizontal and vertical directions is 2 or more, the PartitionPosXAtt[attrIdx][i], PartitionPosYAtt[attrIdx][j], PartitionWidthAtt[attrIdx][i], PartitionHeightAtt[attrIdx][j], which indicate the x, y coordinates, width, and height of the top left of each rightmost and bottommost partition, are decoded as follows: ;

[0109] PartitionPosXAtt[attrIdx][ NumPartitionColumnsAtt[attrIdx - 1 ] = PartitionPosXAtt[attrIdx][ NumPartitionColumnsAtt[attrIdx - 2 ] + PartitionWidthAtt[attrIdx][ NumPartitionColumnsAtt[attrIdx] - 2 ] PartitionWidthAtt[attrIdx][ NumPartitionColumnsAtt[attrIdx] - 1 ] = asve_attribute_frame_width[attrIdx] - PartitionPosXAtt[attrIdx][ NumPartitionColumnsAtt[attrIdx] - 1 ] PartitionPosYAtt[attrIdx][ NumPartitionRowsAtt[attrIdx - 1 ] = PartitionPosYAtt[attrIdx][ NumPartitionRowsAtt[attrIdx - 2 ] + partitionHeightAtt[attrIdx][ NumPartitionRowsAtt[attrIdx - 2 ] PartitionHeight[ NumPartitionRowsAtt[attrIdx - 1 ] = asve_attribute_frame_height[attrIdx] - PartitionPosYAtt[attrIdx][ NumPartitionRows - 1 ] Here, the width and height of each partition are set to multiples of 64, but they are not limited to 64; 64 can be replaced with 32, 128, or 256.

[0110] The extended information decoding unit 3023 decodes the syntax element aftai_single_partition_per_tile_flag[attrIdx]. Here, aftai_single_partition_per_tile_flag[attrIdx] is a flag that indicates whether each tile consists of only a single partition, and has a value that indicates each tile consists of only a single partition (e.g., 1) or a value that indicates each tile consists of multiple partitions (e.g., 0). If aftai_single_partition_per_tile_flag[attrIdx] is a value that indicates multiple partitions, the extended information decoding unit 3023 decodes the syntax element aftai_num_tiles_in_atlas_frame_minus1[attrIdx] and performs the following processing to decode the tile parameters from one or more selected partitions. Here, aftai_num_tiles_in_atlas_frame_minus1 is the number of tiles that make up the attribute frame.

[0111] The extended information decoding unit 3023 decodes the syntax elements aftai_top_left_partition_idxAtt[attrIdx][i], aftai_bottom_right_partition_column_offset[attrIdx][i], and aftai_bottom_right_partition_row_offset[attrIdx][i] for each i=0..aftai_num_tiles_in_atlas_frame_minus1[attrIdx]. Here, aftai_top_left_partition_idx[attrIdx][i] is the index of the partition where the top-left corner (point) of the i-th tile is located, aftai_bottom_right_partition_column_offset[attrIdx][i] is the horizontal offset amount of the bottom-right corner of the i-th tile relative to the top-left corner of the i-th tile, and aftai_bottom_right_partition_row_offset[attrIdx][i] is the height offset amount of the bottom-right corner of the i-th tile relative to the top-left corner of the i-th tile.

[0112] Based on the decoded syntax above, the indices of the top-left horizontal, top-right horizontal, and bottom-right horizontal and top-right partitions of each tile i, topLeftColumnAtt[attrIdx][i], topLeftRowAtt[attrIdx][i], bottomRightColumnAtt[attrIdx][i], and bottomRightRowAtt[attrIdx][i], are calculated as follows.

[0113] topLeftColumnAtt[attrIdx][ i ] = aftai_top_left_partition_idxAtt[attrIdx][i] % NumPartitionColumnsAtt[attrIdx] topLeftRowAtt[attrIdx][ i ] = aftai_top_left_partition_idxAtt[attrIdx][ i ] / . NumPartitionColumnsAtt[attrIdx] bottomRightColumnAtt[attrIdx][ i ] = topLeftColumnAtt[attrIdx][ i ] + aftai_bottom_right_partition_column_offsetAtt[attrIdx][i ] bottomRightRowAtt[attrIdx][ i ] = topLeftRowAtt[attrIdx][ i ] + aftai_bottom_right_partition_row_offsetAtt[attrIdx][ i ] if bottomRightColumnAtt[attrIdx][ i ], bottomRightRowAtt[attrIdx][ i ] and if( asve_attribute_frame_width[attrIdx] + 63 ) / 64 - 1 , ( asve_attribute_frame_height [ attrIdx ] + 63 ) / 64 - 1

[0114] In a 3D data decoding device 31 that decodes mesh data or point cloud data, the device has means for decoding syntax elements indicating the position of attribute tiles, and decoding the column topLeftColumnAtt[attrIdx] of the upper left partition of the tile, the row topLeftRowAtt[attrIdx] of the upper left partition, the column bottomRightColumnAtt[attrIdx] of the lower right partition of the attribute tile, and the row bottomRightRowAtt[attrIdx] of the lower right partition, and the 3D data decoding device 31 has means for decoding the column (topLeftCo) of the partition of the i-th attribute tile. The 3D data decoding device 31 may decode a bitstream that satisfies specific bitstream conformance conditions for lumnAtt[attrIdx][i], bottomRightColumnAtt[attrIdx][i]), the column of the partition of the j-th attribute tile (topLeftColumnAtt[attrIdx][j]), the row of the partition of the i-th attribute tile (topLeftRowAtt[attrIdx][i], bottomRightRowAtt[attrIdx][i]), and the row of the partition of the j-th attribute tile (topLeftRowAtt[attrIdx][j]). The 3D data decoding device 31 may decode a bitstream that satisfies the following bitstream conformance conditions.

[0115] (Decoding of additional information) The additional information decoding unit 3024 decodes the additional information (SEI) of the tile submesh information SEI from the encoded data (additional information encoded stream). Furthermore, it may decode SEI for post-processing such as zippering to remove cracks in the mesh, or SEI for extracting mesh displacement for each LoD. The extended information decoding unit 3023 and the extended information encoding unit 1011 decode and encode the syntax elements of the tile submesh information SEI, respectively.

[0116] (Relationship between Tiles and Codec Tiles) The 3D data decoding device 31, which decodes mesh data or point cloud data, decodes the atlas frame (displacement frame gFrame) and attribute frame aFrame in tile units. The atlas frame (displacement frame gFrame) and attribute frame aFrame are encoded with codecs such as HEVC and VVC. Codecs such as HEVC and VVC also partition the frame into rectangular regions called independent decoding units (tiles). A tile is a unit that can be decoded independently (in parallel) within that frame. Tiles, which are rectangular regions defined in video codecs such as HEVC and VVC, are called "codec tiles (independently decodeable units, video tiles)" to distinguish them from V3C and V-DMC tiles (geometry tiles, attribute tiles). The definition of this tile is called "codec tile (video tile) information". When the displacement subbitstream is encoded in ISO / IEC23008-2 (HEVC), the codec tile is defined as a tile as defined in ISO / IEC23008-2. If the displacement subbitstream is encoded in ISO / IEC 23090-3 (VVC), it is defined as a tile as defined in ISO / IEC 23090-3.

[0117] (Tile Submesh Information SEI) Figure 21 shows an example of the syntax for Tile Submesh Information SEI.

[0118] This is an example of SEI syntax indicating the index of the submesh associated with a tile. Furthermore, if the frame containing the tile is encoded by a codec, it may also include syntax indicating the index of the codec tile associated with the tile. The tile submesh information SEI includes the syntax elements tmsm_tile_type_idc, tmsm_num_tiles_minus1, tmsm_tile_id_length_minus1, tmsm_tile_id[i], tmsm_use_single_mesh_flag[i], tmsm_num_submeshes_minus2[i], tmsm_submesh_id_length_minus1[i], tmsm_submesh_id[i], tmsm_num_codec_tiles_in_tile_minus1[i], and tmsm_codec_tile_idx[i][j]. The tile submesh information SEI may also include persistence flags: tmsm_persistance_mapping_flag, tmsm_codec_tile_signal_flag, tmsm_tile_type_flag[i], tmsm_geo_codec_tile_alignment_flag, and tmsm_attr_codec_tile_alignment_flag. The semantics of each syntax element are as follows. The persistence scope of this SEI message is the remainder of the bitstream or until a new tile submesh mapping SEI message is received with the same or the same value of tmsm_persistance_mapping_flag. Mappings defined before the previous SEI message persist if the value of tmsm_persistance_mappingflag is equal to 1.

[0119] tmsm_persistance_mapping_flag: If tmsm_persistance_mapping_flag is equal to 1, it indicates that the tile submesh mapping is persistent. If it is equal to 0, it indicates that the tile submesh mapping is only valid for the current frame.

[0120] tmsm_tile_type_idc: If tmsm_tile_type_idc is equal to 0, the tile indicated in this SEI message is of type P_TILE or I_TILE (displacement frame tile, geometry tile) and transmits geometry-related information. If it is equal to 1, the tile indicated in this SEI message is of type P_TILE_ATTR or I_TILE_ATTR (attribute tile) and transmits attribute-related information.

[0121] tmsm_num_tiles_minus1: The value of tmsm_num_tiles_minus1 plus 1 indicates the total number of tiles of type P_TILE and I_TILE in the encoded atlas sequence (CAS) if tmsm_tile_type_idc is equal to 0. If tmsm_tile_type_idc is equal to 1, it indicates the total number of tiles of type P_TILE_ATTR and I_TILE_ATTR in the encoded atlas sequence (CAS). tmsm_num_tiles_minus1 must be equal to TotalTileCount-1. Also, the value assigned to Total_Tile_Count changes depending on the value of tmsm_tile_type_idc. If tmsm_tile_type_idc is equal to 0, TotalTileCount is assigned afti_num_tiles_in_atlas_frame_minus1+1. If tmsm_tile_type_idc is equal to 1 and asve_attribute_information_present_flag is equal to 1, TotalTileCount is assigned to afati_num_tiles_in_atlas_frame_minus1[i]. Index i takes the range 0..AspsAttributeNominalFrameCount.

[0122] if(tmsm_tile_type_idc == 0){ TotalTileCount = afti_num_tiles_in_atlas_frame_minus1 +1} if(tmsm_tile_type_idc == 1){ if(asve_attribute_information_present_flag){ for(i = 0; i < AspsAttributeNominalFrameCount; i++) TotalTileCount = afti_num_tiles_in_atlas_frame_minus1[ i ] + 1}} tmsm_tile_id_length_minus1: The value of tmsm_tile_id_length_minus1 plus 1 specifies the number of bits to use to represent the syntax element tmsm_tile_id[ i ] if it exists. The value of tmsm_tile_id_length_minus1 is in the range of 0 to 15.

[0123] tmsm_tile_id[i]: Specifies the tile ID of the i-th tile. If tmsm_tile_id[i] does not exist, the value of tmsm_tile_id[i] is presumed to be equal to i for each i in the range from 0 to tmsm_num_tiles_minus1, including tmsm_num_tiles_minus1. A bitstream conformance requirement is that tmsm_tile_id[i] is not equal to tmsm_tile_id[k] for all i != k.

[0124] tmsm_use_single_mesh_flag[i]: If tmsm_use_single_mesh_flag[i] is equal to 1, it indicates that the tile at index i has only one submesh. If tmsm_use_single_mesh_flag[i] is 0, it indicates that the tile at index i may have multiple submeshes.

[0125] tmsm_num_submeshes_minus2[i]: The value obtained by adding 2 to tmsm_num_submeshes_minus2[i] specifies the number of submeshes present in the Atlas style.

[0126] tmsm_submesh_id_length_minus1: The value of tmsm_submesh_id_length_minus1 plus 1 specifies the number of bits used to represent the syntax element tmsm_submesh_id[i][j]. The value of tmsm_signalled_submesh_id_length_minus1 is in the range of 0 to 15. If tmsm_signalled_submesh_id_length_minus1 does not exist, its value is assumed to be equal to Ceil(Log2(NumSubmeshes))-1.

[0127] tmsm_submesh_id[i][j]: Specifies the submesh ID of the j-th submesh associated with the tile at index i. If it does not exist, the value of tmsm_submesh_id[i][j] is assumed to be equal to j for each j in the range from 0 to NumSubMeshes-1. The length of tmsm_submesh_id[i][j] is tmsm_submesh_id_length_minus1[i]+1 bits.

[0128] tmsm_num_codec_tiles_in_tile_minus1[i]: The value obtained by adding 1 to tmsm_num_codec_tiles_in_tile_minus1[i] indicates the number of codec tiles contained in the i-th tile. If tmsm_num_codec_tiles_in_tile_minus1[i] does not exist for i within the range of 0 to tmsm_num_tiles_minus1, it is estimated as follows:

[0129] If tmsm_tile_type_idc is 0, then num_codec_tiles_in_tile_minus1[i] is estimated to be afti_num_tiles_in_atlas_frame_minus1.

[0130] If tmsm_tile_type_idc is valued at 1, then num_codec_tiles_in_tile_minus1[i] is estimated to be = aftai_num_tiles_in_atlas_frame_minus1[i].

[0131] tmsm_codec_tile_idx[i][j]: tmsm_codec_tile_idx[i][j] indicates the index of the j-th codec tile contained in the i-th tile. If tmsm_codec_tile_idx[i][j] does not exist for the i-th tile, it is assumed to be equal to j for the range of j from 0 to tmsm_num_codec_tiles_in_tile_minus1[i] (including both ends).

[0132] tmsm_num_codec_tiles_in_tile_minus[ i ] = j.

[0133] Figure 19 illustrates the relationship between tiles and codec tiles and the values ​​of tmsm_codec_tile_idx[i][j] in the case where tmsm_tile_type_idc == 0 (Figure 19(a)) and tmsm_tile_type_idc == 1 (Figure 19(b)), as an example of a tiled submesh SEI. Codec tiles may be macroblock slices of AVC / H.264, slices or tiles of HEVC / H.265, or slices, tiles or subpictures of VVC / H.266. Codec tiles initialize arithmetic codes such as CABAC at the beginning of each partition (also called a subset or segment) that divides the frame. The partitioning method for codec tiles is derived by decoding the video codec bitstream parameter sets (Sequence Parameter Set (SPS), Picture Parameter Set (PPS)) and slice headers. Tiles represent the partitioning of the atlas frame. The tile division method is obtained by decoding the atlas information (atlas_frame_tile_information()) from the atlas information encoding stream.

[0134] According to the above configuration, the additional information decoding unit 3024 decodes the index tmsm_codec_tile_index[i][j] of each codec tile associated with the tile from the encoded data. This allows the codec tile necessary for decoding a certain tile to be identified as the codec tile indicated by the index tmsm_codec_tile_index[i][j], and thus only these necessary codec tiles can be decoded in parallel.

[0135] Figure 19(a) shows an example of an image frame consisting of one geometry tile and the codec tiles contained within that geometry tile. Tiles and codec tiles are numbered (indexed) in the order of the raster scan. The example in Figure 19(a) shows an example where tmsm_tile_type_idc is valued at 0 and tmsm_num_tiles_minus1 is 0 (afti_num_tiles_in_atlas_frame_minus1 is 0). In the example in Figure 19(a), the 0th tile contains the 0th, 1st, 2nd, and 3rd codec tiles. Each number listed on the tile / codec tile in Figure 19(a) indicates the value of j in tmsm_codec_tile_idx[0][j], and the value of tmsm_codec_tile_idx[0][j] can be expressed as follows:

[0136] tmsm_codec_tile_idx[ 0 ][ 0 ] = 0 tmsm_codec_tile_idx[ 0 ][ 1 ] = 1 tmsm_codec_tile_idx[ 0 ][ 2 ] = 2 tmsm_codec_tile_idx[ 0 ][ 3 ] = 3 Figure 19(b) also shows an example of an image frame composed of two attribute tiles and the codec tiles contained within each attribute tile. Attribute tiles and codec tiles are assigned numbers (indexes) in the order of the raster scan. The example in Figure 19(b) shows an example where tmsm_tile_type_idc is valued at 1 and tmsm_num_tiles_minus1 is 1 (aftai_num_tiles_in_atlas_frame_minus1 is 1). In the example in Figure 19(b), the 0th attribute tile contains the 0th, 1st, 2nd, and 3rd codec tiles. Each number listed in the attribute tile / codec tile in Figure 19(a) represents the value of j in tmsm_codec_tile_idx[0][j], and the value of tmsm_codec_tile_idx[0][j] can be expressed as follows:

[0137] tmsm_codec_tile_idx[ 0 ][ 0 ] = 0 tmsm_codec_tile_idx[ 0 ][ 1 ] = 1 tmsm_codec_tile_idx[ 0 ][ 2 ] = 2 tmsm_codec_tile_idx[ 0 ][ 3 ] = 3 tmsm_codec_tile_idx[ 1 ][ 0 ] = 4 tmsm_codec_tile_idx[ 1 ][ 1 ] = 5 (Configuration including both tiles and attribute tiles) In another configuration, one tile submesh information SEI may send the relationship between the tile (geometry) and attribute tile (attribute) submesh and tile. The tile index shown in the loop i=0..tmsm_num_tiles_minus1 is added in the order of all attribute styles, and then all attribute tiles. For example, if the number of tiles is 2 (afti_num_tiles_in_atlas_frame_minus1=1) and the number of attributes is 3 (AspsAttributeNominalFrameCount=3), and the number of attribute tiles is 3, 1, and 2 (aftai_num_tiles_in_atlas_frame_minus1[ i ] = {3, 1, 2}), then the tile at index = 0..1 is a tile (geometry tile), and the tile at index = 2..7 is an attribute tile. The syntax elements are the same as those explained in Figure 21. In this configuration, tmsm_tile_type_idc takes one of the values ​​0, 1, or 2, indicating that it contains information about geometry, attributes, and geometry and attribute tiles, respectively. The syntax elements of this configuration are explained below. The semantics of the syntax elements whose explanations are omitted are assumed to be the same as those already explained.

[0138] tmsm_tile_type_idc: If tmsm_tile_type_idc is equal to 0, the tile indicated in this SEI message is of type P_TILE or I_TILE (displacement frame tile, geometry tile) and transmits geometry-related information. If it is equal to 1, the tile indicated in this SEI message is of type P_TILE_ATTR or I_TILE_ATTR (attribute tile) and transmits attribute-related information. If it is equal to 2, the tile indicated in this SEI message is of type P_TILE, I_TILE, P_TILE_ATTR, or I_TILE_ATTR and transmits both geometry-related and attribute-related information.

[0139] tmsm_num_tiles_minus1: The value of tmsm_num_tiles_minus1 plus 1 indicates the total number of tiles of type P_TILE and I_TILE in the encoded atlas sequence (CAS) if tmsm_tile_type_idc is equal to 0. If tmsm_tile_type_idc is equal to 1, it indicates the total number of tiles of type P_TILE_ATTR and I_TILE_ATTR in the encoded atlas sequence (CAS). If tmsm_tile_type_idc is equal to 2, it indicates the total number of tiles of type P_TILE, I_TILE, P_TILE_ATTR, and I_TILE_ATTR in the encoded atlas sequence (CAS). tmsm_num_tiles_minus1 must be equal to TotalTileCount-1. If tmsm_tile_type_idc is equal to 0 or 2, add afti_num_tiles_in_atlas_frame_minus1 to TotalTileCount. If tmsm_tile_type_idc is equal to 1 or 2 AND asve_attribute_information_present_flag is equal to 1, add afati_num_tiles_in_atlas_frame_minus1[i] to TotalTileCount. Index i takes the range 0..AspsAttributeNominalFrameCount.

[0140] TotalTileCount = 0 if(tmsm_tile_type_idc == 0 || tmsm_tile_type_idc == 2){ TotalTileCount += afti_num_tiles_in_atlas_frame_minus1 +1} if(tmsm_tile_type_idc == 1 || tmsm_tile_type_idc == 2){ if(asve_attribute_information_present_flag){ for(i = 0; i < AspsAttributeNominalFrameCount; i++) TotalTileCount += aftai_num_tiles_in_atlas_frame_minus1[ i ] + 1}} If tmsm_tile_type_idc is valued at 0, or if tmsm_tile_type_idc is valued at 2 and index i is less than or equal to afti_num_tiles_in_atlas_frame_minus1, then num_codec_tiles_in_tile_minus1[i] = afti_num_tiles_in_atlas_frame_minus1 is assumed.

[0141] If tmsm_tile_type_idc is valued at 1, or if tmsm_tile_type_idc is valued at 2 and index i is greater than afti_num_tiles_in_atlas_frame_minus1, then num_codec_tiles_in_tile_minus1[i] = aftai_num_tiles_in_atlas_frame_minus1[i] is assumed.

[0142] tmsm_tile_type_flag[i]: If tmsm_tile_type_flag[i] is equal to the first value (0), the tile at index i is of type P_TILE or I_TILE and signals information related to the geometry. If tmsm_tile_type_flag[i] is equal to the second value (1), the tile at index i is of type P_TILE_ATTR or I_TILE_ATTR and signals information related to the attribute.

[0143] Figure 20 is a diagram illustrating the relationship between tiles and codec tiles and the value of tmsm_codec_tile_idx[i][j] in the case of tmsm_tile_type_idc == 2, as an example of a tile submesh SEI.

[0144] Figure 20 shows an example of an image frame consisting of one tile, two attribute tiles, and the codec tiles contained within each tile and attribute tile. Tiles, attribute tiles, and codec tiles are numbered (indexed) in the order of the raster scan. In the example in Figure 20, i=0 represents the tile, and i=1 and i=2 represent the relationship with the codec tiles contained within the attribute tile. The 0th tile contains the 0th, 1st, 2nd, and 3rd codec tiles. In the example in Figure 20, tmsm_tile_type_idc is valued at 2, and tmsm_num_tiles_minus1 is 2 (afti_num_tiles_in_atlas_frame_minus1 is 0, and afati_num_tiles_in_atlas_frame_minus1[0] is 1). The numbers listed in the tile / codec tile and attribute tile / codec tile in Figure 20 represent the value of j in tmsm_codec_tile_idx[0][j], and the value of tmsm_codec_tile_idx[0][j] can be expressed as follows:

[0145] tmsm_codec_tile_idx[ 0 ][ 0 ] = 0 tmsm_codec_tile_idx[ 0 ][ 1 ] = 1 tmsm_codec_tile_idx[ 0 ][ 2 ] = 2 tmsm_codec_tile_idx[ 0 ][ 3 ] = 3 tmsm_codec_tile_idx[ 1 ][ 0 ] = 4 tmsm_codec_tile_idx[ 1 ][ 1 ] = 5 tmsm_codec_tile_idx[ 1 ][ 2 ] = 6 tmsm_codec_tile_idx[ 1 ][ 3 ] = 7 tmsm_codec_tile_idx[ 2 ][ 0 ] = 8 tmsm_codec_tile_idx[ 2 ][ 1 ] = 9 (Another syntax configuration) In an alternative configuration, the syntax structure shown in Figure 23 may be used. In this configuration, both geometry and atlas are included as tiles. Decode tmsm_num_tiles_minus1, which indicates the number of tiles containing both geometry tiles and attribute tiles minus 1, and tmsm_tile_type_flag[i], which indicates whether the type of index i is a tile or an attribute tile.

[0146] The following describes the syntax elements of this structure. The semantics of syntax elements whose explanations are omitted are assumed to be the same as those already explained (and so on).

[0147] tmsm_num_tiles_minus1: The value of tmsm_num_tiles_minus1 plus 1 indicates the number of tiles present in the encoded atlas sequence (CAS). tmsm_num_tiles_minus1 must be equal to the variable TotalTileCount-1, which is obtained below.

[0148] TotalTileCount = afti_num_tiles_in_atlas_frame_minus1 + 1 if(asve_attribute_information_present_flag){ for(i = 0; i < AspsAttributeNominalFrameCount; i++) TotalTileCount = afti_num_tiles_in_atlas_frame_minus1[ i ] + 1} tmsm_tile_type_flag[ i ] : If tmsm_tile_type_flag[ i ] is 0, the tile at index i is of type P_TILE or I_TILE and transmits geometry-related information. If it is 1, the tile at index i is of type P_TILE_ATTR or I_TILE_ATTR and transmits attribute-related information.

[0149] In the above configuration, the following restrictions are imposed on the tile index i.

[0150] The attribute style (geometry tile and attribute tile) at index i, indicated by i=0..tmsm_num_tiles_minus1, will be processed in the order of all geometry tiles first, followed by all attribute tiles.

[0151] (Configuration of signal flag) In another configuration (Figure 20 + Configuration of signal flag), the flag tmsm_codec_tile_signal_flag, which indicates whether or not to signal the index showing the relationship between the tile and the codec tile, is decoded. If the first value (e.g., 1) is used, it indicates that the index showing the relationship between the tile and the codec tile, tmsm_codec_tile_idx[i][j], is encoded and decoded using this SEI. If the second value (e.g., 0) is used, it indicates that the index showing the relationship between the tile and the codec tile is not encoded and decoded using this SEI. The following example syntax may also be used.

[0152] tile_submesh_mapping( payload ) { tmsm_persistance_mapping_flag tmsm_tile_type_idc tmsm_num_tiles_minus1 tmsm_tile_id_length_minus1 tmsm_codec_tile_signal_flag for( i = 0; i < tmsm_num_tiles_minus1 + 1; i++ ) { ... if( tmsm_codec_tile_signal_flag ) { tmsm_num_codec_tiles_in_tile_minus1[ i ] for( j = 0; j < tmsm_num_codec_tiles_in_tile_minus1[ i ] + 1; j++ ) tmsm_codec_tile_idx[ i ][ j ]}}} Here, the semantics of the syntax element tmsm_codec_tile_signal_flag are as follows.

[0153] tmsm_codec_tile_signal_flag: When tmsm_codec_tile_signal_flag has the first value (e.g., 1), it indicates that the index of the codec tile corresponding to the tile is signaled in this SEI. When it has the second value (e.g., 0), it indicates that the index of the codec tile corresponding to the tile is not signaled in this SEI.

[0154] According to the above configuration, it is possible to decode a flag indicating whether to signal the index indicating the relationship between the tile and the codec tile, and make the relationship between the tile and the codec tile optional (not specified). Also, when the relationship between the tile and the codec tile is not required, it is possible to reduce the amount of code.

[0155] (Configuration of alignment flags) In another configuration, the tile submesh information SEI is used to decode the alignment flag tmsm_geo_codec_tile_alignment_flag, which indicates whether the partitioning (partition structure) of the tile (geometry tile) and the partitioning of the codec tile are the same, and the alignment flag tmsm_attr_codec_tile_alignment_flag, which indicates whether the partitioning of the attribute tile and the partitioning of the codec tile are the same. The syntax may also be as shown in the following example.

[0156] tile_submesh_mapping( payload ){ tmsm_persistance_mapping_flag tmsm_num_tiles_minus1 tmsm_tile_id_length_minus1 tmsm_geo_codec_tile_alignment_flag tmsm_attr_codec_tile_alignment_flag for( i=0; i<tmsm_num_tiles_minus1+1; i++ ){ tmsm_tile_id[ i ] TileIdxToID[ i ] = tmsm_num_submeshes_min2[ i ] tmsm_tile_type_flag[ i ] tmsm_use_single_mesh_flag[ i ] if( !tmsm_use_single_mesh_flag ){ ... if( ( tmsm_tile_type_flag[ i ] == 0 && !tmsm_geo_codec_tile_alignment_flag) || ( tmsm_tile_type_flag[ i ] == 1 && !tmsm_attr_codec_tile_alignment_flag ){ tmsm_num_codec_tiles_in_tile_minus1[ i ] for( j=0; j<tmsm_num_codec_tiles_in_tile_minus1[ i ]+1; j++ ) tmsm_codec_tile_idx[ i ][ j ]}}} Here, the semantics of the syntax elements tmsm_geo_codec_tile_alignment_flag and tmsm_attr_codec_tile_alignment_flag are as follows.

[0157] tmsm_geo_codec_tile_alignment_flag: If tmsm_geo_codec_tile_algnment_flag is 1, it indicates that the codec partitioning is the same as the geometry tile partitioning. In this case, each i-th tile contains only the region of the i-th codec tile, and the i-th tile does not contain the region of the j-th (j!=i) codec tile. If it is 0, it indicates that the codec tile partitioning may not be the same as the geometry tile partitioning.

[0158] tmsm_attr_codec_tile_alignment_flag: If tmsm_attr_codec_tile_algnment_flag is 1, it indicates that the codec partitioning is the same as the attribute tile partitioning. In this case, each i-th attribute tile contains only the region of the i-th codec tile, and the i-th attribute tile does not contain the region of the j-th (j!=i) codec tile. If it is 0, it indicates that the codec tile partitioning may not be the same as the attribute tile partitioning.

[0159] With the above configuration, it is possible to independently determine whether each geometry tile and attribute tile matches the partitioning of the codec tile. Furthermore, it has the effect of reducing the overhead of the encoding.

[0160] (Another configuration of the alignment flag) In an alternative configuration of the alignment flag, the alignment flag tmsm_codec_tile_alignment_flag, which indicates whether the partitioning of the decoded tile and the partitioning of the codec tile are the same, is decoded in the tile submesh information SEI. The syntax may be as follows:

[0161] tile_submesh_mapping( payload ) { tmsm_persistance_mapping_flag tmsm_tile_type_idc tmsm_num_tiles_minus1 tmsm_tile_id_length_minus1 tmsm__codec_tile_alignment_flag for( i = 0; i < tmsm_num_tiles_minus1 + 1; i++ ) { tmsm_tile_id[ i ] TileIdxToID[ i ] = tmsm_num_submeshes_min2[ i ] tmsm_use_single_mesh_flag[ i ] if( !tmsm_use_single_mesh_flag ) { ... if( !tmsm_codec_tile_alignment_flag ) { tmsm_num_codec_tiles_in_tile_minus1[ i ] for( j = 0; j < tmsm_num_codec_tiles_in_tile_minus1[ i ] + 1; j++ ) tmsm_codec_tile_idx[ i ][ j ]}}} According to the above configuration, the flag tmsm_codec_tile_alignment_flag is transmitted, and only when the flag is 0, the syntax indicating the index showing the relationship between the tile and the codec tile is encoded. Thereby, it has the effect of reducing the overhead of the code amount.

[0162] (Configuration of signal and alignment flags) Alternatively, the syntax structure shown in Figure 22 may be used. In this alternative configuration, if tmsm_codec_tile_signal_flag is a first value in the tile submesh information SEI, the relationship between the tile and the codec tile is transmitted. If tmsm_codec_tile_signal_flag is a value other than the first value, the relationship between the tile and the codec tile is not indicated. The alignment flag tmsm_codec_tile_alignment_flag, which indicates whether the partitioning of the atlas style (geometry, atlas style) and the partitioning of the codec tile are the same, is decoded.

[0163] If the alignment flag is false, decode tmsm_num_codec_tiles_in_tile_minus1[i] which indicates the number of codec tiles minus 1, and tmsm_codec_tile_idx[i][j] which is the index of the codec tile contained in tile i.

[0164] if( tmsm_codec_tile_signal_flag && !tmsm_codec_tile_flag ){ tmsm_num_codec_tiles_in_tile_minus1[ i ] for( j=0; j<tmsm_num_codec_tiles_in_tile_minus1[ i ]+1; j++ ) tmsm_codec_tile_idx[ i ][ j ] } As another configuration of the signal flag and the alignment flag, a syntax structure may be used as shown in FIG. 24. In the tile submesh information SEI, an alignment flag tmsm_geo_codec_tile_alignment_flag indicating whether the partitioning of the geometry tile and the partitioning of the codec tile are the same, and an alignment flag tmsm_attr_codec_tile_alignment_flag indicating whether the partitioning of the attribute tile and the partitioning of the codec tile are the same may be decoded.

[0165] if( tmsm_codec_tile_signal_flag ){ tmsm_geo_codec_tile_alignment_flag tmsm_attr_codec_tile_alignment_flag } If the tile of i is a geometry tile (tmsm_tile_type_flag[ i ] == 0) and the geometry alignment flag is false, or if the tile of j is an attribute tile (tmsm_tile_type_flag[ i ] == 1) and the attribute alignment flag is false, then decode tmsm_num_codec_tiles_in_tile_minus1[ i ] indicating the number of codec tiles - 1, and the index tmsm_codec_tile_idx[ i ][ j ] of the codec tiles included in tile i.

[0166] if( tmsm_codec_tile_signal_flag ){ if( ( tmsm_tile_type_flag[ i ] == 0 &&!tmsm_geo_codec_tile_alignment_ flag) || ( tmsm_tile_type_flag[ i ] == 1&&!tmsm_attr_codec_tile_alig nment_flag ){ tmsm_num_codec_tiles_in_tile_minus1[ i ] for( j=0; j<tmsm_num_codec_tiles_in_tile_minus1[ i ]+1; j++ ) tmsm_codec_tile_idx[ i ][ j ] } } According to the above configuration, by transmitting the flag tmsm_codec_tile_signal_flag as a syntax element, it is possible to select whether to signal an index indicating the relationship between the tile and the codec tile.

[0167] Furthermore, by transmitting tmsm_codec_tile_alignment_flag or tmsm_geo_codec_tile_alignment_flag, tmsm_attr_codec_tile_alignment_flag as syntax elements, it is possible to indicate with a 1-bit flag whether the partition structure of the tile and the codec tile of the specified index i is equal. Thereby, it is possible to reduce the overhead of the amount of code.

[0168] (Decoding of base mesh) Figure 5 is a functional block diagram showing the configuration of the base mesh decoding unit 303. The base mesh decoding unit 303 consists of a mesh decoding unit 3031, a motion information decoding unit 3032, a mesh motion compensation unit 3033, a reference mesh memory 3034, a switch 3035, a switch 3036, and a skip decoding unit 3037. The base mesh decoding unit 303 may also have a configuration that includes a base mesh inverse quantization unit (not shown) before the output of the base mesh. Switches 3035 and 3036 are connected to the mesh decoding unit 3031 side if the base mesh to be decoded is coded (intra-coded) without referring to other base meshes (e.g., base meshes that have already been coded and decoded). If the base mesh to be decoded is coded (inter-coded) by referring to other base meshes, they are connected to the side that performs motion compensation. When motion compensation is performed, the target vertex coordinates are decoded by referring to the already decoded vertex coordinates and motion information. Instead, if the base mesh to be decoded is skipped and another base mesh is encoded as the target of decoded (skip encoding), connect to the skip decoding unit 3037.

[0169] Each base mesh consists of one or more submeshes. If multiple submeshes exist, the tile header of the atlas data subbitstream requires an ID to find the submesh corresponding to the tile. Here, a submesh is a subset of a mesh defined by specifying a part of a 3D model, and is a mesh created by dividing a mesh into multiple parts. By dividing a mesh into subsets to finely control a part of a 3D model, it is possible to define meshes of specific ranges individually. Each submesh has its own vertex coordinates, normal vectors, texture coordinates, etc., and can be manipulated and edited individually. The mesh of a given frame is called a mesh frame.

[0170] The mesh decoding unit 3031 decodes the intra-encoded base mesh encoding stream and outputs the base mesh (base mesh vertex positions, base mesh vertex position vector). The encoding method used may include Draco or edge breaker.

[0171] The motion information decoding unit 3032 decodes the intercoded base mesh coded stream and outputs motion information (mesh motion information, mesh motion vector) for each vertex of the reference mesh described later. Entropy coding such as arithmetic coding is used as the coding method.

[0172] The mesh motion compensation unit 3033 performs motion compensation on each vertex of the reference mesh input from the reference mesh memory 3034 based on motion information, and outputs the motion-compensated mesh.

[0173] The reference mesh memory 3034 is a memory that holds the decoded mesh for reference in subsequent decoding processes.

[0174] (Decoding of Mesh Displacement) Figure 6 is a functional block diagram showing the configuration of the mesh displacement decoding unit 305. The mesh displacement decoding unit 305 consists of a CABAC decoding unit (arithmetic decoding unit 3051, multi-level conversion unit 3052, context selection unit 3056, context initialization unit 3057), an inverse quantization unit 3053, an inverse transformation unit 3054, and a coordinate system transformation unit 3055.

[0175] (Coordinate System) The coordinate system used for mesh displacement (3D vector) is one of the following two types of coordinate systems.

[0176] Cartesian coordinate system (canonical): A Cartesian coordinate system defined commonly across the entire 3D space. (X,Y,Z) coordinate system. A Cartesian coordinate system in which the direction does not change at the same time (same frame, same tile).

[0177] Local coordinate system (local): A Cartesian coordinate system defined for each region or vertex in 3D space. A Cartesian coordinate system whose direction can change at the same time (same frame, same tile). A coordinate system with normal (D), tangent (U), and bi-tangent (V) axes. That is, a Cartesian coordinate system consisting of a first axis (D) indicated by the normal vector n_vec at a certain vertex (or the face containing a certain vertex), and a second axis (U) and a third axis (V) indicated by two tangent vectors t_vec and b_vec that are orthogonal to the normal vector n_vec. n_vec, t_vec, and b_vec are 3-dimensional vectors. The (D, U, V) coordinate system may also be called the (n, t, b) coordinate system.

[0178] (Decoding and Derivation of Sequence-Level Control Parameters) Here, we will explain the sequence-level control parameters that are decoded from the encoded data by the mesh displacement decoding unit 305.

[0179] Figure 8 shows an example of the syntax for ASVE (ASPS Vdmc Extension), a sequence-level mesh data extended coding parameter set. ASVE is one of the NAL units of atlas information and contains syntax elements to be applied to the atlas information coded stream. The semantics of each syntax element are as follows:

[0180] asve_subdivision_iteration_count: Indicates the number of iterations for mesh subdivision.

[0181] asve_displacement_coordinate_system: Coordinate system transformation information indicating the coordinate system of mesh displacement. If the value is equal to a predetermined first value (e.g., 0), it indicates the Cartesian coordinate system. If the value is equal to another second value (e.g., 1), it indicates the local coordinate system.

[0182] asve_1d_displacement_flag: This flag indicates whether the mesh displacement is one-dimensional or not. A value of true indicates that the mesh displacement is one-dimensional. A value of false indicates that the mesh displacement is three-dimensional.

[0183] (Decoding and Derivation of Picture / Frame-Level Control Parameters) Figure 9 shows an example of the syntax of extended coded parameter information in AFPS, which is a picture / frame-level parameter set. AFPS is one of the NAL units of atlas information and contains syntax elements to be applied to the atlas information coded stream. The semantics of each syntax element are as follows. AFPS includes atlas_frame_mesh_information().

[0184] afve_overriden_flag: This flag indicates whether or not to update the mesh displacement coordinate system. If this flag is equal to true, the mesh displacement coordinate system will be updated based on the value of afve_displacement_coordinate_system described below. If this flag is equal to false, the mesh displacement coordinate system will not be updated.

[0185] afve_subdivision_iteration_count: Indicates the number of mesh subdivision iterations.

[0186] afve_displacement_coordinate_system: Coordinate system transformation information indicating the coordinate system of mesh displacement. If the value is equal to the first value (e.g., 0), it indicates the Cartesian coordinate system. If the value is equal to the second value (e.g., 1), it indicates the local coordinate system. If no syntax elements appear, the default coordinate system is the coordinate system indicated by ASPS, assuming the value is the value decoded by ASPS.

[0187] (Operation of the Mesh Displacement Decoding Unit) The arithmetic decoding unit 3051 decodes the mesh displacement coding stream, which has been arithmetically coded according to the value (context) representing the random variable, and outputs a binary signal. The binary signal may be an alpha code or a k-th order Exp-Golomb code. An Exp-Golomb code consists of a prefix and a suffix. The prefix is ​​an exponentially increasing value, and the suffix is ​​its remainder. When encoding and decoding the variable rem with an Exp-Golomb code, the prefix and suffix of the Exp-Golomb code are also called the prefix and suffix of rem.

[0188] The multi-leveling unit 3052 decodes the binary signal into a multi-level signal, which is a quantized mesh displacement Qdisp.

[0189] The context selection unit 3056 (context memory) has memory for holding contexts, derives a context used for arithmetic decoding of mesh displacements according to the state, and updates the value as necessary.

[0190] The context initialization unit 3057 initializes the context (the probability of a binary signal occurring).

[0191] (Mesh displacement derivation process) The mesh displacement decoding unit 305 decodes the syntax elements dismu_nz_subBlock, dismu_coeff_abs_level_gt0, dismu_coeff_abs_level_gt1, dismu_coeff_abs_level_gt2, dismu_coeff_abs_level_gt3, dismu_coeff_abs_level_rem, and dismu_coeff_sign by the following process and decodes the mesh displacement Qdisp.

[0192] The inverse quantization unit 3053 performs inverse quantization based on the quantization scale value iscale and decodes the mesh displacement Tdisp after the transformation (e.g., wavelet transform). Tdisp may be in a Cartesian coordinate system or a local coordinate system. iscale is a value derived from the quantization parameters of each component of the mesh displacement image. Inverse quantization may be performed on a submesh unit indicated by subMeshID (= displSubMeshID). Tdisp[subMeshID][0][] = (Qdisp[subMeshID][0][] * iscale[0] + iscaleOffset) >> iscaleShift Tdisp[subMeshID][1][] = (Qdisp[subMeshID][1][] * iscale[1] + iscaleOffset) >> iscaleShift Tdisp[subMeshID][2][] = (Qdisp[subMeshID][2][] * iscale[2] + iscaleOffset) >> iscaleShift Here, iscaleOffset = 1<<(iscaleShift-1). iscaleShift may be a predetermined constant, or it may be encoded at the sequence level, picture / frame level, submesh level (= displSubMeshID), tile / patch level, etc., and the value decoded from the encoded data may be used.

[0193] The inverse transformation unit 3054 performs an inverse transformation g (e.g., inverse wavelet transformation) to decode the mesh displacement d. d[0][] = g(Tdisp[subMeshID][0][]) d[1][] = g(Tdisp[subMeshID][1][]) d[2][] = g(Tdisp[subMeshID][2][]) The coordinate system transformation unit 3055 transforms the mesh displacement (coordinate system of the mesh displacement) to the Cartesian coordinate system based on the value of the coordinate system transformation information displacementCoordinateSystem. Specifically, when displacementCoordinateSystem == 1, it transforms the displacement in the local coordinate system to the displacement in the Cartesian coordinate system. Here, d is a three-dimensional vector indicating the mesh displacement before coordinate system transformation. disp is a three-dimensional vector indicating the mesh displacement after coordinate system transformation and is in the Cartesian coordinate system. n_vec, t_vec, b_vec are three-dimensional vectors (in the Cartesian coordinate system) corresponding to each axis of the local coordinate system of the target region or target vertex. if (displacementCoordinateSystem == 0) { disp = d} else if (displacementCoordinateSystem == 1){ disp = d[0] * n_vec3 + d[1] * t_vec3 + d[2] * b_vec3} Here, n_vec3, t_vec3, b_vec3 are three-dimensional vectors (in the Cartesian coordinate system) corresponding to each axis of the local coordinate system of the target region with fluctuations suppressed. For example, the coordinate system vectors used for decoding are decoded from the previous coordinate system and the current coordinate system as follows.

[0194] n_vec3 = (w*n_vec3 + (WT-w)*n_vec)>>wShift t_vec3 = (w*t_vec3 + (WT-w)*t_vec)>>wShift b_vec3 = (w*b_vec3 + (WT-w)*b_vec)>>wShift Here, for example, wShift = 2, 3, 4, WT = 1<<wShift, w = 1..WT-1. For example, when w = 3 and wShift = 3, the coordinate system vectors are decoded as follows.

[0195] n_vec3 = (3*n_vec3 + 5*n_vec)>>3 t_vec3 = (3*t_vec3 + 5*t_vec)>>3 b_vec3 = (3*b_vec3 + 5*b_vec)>>3 (Mesh reconstruction) Figure 7 is a functional block diagram showing the configuration of the mesh reconstruction unit 307. The mesh reconstruction unit 307 consists of a mesh division unit 3071 and a mesh deformation unit 3072.

[0196] The mesh division unit 3071 divides the base mesh output from the base mesh decoding unit 303 and generates divided meshes.

[0197] Figure 12(a) shows a part of the base mesh (a triangle), which is composed of vertices v1, v2, and v3. v1, v2, and v3 are 3D vectors. The mesh division unit 3071 generates and outputs a divided mesh by adding new vertices v12, v13, and v23 in the middle of each side of the triangle (Figure 12(b)). v12 = (v1 + v2) / 2 v13 = (v1 + v3) / 2 v23 = (v2 + v3) / 2 Alternatively, the following may also be used: v12 = (v1 + v2 + 1) >> 1 v13 = (v1 + v3 + 1) >> 1 v23 = (v2 + v3 + 1) >> 1 The mesh deformation unit 3072 takes the divided mesh and mesh displacement as input, generates and outputs a deformed mesh by adding the mesh displacements d12, d13, and d23 (Figure 12(c)). The mesh displacement is the output of the mesh displacement decoding unit 305 (coordinate system transformation unit 3055). d12, d13, and d23 are the mesh displacements corresponding to the vertices v12, v13, and v23 added by the mesh division unit 3071. v12' = v12 + d12 v13' = v13 + d13 v23' = v23 + d23 Note that d12 = disp[0][], d23 = disp[1][], and d23 = disp[3][] may also be used.

[0198] (Structure of Tile Information Syntax) An atlas frame can be divided into units of one or more partitions, and tiles can be constructed from these units (partitions). Typical cases include: - Using the entire atlas frame as a single tile without dividing it (afti_single_tile_in_atlas_frame_flag==1). - Dividing the atlas frame into multiple partitions and using one partition as one tile (afti_single_tile_in_atlas_frame_flag==0 and afti_single_partition_per_tile_flag==1). - Dividing the atlas frame into multiple partitions and using one or more partitions that are continuous in the horizontal and vertical directions as a single tile (afti_single_tile_in_atlas_frame_flag==0 and afti_single_partition_per_tile_flag==0).

[0199] The atlas frame can be divided into tile partitions (hereinafter also called partitions) of size NumPartitionColumns * NumPartitionRows. When dividing, you can choose to divide the frame at equal intervals or at specified units. NumPartitionColumns and NumPartitionRows are the number of partitions in the horizontal and vertical directions, respectively.

[0200] Note that tiles are not limited to atlas frames; they can also be attributes, geometries, displacements, or meshes. In other words, the following syntax elements and their bitstream conformance conditions can also be used for attribute, geometry, displacement, and mesh tiles.

[0201] Figure 10 shows the syntax for tile information. Tile information may also be provided using the atlas_frame_tile_information() function defined in the ISO / IEC 23090-5 V3C standard.

[0202] The tile information decoding unit 3022 decodes the syntax element afti_single_tile_in_atlas_frame_flag. afti_single_tile_in_atlas_frame_flag is a binary flag indicating whether the atlas frame consists of a single tile or not, and has a value indicating that the atlas frame consists of a single tile (e.g., 1) or a value indicating that the atlas frame consists of multiple tiles (e.g., 0). If the value of afti_single_tile_in_atlas_frame_flag is a value indicating multiple tiles, the tile information decoding unit 3022 decodes the syntax element afti_uniform_partition_spacing_flag. Here, afti_uniform_partition_spacing_flag is a binary flag indicating whether the atlas frame is divided into equally spaced partitions or not, and has a value indicating that the atlas frame is divided into equally spaced partitions (e.g., 1) or a value indicating that the atlas frame is divided into partitions with different spacings (e.g., 0).

[0203] The tile information decoding unit 3022 decodes parameters indicating the position and size of the tiles.

[0204] If afti_uniform_partition_spacing_flag is valued at 1, the tile information decoding unit 3022 decodes the syntax elements afti_partition_cols_width_minus1 and afti_partition_cols_width_minus1, which indicate the width (column width) and height (row height) of each partition except the rightmost column and the bottommost row. For each i=0..NumPartitionColumns-1 and j=0..NumPartitionRows, the PartitionPosX[i], PartitionPosY[j], PartitionWidth[i], and PartitionHeight[j], which indicate the x, y coordinates, width, and height of the top left of each partition, are calculated as follows.

[0205] partitionWidth = ( afti_partition_cols_width_minus1 + 1 ) * 64 NumPartitionColumns = asps_frame_width / partitionWidth PartitionPosX[ 0 ] = 0 PartitionWidth[ 0 ] = partitionWidth for( i = 1; i < NumPartitionColumns - 1; i++ ) { PartitionPosX[ i ] = PartitionPosX[ i - 1 ] + PartitionWidth[ i - 1 ] PartitionWidth[ i ] = partitionWidth} partitionHeight = (afti_partition_rows_height_minus1 + 1) * 64 NumPartitionRows = asps_frame_height / partitionHeight PartitionPosY[ 0 ] = 0 PartitionHeight[ 0 ] = partitionHeight for( j = 1; j < NumPartitionRows - 1; j++ ) { PartitionPosY[ j ] = PartitionPosY[ j - 1 ] + PartitionHeight[ j - 1 ] PartitionHeight[ j ] = partitionHeight} If afti_uniform_partition_spacing_flag is a value of 0, the tile information decoding unit 3022 decodes the syntax elements afti_num_partition_columns_minus1 and afti_num_partition_rows_minus1 which indicate the number of tile partitions in the horizontal and height directions.

[0206] For each i=0..NumPartitionColumns-1 and j=0..NumPartitionRows, the x, y coordinates, width, and height of the top-left corner of each partition are calculated as follows: PartitionPosX[i], PartitionPosY[j], PartitionWidth[i], and PartitionHeight[j].

[0207] NumPartitionColumns = afti_num_partition_columns_minus1 + 1 PartitionPosX[ 0 ] = 0 partitionWidth[ 0 ] = ( afti_partition_column_width_minus1[ 0 ] + 1 ) * 64 for( i = 1; i < NumPartitionColumns - 1; i++ ) { PartitionPosX[ i ] = PartitionPosX[ i - 1 ] + PartitionWidth[ i - 1 ] PartitionWidth[ i ] = ( afti_partition_column_width_minus1[ i ] + 1 ) * 64} NumPartitionRows = afti_num_partition_rows_minus1 + 1 PartitionPosY[ 0 ] = 0 PartitionHeight[ 0 ] = ( ) * 64 for( j = 1; j < NumPartitionRows - 1; j++ ) { PartitionPosY[ j ] = PartitionPosY[ j - 1 ] + PartitionHeight[ j - 1 ] PartitionHeight[ j ] = ( afti_partition_row_height_minus1[ j ] + 1 ) * 64} Also, if the number of partitions in the horizontal and vertical directions is 2 or more, the PartitionPosX[i], PartitionPosY[j], PartitionWidth[i], and PartitionHeight[j], which indicate the x, y coordinates, width, and height of the top left of each partition in the rightmost column and bottommost row, are decoded as follows.

[0208] PartitionPosX[ NumPartitionColumns - 1 ] = PartitionPosX[ NumPartitionColumns - 2 ] + PartitionWidth[ NumPartitionColumns - 2 ] PartitionWidth[ NumPartitionColumns - 1 ] = asps_frame_width - PartitionPosX[ NumPartitionColumns - 1 ] PartitionPosY[ NumPartitionRows - 1 ] = PartitionPosY[ NumPartitionRows - 2 ] + partitionHeight[ NumPartitionRows - 2 ] PartitionHeight[ NumPartitionRows - 1 ] = asps_frame_height - PartitionPosY[ NumPartitionRows - 1 ] Here, the width and height of each partition are set to multiples of 64, but they are not limited to 64; 64 can be replaced with 32, 128, or 256.

[0209] The tile information decoding unit 3022 decodes the syntax element afti_single_partition_per_tile_flag. Here, afti_single_partition_per_tile_flag is a flag indicating whether each tile consists of only a single partition, and has a value (e.g., 1) indicating that each tile consists of only a single partition, or a value (e.g., 0) indicating that each tile consists of multiple partitions. If afti_single_partition_per_tile_flag is a value indicating multiple partitions, the tile information decoding unit 3022 decodes the syntax element afti_num_tiles_in_atlas_frame_minus1 and performs the following processing to decode the tile parameters from one or more selected partitions. Here, afti_num_tiles_in_atlas_frame_minus1 is the number of tiles that make up the atlas frame.

[0210] The tile information decoding unit 3022 decodes the syntax elements afti_top_left_partition_idx[i], afti_bottom_right_partition_column_offset[i], and afti_bottom_right_partition_row_offset[i] for each i=0..afti_num_tiles_in_atlas_frame_minus1. Here, afti_top_left_partition_idx[i] is the index of the partition where the top-left corner (point) of the i-th tile is located, afti_bottom_right_partition_column_offset[i] is the horizontal offset amount of the bottom-right corner of the i-th tile relative to the top-left corner of the i-th tile, and afti_bottom_right_partition_row_offset[i] is the height offset amount of the bottom-right corner of the i-th tile relative to the top-left corner of the i-th tile.

[0211] Based on the decoded syntax above, the indices of the top-left horizontal, top-right horizontal, and bottom-right horizontal and top-right partitions of each tile i, topLeftColumn[i], topLeftRow[i], bottomRightColumn[i], and bottomRightRow[i], are determined as follows.

[0212] topLeftColumn[ i ] = afti_top_left_partition_idx[ i ] % NumPartitionColumns topLeftRow[ i ] = afti_top_left_partition_idx[ i ] / NumPartitionColumns bottomRightColumn[ i ] = topLeftColumn[ i ] + afti_bottom_right_partition_column_offset[ i ] bottomRightRow[ i ] = topLeftRow[ i ] + afti_bottom_right_partition_row_offset[ i ] Here, bottomRightColumn[ i ] and bottomRightRow[ i ] may be less than or equal to (asps_frame_width + 63 ) / 64 - 1 and (asps_frame_height + 63 ) / 64 - 1, respectively.

[0213] In a 3D data decoding device 31 that decodes mesh data or point cloud data, the device has means for decoding syntax elements indicating the position of a tile, and decoding the column topLeftColumn of the upper-left partition of the tile, the row topLeftRow of the upper-left partition, the column bottomRightColumn of the lower-right partition of the tile, and the row bottomRightRow of the lower-right partition. The 3D data decoding device 31 may decode a bitstream that satisfies specific bitstream conformance conditions for the columns of the partition of the i-th tile (topLeftColumn[i] and bottomRightColumn[i]), the columns of the partition of the j-th tile (topLeftColumn[j]), the rows of the partition of the i-th tile (topLeftRow[i], bottomRightRow[i]), and the rows of the partition of the j-th tile (topLeftRow[j]). The 3D data decoding device 31 may decode a bitstream that satisfies the following bitstream conformance conditions.

[0214] (Configuration of the 3D data encoding device according to the first embodiment) Figure 13 is a functional block diagram showing the schematic configuration of the 3D data encoding device 11 according to the first embodiment. The 3D data encoding device 11 consists of an atlas information encoding unit 101, a base mesh encoding unit 103, a base mesh decoding unit 104, a mesh displacement update unit 106, a mesh displacement encoding unit 107, a mesh displacement decoding unit 108, a mesh reconstruction unit 109, an attribute update unit 110, a padding unit 111, a color space conversion unit 112, an attribute encoding unit 113, a multiplexing unit 114, and a mesh separation unit 115. The 3D data encoding device 11 takes additional information, atlas information, a base mesh, mesh displacement, a mesh, and an attribute image as input as 3D data and outputs encoded data.

[0215] The Atlas information encoding unit 101 encodes the Atlas information and outputs an Atlas information encoded stream.

[0216] The base mesh encoding unit 103 encodes the base mesh and outputs a base mesh encoded stream. The encoding method used is typically Draco.

[0217] The base mesh decoding unit 104 is the same as the base mesh decoding unit 303, so its explanation is omitted.

[0218] The mesh displacement update unit 106 adjusts the mesh displacement based on the (original) base mesh and the decoded base mesh, and outputs the updated mesh displacement.

[0219] The mesh displacement coding unit 107 codes the updated mesh displacement and outputs a mesh displacement coding stream. The coding scheme used may include VVC or HEVC.

[0220] The mesh displacement decoding unit 108 is the same as the mesh displacement decoding unit 305, so its description is omitted.

[0221] The mesh reconstruction unit 109 is the same as the mesh reconstruction unit 307, so its description is omitted.

[0222] The attribute update unit 110 receives the (original) mesh, the reconstructed mesh output from the mesh reconstruction unit 109 (mesh deformation unit 3072), and the attribute image as input, updates the attribute image to match the position (coordinates) of the reconstructed mesh, and outputs the updated attribute image.

[0223] The padding unit 111 receives an attribute image as input and performs padding on areas where the pixel value is empty.

[0224] The color space conversion unit 112 performs a color space conversion from RGB format to YCbCr format.

[0225] The attribute encoding unit 113 encodes the attribute image in YCbCr format output from the color space conversion unit 112 and outputs an attribute video stream. The encoding method used may include VVC or HEVC.

[0226] The multiplexing unit 114 multiplexes the atlas information encoded stream, base mesh encoded stream, mesh displacement encoded stream, and attribute video stream and outputs them as encoded data. The multiplexing method used is a byte stream format, ISOBMFF, etc.

[0227] (Operation of the mesh separation unit) The mesh separation unit 115 generates a base mesh and mesh displacement from the mesh.

[0228] Figure 17 is a functional block diagram showing the configuration of the mesh separation unit 115. The mesh separation unit 115 consists of a mesh thinning unit 1151, a mesh division unit 1152, and a mesh displacement extraction unit 1153.

[0229] The mesh thinning unit 1151 generates a base mesh by thinning out some vertices from the mesh.

[0230] Figure 18(a) shows a portion of the mesh, which consists of vertices v1, v2, v3, v4, v5, and v6. Each of v1, v2, v3, v4, v5, and v6 is a 3D vector. The mesh thinning unit 1151 generates and outputs a base mesh by thinning out vertices v4, v5, and v6 (Figure 18(b)).

[0231] The mesh division unit 1152, like the mesh division unit 3071, divides the base mesh and generates a divided mesh (Figure 18(c)).

[0232] v4' = (v1 + v2) / 2 v5' = (v1 + v3) / 2 v6' = (v2 + v3) / 2 The mesh displacement derivation unit derives and outputs the displacements d4, d5, d6 of vertices v4, v5, v6 for vertices v4', v5', v6' as mesh displacements based on the mesh and subdivision mesh (Figure 18(d)).

[0233] d4 = v4 - v4' d5 = v5 - v5' d6 = v6 - v6' (Encoding of Atlas Information) Figure 14 is a functional block diagram showing the configuration of the Atlas Information Encoding Unit 101. The Atlas Information Encoding Unit 101 consists of an Extended Information Encoding Unit 1011, a Tile Information Encoding Unit 1012, a Parameter Encoding Unit 1013, and an Additional Information Encoding Unit 1014.

[0234] The extended information encoding unit 1011 encodes extended encoding parameters related to the mesh data.

[0235] The tile information encoding unit 1012 encodes the number of tiles and tile IDs referenced at the picture / frame level.

[0236] The parameter coding unit 1013 encodes coding parameters related to the 3D data.

[0237] The additional information encoding unit 1014 encodes additional information such as SEI. The additional information encoding unit 1014 encodes mapping information that shows the relationship between a certain tile and a codec tile, looping through the number of tiles in a certain frame (smtm_num_tiles_minus1) and the number of codec tiles included in a certain displacement subbitstream (smtm_num_codec_tiles_minus1).

[0238] Furthermore, the additional information encoding unit 1014 encodes mapping information that shows the relationship between a certain tile and a codec tile, looping through the number of attribute video data (smatm_attribute_video_minus1), the number of attribute tiles included in that data (smatm_num_tiles_minus1[i]), and the number of codec tiles included in a certain attribute video subbitstream (smatm_num_codec_tiles_minus1).

[0239] In an alternative configuration, the flag smtm_codec_tile_alignment_flag / smatm_codec_tile_alignment_flag / smatm_codec_tile_alignment_flag[i] indicates whether the division of a certain tile / attribute tile is the same as the division of a codec tile, and the additional information decoding unit 1014 may encode a first value (e.g., 1) if it includes only the division method of the i-th tile / attribute tile and the codec tile division method area of ​​the displacement / attribute video subbitstream, and a second value (e.g., 0) otherwise.

[0240] (Base Mesh Encoding) Figure 15 is a functional block diagram showing the configuration of the base mesh encoding unit 103. The base mesh encoding unit 103 consists of a mesh encoding unit 1031, a mesh decoding unit 1032, a motion information encoding unit 1033, a motion information decoding unit 1034, a mesh motion compensation unit 1035, a reference mesh memory 1036, a switch 1037, a switch 1038, and a skip encoding unit 1039. The base mesh encoding unit 103 may also include a base mesh quantization unit (not shown) after the input of the base mesh. Switches 1037 and 1038 are connected to the side that does not perform motion compensation or skipping when encoding the base mesh without referencing other base meshes (e.g., already encoded base meshes) (intra encoding). Instead, they are connected to the side that performs motion compensation when encoding the base mesh by referencing other base meshes (inter encoding). Instead, they are connected to the side that performs skip encoding when skipping the encoding of the base mesh (skip encoding).

[0241] The mesh coding unit 1031 has an intra coding function, intra-codes the base mesh, and outputs a base mesh coded stream. The coding method used is Draco, among others.

[0242] The mesh decoding unit 1032 is the same as the mesh decoding unit 3031, so its description is omitted.

[0243] The motion information coding unit 1033 has an intercoding function, intercodes the base mesh, and outputs a base mesh coded stream. Entropy coding such as arithmetic coding is used as the coding method.

[0244] The motion information decoding unit 1034 is the same as the motion information decoding unit 3032, so its explanation is omitted.

[0245] The mesh motion compensation unit 1035 is the same as the mesh motion compensation unit 3033, so its explanation is omitted.

[0246] The reference mesh memory 1036 is the same as the reference mesh memory 3034, so its description is omitted.

[0247] (Mesh Displacement Encoding) Figure 16 is a functional block diagram showing the configuration of the mesh displacement encoding unit 107. The mesh displacement encoding unit 107 consists of a coordinate system transformation unit 1071, a transformation unit 1072, a quantization unit 1073, and a displacement mapping unit 1074 (image packing unit, displacement encoding unit). As shown in the figure, the mesh displacement encoding unit 107 may also include a video encoding unit 1075. Alternatively, the video encoding unit 1075 may not be included in the mesh displacement encoding unit 107, and the encoding of the displacement image may be performed using an external image encoding device.

[0248] The coordinate system transformation unit 1071 transforms the coordinate system of the mesh displacement from the Cartesian coordinate system to the coordinate system that encodes the displacement (e.g., the local coordinate system) based on the value of the coordinate transformation information displacementCoordinateSystem. Here, disp is a 3D vector representing the mesh displacement before the coordinate system transformation, d is a 3D vector representing the mesh displacement after the coordinate system transformation, and n_vec, t_vec, and b_vec are 3D vectors (in the Cartesian coordinate system) representing each axis of the local coordinate system.

[0249] if (displacementCoordinateSystem == 0) { d = disp} else if (displacementCoordinateSystem == 1){ d = (disp * n_vec, disp * t_vec, disp * b_vec)} The mesh displacement coding unit 107 may update the value of displacementCoordinateSystem at the picture / frame level.

[0250] When encoding `displacementCoordinateSystem` at the sequence level, use the syntax shown in Figure 9. Set `asve_displacement_coordinate_system` to 0 for the Cartesian coordinate system and 1 for the local coordinate system.

[0251] To change the displacementCoordinateSystem at the picture / frame level, use the syntax shown in Figure 12. Set afve_overriden_flag to 1 if you want to update the coordinate system, or 0 if you don't. Set afve_displacement_coordinate_system to 0 for the Cartesian coordinate system, or 1 for the local coordinate system.

[0252] The transformation unit 1072 performs a transformation f (for example, a wavelet transform) and decodes the transformed mesh displacement Tdisp. The following is done for pos=0..NumDisp-1, where NumDisp is the number of mesh vertices.

[0253] The quantization unit 1073 performs quantization based on the quantization scale value scale, which is decoded from the quantization parameters of each component of the mesh displacement, and decodes the quantized mesh displacement dispQuantCoeffArray.

[0254] Vcount0 = 0 for( i = 0; i < subdivisionIterationCount; i++ ) { vcount1 = levelOfDetailCounts[ i ] for( v = vcount0; v < vcount1; v++ ) { for( d = 0; d < DisplacementDim; d++ ) { dispQuantCoeffArray[v][d] = dispCoeffArray[ v ][ d ] / iscale[ i ][ d ]}} vcount0 = vcount1} Alternatively, the scale value can be approximated by a power of 2 and dispQuantCoeffArray can be derived using the following formula.

[0255] Vcount0 = 0 for( i = 0; i < subdivisionIterationCount; i++ ) { scale[i] = 1 << scale2[i] vcount1 = levelOfDetailCounts[ i ] for( v = vcount0; v < vcount1; v++ ) { for( d = 0; d < DisplacementDim; d++ ) { dispQuantCoeffArray[v][d] = dispCoeffArray[v][d] >> scale2[i][d]}} vcount0 = vcount1} Displacement mapping unit 1074 generates an image dispQuantCoeffFrame from the quantized mesh displacement dispQuantCoeffArray based on the value of the displacement mapping parameter displacementChromaLocationType.

[0256] The displacement mapping unit 1074 may map the first component of the (quantized) mesh displacement array, dispQuantCoeffArray[v][0], to the luminance (Y) image component as follows: For an image with width W and height H, the following is applied to (y=0..H-1, x=0..W-1).

[0257] H = origHeight shift = (1 << bitDepth) >> 1 dispQuantCoeffFrame[x][ y][0] = dispQuantCoeffArray[v][0] + shift dispQuantCoeffFrame[x][ H+y][0] = dispQuantCoeffArray[v][1] + shift dispQuantCoeffFrame[x][2*H+y][0] = dispQuantCoeffArray[v][2] + shift v++ dispQuantCoeffFrame[x / 2][ y / 2][1] = shift dispQuantCoeffFrame[x / 2][H / 2+y / 2][1] = shift dispQuantCoeffFrame[x / 2][ H+y / 2][1] = shift dispQuantCoeffFrame[x / 2][ y / 2][2] = shift dispQuantCoeffFrame[x / 2][H / 2+y / 2][2] = shift dispQuantCoeffFrame[x / 2][ H+y / 2][2] = shift Alternatively, the displacement mapping unit 1074 may encode the mesh displacement for each submesh.

[0258] Furthermore, the processing may be switched depending on the DecGeoChromaFormat. That is, if DecGeoChromaFormat=1 (4:2:0), the above processing is performed, and if DecGeoChromaFormat=3 (4:4:4), the following processing is performed.

[0259] dispQuantCoeffFrame[x][y][d] = dispQuantCoeffArray [v][0] dispQuantCoeffFrame[x][y][d] = dispQuantCoeffArray [v][1] dispQuantCoeffFrame[x][y][d] = dispQuantCoeffArray [v][2] v++ The mesh displacement coding unit 107 may update the values ​​of origHeight and origWidth at the picture / frame level.

[0260] The video encoding unit 1075 encodes an image in YCbCr4:2:0 format, which includes a (quantized) mesh displacement image, and outputs a mesh displacement encoded stream. The encoding scheme used may include VVC or HEVC.

[0261] The video encoding unit 1075 may divide the mesh displacement image into slices for each origHeight and encode them. Alternatively, the origHeight may be aligned to a predetermined size according to the CTU size.

[0262] [Application Examples] The 3D data encoding device 11 and 3D data decoding device 31 described above can be installed and used in various devices that transmit, receive, record, and reproduce 3D data. The 3D data may be natural 3D data captured by a camera or the like, or artificial 3D data (including CG and GUI) generated by a computer or the like. (Summary)

[0263] To solve the above problems, a 3D data decoding device according to one aspect of the present invention is a 3D data decoding device that decodes mesh data or point cloud data, wherein the atlas information decoding unit that decodes atlas information from encoded data comprises a tile information decoding unit that decodes tile information, an index indicating the type of tile, and an additional information decoding unit that decodes codec tile information contained in the tile, and the additional information decoding unit decodes the index information of the codec tile necessary for decoding the tile according to the index indicating the type of tile.

[0264] The above additional information includes a flag indicating whether the index of the codec tile corresponding to the tile is signaled, and the above additional information decoding unit decodes the first value when a signal is given, and decodes the second value when no signal is given.

[0265] The above additional information includes a flag indicating whether the partitioning of the codec tile is the same as the partitioning of the tile, and the above additional information decoding unit is characterized in that if a tile with a certain index contains only the region of a codec tile with the same index and does not contain the region of a codec tile with a different index, it decodes the first value, and otherwise it decodes the second value.

[0266] To solve the above problems, a 3D data encoding device according to one aspect of the present invention is a 3D data encoding device that decodes mesh data or point cloud data, wherein the atlas information encoding unit that encodes atlas information comprises a tile information encoding unit that encodes tile information, an index indicating the type of tile, and an additional information encoding unit that encodes codec tile information included in the tile, and the additional information encoding unit encodes the index information of the codec tile necessary for encoding the tile according to the index indicating the type of tile.

[0267] The above additional information includes a flag indicating whether the partitioning of the codec tile is the same as the partitioning of the tile, and the above additional information encoding unit encodes a first value if the tile for a certain index includes only the region of the codec tile with the same index and does not include the region of the codec tile with other indices, and encodes a second value otherwise.

[0268] The above additional information includes a flag indicating whether the index of the codec tile corresponding to the tile is signaled, and the above additional information encoding unit encodes a first value when a signal is given, and encodes a second value when no signal is given.

[0269] The embodiments of the present invention are not limited to those described above, and various modifications are possible within the scope of the claims. That is, embodiments obtained by combining technical means that have been appropriately modified within the scope of the claims are also included in the technical scope of the present invention.

[0270] (Cross-reference of related applications) This application claims priority to Japanese Patent Application No. 2024-225000, filed on 20 December 2024, and by reference thereto, all of its contents are included herein.

[0271] Embodiments of the present invention can be suitably applied to a 3D data decoding device that decodes encoded data obtained by encoding 3D data, and a 3D data encoding device that generates encoded data obtained by encoding 3D data. Furthermore, it can be suitably applied to the data structure of encoded data generated by the 3D data encoding device and referenced by the 3D data decoding device.

[0272] 11 3D Data Encoding Device 101 Atlas Information Encoding Unit 1011 Extended Information Encoding Unit 1012 Tile Information Encoding Unit 1013 Parameter Encoding Unit 1014 Additional Information Decoding Unit 103 Base Mesh Encoding Unit 1031 Mesh Encoding Unit 1032 Mesh Decoding Unit 1033 Motion Information Encoding Unit 1034 Motion Information Decoding Unit 1035 Mesh Motion Compensation Unit 1036 Reference Mesh Memory 1037 Switch 1038 Switch 1039 Skip Encoding Unit 104 Base Mesh Decoding Unit 106 Mesh Displacement Update Unit 107 Mesh Displacement Encoding Unit 1071 Coordinate System Transformation Unit 1072 Transformation Unit 1073 Quantization Unit 1074 Displacement Unmapping Unit 1075 Image Encoding Unit 108 Mesh Displacement Decoding Unit 109 Mesh Reconstruction Unit 110 Attribute update unit 111 Padding unit 112 Color space conversion unit 113 Attribute coding unit 114 Multiplexing unit 115 Mesh separation unit 1151 Mesh thinning unit 1152 Mesh division unit 1153 Mesh displacement derivation unit 21 Network 31 3D data decoding device 301 Demultiplexing unit 302 Atlas information decoding unit 3021 Parameter decoding unit 3022 Tile information decoding unit 3023 Extended information decoding unit 3024 Additional information decoding unit 303 Base mesh decoding unit 3031 Mesh decoding unit 3032 Motion information decoding unit 3033 Mesh motion compensation unit 3034 Reference mesh memory 3035 Switch 3036 Switch 3037 Skip decoding unit 305 Mesh displacement decoding unit 3051 Video decoding unit 3052 Displacement unmapping unit 3053 Inverse quantization unit 3054 Inverse transformation unit 3055 Coordinate system transformation unit 307 Mesh reconstruction unit 306 Attribute decoding unit 3071 Mesh division unit 3072 Mesh deformation unit 308 Color space conversion unit 41 3D data display device

Claims

1. A 3D data decoding device for decoding mesh data or point cloud data, comprising: a tile information decoding unit for decoding tile information; and an additional information decoding unit for decoding tile submesh information, wherein the tile submesh information includes a first syntax element indicating the tile type, and the additional information decoding unit decodes a second syntax element indicating the index of the codec tile included in the tile according to the value of the syntax element.

2. The 3D data decoding device according to claim 1, wherein the tile submesh information includes a third syntax element indicating whether or not the index of the corresponding codec tile in the tile is signaled, and the third syntax element indicates a first value if the index is signaled and a second value if it is not signaled.

3. The 3D data decoding device according to claim 2, wherein the tile submesh information includes a fourth syntax element indicating whether the partitioning of the codec tile is the same as the partitioning of other tiles when the value of the third syntax element is the first value, the fourth syntax element with the first value indicates that the i-th tile includes only the region of the i-th codec tile and does not include the region of the j-th codec tile, and the fourth syntax element with the second value indicates that the partitioning of the codec tile is not the same as the partitioning of other tiles.

4. A 3D data encoding device for decoding mesh data or point cloud data, comprising: a tile information encoding unit for encoding tile information; and an additional information encoding unit for encoding tile submesh information, wherein the tile submesh information includes a first syntax element indicating the tile type; and the additional information encoding unit encodes a second syntax element indicating the index of the codec tile included in the tile according to the value of the syntax element.

5. A method for transmitting an encoded stream, wherein the encoded stream includes tile information and tile submesh information, the tile submesh information includes a first syntax element indicating the tile type, and the tile submesh information includes a second syntax element indicating the index of the codec tile included in the tile, depending on the value of the syntax element.