A data processing method, device and equipment of point cloud media and medium
By adding multiple attribute headers to the decoding indication information of point cloud frames, the problem of the single decoding indication method of point cloud frames is solved, and the flexibility and efficiency of point cloud media processing are improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2022-03-11
- Publication Date
- 2026-06-19
AI Technical Summary
The current technology for decoding point cloud frames is relatively simple and lacks flexibility, failing to meet the diverse needs of point cloud media decoding and transmission.
By adding at least two attribute headers to the decoding indication information of each point cloud frame, each attribute header containing an attribute identifier field, the decoding indication method is enriched, enabling content consumption devices to select the required media files for transmission and decoding as needed.
It enables diverse decoding instruction methods for point cloud media, allowing content consumption devices to flexibly select the required media files for transmission and decoding consumption, thus improving the flexibility and efficiency of point cloud media processing.
Smart Images

Figure CN116781676B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, specifically to a data processing method for point cloud media, a data processing device for point cloud media, a data processing equipment for point cloud media, and a computer-readable storage medium. Background Technology
[0002] With the continuous development of science and technology, it is now possible to obtain large amounts of highly accurate point cloud data at a relatively low cost and in a short period of time. As large-scale point cloud data continues to accumulate, the efficient storage, transmission, distribution, sharing, and standardization of point cloud data have become hot topics in point cloud application research.
[0003] Currently, the decoding indication information for point cloud frames includes an attribute header, and the corresponding point cloud attribute patch contains all the attribute data of the point cloud frame. In practice, it has been found that the decoding indication method for point cloud frames is relatively simple. Summary of the Invention
[0004] This application provides a data processing method, apparatus, device, and computer-readable storage medium for point cloud media, which can enrich the decoding indication methods of point cloud media.
[0005] On one hand, embodiments of this application provide a data processing method for point cloud media, including:
[0006] Obtain the media file of the point cloud media. The media file includes the bitstream data of the point cloud frame and the decoding indication information of the point cloud frame. The decoding indication information of each point cloud frame includes at least two attribute headers, and each attribute header includes an attribute identifier field.
[0007] Based on the bitstream data and decoding instructions, point cloud media is presented.
[0008] In this embodiment, a media file of point cloud media is obtained. The media file includes bitstream data of point cloud frames and decoding indication information of point cloud frames. The decoding indication information of each point cloud frame includes at least two attribute headers, each of which includes an attribute identifier field. Based on the bitstream data and decoding indication information, the point cloud media is presented. It is evident that the attribute identifier field allows for the differentiation of attribute data of point cloud frames, and the at least two attribute headers provide indication, enriching the decoding indication methods for point cloud media. Content consumption devices can flexibly select the desired point cloud media file for transmission and decoding consumption according to their needs.
[0009] On one hand, embodiments of this application provide a data processing method for point cloud media, including:
[0010] Acquire point cloud frames from point cloud media and encode the point cloud frames to obtain the bitstream data of the point cloud frames;
[0011] Based on the bitstream data, decoding indication information for point cloud frames is generated. The decoding indication information for each point cloud frame includes at least two attribute headers, and each attribute header includes an attribute identifier field.
[0012] The bitstream data and decoding instruction information are encapsulated to obtain the point cloud media file.
[0013] In this embodiment, point cloud frames of point cloud media are acquired, and the point cloud frames are encoded to obtain bitstream data. Based on the bitstream data, decoding indication information for each point cloud frame is generated. The decoding indication information for each point cloud frame includes at least two attribute headers, each of which includes an attribute identifier field. The bitstream data and decoding indication information are encapsulated to obtain the media file of the point cloud media. It is evident that the attribute identifier field allows for the differentiation of attribute data in point cloud frames, and the at least two attribute headers provide indication, enriching the decoding indication methods for point cloud media. This enables content consumption devices to flexibly select the desired point cloud media file for transmission and decoding consumption according to their needs.
[0014] On one hand, embodiments of this application provide a data processing apparatus for point cloud media, including:
[0015] The acquisition unit is used to acquire the media file of the point cloud media. The media file includes the bitstream data of the point cloud frame and the decoding indication information of the point cloud frame. The decoding indication information of each point cloud frame includes at least two attribute headers, and each attribute header includes an attribute identifier field.
[0016] The processing unit is used to present point cloud media based on the bitstream data and decoding instruction information.
[0017] On one hand, embodiments of this application provide a data processing apparatus for point cloud media, including:
[0018] The acquisition unit is used to acquire point cloud frames of point cloud media and encode the point cloud frames to obtain the bit stream data of the point cloud frames.
[0019] The processing unit is used to generate decoding indication information for point cloud frames based on the bitstream data. The decoding indication information for each point cloud frame includes at least two attribute headers, and each attribute header includes an attribute identifier field.
[0020] And media files used to encapsulate bitstream data and decoding indication information to obtain point cloud media.
[0021] Accordingly, this application provides a computer device, the device comprising:
[0022] A processor is used to load and execute computer programs;
[0023] A computer-readable storage medium storing a computer program that, when executed by a processor, implements the aforementioned data processing method for point cloud media.
[0024] Accordingly, this application provides a computer-readable storage medium storing a computer program adapted to be loaded by a processor and executed by the above-described point cloud media data processing method.
[0025] Accordingly, this application provides a computer program product or computer program that includes computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the aforementioned point cloud media data processing method.
[0026] In this embodiment, point cloud frames of point cloud media are acquired, and the point cloud frames are encoded to obtain bitstream data. Based on the bitstream data, decoding indication information for each point cloud frame is generated. The decoding indication information for each point cloud frame includes at least two attribute headers, each of which includes an attribute identifier field. The bitstream data and decoding indication information are encapsulated to obtain the media file of the point cloud media. It is evident that the attribute identifier field allows for the differentiation of attribute data in point cloud frames, and the at least two attribute headers provide indication, enriching the decoding indication methods for point cloud media. This enables content consumption devices to flexibly select the desired point cloud media file for transmission and decoding consumption according to their needs. Attached Figure Description
[0027] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0028] Figure 1a A schematic diagram of a 6DoF circuit provided in an embodiment of this application;
[0029] Figure 1b A schematic diagram of 3DoF provided for an embodiment of this application;
[0030] Figure 1c A schematic diagram of 3DoF+ provided for an embodiment of this application;
[0031] Figure 1d A data processing architecture diagram for point cloud media provided in this application embodiment;
[0032] Figure 2 A flowchart illustrating a data processing method for point cloud media provided in this application embodiment;
[0033] Figure 3 A flowchart illustrating another point cloud media data processing method provided in this application embodiment;
[0034] Figure 4 A schematic diagram of the structure of a point cloud media data processing device provided in an embodiment of this application;
[0035] Figure 5 A schematic diagram of the structure of another point cloud media data processing device provided in an embodiment of this application;
[0036] Figure 6 This is a schematic diagram of the structure of a content consumption device provided in an embodiment of this application;
[0037] Figure 7 This is a schematic diagram of the structure of a content creation device provided in an embodiment of this application. Detailed Implementation
[0038] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.
[0039] The following describes some technical terms used in the embodiments of this application:
[0040] I. Immersive Media:
[0041] Immersive media refers to media files that provide immersive content, allowing viewers to experience visual, auditory, and other sensory sensations reminiscent of the real world. Based on the degree of freedom viewers have when consuming the media content, immersive media can be categorized as: 6DoF (Degree of Freedom) immersive media, 3DoF immersive media, and 3DoF+ immersive media.
[0042] II. Point Clouds:
[0043] A point cloud is a set of randomly distributed discrete points in space that represent the spatial structure and surface properties of a three-dimensional object or scene. Each point in a point cloud has at least three-dimensional positional information and, depending on the application, may also have color, material, or other information. Typically, each point in a point cloud has the same number of additional attributes.
[0044] III. Point Cloud Media:
[0045] Point cloud media is a typical 6DoF immersive media. It can flexibly and conveniently express the spatial structure and surface properties of three-dimensional objects or scenes, and is therefore widely used in projects such as Virtual Reality (VR) games, Computer-Aided Design (CAD), Geographic Information Systems (GIS), Autonomous Navigation Systems (ANS), digital cultural heritage, free-viewpoint broadcasting, 3D immersive telepresence, and 3D reconstruction of biological tissues and organs.
[0046] IV. Track:
[0047] A track is a collection of media data during the media file encapsulation process. A media file can consist of one or more tracks. For example, a media file can typically contain a video track, an audio track, and a subtitle track.
[0048] V. Sample:
[0049] A sample is a unit of encapsulation in the media file encapsulation process. A track consists of many samples. For example, a video track can consist of many samples. A sample is usually a video frame.
[0050] 6. ISOBMFF (ISO Based Media File Format): This is a media file encapsulation standard. A typical ISOBMFF file is an MP4 file.
[0051] 7. DASH (Dynamic Adaptive Streaming over HTTP): This is an adaptive bitrate technology that enables high-quality streaming media to be delivered over the Internet through traditional HTTP web servers.
[0052] 8. MPD (Media Presentation Description, in DASH) is used to describe media segment information in a media file.
[0053] This application relates to immersive media data processing technology. Some concepts in the immersive media data processing process will be introduced below. In particular, the following embodiments of this application will use free-viewpoint video as an example for immersive media.
[0054] Figure 1a This diagram illustrates a 6DoF implementation as provided in this application. 6DoF is categorized into window 6DoF, omnidirectional 6DoF, and 6DoF. Window 6DoF restricts the viewer's rotational movement along the X and Y axes, and translation along the Z axis; for example, the viewer cannot see outside the window frame or pass through the window. Omnidirectional 6DoF restricts the viewer's rotational movement along the X, Y, and Z axes; for example, the viewer cannot freely move through the 3D 360° VR content within the restricted movement area. 6DoF allows the viewer to translate freely along the X, Y, and Z axes; for example, the viewer can move freely within the 3D 360° VR content. Similar to 6DoF are 3DoF and 3DoF+ production techniques. Figure 1b This is a schematic diagram of a 3DoF implementation provided in an embodiment of this application; as shown... Figure 1b As shown, 3DoF refers to the viewer of immersive media being fixed at the center point in a three-dimensional space, while the viewer's head rotates along the X, Y, and Z axes to view the images provided by the media content. Figure 1c This is a schematic diagram of a 3DoF+ embodiment provided in this application, as shown below. Figure 1c As shown, 3DoF+ refers to the ability of immersive media viewers to move their heads within a limited space based on 3DoF to view the images provided by the media content when the virtual scene provided by the immersive media has a certain depth information.
[0055] With the continuous development of science and technology, it is now possible to obtain large amounts of highly accurate point cloud data at a relatively low cost and in a short period of time. Point cloud data acquisition methods include computer generation, 3D laser scanning, and 3D photogrammetry. Specifically, point cloud data can be obtained by acquiring visual scenes of the real world through acquisition devices (a set of cameras or a camera device with multiple lenses and sensors). 3D laser scanning can obtain point clouds of static real-world 3D objects or scenes, acquiring millions of point cloud data per second; 3D photogrammetry can obtain point clouds of dynamic real-world 3D objects or scenes, acquiring tens of millions of point cloud data per second. Furthermore, in the medical field, point cloud data of biological tissues and organs can be obtained through magnetic resonance imaging (MRI), computed tomography (CT), and electromagnetic positioning information. Additionally, point cloud data can also be directly generated by computers based on virtual 3D objects and scenes; for example, computers can generate point cloud data for virtual 3D objects and scenes. With the continuous accumulation of large-scale point cloud data, the efficient storage, transmission, publication, sharing, and standardization of point cloud data have become crucial for point cloud applications.
[0056] Figure 1d This is a data processing architecture diagram for point cloud media provided in an embodiment of this application. For example... Figure 1d As shown, the data processing process on the content production device side mainly includes: (1) the acquisition process of media content from point cloud data; (2) the encoding and file encapsulation process of point cloud data. The data processing process on the content consumption device side mainly includes: (3) the file decapsulation and decoding process of point cloud data; (4) the rendering process of point cloud data. In addition, the transmission process of point cloud media between the content production device and the content consumption device is involved. This transmission process can be based on various transmission protocols, including but not limited to: DASH (Dynamic Adaptive Streaming over HTTP) protocol, HLS (HTTP Live Streaming) protocol, SMTP (Smart Media Transport Protocol), TCP (Transmission Control Protocol), etc.
[0057] The data processing procedure for point cloud media is described in detail below:
[0058] (1) Obtain the media content of point cloud media.
[0059] From the perspective of how point cloud media content is acquired, it can be divided into two methods: acquiring sound and visual scenes from the real world through capture devices, and generating them through computers. In one implementation, the capture device can refer to a hardware component installed in the content production equipment, such as a microphone, camera, or sensor on the terminal. In another implementation, the capture device can also be a hardware device connected to the content production equipment, such as a camera connected to a server; used to provide the content production equipment with point cloud data media content acquisition services. The capture device can include, but is not limited to, audio devices, camera devices, and sensing devices. Audio devices can include audio sensors, microphones, etc. Camera devices can include ordinary cameras, stereo cameras, light field cameras, etc. Sensing devices can include laser devices, radar devices, etc. Multiple capture devices can be used, deployed at specific locations in the real space to simultaneously capture audio and video content from different angles within that space, with the captured audio and video content remaining synchronized in both time and space. Due to the different acquisition methods, the compression encoding methods corresponding to the media content of different point cloud data may also differ.
[0060] (2) The process of encoding and encapsulating media content in point cloud media.
[0061] Currently, geometry-based point cloud compression (GPCC) is commonly used to encode acquired point cloud data, resulting in a geometry-based compressed bitstream (including encoded geometric bitstream and attribute bitstream). The encapsulation modes of geometry-based compressed bitstreams include single-track encapsulation and multi-track encapsulation.
[0062] Single-track encapsulation mode refers to encapsulating point cloud bitstreams in the form of a single track. In single-track encapsulation mode, a sample will contain one or more coded content units (such as a geometric coded content unit and multiple attribute coded content units). The advantage of single-track encapsulation mode is that a single-track encapsulated point cloud file can be obtained based on the point cloud bitstream without much processing.
[0063] Multi-track encapsulation mode refers to encapsulating point cloud bitstreams in the form of multiple tracks. In multi-track encapsulation mode, each track contains one component in the point cloud bitstream, namely a geometric component track and one or more attribute component tracks. The advantage of multi-track encapsulation is that encapsulating different components separately allows the client to select the required components for transmission and decoding consumption according to its own needs.
[0064] (3) The process of decapsulating and decoding point cloud media files;
[0065] Content consumption devices can obtain media file resources and corresponding media presentation description information from point cloud data through content production devices. The media file resources and media presentation description information of the point cloud data are transmitted from the content production device to the content consumption device via a transmission mechanism (such as DASH or SMT). The file decapsulation process on the content consumption device side is the reverse of the file encapsulation process on the content production device side. The content consumption device decapsulates the media file resources according to the file format requirements of the point cloud media to obtain the encoded bitstream (GPCC bitstream or VPCC bitstream). The decoding process on the content consumption device side is the reverse of the encoding process on the content production device side. The content consumption device decodes the encoded bitstream to reconstruct the point cloud data.
[0066] (4) The rendering process of point cloud media.
[0067] The content consumption device renders the point cloud data obtained by decoding the GPCC bitstream based on the metadata related to rendering and windowing in the media presentation description information, obtains the point cloud frames of the point cloud media, and presents the point cloud media according to the presentation time of the point cloud frames.
[0068] In one embodiment, the content creation device first samples a real-world visual scene using an acquisition device to obtain point cloud data corresponding to the real-world visual scene. Then, it encodes the acquired point cloud data using geometry-based point cloud compression (GPCC) to obtain a GPCC bitstream (including encoded geometric bitstreams and attribute bitstreams). Next, it encapsulates the GPCC bitstream to obtain a media file (i.e., point cloud media) corresponding to the point cloud data. Specifically, the content creation device combines one or more encoded bitstreams into a media file for file playback, or a sequence of initialization segments and media segments for streaming, according to a specific media container file format. The media container file format refers to the ISO Basic Media File Format as specified in International Organization for Standardization (ISO) / International Electrotechnical Commission (IEC) 14496-12. In one embodiment, the content creation device also encapsulates metadata into the media file or the sequence of initialization / media segments and transmits the sequence of initialization / media segments to the content consumption device via a transmission mechanism (such as a dynamic adaptive streaming media transmission interface).
[0069] On the content consumption device side: First, it receives point cloud media files sent by the content production device, including media files for playback or initialization segments and media segment sequences for streaming. Then, it decapsulates the point cloud media files to obtain an encoded GPCC bitstream. Next, it parses the encoded GPCC bitstream (i.e., decodes the encoded GPCC bitstream to obtain point cloud data). In specific implementations, the content consumption device determines the media files or media segment sequences required for presenting the point cloud media based on the current viewing position / direction. It then decodes these media files or media segment sequences to obtain the required point cloud data. Finally, based on the current viewing (window) direction, it renders the decoded point cloud data to obtain point cloud frames, and presents the point cloud media on the screen of the head-mounted display or any other display device carried by the content consumption device according to the presentation time of the point cloud frames. It should be noted that the current viewing position / direction is determined by head tracking and possibly visual tracking functions. In addition to using a renderer to render point cloud data of the current object's viewing position / viewing direction, an audio decoder can also be used to decode and optimize the audio in the current object's viewing (viewport) direction.
[0070] Content creation equipment and content consumption equipment can together form a point cloud media system. Content creation equipment refers to the computer equipment used by the provider of point cloud media (e.g., the content creator of point cloud media). This computer equipment can be a terminal (such as a PC, a smart mobile device (such as a smartphone)) or a server. The server can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms. Content consumption equipment refers to the computer equipment used by the user of point cloud media (e.g., the viewer of point cloud media). This computer equipment can be a terminal (such as a PC, a smart mobile device (such as a smartphone), VR devices (such as VR headsets, VR glasses), smart home appliances, in-vehicle terminals, aircraft, etc.).
[0071] It is understood that the data processing technology of point cloud media involved in this application can be implemented based on cloud technology; for example, using a cloud server as a content production device. Cloud technology refers to a hosting technology that unifies a series of resources such as hardware, software, and networks within a wide area network or local area network to realize the computation, storage, processing, and sharing of data.
[0072] In practical applications, content creation devices can guide content consumption devices to decode and present the bitstream data of point cloud frames using the sequence header and decoding instructions for each point cloud frame. The syntax of the sequence header is shown in Table 1 below:
[0073] Table 1
[0074]
[0075]
[0076] The profile_id field indicates the profile of the bitstream conforms to, and its value is an 8-bit unsigned integer. The level_id field indicates the level of the bitstream conforms to, and its value is an 8-bit unsigned integer.
[0077] The sequence parameter set identifier (sequence_parameter_set_id) field provides a sequence parameter set (SPS) identifier for reference by other syntax elements. This identifier is an integer between 0 and 31.
[0078] The `bounding_box_offset_x_upper` field represents the number of bits above the 16-bit mark of the bounding box's x-coordinate. Its value is an unsigned integer. The `bounding_box_offset_x_lower` field represents the lower 16 bits of the bounding box's x-coordinate. Its value is also an unsigned integer. Therefore, the x-coordinate of the bounding box's origin is: `bounding_box_offset_x = (bounding_box_offset_x_upper) << 16 + bounding_box_offset_x_lower`.
[0079] The `bounding_box_offset_y_upper` field represents the number of bits above the 16-bit mark of the bounding box's y-coordinate. Its value is an unsigned integer. The `bounding_box_offset_y_lower` field represents the lower 16 bits of the bounding box's y-coordinate. Its value is also an unsigned integer. Therefore, the y-coordinate of the bounding box origin is: `bounding_box_offset_y = (bounding_box_offset_y_upper) << 16 + bounding_box_offset_y_lower`.
[0080] The `bounding_box_offset_z_upper` field represents the number of bits above the 16-bit mark of the bounding box's z-coordinate. Its value is an unsigned integer. The `bounding_box_offset_z_lower` field represents the lower 16 bits of the bounding box's z-coordinate. Its value is also an unsigned integer. Therefore, the z-coordinate of the bounding box origin is: `bounding_box_offset_z = (bounding_box_offset_z_upper) << 16 + bounding_box_offset_z_lower`.
[0081] The `bounding_box_size_width_upper` field represents the number of bits above 16 bits of the bounding box width. Its value is an unsigned integer. The `bounding_box_size_width_lower` field represents the lower 16 bits of the bounding box width. Its value is also an unsigned integer. The bounding box width is calculated as: `bounding_box_size_width = (bounding_box_size_width_upper) << 16 + bounding_box_size_width_lower`.
[0082] The `bounding_box_size_height_upper` field represents the number of bits above the 16-bit bounding box height. Its value is an unsigned integer. The `bounding_box_size_height_lower` field represents the lower 16 bits of the bounding box height. Its value is also an unsigned integer. The bounding box height is calculated as: `bounding_box_size_height = (bounding_box_size_height_upper) << 16 + bounding_box_size_height_lower`.
[0083] The `bounding_box_size_depth_upper` field represents the number of bits above 16 bits in the bounding box depth. Its value is an unsigned integer. The `bounding_box_size_depth_lower` field represents the lower 16 bits of the bounding box depth. Its value is also an unsigned integer. The bounding box depth is calculated as: `bounding_box_size_depth = (bounding_box_size_depth_upper) << 16 + bounding_box_size_depth_lower`.
[0084] The quantization step size high-order part (quant_step_lower) field represents the high 16 bits of the 32-bit floating-point quantization step size. The value of the quantization step size high-order part (quant_step_lower) field is a 16-bit number. The quantization step size low-order part (quant_step_upper) field represents the low 16 bits of the 32-bit floating-point quantization step size. The value of the quantization step size low-order part (quant_step_upper) field is a 16-bit number. The quantization step size is: quant_step = (float)((quant_step_upper << 16) + quant_step_lower).
[0085] The `geomRemoveDuplicateFlag` field is a binary variable. When the value of `geomRemoveDuplicateFlag` is '1', it means that duplicate points (points with the same geometric position) are removed before geometric encoding; when the value of `geomRemoveDuplicateFlag` is '0', it means that duplicate points are not removed.
[0086] The attribute present_flag field has a binary value. When the attribute present_flag field is '1', it means that this bitstream contains attribute encoding; when the attribute present_flag field is '0', it means that this bitstream does not contain attribute encoding.
[0087] The `maxNumAttributesMinus1` field is an unsigned integer. Adding 1 to the value of `maxNumAttributesMinus1` indicates the maximum number of attribute codes supported by the current standard bitstream. The value of `maxNumAttributesMinus1` is an integer between 0 and 15. When the bitstream does not contain a `maxNumAttributesMinus1` field, its default value is 0.
[0088] The attribute-adaptive prediction flag (attribute_adapt_pred) field has a binary value. When the attribute-adaptive prediction flag (attribute_adapt_pred) field is '0', it indicates that no adaptive prediction method is selected; when the attribute-adaptive prediction flag (attribute_adapt_pred) field is '1', it indicates that switching from a geometrically based prediction method to an attribute-based prediction method is allowed.
[0089] The attribute quantization parameter (attribute_qp) field is used to represent the attribute quantization parameter. The value of the attribute quantization parameter (attribute_qp) field is an unsigned integer.
[0090] Furthermore, ue(v) is a syntax element encoded in Golomb code for an unsigned integer exponent, with the left bit first. se(v) is a syntax element encoded in Golomb code for a signed integer exponent, with the left bit first. u(n) is an n-bit unsigned integer. In the syntax table, if n is 'v', its number of bits is determined by the values of other syntax elements. f(n) is an n-bit fixed-pattern bit string.
[0091] The decoding indication information for point cloud frames includes geometric header information, the syntax of which can be found in Table 2 below:
[0092] Table 2
[0093]
[0094] The geometry_parameter_set_id field provides a geometry parameter identifier for reference by other syntax elements. This identifier is an integer between 0 and 31.
[0095] The `geometry_sequence_parameter_set_id` field is used to identify a Sequence Parameter Set (SPS) for use by the current geometry parameter set. This identifier is an integer between 0 and 31, and the value of the `geometry_sequence_parameter_set_id` field remains consistent across all geometry parameter sets within the same point cloud.
[0096] The node size (gps_lcu_node_size_log2_minus_one) field for the geometric macroblock has an unsigned integer value. When the node size (gps_lcu_node_size_log2_minus_one) field is '0', it indicates that block structure encoding is disabled; when the node size (gps_lcu_node_size_log2_minus_one) field is greater than '0', it indicates that block structure encoding is enabled, and the geometric node size of the macroblock is defined, i.e., gps_lcu_node_size_log2 = gps_lcu_node_size_log2_minus_one + 1.
[0097] The `gps_implicit_geom_partition_flag` field is a binary variable. A value of '0' indicates that implicit geometric partitioning is disabled; a value of '1' indicates that implicit geometric partitioning is enabled.
[0098] The `gps_max_num_implicit_qtbt_before_ot` field represents the maximum number of quadtree / binary tree partitions allowed before octree partitioning in geometric implicit partitioning. The value of this field is an unsigned integer.
[0099] The `gps_min_size_implicit_qtbt` field, representing the minimum allowed partition size for a quadtree or binary tree in geometric implicit partitioning, specifies the minimum size allowed for such partitions. The value of this field is an unsigned integer.
[0100] The `gps_single_mode_flag` field is a binary variable. A value of '0' indicates that the geometric isolation coding mode is disabled; a value of '1' indicates that the geometric isolation coding mode is enabled.
[0101] It should be noted that when the geometric implicit partitioning flag (gps_implicit_geom_partition_flag) field is '1', the maximum number of quaternary / binary tree partitions (gps_max_num_implicit_qtbt_before_ot) and the minimum size of the quaternary / binary tree partition (gps_min_size_implicit_qtbt) fields before the octree partition need to be limited based on the logarithmic size of the root node. The specific process is shown in Table 3.
[0102] Table 3
[0103]
[0104]
[0105] The `gps_save_stat_flag` field is a binary variable. A value of '0' indicates that the encoding state (i.e., the entropy encoding context and the hash table information of the geometric encoding) is not stored; a value of '1' indicates that the encoding state is stored.
[0106] In addition, ue(v) is a syntax element of the unsigned integer exponentiation Golomb code encoding, with the left bit first. u(n) is an n-bit unsigned integer.
[0107] The decoding indication information for point cloud frames also includes attribute header information. The syntax of the attribute header can be found in Table 4 below:
[0108] Table 4
[0109]
[0110]
[0111]
[0112] The attribute parameter set identifier (attribute_parameter_set_id) field is used to provide an attribute parameter identifier for reference by other syntax elements. This identifier is an integer between 0 and 31.
[0113] The `attribute_sequence_parameter_set_id` field is used to identify a Sequence Parameter Set (SPS) for use by the current attribute parameter set. This identifier is an integer between 0 and 31. The value of the `attribute_sequence_parameter_set_id` field remains consistent across all attribute parameter sets within the same point cloud.
[0114] The attribute presence flag (attributePresentFlag[attrIdx]) field is a binary variable. When the value of the attribute presence flag (attributePresentFlag[attrIdx]) field is '1', it indicates that the current bitstream contains the encoding of the attrIdx attribute; when the value of the attribute presence flag (attributePresentFlag[attrIdx]) field is '0', it indicates that the current bitstream does not contain the encoding of the attrIdx attribute. attrIdx is an integer between 0 and 15. Its meaning is indicated by the attribute encoding mapping table in Table x, as shown in Table 5 below:
[0115] Table 5
[0116] value of attr_idx Attribute Description 0 color 1 reflectivity 2..15 reserve
[0117] The attribute transformation algorithm flag (transform) field controls whether wavelet transform is used for attribute encoding. This field is a binary variable; a value of '1' indicates that wavelet transform is used, while a value of '0' indicates that prediction is used for attribute encoding.
[0118] The attribute transformation coefficient quantization parameter difference (attrTransformQpDelta) field represents the difference between the attribute transformation coefficient quantization parameter and the attribute residual quantization parameter. The value of this field is an unsigned integer. Attribute transformation coefficient quantization parameter attrTransformQp = attrQuantParam + attrTransformQpDelta
[0119] The `attrTransformNumPoints` field indicates the number of points used in the attribute transformation (i.e., the wavelet transform using `attrTransformNumPoints` points). A value of 0 in the `attrTransformNumPoints` field indicates that all points in the slice are used for the wavelet transform. The value of this field is an unsigned integer.
[0120] The field `maxNumOfNeighbour_log2_minus7` (logarithm of the maximum number of neighbors searched minus seven) is used to derive the variable `maxNumOfNeighbour`, which represents the maximum number of encoded neighbors available for searching. This controls the search range of candidate neighbors and the number of points cached by the hardware during attribute prediction. The value of this field is an unsigned integer. `maxNumOfNeighbour` is calculated using the following formula:
[0121] maxNumOfNeighbour = 2 (maxNumOfNeighbor _ log2 _ minus7+7)
[0122] Here, maxNumOfNeighbour_log2_minus7 is an integer between 0 and 3.
[0123] The value of the attribute residual quadratic prediction (cross_component_pred) field is a binary variable. When the value of the attribute residual quadratic prediction (cross_component_pred) field is '1', it means that attribute residual quadratic prediction is allowed; when the value of the attribute residual quadratic prediction (cross_component_pred) field is '0', it means that attribute residual quadratic prediction is not allowed.
[0124] The value of the half_zero_runlength_enable field is a binary variable. When the value of the half_zero_runlength_enable field is '1', it means that the half-run length is used; when the value of the half_zero_runlength_enable field is '0', it means that the half-run length is not used.
[0125] The chroma channel Cb quantization parameter offset (chromaQpOffsetCb) field controls the quantization parameter of the Cb channel. The value of this field is a signed integer, ranging from -16 to 16. It should be noted that if chromaQpOffsetCb is not present in the current attribute header information, its value is 0. That is, chromaQpCb = Clip3(minQP, maxQP, attribute_qp + chromaQpOffsetCb). Here, the quantization parameter lumaQp for the luma channel is attribute_qp, with a minimum supported quantization parameter of minQP = 0 and a maximum supported quantization parameter of maxQP = 63.
[0126] The chroma channel quantization parameter offset (chromaQpOffsetCr) field controls the quantization parameter of the Cr channel. The value of this field is a signed integer, ranging from -16 to 16. It should be noted that if chromaQpOffsetCr is not present in the current attribute header information, its value is 0. That is, chromaQpCr = Clip3(minQP, maxQP, attribute_qp + chromaQpOffsetCr). Here, the quantization parameter lumaQp for the luma channel is attribute_qp, with a minimum supported quantization parameter of minQP = 0 and a maximum supported quantization parameter of maxQP = 63.
[0127] The nearest neighbor prediction parameter 1 (nearestPredParam1) field is used to control the threshold for nearest neighbor prediction. The value of this field is an unsigned integer.
[0128] The nearest neighbor prediction parameter 2 (nearestPredParam2) field controls the threshold for nearest neighbor prediction; its value is an unsigned integer. Specifically, the threshold for nearest neighbor prediction is:
[0129] attrQuantParam*nearestPredParam1+nearestPredParam1
[0130] The spatial bias coefficient (axisBias) field controls the offset in the Z direction during attribute prediction calculation. The value of this field is an unsigned integer.
[0131] The `outputBitDepthMinus1` field controls the attribute output bit depth. The value of this field is an unsigned integer, ranging from 0 to 15. The attribute output bit depth `outputBitDepth` = `outputBitDepthMinus1` + 1. It should be noted that if this syntax element is not in the bitstream, the value of the `outputBitDepthMinus1` field is the default value (0).
[0132] The `numOflevelOfDetail` field controls the number of Levels of Detail (LoD) layers used during attribute prediction. This field has an unsigned integer value. Within the current portion of the bitstream, the value of `numOflevelOfDetail` should not exceed 32.
[0133] The `maxNumOfPredictNeighbours` field limits the number of neighbor points selected during attribute prediction. This field has an unsigned integer value. In the current bitstream, the value of `maxNumOfPredictNeighbours` should not exceed 16.
[0134] The intraLodFlag field controls whether intra-layer prediction is enabled. This field is a binary variable; a value of '1' indicates that intra-layer prediction is enabled, and a value of '0' indicates that intra-layer prediction is disabled.
[0135] The `colorReorderMode` field indicates the reordering mode selected for the current color information. The value of this field is an unsigned integer. When the value of `colorReorderMode` is "0", the original point cloud input order is used; when the value of `colorReorderMode` is "1", Hilbert reordering is used; and when the value of `colorReorderMode` is "2", Morton reordering is used.
[0136] The `refReorderMode` field indicates the reordering mode selected for the current reflectance information. The value of this field is an unsigned integer. When the value of `refReorderMode` is "0", the original point cloud input order is used; when the value of `refReorderMode` is "1", Hilbert reordering is used; and when the value of `refReorderMode` is "2", Morton reordering is used.
[0137] The `attrEncodeOrder` field controls the encoding order of attributes when the point cloud contains multiple attribute types. This field is a binary variable. A value of '0' indicates that color is encoded first, followed by reflectance; a value of '1' indicates that reflectance is encoded first, followed by color.
[0138] The `crossAttrTypePred` field indicates whether cross-type attribute prediction is allowed. This field is a binary variable; a value of '1' indicates that cross-type attribute prediction is allowed, while a value of '0' indicates that cross-type attribute prediction is not allowed.
[0139] The crossAttrTypePredParam1 field controls the weighting of geometric and attribute distances in cross-type attribute prediction. The value of this field is a 15-bit unsigned integer.
[0140] The crossAttrTypePredParam2 field controls the weight parameter 2 used to calculate the distance between geometric information and attribute information in cross-type attribute prediction. The value of this field is a 21-bit unsigned integer.
[0141] The reflectance group prediction flag (refGroupPred) field controls whether the reflectance group prediction mode of the prediction transformation is enabled. This field is a binary variable; a value of '1' indicates that group prediction is enabled, and a value of '0' indicates that group prediction is disabled.
[0142] Furthermore, ue(v) is a syntax element encoded in Golomb code for an unsigned integer exponent, with the left bit first. se(v) is a syntax element encoded in Golomb code for a signed integer exponent, with the left bit first. u(n) is an n-bit unsigned integer. In the syntax table, if n is 'v', its number of bits is determined by the values of the other syntax elements.
[0143] The decoding indication information for point cloud frames also includes attribute headers, the syntax of which can be found in Table 6 below:
[0144] Table 6
[0145]
[0146] The slice_id field indicates the slice number of the attribute; its value is an unsigned integer. ue(v) is a syntax element encoded in Golomb code for an unsigned integer exponent, with the left bit first.
[0147] As can be seen from the above sequence header and the decoding indication information of the point cloud frame, although the attribute data has an attribute header to indicate the attribute-related parameters, the following problems exist: (1) Each point cloud frame consists of a geometry header, an attribute header, and one or more point cloud patch data. However, a point cloud attribute patch contains all the attribute data, making it impossible to map different point cloud attributes to different point cloud attribute patches. (2) For a certain type of point cloud attribute, only one set of point cloud data of that type can exist.
[0148] Based on this, this application proposes a data processing method for point cloud media. By extending the fields at the high-level syntax layer of the bitstream, it distinguishes the type and unique identifier of point cloud attributes, supports multiple attribute headers to indicate attribute-related parameters, and supports multiple sets of point cloud data containing the same point cloud type. This application can be applied to content production equipment, content consumption equipment, and intermediate nodes in point cloud media systems. The point cloud sequence is the highest-level syntax structure of the bitstream. The point cloud sequence starts with a sequence header and also includes bitstream data of one or more point cloud frames. The bitstream data of each point cloud frame corresponds to decoding indication information. The decoding indication information of the point cloud frame includes a geometric header, an attribute header, and one or more point cloud slice data. Among them, the point cloud slice includes point cloud geometric slices and point cloud attribute slices. The point cloud geometric slice consists of a geometric slice header and geometric information; the point cloud attribute slice consists of an attribute slice header and attribute information. The following example illustrates how to define the high-level syntax information indication method for point clouds using extended Audio Video Coding Standard (AVS) and Geometry-based Point Cloud Compression (GPCC) bitstream high-level syntax. High-level syntax elements include sequence headers, attribute headers, and attribute slice headers. In one implementation, the decoding indication information of a point cloud frame contains multiple attribute headers, each corresponding to one or more point cloud attribute slices, where the attribute identifier field values are identical. The sequence header extensions are shown in Table 7.
[0149] Table 7
[0150]
[0151]
[0152] The attributeIdentifier field indicates the attribute data of the point cloud frame. The value of the attributeIdentifier field is different for each set of data instances of each type of attribute data in the point cloud frame (i.e., each set of data instances of each type of attribute data corresponds to a unique attribute data identifier). (For example, if the bitstream contains two different sets of color data, the value of the attributeIdentifier field will be different for each set of color data instances).
[0153] The `attributeSliceDataType` field indicates the type of attribute data in the point cloud slice. A value of 0 indicates that the point cloud slice contains only color attribute data; a value of 1 indicates that the point cloud slice contains only reflectance attribute data; and a value of 2 indicates that the point cloud slice contains both color and reflectance attribute data, and attribute prediction can span both attribute types. The syntax for the remaining fields in the sequence header can be found in Table 1 above and will not be repeated here.
[0154] The extensions of the attribute headers corresponding to Table 7 above are shown in Table 8:
[0155] Table 8
[0156]
[0157]
[0158]
[0159] The syntax of each field in the attribute header (Table 8) can be found in Tables 4 and 7 above, and will not be repeated here.
[0160] The extensions of the attribute headers corresponding to Tables 7 and 8 above are shown in Table 9:
[0161] Table 9
[0162]
[0163] The syntax of each field in the attribute header (Table 9) can be found in Tables 6 and 7 above, and will not be repeated here.
[0164] In another implementation, the decoding indication information of the point cloud frame contains an attribute header. By indicating the attribute identifier field and the attribute slice data type field respectively, the attribute data in the point cloud attribute slice can be distinguished, the correspondence between the point cloud attribute slice and the attribute header can be determined, and multiple sets of point cloud data containing the same point cloud type can be supported. The sequence header extension can be specified in Table 7 above, and the attribute header extension is shown in Table 10:
[0165] Table 10
[0166]
[0167]
[0168] The syntax of each field in the attribute header (Table 10) can be found in Tables 4 and 7 above, and will not be repeated here.
[0169] The extensions of the attribute data bitstream (general_attribute_data_bitstream) corresponding to Tables 7 and 10 above are shown in Table 11:
[0170] Table 11
[0171]
[0172]
[0173] The syntax of each field in the attribute data bitstream can be found in Table 7 above, and will not be repeated here.
[0174] In this embodiment, the attribute data of the point cloud frame is indicated by the attribute identifier field and the attribute slice data type field, which can distinguish the attribute data of the point cloud frame; and the correspondence between the attribute header and the sequence header, and the attribute slice header / attribute data bit stream is established by the attribute identifier field, so that the decoding indication information of the point cloud frame can include at least two attribute headers, and supports multiple sets of point cloud data containing the same point cloud type, which enriches the decoding indication method of point cloud media, thereby supporting more flexible file encapsulation and transmission methods and more diverse point cloud application forms.
[0175] Figure 2 A flowchart of a data processing method for point cloud media provided in this application embodiment; the method can be executed by a content consumption device in a point cloud media system, and the method includes the following steps S201 and S202:
[0176] S201. Obtain the media files of the point cloud media.
[0177] The media file includes bitstream data of point cloud frames and decoding indication information of point cloud frames. The decoding indication information of each point cloud frame includes at least two attribute headers. Each attribute header includes an attribute identifier field, which is used to indicate the attribute data of the point cloud frame. The value of the attribute identifier field corresponding to each set of data instances of each type of attribute data in the point cloud frame is different (that is, each set of data instances of each type of attribute data corresponds to a unique attribute data identifier). The value of the attribute identifier field is an eight-bit unsigned integer.
[0178] In one implementation, the decoding indication information for each point cloud frame further includes one or more point cloud slice data. Each point cloud slice data includes a point cloud attribute slice, and each point cloud attribute slice includes an attribute slice header, which includes an attribute identifier field. Each point cloud attribute slice can be indexed to the corresponding attribute header through the value of the attribute identifier field in the attribute slice header.
[0179] Each attribute header corresponds to one or more point cloud attribute patches, and the value of the attribute identifier field in each attribute header matches the value of the attribute identifier field in the point cloud attribute patch corresponding to that attribute header. In other words, a content consuming device can determine one or more point cloud attribute patches corresponding to a given attribute header by using the value of the attribute identifier field in the attribute header. Specific indication methods for attribute headers and attribute patch headers can be found in Tables 8 and 9 above, and will not be repeated here.
[0180] In one implementation, the bitstream data of the point cloud frame includes an attribute data bitstream, which includes an attribute identifier field.
[0181] The attribute identifier field in the attribute header has N possible values, where N is a positive integer. This attribute header corresponds to M attribute data bitstreams, where M is an integer greater than or equal to N. The value of the attribute identifier field in the attribute header matches the value of the attribute identifier field in the attribute data bitstream. In other words, based on the current value of the attribute identifier field in the attribute header, the content consuming device can determine the attribute data bitstream in the bitstream data of the point cloud frame whose value matches at least one attribute identifier field value in the attribute header. Specific indication methods for the attribute header and attribute bitstreams can be found in Tables 10 and 11 above, and will not be repeated here.
[0182] In one implementation, the bitstream data of each point cloud frame includes one or more types of attribute data; each type of attribute data includes one or more sets of data instances; the value of the attribute identifier field corresponding to each set of data instances of each type of attribute data is different; for example, if the bitstream data of the point cloud frame contains two different sets of color data, then the value of the attribute identifier field corresponding to each set of color data instances is different.
[0183] In one implementation, each attribute header also includes an attribute slice data type field, which indicates the type of attribute data indicated by the attribute identifier field. The value of the attribute slice data type field is a four-digit unsigned integer.
[0184] Specifically, when the attribute slice data type field is set to a first value (e.g., attributeSliceDataType = 0), it indicates that the attribute data indicated by the attribute identifier field is color type attribute data; when the attribute slice data type field is set to a second value (e.g., attributeSliceDataType = 1), it indicates that the attribute data indicated by the attribute identifier field is reflectance type attribute data; and when the attribute slice data type field is set to a third value (e.g., attributeSliceDataType = 2), it indicates that the attribute data indicated by the attribute identifier field includes both color type attribute data and reflectance type attribute data. Furthermore, when the attribute slice data type field is set to a third value (e.g., attributeSliceDataType = 2), it indicates that switching between different types of attribute data is allowed during attribute prediction; for example, switching between color type attribute data and reflectance type attribute data.
[0185] In one implementation, the bitstream data may include one or more sets of attribute data data instances; the media file also includes a sequence header, which indicates the number of sets of attribute data data instances contained in the bitstream data, the attribute identifier field corresponding to each set of data instances, and the attribute slice data type field corresponding to each set of data instances.
[0186] In one implementation, the content consuming device acquires all component tracks of the point cloud media. The attribute identifier fields corresponding to the attribute data encapsulated in different attribute component tracks have different values. Based on the decoding indication information of the point cloud frame, the application format of the point cloud frame, or its own decoding capabilities, the content consuming device determines the attribute component tracks required for decoding. After obtaining the required attribute component tracks for decoding, the content consuming device decapsulates these attribute component tracks to obtain the required bitstream data.
[0187] In another implementation, the content consuming device acquires the transmission signaling file of the point cloud media, and determines the media file required for presenting the point cloud media based on the description information in the transmission signaling file, the application format of the point cloud frames, its own decoding capabilities, or current network conditions (such as network transmission speed). The device then retrieves the determined media file of the point cloud media via streaming. The media file contains attribute component tracks required for decoding. The content consuming device decapsulates these attribute component tracks to obtain the required bitstream data.
[0188] S202. Based on the bitstream data and decoding instruction information, present the point cloud media.
[0189] The content consumption device decodes the bitstream data of the point cloud frame according to the decoding instruction information to present the point cloud media. The decoding process on the content consumption device side is the reverse of the encoding process on the content production device side. The content consumption device decodes the encoded bitstream according to the decoding instruction information to reconstruct the point cloud data. Then, based on the metadata related to rendering and windowing in the media presentation description information, it renders the obtained point cloud data to obtain the point cloud frames of the point cloud media, and presents the point cloud media according to the presentation time of the point cloud frames. For a detailed implementation method of the content consumption device decoding media files to present point cloud media, please refer to [link to relevant documentation]. Figure 1d The implementation methods for decoding and presenting Zhongdian Cloud Media will not be elaborated here.
[0190] In this embodiment, a media file of point cloud media is obtained. The media file includes bitstream data of point cloud frames and decoding indication information of point cloud frames. The decoding indication information of each point cloud frame includes at least two attribute headers, and each attribute header includes an attribute identifier field. The point cloud media is presented based on the bitstream data and decoding indication information. It can be seen that the attribute identifier field can distinguish the attribute data of the point cloud frames and indicate it through at least two attribute headers. In addition, the attribute slice data type field is used to distinguish the type of attribute data, enriching the decoding indication method of point cloud media. Content consumption devices can flexibly select the required point cloud media files for transmission and decoding consumption according to their needs.
[0191] Figure 3 A flowchart illustrating another data processing method for point cloud media provided in this application embodiment; the method can be executed by a content creation device in a point cloud media system, and includes the following steps S301-S303:
[0192] S301. Obtain the point cloud frame of the point cloud media and encode the point cloud frame to obtain the bit stream data of the point cloud frame.
[0193] The bitstream data may include one or more sets of attribute data instances. For a detailed implementation of step S301, please refer to... Figure 1d The methods for obtaining media content from point cloud media and encoding the media content from point cloud media will not be described in detail here.
[0194] S302. Generate decoding indication information for point cloud frames based on the bitstream data.
[0195] The decoding indication information for each point cloud frame includes at least two attribute headers, each of which includes an attribute identifier field. This field is used to indicate the attribute data of the point cloud frame. The value of the attribute identifier field is different for each set of data instances of each type of attribute data in the point cloud frame (that is, each set of data instances of each type of attribute data corresponds to a unique attribute data identifier). The value of the attribute identifier field is an eight-bit unsigned integer.
[0196] In one implementation, the content creation device generates a sequence header based on the bitstream data of the point cloud frame. The sequence header indicates the number of data instance groups of attribute data contained in the bitstream data, the attribute identifier field corresponding to each data instance group, and the attribute slice data type field corresponding to each data instance group. After obtaining the sequence header, the content creation device generates decoding indication information for the point cloud frame based on the sequence header.
[0197] In one implementation, the decoding indication information for each point cloud frame further includes one or more point cloud slice data. Each point cloud slice data includes a point cloud attribute slice, and each point cloud attribute slice includes an attribute slice header, which includes an attribute identifier field. Each point cloud attribute slice can be indexed to the corresponding attribute header through the value of the attribute identifier field in the attribute slice header.
[0198] Each attribute header corresponds to one or more point cloud attribute patches, and the value of the attribute identifier field in each attribute header matches the value of the attribute identifier field in the point cloud attribute patch corresponding to that attribute header. In other words, a content consuming device can determine one or more point cloud attribute patches corresponding to a given attribute header by using the value of the attribute identifier field in the attribute header. Specific indication methods for attribute headers and attribute patch headers can be found in Tables 8 and 9 above, and will not be repeated here.
[0199] In one implementation, the bitstream data of the point cloud frame includes an attribute data bitstream, which includes an attribute identifier field.
[0200] The attribute identifier field in the attribute header has N possible values, where N is a positive integer. This attribute header corresponds to M attribute data bitstreams, where M is an integer greater than or equal to N. The value of the attribute identifier field in the attribute header matches the value of the attribute identifier field in the attribute data bitstream. In other words, based on the current value of the attribute identifier field in the attribute header, the content consuming device can determine the attribute data bitstream in the bitstream data of the point cloud frame whose value matches at least one attribute identifier field value in the attribute header. Specific indication methods for the attribute header and attribute bitstreams can be found in Tables 10 and 11 above, and will not be repeated here.
[0201] In one implementation, the bitstream data of each point cloud frame includes one or more types of attribute data; each type of attribute data includes one or more sets of data instances; the value of the attribute identifier field corresponding to each set of data instances of each type of attribute data is different; for example, if the bitstream data of the point cloud frame contains two different sets of color data, then the value of the attribute identifier field corresponding to each set of color data instances is different.
[0202] In one implementation, each attribute header also includes an attribute slice data type field, which indicates the type of attribute data indicated by the attribute identifier field. The value of the attribute slice data type field is a four-digit unsigned integer.
[0203] Specifically, when the attribute slice data type field is set to a first value (e.g., attributeSliceDataType = 0), it indicates that the attribute data indicated by the attribute identifier field is color type attribute data; when the attribute slice data type field is set to a second value (e.g., attributeSliceDataType = 1), it indicates that the attribute data indicated by the attribute identifier field is reflectance type attribute data; and when the attribute slice data type field is set to a third value (e.g., attributeSliceDataType = 2), it indicates that the attribute data indicated by the attribute identifier field includes both color type attribute data and reflectance type attribute data. Furthermore, when the attribute slice data type field is set to a third value (e.g., attributeSliceDataType = 2), it indicates that switching between different types of attribute data is allowed during attribute prediction; for example, switching between color type attribute data and reflectance type attribute data.
[0204] S303. Encapsulate the bitstream data and decoding instruction information to obtain the media file of the point cloud media.
[0205] In one implementation, the content creation device encapsulates the sequence header, bitstream data, and decoding instruction information to obtain a media file of point cloud media.
[0206] Optionally, after obtaining the media file of the point cloud media, the content production device slices the media file to obtain multiple media segments; and generates a transmission signaling file for the media file. The transmission signaling file is used to describe the point cloud data encapsulated in the track, so that the content consumption device can flexibly select the required point cloud media file for transmission and decoding consumption according to the description of the transmission signaling file and its own needs.
[0207] The following is a complete example illustrating the data processing method for point cloud media provided in this application:
[0208] Based on the geometric and attribute data contained in the bitstream data of the point cloud frames, the content creation device defines the following high-level syntax information:
[0209] a) The sequence header indicates the number of data instance groups of attribute data contained in the bitstream data of the point cloud frame, the attribute identifier field corresponding to each data instance group, and the attribute slice data type field corresponding to each data instance group; including:
[0210] maxNumAttributesMinus1 = 1;
[0211] {attributeIdentifier=0; attributeSliceDataType=0}
[0212] {attributeIdentifier=1; attributeSliceDataType=0}
[0213] Where maxNumAttributesMinus1=1 indicates that the maximum number of attribute codes supported by the current bitstream data is 2 (i.e., the current bitstream data contains two sets of data instances); {attributeIdentifier=0;attributeSliceDataType=0} indicates that the attribute identifier corresponding to one set of data instances in the current bitstream data is 0, and the type of this set of data instances is color; {attributeIdentifier=1;attributeSliceDataType=0} indicates that the attribute identifier corresponding to another set of data instances in the current bitstream data is 1, and the type of this set of data instances is color.
[0214] b) The content production device defines the geometric header (parameter set) and at least two attribute headers (parameter sets) required for decoding each point cloud frame based on the information in the sequence header; wherein, the bitstream data of each point cloud frame contains two attribute headers, and the values of the attribute identifier field and the attribute slice data type field in the attribute header correspond one-to-one with the information in the sequence header.
[0215] c) The content production device defines the corresponding point cloud geometric slices and point cloud attribute slices based on the information in the sequence header. Each point cloud attribute slice is indexed to the corresponding attribute header (parameter set) through the attributeIdentifier field in the point cloud attribute slice slice header.
[0216] Next, the content production device encapsulates the bitstream data of the point cloud frame into three file tracks based on the high-level syntax information, including one geometry component track and two attribute component tracks. These two attribute component tracks correspond to two sets of data instances with attribute identifier 0 and attribute identifier 1, respectively.
[0217] In one implementation, the content production device sends all the component tracks of the point cloud media to the content consumption device. After obtaining all the component tracks of the point cloud media, the content consumption device partially decodes the required attribute component tracks based on its own decoding capabilities or the application form of the point cloud media.
[0218] In another implementation, the content production device generates MPD signaling for the point cloud media and transmits it to the content consumption device. After receiving the MPD signaling for the point cloud media, the content consumption device requests geometric component tracks and a specific attribute component track based on the transmission bandwidth, its own decoding capability, or the application form of the point cloud media, and partially transmits and decodes the required attribute component track.
[0219] In this embodiment, point cloud frames of point cloud media are acquired, and the point cloud frames are encoded to obtain bitstream data. Based on the bitstream data, decoding indication information for each point cloud frame is generated. The decoding indication information for each point cloud frame includes at least two attribute headers, and each attribute header includes an attribute identifier field. The bitstream data and decoding indication information are encapsulated to obtain a media file of the point cloud media. It is evident that the attribute identifier field allows for the differentiation of attribute data in the point cloud frames, and the at least two attribute headers provide indication. Furthermore, the attribute slice data type field further differentiates the type of attribute data, enriching the decoding indication methods for point cloud media. This enables content consumption devices to flexibly select the desired point cloud media file for transmission and decoding consumption according to their needs.
[0220] The methods of the embodiments of this application have been described in detail above. In order to facilitate better implementation of the above solutions of the embodiments of this application, the apparatus of the embodiments of this application is provided below.
[0221] Please see Figure 4 , Figure 4 This is a schematic diagram of the structure of a point cloud media data processing device provided in an embodiment of this application; the point cloud media data processing device can be a computer program (including program code) running on a content consumption device, for example, the point cloud media data processing device can be an application software in the content consumption device. Figure 4 As shown, the data processing device for point cloud media includes an acquisition unit 401 and a processing unit 402.
[0222] Please see Figure 4 In one exemplary embodiment, the various units are described in detail below:
[0223] The acquisition unit 401 is used to acquire the media file of the point cloud media. The media file includes the bitstream data of the point cloud frame and the decoding indication information of the point cloud frame. The decoding indication information of each point cloud frame includes at least two attribute headers, and each attribute header includes an attribute identifier field.
[0224] The processing unit 402 is used to present point cloud media based on the bitstream data and decoding instruction information.
[0225] In one implementation, the decoding indication information for each point cloud frame further includes one or more point cloud slice data, each point cloud slice data including a point cloud attribute slice, the point cloud attribute slice including an attribute slice header, and the attribute slice header including an attribute identifier field.
[0226] In one implementation, each attribute header corresponds to one or more point cloud attribute patches, and the value of the attribute identifier field in each attribute header matches the value of the attribute identifier field in the point cloud attribute patch corresponding to that attribute header.
[0227] In one implementation, the bitstream data of the point cloud frame includes an attribute data bitstream, which includes an attribute identifier field.
[0228] In one implementation, the value of the attribute identifier field in the attribute header matches the value of the attribute identifier field in the attribute data bitstream.
[0229] In one implementation, the bitstream data of each point cloud frame includes one or more types of attribute data; each type of attribute data includes one or more sets of data instances; and the attribute identifier field corresponding to different data instances has different values.
[0230] In one implementation, each attribute header also includes an attribute slice data type field, which indicates the type of attribute data indicated by the attribute identifier field.
[0231] In one implementation, when the value of the attribute slice data type field is a first set value, it is used to indicate that the attribute data indicated by the attribute identifier field is color type attribute data.
[0232] When the attribute slice data type field takes the second set value, it is used to indicate that the attribute data indicated by the attribute identifier field is reflectance type attribute data;
[0233] When the attribute slice data type field takes the third set value, it indicates that the attribute data indicated by the attribute identifier field includes color type attribute data and reflectivity type attribute data.
[0234] In one implementation, when the attribute slice data type field takes the value of a third set value, switching between different types of attribute data is allowed during attribute prediction.
[0235] In one implementation, the media file further includes a sequence header, which indicates the number of data instance groups of attribute data contained in the bitstream data, an attribute identifier field corresponding to each data instance group, and an attribute slice data type field corresponding to each data instance group.
[0236] In one embodiment, the acquisition unit 401 is further configured to:
[0237] Obtain the transmission signaling file of the point cloud media, which includes the description information of the point cloud media;
[0238] Based on the description information of the point cloud media, determine the media files required to present the point cloud media;
[0239] The media files of the determined point cloud media are retrieved using streaming transmission.
[0240] According to one embodiment of this application, Figure 2 The data processing method for point cloud media shown can be implemented by [the relevant authority / organization]. Figure 4 The data processing is performed by individual units within the point cloud media data processing device shown. For example, Figure 2 Step S201 shown can be performed by Figure 4 The acquisition unit 401 shown is executed, and step S202 can be performed by... Figure 4 The processing unit 402 shown executes. Figure 4The data processing device for point cloud media shown can be composed of individual or combined units into one or more other units, or some of the units can be further divided into multiple functionally smaller units. This achieves the same operation without affecting the technical effects of the embodiments of this application. The above-mentioned units are based on logical function division. In practical applications, the function of one unit can be implemented by multiple units, or the function of multiple units can be implemented by one unit. In other embodiments of this application, the data processing device for point cloud media may also include other units. In practical applications, these functions can also be implemented with the assistance of other units, and can be implemented collaboratively by multiple units.
[0241] According to another embodiment of this application, the following can be executed by running on a general-purpose computing device, such as a computer, which includes processing elements and storage elements such as a central processing unit (CPU), random access memory (RAM), and read-only memory (ROM). Figure 2 The computer program (including program code) for each step involved in the corresponding method shown, to construct such... Figure 4 The data processing apparatus for point cloud media shown herein, and the data processing method for point cloud media for implementing embodiments of this application, are described. A computer program may be recorded on, for example, a computer-readable recording medium, loaded onto the aforementioned computing device via the computer-readable recording medium, and executed therein.
[0242] Based on the same inventive concept, the data processing device for point cloud media provided in the embodiments of this application solves the problem in a similar principle and with similar beneficial effects as the data processing method for point cloud media in the embodiments of this application. For details, please refer to the implementation principle and beneficial effects of the method. For the sake of brevity, these will not be repeated here.
[0243] Please see Figure 5 , Figure 5 This is a schematic diagram of another point cloud media data processing device provided in an embodiment of this application; the point cloud media data processing device can be a computer program (including program code) running in a content production device, for example, the point cloud media data processing device can be application software in the content production device. Figure 5 As shown, the data processing device for point cloud media includes an acquisition unit 501 and a processing unit 502. Please refer to... Figure 5 The detailed descriptions of each unit are as follows:
[0244] The acquisition unit 501 is used to acquire point cloud frames of point cloud media and encode the point cloud frames to obtain the bit stream data of the point cloud frames.
[0245] The processing unit 502 is used to generate decoding indication information for point cloud frames based on the bitstream data. The decoding indication information for each point cloud frame includes at least two attribute headers, and each attribute header includes an attribute identifier field.
[0246] And media files used to encapsulate bitstream data and decoding indication information to obtain point cloud media.
[0247] In one embodiment, the processing unit 502 is configured to generate decoding indication information for point cloud frames based on the bitstream data, specifically for:
[0248] A sequence header is generated based on the bitstream data. The sequence header is used to indicate the number of data instance groups of attribute data contained in the bitstream data, the attribute identifier field corresponding to each data instance group, and the attribute slice data type field corresponding to each data instance group.
[0249] Based on the sequence header, generate decoding indication information for the point cloud frame;
[0250] The decoding indication information for each point cloud frame also includes one or more point cloud slice data, each point cloud slice data includes a point cloud attribute slice, and the point cloud attribute slice includes an attribute identifier field.
[0251] In one embodiment, the processing unit 502 is used to encapsulate the bitstream data and decoding indication information to obtain a media file of point cloud media, specifically for:
[0252] The sequence header, bitstream data, and decoding indication information are encapsulated to obtain the media file of the point cloud media.
[0253] In one embodiment, the processing unit 502 is further configured to:
[0254] Slicing a media file to obtain multiple media segments; and,
[0255] Generate the transmission signaling file for media files.
[0256] According to one embodiment of this application, Figure 3 The data processing method for point cloud media shown can be implemented by [the relevant authority / organization]. Figure 5 The data processing is performed by individual units within the point cloud media data processing device shown. For example, Figure 3 Step S301 shown can be performed by Figure 5 The acquisition unit 501 shown is executed, and steps S302 and S303 can be performed by... Figure 5 The processing unit 502 shown is executed. Figure 5The data processing device for point cloud media shown can be composed of individual or combined units into one or more other units, or some of the units can be further divided into multiple functionally smaller units. This achieves the same operation without affecting the technical effects of the embodiments of this application. The above-mentioned units are based on logical function division. In practical applications, the function of one unit can be implemented by multiple units, or the function of multiple units can be implemented by one unit. In other embodiments of this application, the data processing device for point cloud media may also include other units. In practical applications, these functions can also be implemented with the assistance of other units, and can be implemented collaboratively by multiple units.
[0257] According to another embodiment of this application, the following can be executed by running on a general-purpose computing device, such as a computer, which includes processing elements and storage elements such as a central processing unit (CPU), random access memory (RAM), and read-only memory (ROM). Figure 3 The computer program (including program code) for each step involved in the corresponding method shown, to construct such... Figure 5 The data processing apparatus for point cloud media shown herein, and the data processing method for point cloud media for implementing embodiments of this application, are described. A computer program may be recorded on, for example, a computer-readable recording medium, loaded onto the aforementioned computing device via the computer-readable recording medium, and executed therein.
[0258] Based on the same inventive concept, the data processing device for point cloud media provided in the embodiments of this application solves the problem in a similar principle and with similar beneficial effects as the data processing method for point cloud media in the embodiments of this application. For details, please refer to the implementation principle and beneficial effects of the method. For the sake of brevity, these will not be repeated here.
[0259] Figure 6 This is a schematic diagram of the structure of a content consumption device provided in an embodiment of this application; the content consumption device can be a computer device used by a user of Point Cloud Media, and the computer device can be a terminal (such as a PC, a smart mobile device (such as a smartphone), a VR device (such as a VR headset, VR glasses, etc.)). Figure 6 As shown, the content consumption device includes a receiver 601, a processor 602, a memory 603, and a display / playback device 604. Wherein:
[0260] Receiver 601 is used to enable decoding and transmission interaction with other devices, specifically for the transmission of point cloud media between the content production device and the content consumption device. That is, the content consumption device receives the relevant media resources of the point cloud media transmitted by the content production device through receiver 601.
[0261] Processor 602 (or CPU (Central Processing Unit)) is the processing core of the content production device. Processor 602 is adapted to implement one or more program instructions, specifically to load and execute one or more program instructions to achieve... Figure 2 The flowchart illustrates the data processing method for point cloud media.
[0262] Memory 603 is a memory device in the content consumption device used to store programs and media resources. It is understood that memory 603 here can include the built-in storage medium of the content consumption device, or it can include extended storage media supported by the content consumption device. It should be noted that memory 603 can be high-speed RAM, or non-volatile memory, such as at least one disk storage device; optionally, it can also be at least one memory located remotely from the aforementioned processor. Memory 603 provides storage space for storing the operating system of the content consumption device. Furthermore, this storage space is also used to store computer programs, which include program instructions adapted to be called and executed by the processor to perform the various steps of the point cloud media data processing method. In addition, memory 603 can also be used to store a three-dimensional image of the point cloud media formed after processor processing, the audio content corresponding to the three-dimensional image, and information required for rendering the three-dimensional image and audio content.
[0263] Display / playback device 604 is used to output rendered sound and 3D images.
[0264] Please see again Figure 6 The processor 602 may include a parser 621, a decoder 622, a converter 623, and a renderer 624; wherein:
[0265] The parser 621 is used to depackage and encapsulate the encapsulated files of the rendering media from the content production device. Specifically, it depackages the media file resources according to the file format requirements of point cloud media to obtain audio and video streams, and provides the audio and video streams to the decoder 622.
[0266] Decoder 622 decodes the audio stream to obtain audio content, which is then provided to the renderer for audio rendering. Additionally, decoder 622 decodes the video stream to obtain a 2D image. Based on the metadata provided by the media presentation description information, if the metadata indicates that the point cloud media has undergone a region encapsulation process, the 2D image refers to an encapsulated image; if the metadata indicates that the point cloud media has not undergone a region encapsulation process, the planar image refers to a projected image.
[0267] Converter 623 is used to convert 2D images into 3D images. If the point cloud media has undergone a region encapsulation process, converter 623 will first decapsulate the encapsulated image to obtain a projected image. Then, the projected image will be reconstructed to obtain a 3D image. If the rendering media has not undergone a region encapsulation process, converter 623 will directly reconstruct the projected image to obtain a 3D image.
[0268] Renderer 624 is used to render the audio content and 3D images of point cloud media. Specifically, it renders the audio content and 3D images based on the metadata related to rendering and viewport in the media presentation description information, and then outputs the rendered content to the display / playback device.
[0269] In one exemplary embodiment, the processor 602 (specifically, the devices included in the processor) executes instructions by calling one or more instructions stored in memory. Figure 2 The steps of the data processing method for point cloud media are shown. Specifically, the memory stores one or more first instructions, which are adapted to be loaded by the processor 602 and executed in the following steps:
[0270] Obtain the media file of the point cloud media. The media file includes the bitstream data of the point cloud frame and the decoding indication information of the point cloud frame. The decoding indication information of each point cloud frame includes at least two attribute headers, and each attribute header includes an attribute identifier field.
[0271] Based on the bitstream data and decoding instructions, point cloud media is presented.
[0272] In one embodiment, the decoding indication information for each point cloud frame further includes one or more point cloud slice data, each point cloud slice data includes a point cloud attribute slice, the point cloud attribute slice includes an attribute slice header, and the attribute slice header includes an attribute identifier field.
[0273] In one implementation, each attribute header corresponds to one or more point cloud attribute patches, and the value of the attribute identifier field in each attribute header matches the value of the attribute identifier field in the point cloud attribute patch corresponding to that attribute header.
[0274] In one implementation, the bitstream data of the point cloud frame includes an attribute data bitstream, and the attribute data bitstream includes an attribute identifier field.
[0275] In one implementation, the value of the attribute identifier field in the attribute header matches the value of the attribute identifier field in the attribute data bitstream.
[0276] In one implementation, the bitstream data of each point cloud frame includes one or more types of attribute data; each type of attribute data includes one or more sets of data instances; and the attribute identifier field corresponding to different data instances has different values.
[0277] In one implementation, each attribute header also includes an attribute slice data type field, which indicates the type of attribute data indicated by the attribute identifier field.
[0278] In one implementation, when the value of the attribute slice data type field is a first set value, it is used to indicate that the attribute data indicated by the attribute identifier field is color type attribute data.
[0279] When the attribute slice data type field takes the second set value, it is used to indicate that the attribute data indicated by the attribute identifier field is reflectance type attribute data;
[0280] When the attribute slice data type field takes the third set value, it indicates that the attribute data indicated by the attribute identifier field includes color type attribute data and reflectivity type attribute data.
[0281] In one implementation, when the attribute slice data type field takes the value of a third preset value, switching between different types of attribute data is allowed during attribute prediction.
[0282] In one embodiment, the media file further includes a sequence header, which indicates the number of data instance groups of attribute data contained in the bitstream data, the attribute identifier field corresponding to each data instance group, and the attribute slice data type field corresponding to each data instance group.
[0283] In one embodiment, the computer program in memory 603 is loaded by processor 602 and further performs the following steps:
[0284] Obtain the transmission signaling file of the point cloud media, which includes the description information of the point cloud media;
[0285] Based on the description information of the point cloud media, determine the media files required to present the point cloud media;
[0286] The media files of the determined point cloud media are retrieved using streaming transmission.
[0287] Based on the same inventive concept, the principle and beneficial effects of the content consumption device provided in the embodiments of this application are similar to the principle and beneficial effects of the data processing method for point cloud media in the embodiments of this application. Please refer to the principle and beneficial effects of the method implementation. For the sake of brevity, they will not be repeated here.
[0288] Figure 7This is a schematic diagram of a content creation device provided in an embodiment of this application; the content creation device may be a computer device used by a provider of point cloud media, which may be a terminal (such as a PC, a smart mobile device (such as a smartphone) or a server. Figure 7 As shown, the content creation device includes a capture device 701, a processor 702, a memory 703, and a transmitter 704. Wherein:
[0289] The capture device 701 is used to acquire raw data (including audio and video content synchronized in time and space) of point cloud media from real-world sound-visual scenes. The capture device 701 may include, but is not limited to, audio devices, camera devices, and sensing devices. Audio devices may include audio sensors, microphones, etc. Camera devices may include ordinary cameras, stereo cameras, light field cameras, etc. Sensing devices may include laser devices, radar devices, etc.
[0290] Processor 702 (or CPU (Central Processing Unit)) is the processing core of the content production device. Processor 702 is adapted to implement one or more program instructions, specifically to load and execute one or more program instructions to achieve... Figure 3 The flowchart illustrates the data processing method for point cloud media.
[0291] Memory 703 is a memory device in the content creation apparatus used to store programs and media resources. It is understood that memory 703 here can include both the built-in storage medium of the content creation apparatus and extended storage media supported by the content creation apparatus. It should be noted that the memory can be high-speed RAM or non-volatile memory, such as at least one disk storage device; optionally, it can also be at least one memory located remotely from the aforementioned processor. The memory provides storage space for storing the operating system of the content creation apparatus. Furthermore, this storage space is also used to store computer programs, which include program instructions adapted to be called and executed by the processor to perform the various steps of the point cloud media data processing method. In addition, memory 703 can also be used to store point cloud media files formed after processing by the processor, which include media file resources and media presentation description information.
[0292] The transmitter 704 is used to enable transmission and interaction between the content creation device and other devices, specifically to facilitate the transmission of point cloud media between the content creation device and the content playback device. That is, the content creation device uses the transmitter 704 to transmit relevant media resources of the point cloud media to the content playback device.
[0293] Please see again Figure 7 The processor 702 may include a converter 721, an encoder 722, and a packager 723; wherein:
[0294] Converter 721 performs a series of conversion processes on captured video content to make it suitable for video encoding of point cloud media. The conversion processes may include stitching and projection; optionally, they may also include region encapsulation. Converter 721 can convert captured 3D video content into 2D images and provide them to the encoder for video encoding.
[0295] Encoder 722 is used to encode the captured audio content to form an audio bitstream of point cloud media. It is also used to encode the 2D image obtained by converter 721 to obtain a video bitstream.
[0296] The encapsulator 723 encapsulates audio and video streams into a file container according to the point cloud media file format (such as ISOBMFF) to form a point cloud media file resource. This media file resource can be a media file or a media segment forming a point cloud media file. It also records the metadata of the point cloud media file resource using media presentation description information according to the point cloud media file format requirements. The encapsulated point cloud media file obtained by the encapsulator is stored in memory and provided to the content playback device as needed for point cloud media presentation.
[0297] The processor 702 (specifically, the various components within the processor) executes instructions by calling one or more instructions from memory. Figure 4 The steps of the data processing method for point cloud media are shown. Specifically, the memory 703 stores one or more first instructions, which are adapted to be loaded by the processor 702 and executed in the following steps:
[0298] Acquire point cloud frames from point cloud media and encode the point cloud frames to obtain the bitstream data of the point cloud frames;
[0299] Based on the bitstream data, decoding indication information for point cloud frames is generated. The decoding indication information for each point cloud frame includes at least two attribute headers, and each attribute header includes an attribute identifier field.
[0300] The bitstream data and decoding instruction information are encapsulated to obtain the point cloud media file.
[0301] In one embodiment, the processor 702 generates decoding indication information for point cloud frames based on the bitstream data. A specific example of this is as follows:
[0302] A sequence header is generated based on the bitstream data. The sequence header is used to indicate the number of data instance groups of attribute data contained in the bitstream data, the attribute identifier field corresponding to each data instance group, and the attribute slice data type field corresponding to each data instance group.
[0303] Based on the sequence header, generate decoding indication information for the point cloud frame;
[0304] The decoding indication information for each point cloud frame also includes one or more point cloud slice data, each point cloud slice data includes a point cloud attribute slice, and the point cloud attribute slice includes an attribute identifier field.
[0305] In one embodiment, the processor 702 encapsulates the bitstream data and decoding indication information to obtain a media file of point cloud media. A specific example is as follows:
[0306] The sequence header, bitstream data, and decoding indication information are encapsulated to obtain the media file of the point cloud media.
[0307] In one embodiment, the computer program in memory 703 is loaded by processor 702 and further performs the following steps:
[0308] Slicing a media file to obtain multiple media segments; and,
[0309] Generate the transmission signaling file for media files.
[0310] Based on the same inventive concept, the content production device provided in the embodiments of this application solves the problem in a similar principle and with similar beneficial effects as the data processing method for point cloud media in the method embodiments of this application. For details, please refer to the implementation principle and beneficial effects of the method. For the sake of brevity, these will not be repeated here.
[0311] This application also provides a computer-readable storage medium storing one or more instructions, which are adapted to be loaded by a processor and executed by the data processing method for point cloud media described in the above method embodiments.
[0312] This application also provides a computer program product containing instructions that, when run on a computer, causes the computer to execute the point cloud media data processing method described in the above method embodiments.
[0313] This application also provides a computer program product or computer program that includes computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the aforementioned point cloud media data processing method.
[0314] The steps in the method of this application embodiment can be adjusted, combined, or deleted according to actual needs.
[0315] The modules in the device of this application embodiment can be merged, divided, and deleted according to actual needs.
[0316] Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be implemented by a program instructing related hardware. The program can be stored in a computer-readable storage medium, which may include: flash drive, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk, etc.
[0317] The above-disclosed embodiments are merely preferred embodiments of this application and should not be construed as limiting the scope of this application. Those skilled in the art will understand that all or part of the processes for implementing the above embodiments and equivalent variations made in accordance with the claims of this application are still within the scope of this application.
Claims
1. A data processing method of a point cloud media, characterized in that, The method includes: Obtain the media file of point cloud media, the media file includes bitstream data of point cloud frames and decoding indication information of point cloud frames. The decoding indication information of each point cloud frame includes at least two attribute headers. Each attribute header includes an attribute identifier field and an attribute slice data type field. The attribute slice data type field is used to indicate the type of attribute data indicated by the attribute identifier field. The point cloud media is presented based on the bitstream data and the decoding instruction information; Specifically, when the attribute slice data type field is set to a first preset value, it indicates that the attribute data indicated by the attribute identifier field is color type attribute data; when the attribute slice data type field is set to a second preset value, it indicates that the attribute data indicated by the attribute identifier field is reflectance type attribute data; when the attribute slice data type field is set to a third preset value, it indicates that the attribute data indicated by the attribute identifier field includes both color type attribute data and reflectance type attribute data, and allows switching between different types of attribute data during attribute prediction.
2. The method of claim 1, wherein, The decoding indication information for each point cloud frame also includes one or more point cloud slice data, each point cloud slice data includes a point cloud attribute slice, the point cloud attribute slice includes an attribute slice header, and the attribute slice header includes an attribute identifier field.
3. The method of claim 2, wherein, Each attribute header corresponds to one or more point cloud attribute patches, and the value of the attribute identifier field in each attribute header matches the value of the attribute identifier field in the point cloud attribute patch corresponding to that attribute header.
4. The method of claim 1, wherein, The bitstream data of the point cloud frame includes an attribute data bitstream, and the attribute data bitstream includes an attribute identifier field.
5. The method of claim 4, wherein, The value of the attribute identifier field in the attribute header matches the value of the attribute identifier field in the attribute data bitstream.
6. The method of claim 1, wherein, Each point cloud frame's bitstream data includes one or more types of attribute data; each type of attribute data includes one or more sets of data instances; the attribute identifier field corresponding to different data instances has different values.
7. The method of claim 1, wherein, The media file also includes a sequence header, which indicates the number of data instance groups of attribute data contained in the bitstream data, the attribute identifier field corresponding to each data instance group, and the attribute slice data type field corresponding to each data instance group.
8. The method according to any one of claims 1 to 7, wherein The method further includes: Obtain the transmission signaling file of the point cloud media, wherein the transmission signaling file includes the description information of the point cloud media; Based on the description information of the point cloud media, determine the media files required to present the point cloud media; The media files of the determined point cloud media are retrieved using streaming transmission.
9. A data processing method of a point cloud medium, characterized by, The method includes: Obtain point cloud frames from point cloud media and encode the point cloud frames to obtain the bitstream data of the point cloud frames; Based on the bitstream data, decoding indication information for point cloud frames is generated. The decoding indication information for each point cloud frame includes at least two attribute headers. Each attribute header includes an attribute identifier field and an attribute slice data type field. The attribute slice data type field is used to indicate the type of attribute data indicated by the attribute identifier field. The bitstream data and the decoding indication information are encapsulated to obtain the media file of the point cloud media; Specifically, when the attribute slice data type field is set to a first preset value, it indicates that the attribute data indicated by the attribute identifier field is color type attribute data; when the attribute slice data type field is set to a second preset value, it indicates that the attribute data indicated by the attribute identifier field is reflectance type attribute data; when the attribute slice data type field is set to a third preset value, it indicates that the attribute data indicated by the attribute identifier field includes both color type attribute data and reflectance type attribute data, and allows switching between different types of attribute data during attribute prediction.
10. The method of claim 9, wherein, The step of generating decoding indication information for point cloud frames based on the bitstream data includes: A sequence header is generated based on the bitstream data. The sequence header is used to indicate the number of data instance groups of attribute data contained in the bitstream data, the attribute identifier field corresponding to each data instance group, and the attribute slice data type field corresponding to each data instance group. Based on the sequence header, generate decoding indication information for the point cloud frame; The decoding indication information for each point cloud frame also includes one or more point cloud slice data, each point cloud slice data includes a point cloud attribute slice, and the point cloud attribute slice includes an attribute identifier field.
11. The method of claim 10, wherein, The process of encapsulating the bitstream data and the decoding indication information to obtain the media file of the point cloud media includes: The sequence header, the bitstream data, and the decoding indication information are encapsulated to obtain the media file of the point cloud media.
12. The method of claim 9, wherein, The method further includes: The media file is sliced to obtain multiple media segments; and, Generate the transmission signaling file for the media file.
13. A data processing apparatus of a point cloud media, characterized by, The data processing device for the point cloud media includes: The acquisition unit is used to acquire the media file of the point cloud media. The media file includes the bitstream data of the point cloud frame and the decoding indication information of the point cloud frame. The decoding indication information of each point cloud frame includes at least two attribute headers. Each attribute header includes an attribute identifier field and an attribute slice data type field. The attribute slice data type field is used to indicate the type of attribute data indicated by the attribute identifier field. The processing unit is configured to present the point cloud media based on the bitstream data and the decoding instruction information; Specifically, when the attribute slice data type field is set to a first preset value, it indicates that the attribute data indicated by the attribute identifier field is color type attribute data; when the attribute slice data type field is set to a second preset value, it indicates that the attribute data indicated by the attribute identifier field is reflectance type attribute data; when the attribute slice data type field is set to a third preset value, it indicates that the attribute data indicated by the attribute identifier field includes both color type attribute data and reflectance type attribute data, and allows switching between different types of attribute data during attribute prediction.
14. A data processing device for point cloud media, characterized in that, The data processing device for the point cloud media includes: The acquisition unit is used to acquire point cloud frames of point cloud media and encode the point cloud frames to obtain the bitstream data of the point cloud frames. The processing unit is configured to generate decoding indication information for point cloud frames based on the bitstream data. The decoding indication information for each point cloud frame includes at least two attribute headers. Each attribute header includes an attribute identifier field and an attribute slice data type field. The attribute slice data type field is used to indicate the type of attribute data indicated by the attribute identifier field. And a media file for encapsulating the bitstream data and the decoding indication information to obtain the point cloud media; Specifically, when the attribute slice data type field is set to a first preset value, it indicates that the attribute data indicated by the attribute identifier field is color type attribute data; when the attribute slice data type field is set to a second preset value, it indicates that the attribute data indicated by the attribute identifier field is reflectance type attribute data; when the attribute slice data type field is set to a third preset value, it indicates that the attribute data indicated by the attribute identifier field includes both color type attribute data and reflectance type attribute data, and allows switching between different types of attribute data during attribute prediction.
15. A computer device, characterized in that, include: Storage devices and processors; A memory, wherein a computer program is stored; A processor is configured to load the computer program to implement the data processing method for point cloud media as described in any one of claims 1-8; or, to load the computer program to implement the data processing method for point cloud media as described in any one of claims 9-12.
16. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program adapted to be loaded by a processor and execute the data processing method for point cloud media as described in any one of claims 1-8; or, to load and execute the data processing method for point cloud media as described in any one of claims 9-12.
17. A computer program product, characterized in that, The computer program product includes a computer program adapted to be loaded by a processor and execute the data processing method for point cloud media as described in any one of claims 1-8; or, to load and execute the data processing method for point cloud media as described in any one of claims 9-12.