Method and apparatus for encapsulating and decapsulating point cloud media file, and storage medium
By adding the characteristic information of attribute instances to point cloud media files, the problem of not being able to determine attribute instances in point cloud media encapsulation technology is solved, and a more efficient decoding process is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2021-09-01
- Publication Date
- 2026-06-23
Smart Images

Figure CN113852829B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of video processing technology, and in particular to a method, apparatus and storage medium for encapsulating and decapsulating point cloud media files. Background Technology
[0002] A point cloud is a set of randomly distributed discrete points in space that represent the spatial structure and surface properties of a three-dimensional object or scene. Point cloud media can be categorized into 3-DOF (Degree of Freedom) media, 3DoF+ media, and 6DoF media based on the degree of freedom users have when consuming the content.
[0003] Each point in a point cloud includes geometric information and attribute information. Attribute information includes different types of attributes such as color and reflectivity. The same type of attribute information can also include different attribute instances. For example, the color attribute of a point can include different color types, which are called different attribute instances of the color attribute. In encoding techniques, such as Geometry-based Point Cloud Compression (GPCC), it is possible to include multiple attribute instances of the same attribute type in a single bitstream.
[0004] However, current point cloud media encapsulation technology cannot determine which specific attribute instance to consume when there are multiple attribute instances of the same attribute type, resulting in low decoding efficiency for point cloud media. Summary of the Invention
[0005] This application provides a method, apparatus, and storage medium for encapsulating and decapsulating point cloud media files. It can selectively consume attribute instances based on the first feature information of at least one attribute instance among M attribute instances added to the media file, thereby saving decoding resources and improving decoding efficiency.
[0006] In a first aspect, this application provides a method for encapsulating point cloud media files, applied to a file encapsulation device, the method comprising:
[0007] Acquire a target point cloud and encode the target point cloud to obtain a bitstream of the target point cloud. The target point cloud includes N types of attribute information, and at least one type of attribute information in the N types of attribute information includes M attribute instances. N is a positive integer and M is a positive integer greater than 1.
[0008] The target point cloud's bitstream is encapsulated based on the first feature information of at least one of the M attribute instances to obtain a media file of the target point cloud, wherein the media file of the target point cloud includes the first feature information of the at least one attribute instance.
[0009] Secondly, this application provides a method for decapsulating point cloud media files, applied to a file decapsulation device, the method comprising:
[0010] Receive the first information sent by the file encapsulation device;
[0011] Wherein, the first information is used to indicate the first feature information of at least one of the M attribute instances, wherein the M attribute instances are the M attribute instances included in at least one of the N types of attribute information included in the target point cloud, where N is a positive integer and M is a positive integer greater than 1.
[0012] Thirdly, this application provides a point cloud media file encapsulation device, applied to a file encapsulation device, the device comprising:
[0013] An acquisition unit is used to acquire a target point cloud and encode the target point cloud to obtain a bitstream of the target point cloud. The target point cloud includes N types of attribute information, and at least one type of attribute information in the N types of attribute information includes M attribute instances. N is a positive integer and M is a positive integer greater than 1.
[0014] The encapsulation unit is used to encapsulate the bitstream of the target point cloud according to the first feature information of at least one of the M attribute instances to obtain a media file of the target point cloud, wherein the media file of the target point cloud includes the first feature information of the at least one attribute instance.
[0015] Fourthly, this application provides a decapsulation device for point cloud media files, applied to a file decapsulation device, the device comprising:
[0016] The transceiver unit is used to receive the first information sent by the file encapsulation device;
[0017] Wherein, the first information is used to indicate the first feature information of at least one of the M attribute instances, wherein the M attribute instances are the M attribute instances included in at least one of the N types of attribute information included in the target point cloud, where N is a positive integer and M is a positive integer greater than 1.
[0018] Fifthly, this application provides a file packaging device, including: a processor and a memory, the memory for storing a computer program, and the processor for calling and running the computer program stored in the memory to perform the method of the first aspect.
[0019] In a sixth aspect, this application provides a file decompression device, comprising: a processor and a memory, the memory being used to store a computer program, and the processor being used to call and run the computer program stored in the memory to perform the method of the second aspect.
[0020] In a seventh aspect, an encoding / decoding system is provided, including the encoding device of the fifth aspect and the decoding device of the sixth aspect.
[0021] Eighthly, a chip is provided for implementing the methods of any one of the first to second aspects or their respective implementations. Specifically, the chip includes a processor for calling and running a computer program from a memory, causing a device on which the chip is mounted to perform the methods of any one of the first to second aspects or their respective implementations.
[0022] Ninthly, a computer-readable storage medium is provided for storing a computer program that causes a computer to perform the methods of any one of the first to second aspects or their respective implementations.
[0023] In a tenth aspect, a computer program product is provided, comprising computer program instructions that cause a computer to perform the methods of any one of the first to second aspects or their respective implementations.
[0024] Eleventhly, a computer program is provided that, when run on a computer, causes the computer to perform the methods of any one of the first to second aspects or their respective implementations.
[0025] In a twelfth aspect, an electrical device is provided, including a processor and a memory, the memory for storing a computer program, and the processor for calling and running the computer program stored in the memory to perform the method described in either the first aspect or the second aspect.
[0026] In summary, in this application, the file encapsulation device acquires a target point cloud and encodes it to obtain a bitstream of the target point cloud. This target point cloud includes N types of attribute information, and at least one of these N types includes M attribute instances, where N is a positive integer and M is a positive integer greater than 1. Based on the first feature information of at least one of the M attribute instances, the bitstream of the target point cloud is encapsulated to obtain a media file of the target point cloud. This media file includes the first feature information of at least one attribute instance. In other words, this application adds the first feature information of the attribute instances to the media file, enabling the file decapsulation device to determine the specific target attribute instance to be decoded based on the first feature information of the attribute information, thereby saving bandwidth and decoding resources and improving decoding efficiency. Attached Figure Description
[0027] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0028] Figure 1 A schematic diagram of three degrees of freedom is shown.
[0029] Figure 2 A schematic diagram illustrating three degrees of freedom plus is shown;
[0030] Figure 3 A schematic diagram of six degrees of freedom is shown.
[0031] Figure 4A An architecture diagram of an immersive media system provided in one embodiment of this application;
[0032] Figure 4B This is a schematic diagram of the content flow of V3C media provided in an embodiment of this application;
[0033] Figure 5 A flowchart illustrating a point cloud media file encapsulation method provided in this application embodiment;
[0034] Figure 6 An interactive flowchart of a point cloud media file encapsulation and decapsulation method provided in this application embodiment;
[0035] Figure 7 An interactive flowchart of a point cloud media file encapsulation and decapsulation method provided in this application embodiment;
[0036] Figure 8 A schematic diagram of the structure of a point cloud media file encapsulation device provided in an embodiment of this application;
[0037] Figure 9 This is a schematic diagram of the structure of a point cloud media file decapsulation device provided in an embodiment of this application;
[0038] Figure 10 This is a schematic block diagram of the electronic device provided in the embodiments of this application. Detailed Implementation
[0039] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0040] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or server that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or devices.
[0041] This application relates to data processing technology for point cloud media.
[0042] Before introducing the technical solution of this application, the following is a brief introduction to relevant knowledge about this application:
[0043] Point cloud: A point cloud is a set of randomly distributed discrete points in space that represent the spatial structure and surface properties of a three-dimensional object or scene. Each point in a point cloud has at least three-dimensional positional information, and depending on the application scenario, may also have color, material, or other information. Typically, each point in a point cloud has the same number of additional attributes.
[0044] V3C (Visual Volumetric Video-Based Coding Media) refers to immersive media that captures visual content in three-dimensional space and provides a 3DoF+ or 6DoF viewing experience. It uses traditional video encoding and includes volumetric video type tracks in the file encapsulation, including multi-view videos and video-coded point clouds.
[0045] PCC: Point Cloud Compression.
[0046] G-PCC: Geometry-based Point Cloud Compression.
[0047] V-PCC: Video-based Point Cloud Compression, a point cloud compression based on traditional video coding.
[0048] Atlas: Indicates area information on 2D planar frames, area information in 3D rendering space, and the mapping relationship between the two and the necessary parameter information required for mapping.
[0049] Track: A media file is a collection of media data during the media file encapsulation process. A media file can consist of multiple tracks, such as a video track, an audio track, and a subtitle track.
[0050] Sample: A sample is a unit of encapsulation in the media file encapsulation process. A media track consists of many samples. For example, a sample in a video track is usually a video frame.
[0051] DoF: Degree of Freedom. In a mechanical system, it refers to the number of independent coordinates, including translational degrees of freedom, rotational degrees of freedom, and vibrational degrees of freedom. In the embodiments of this application, it refers to the degrees of freedom of movement and content interaction supported by the user when watching immersive media.
[0052] 3DoF: refers to three degrees of freedom, which means the user's head rotates around the XYZ axes. Figure 1 A schematic diagram of three degrees of freedom is shown. (For example...) Figure 1 As shown, at a certain location or point, one can rotate on all three axes, turning their head, tilting it up and down, or swaying it. Through this three-degrees-of-freedom experience, users can be fully immersed in a scene from 360 degrees. If it's static, it can be understood as a panoramic image. If the panoramic image is dynamic, it's a panoramic video, or VR video. However, VR videos have certain limitations; users cannot move or choose any location to view the content.
[0053] 3DoF+: In addition to the three degrees of freedom, users also have a limited number of degrees of freedom to move along the XYZ axes. It can also be called restricted six degrees of freedom, and the corresponding media stream can be called restricted six degrees of freedom media stream. Figure 2 A schematic diagram of a three-degree-of-freedom+ system is shown.
[0054] 6DoF: In addition to the three degrees of freedom, users also have the freedom to move freely along the XYZ axes. The corresponding media stream can be called a six-degree-of-freedom media stream. Figure 3The diagram illustrates a six-degrees-of-freedom (6DoF) scenario. 6DoF media refers to six-DOF video, meaning video that allows users to freely move their viewpoint along the XYZ axes in three-dimensional space and freely rotate it around the XYX axes, providing a high degree of freedom for viewing. 6DoF media is a combination of video footage from different spatial perspectives captured by a camera array. To facilitate the expression, storage, compression, and processing of 6DoF media, the data is represented as a combination of the following information: texture maps captured by multiple cameras, depth maps corresponding to the texture maps from the multiple cameras, and corresponding 6DoF media content description metadata. The metadata includes parameters of the multiple cameras, as well as descriptive information such as the 6DoF media's stitching layout and edge protection. At the encoding end, the texture map information and corresponding depth map information from the multiple cameras are stitched together, and the description data of the stitching method is written into the metadata according to the defined syntax and semantics. The stitched multi-camera depth map and texture map information are encoded using planar video compression and transmitted to the terminal for decoding. The resulting 6DoF virtual viewpoint is then synthesized to provide the user with the 6DoF media viewing experience.
[0055] AVS: Audio Video Coding Standard.
[0056] ISOBMFF: ISO Based Media File Format, a media file format based on the ISO (International Standards Organization) standard. ISOBMFF is a media file encapsulation standard, and the most typical ISOBMFF file is the MP4 (Moving Picture Experts Group 4) file.
[0057] DASH: Dynamic Adaptive Streaming over HTTP. Dynamic adaptive streaming over HTTP is an adaptive bitrate streaming technology that enables high-quality streaming media to be delivered over the Internet through traditional HTTP web servers.
[0058] MPD: Media Presentation Description, a media presentation description signaling in DASH used to describe information about media segments.
[0059] HEVC: High Efficiency Video Coding, an international video coding standard HEVC / H.265.
[0060] VVC: Versatile Video Coding, the international video coding standard VVC / H.266.
[0061] Intra(picture)Prediction: Intra-frame prediction.
[0062] Inter(picture)Prediction: Inter-frame prediction.
[0063] SCC: Screen Content Coding.
[0064] QP: Quantization Parameter.
[0065] Immersive media refers to media content that provides consumers with an immersive experience. Based on the degree of freedom users have when consuming media content, immersive media can be divided into 3DoF media, 3DoF+ media, and 6DoF media. Among them, common 6DoF media includes point cloud media.
[0066] A point cloud is a set of randomly distributed discrete points in space that represent the spatial structure and surface properties of a three-dimensional object or scene. Each point in a point cloud has at least three-dimensional positional information and, depending on the application, may also have color, material, or other information. Typically, each point in a point cloud has the same number of additional attributes.
[0067] Point clouds can flexibly and conveniently represent the spatial structure and surface properties of three-dimensional objects or scenes, and therefore have a wide range of applications, including Virtual Reality (VR) games, Computer Aided Design (CAD), Geographic Information System (GIS), Autonomous Navigation System (ANS), digital cultural heritage, free-viewpoint broadcasting, 3D immersive remote presentation, and 3D reconstruction of biological tissues and organs.
[0068] Point clouds are primarily acquired through the following methods: computer generation, 3D laser scanning, and 3D photogrammetry. Computers can generate point clouds of virtual 3D objects and scenes. 3D scanning can obtain point clouds of static real-world 3D objects or scenes, acquiring millions of point clouds per second. 3D photography can obtain point clouds of dynamic real-world 3D objects or scenes, acquiring tens of millions of point clouds per second. Furthermore, in the medical field, point clouds of biological tissues and organs can be obtained from MRI, CT, and electromagnetic positioning information. These technologies have reduced the cost and time required for point cloud data acquisition and improved data accuracy. This transformation in point cloud data acquisition methods has made the acquisition of massive amounts of point cloud data possible. With the continuous accumulation of large-scale point cloud data, efficient storage, transmission, publication, sharing, and standardization of point cloud data have become crucial for point cloud applications.
[0069] After encoding point cloud media, the encoded data stream needs to be encapsulated and transmitted to the user. Correspondingly, on the point cloud media player side, the point cloud file needs to be decapsulated first, then decoded, and finally the decoded data stream is presented. Therefore, obtaining specific information during the decapsulation stage can improve the efficiency of the decoding stage to a certain extent, thereby bringing a better experience to the presentation of point cloud media.
[0070] Figure 4A This is an architectural diagram of an immersive media system provided in one embodiment of this application. Figure 4A As shown, an immersive media system includes encoding and decoding devices. Encoding devices can refer to the computer equipment used by the immersive media provider, which can be a terminal (such as a PC, a smart mobile device, or a smartphone) or a server. Decoding devices can refer to the computer equipment used by the immersive media user, which can be a terminal (such as a PC, a smart mobile device, or a VR device, such as a VR headset or VR glasses). The data processing of immersive media includes data processing on the encoding device side and data processing on the decoding device side.
[0071] The data processing at the encoding device mainly includes:
[0072] (1) The process of acquiring and producing immersive media content;
[0073] (2) The encoding and file encapsulation process of immersive media. The data processing at the decoding device mainly includes:
[0074] (3) The process of decapsulating and decoding immersive media files;
[0075] (4) The rendering process of immersive media.
[0076] In addition, the transmission of immersive media between the encoding and decoding devices can be based on various transmission protocols, including but not limited to: DASH (Dynamic Adaptive Streaming over HTTP), HLS (HTTP Live Streaming), SMTP (Smart Media Transport Protocol), TCP (Transmission Control Protocol), etc.
[0077] The following will combine Figure 4A Each step in the data processing of immersive media will be described in detail.
[0078] I. Data processing at the encoding device end:
[0079] (1) The process of acquiring and producing media content for immersive media.
[0080] 1) The process of acquiring media content in immersive media.
[0081] Immersive media content is obtained by capturing sound and visual scenes from the real world using capture devices.
[0082] In one implementation, the capture device can refer to a hardware component located within the encoding device, such as a microphone, camera, or sensor on the terminal. In another implementation, the capture device can also be a hardware device connected to the encoding device, such as a camera connected to a server.
[0083] The capture device may include, but is not limited to, audio devices, camera devices, and sensing devices. Audio devices may include audio sensors, microphones, etc. Camera devices may include ordinary cameras, stereo cameras, light field cameras, etc. Sensing devices may include laser devices, radar devices, etc.
[0084] Multiple capture devices can be deployed at specific locations in the real space to simultaneously capture audio and video content from different angles within that space, ensuring that the captured audio and video content is synchronized in both time and space. The media content acquired through these capture devices is called the raw data of the immersive media.
[0085] 2) The production process of immersive media content.
[0086] The captured audio content itself is suitable for audio encoding for immersive media. The captured video content undergoes a series of processing steps before it becomes suitable for video encoding for immersive media. These processing steps include:
[0087] ① Stitching. Since the captured video content is taken by the capture device from different angles, stitching refers to stitching these video contents taken from various angles into a complete video that can reflect a 360-degree visual panorama of real space. That is, the stitched video is a panoramic video (or spherical video) represented in three-dimensional space.
[0088] ② Projection. Projection refers to the process of mapping a stitched 3D video onto a 2D image. The 2D image formed by projection is called a projected image. Projection methods may include, but are not limited to: latitude and longitude projection and regular hexahedral projection.
[0089] ③ Region Encapsulation. Projected images can be directly encoded, or they can be encapsulated into regions before encoding. In practice, it has been found that encapsulating 2D projected images into regions before encoding significantly improves the video encoding efficiency of immersive media. Therefore, region encapsulation technology is widely used in the video processing of immersive media. Region encapsulation refers to the process of converting a projected image into a region-based image. Specifically, the process includes: dividing the projected image into multiple mapping regions, then converting each mapping region to obtain multiple encapsulated regions, and finally mapping these encapsulated regions into a single 2D image to obtain the encapsulated image. The mapping region refers to the region defined in the projected image before region encapsulation; the encapsulated region refers to the region located in the encapsulated image after region encapsulation.
[0090] Transformation processing can include, but is not limited to: mirroring, rotation, rearrangement, upsampling, downsampling, changing the resolution of the region, and moving the region.
[0091] It's important to note that since capture devices can only capture panoramic video, after processing by encoding devices and transmission to decoding devices for further data processing, users on the decoding side can only view 360-degree video information by performing specific actions (such as head rotation). Performing non-specific actions (such as head movement) does not produce corresponding video changes, resulting in a poor VR experience. Therefore, it's necessary to provide additional depth information that matches the panoramic video to provide users with better immersion and a better VR experience. This involves 6DoF (Six Degrees of Freedom) production technology. 6DoF refers to the ability for users to move relatively freely within a simulated scene. When using 6DoF technology to create immersive media video content, capture devices typically include light field cameras, laser devices, and radar devices to capture point cloud data or light field data in space. Furthermore, specific processing is required during the production process described in steps ①-③, such as cutting and mapping the point cloud data, and calculating depth information.
[0092] (2) The process of encoding and encapsulating immersive media.
[0093] The captured audio content can be directly encoded to form an audio stream for immersive media. After the above production process ①-② or ①-③, the projected image or packaged image is video encoded to obtain the video stream for immersive media. For example, the packaged image (D) is encoded into an encoded image (Ei) or an encoded video bitstream (Ev). The captured audio (Ba) is encoded into an audio bitstream (Ea). Then, according to the specific media container file format, the encoded image, video, and / or audio are combined into a media file (F) for file playback or a sequence of initialization segments and media segments (Fs) for streaming. The encoding device also includes metadata, such as projection and region information, into the file or segment to help present the decoded packaged image.
[0094] It should be noted that if 6DoF production technology is used, a specific encoding method (such as point cloud encoding) is required during video encoding. The audio and video streams are encapsulated in a file container according to the immersive media file format (such as ISOBMFF (ISO Base Media File Format)) to form an immersive media file resource. This media file resource can be a media file or media segments forming the immersive media file. Furthermore, according to the immersive media file format requirements, Media Presentation Description (MPD) is used to record the metadata of this immersive media file resource. Here, metadata refers to all information related to the presentation of immersive media, which may include descriptive information about the media content, descriptive information about the viewport, and signaling information related to the presentation of the media content, etc. Figure 4A As shown, the encoding device stores media presentation description information and media file resources formed after the data processing process.
[0095] Immersive media systems support data boxes, which are data blocks or objects containing metadata; that is, data boxes contain the metadata of the corresponding media content. Immersive media can include multiple data boxes, such as the Sphere Region Zooming Box, which contains metadata describing sphere region zooming information; the 2D Region Zooming Box, which contains metadata describing 2D region zooming information; the Region Wise Packing Box, which contains metadata describing the corresponding information during the region wrapping process, and so on.
[0096] II. Data processing at the decoding device end:
[0097] (3) The process of decapsulating and decoding immersive media files;
[0098] The decoding device can dynamically obtain immersive media file resources and corresponding media presentation description information from the encoding device, either through recommendations from the encoding device or adaptively based on user needs at the decoding device end. For example, the decoding device can determine the user's orientation and position based on head / eye / body tracking information, and then dynamically request the corresponding media file resources from the encoding device based on the determined orientation and position. The media file resources and media presentation description information are transmitted from the encoding device to the decoding device via a transmission mechanism (such as DASH, SMT). The file decapsulation process at the decoding device end is the reverse of the file encapsulation process at the encoding device end. The decoding device decapsulates the media file resources according to the immersive media file format requirements to obtain audio and video streams. The decoding process at the decoding device end is the reverse of the encoding process at the encoding device end. The decoding device performs audio decoding on the audio stream to restore the audio content.
[0099] In addition, the decoding process of the video stream by the decoding device includes the following:
[0100] ① The video stream is decoded to obtain a planar image; based on the metadata provided by the media presentation description information, if the metadata indicates that the immersive media has undergone a region encapsulation process, the planar image refers to the encapsulated image; if the metadata indicates that the immersive media has not undergone a region encapsulation process, the planar image refers to the projected image.
[0101] ② If the metadata indicates that the immersive media has undergone a region encapsulation process, the decoding device will decapsulate the encapsulated image to obtain the projected image. Region decapsulation is the inverse of region encapsulation; it refers to the process of performing inverse transformation processing on the encapsulated image according to its regions, converting the encapsulated image into a projected image. Specifically, the region decapsulation process includes: performing inverse transformation processing on multiple encapsulated regions in the encapsulated image according to the metadata instructions to obtain multiple mapped regions; and mapping these mapped regions onto a 2D image to obtain the projected image. Inverse transformation processing is the process that is the opposite of transformation processing. For example, if transformation processing involves rotating 90 degrees counterclockwise, then inverse transformation processing involves rotating 90 degrees clockwise.
[0102] ③ The projected image is reconstructed based on the media presentation description information to convert it into a 3D image. Here, reconstruction refers to the process of reprojecting the two-dimensional projected image into 3D space.
[0103] (4) The rendering process of immersive media.
[0104] The decoding device renders the audio content obtained from audio decoding and the 3D image obtained from video decoding based on metadata related to rendering and viewport in the media presentation description information. Once rendering is complete, the 3D image can be played and output. Specifically, if 3DoF and 3DoF+ production techniques are used, the decoding device mainly renders the 3D image based on the current viewpoint, parallax, and depth information. If 6DoF production techniques are used, the decoding device mainly renders the 3D image within the viewport based on the current viewpoint. Here, viewpoint refers to the user's viewing position, parallax refers to the difference in line of sight between the user's two eyes or due to motion, and viewport refers to the viewing area.
[0105] Immersive media systems support data boxes, which are data blocks or objects containing metadata; that is, data boxes contain the metadata of the corresponding media content. Immersive media can include multiple data boxes, such as the Sphere Region Zooming Box, which contains metadata describing sphere region zooming information; the 2D Region Zooming Box, which contains metadata describing 2D region zooming information; and the Region Wise Packing Box, which contains metadata describing relevant information during the region wrapping process, etc.
[0106] Figure 4B This is a schematic diagram of the content flow of GPCC point cloud media provided in an embodiment of this application, as shown below. Figure 4B As shown, the immersive media system includes a file encapsulator and a file decapsulator. In some embodiments, the file encapsulator can be understood as the encoding device described above, and the file decapsulator can be understood as the decoding device described above.
[0107] A real-world visual scene (A) is captured by a set of cameras or a camera device with multiple lenses and sensors. The acquisition result is source point cloud data (B). One or more point cloud frames are encoded into G-PCC bitstreams, including encoded geometric bitstreams and attribute bitstreams (E). Then, according to a specific media container file format, one or more encoded bitstreams are combined into a media file (F) for file playback or a sequence of initialization segments and media segments for streaming (Fs). In this application, the media container file format is the ISO Basic Media File Format as specified in ISO / IEC 14496-12. The file wrapper also includes metadata in the file or fragment. A delivery mechanism is used to deliver the fragment Fs to the player.
[0108] The file (F) output by the file encapsulator is the same as the file (F') input by the file decapsulator. The file decapsulator processes the file (F') or the received segments (F's), extracts the encoded bitstream (E'), and parses the metadata. The G-PCC bitstream is then decoded into a decoded signal (D'), and point cloud data is generated from the decoded signal (D'). Where applicable, the point cloud data is rendered and displayed on the screen of a head-mounted display or any other display device, based on the current viewing position, viewing direction, or viewport determined by various types of sensors (e.g., head), and tracked, where tracking can be performed using position tracking sensors or eye-tracking sensors. In addition to being used by the player to access the appropriate portions of the decoded point cloud data, the current viewing position or viewing direction can also be used for decoding optimization. In viewport-related transmissions, the current viewing position and viewing direction are also passed to the strategy module, which determines the track to be received.
[0109] The above process applies to both real-time and on-demand use cases.
[0110] Figure 4B The parameters in the code are defined as follows:
[0111] E / E': is the encoded G-PCC bit stream;
[0112] F / F': A media file that includes the track format specification, which may contain constraints on the underlying stream contained in the track sample.
[0113] Each point in a point cloud includes geometric and attribute information. Attribute information includes different types such as color and reflectivity. The same type of attribute information can also include different attribute instances. For example, the color attribute of a point may include different color types, which are called different attribute instances of the color attribute. In encoding techniques, such as Geometry-based Point Cloud Compression (GPCC), it is supported to include multiple attribute instances of the same attribute type in a single bitstream. Multiple attribute instances of the same attribute type can be distinguished by an attribute instance ID.
[0114] However, current point cloud media encapsulation technologies, such as GPCC encoding, although they support multiple attribute instances of the same attribute type existing simultaneously in the bitstream, do not have corresponding information indicators. This makes it impossible for file decapsulation devices to determine which attribute instance to consume, resulting in low decoding efficiency of point cloud media.
[0115] To address the aforementioned technical issues, the file encapsulation device of this application adds the first characteristic information of at least one of the M attribute instances of the same type of attribute information of the target point cloud to the media file during the encapsulation process. This allows the file decapsulation device to determine the specific target attribute instance to be decoded based on the first characteristic information of the attribute information, thereby saving bandwidth and decoding resources and improving decoding efficiency.
[0116] The technical solutions of the embodiments of this application will be described in detail below through some examples. The following embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments.
[0117] Figure 5 A flowchart of a point cloud media file encapsulation method provided in this application embodiment is shown below. Figure 5 As shown, the method includes the following steps:
[0118] S501, The file encapsulation device acquires the target point cloud, encodes the target point cloud, and obtains the bitstream of the target point cloud.
[0119] In some embodiments, the file encapsulation device is also referred to as a point cloud encapsulation device or a point cloud encoding device.
[0120] In one example, the target point cloud described above is a global point cloud.
[0121] In another example, the target point cloud described above is part of the overall point cloud, such as a subset of the overall point cloud.
[0122] In some embodiments, the target point cloud is also referred to as target point cloud data, target point cloud media content, or target point cloud content, etc.
[0123] In this embodiment of the application, the file encapsulation device acquires the target point cloud in ways including but not limited to the following:
[0124] Method 1: The file packaging device obtains the target point cloud from the point cloud acquisition device. For example, the file packaging device obtains the point cloud acquired by the point cloud acquisition device as the target point cloud.
[0125] Method 2: The file packaging device obtains the target point cloud from the storage device. For example, after the point cloud acquisition device collects point cloud data, it stores the point cloud data in the storage device, and the file packaging device obtains the target point cloud from the storage device.
[0126] Method 3: If the target point cloud is a local point cloud, the file packaging device obtains the overall point cloud according to Method 1 or Method 2, divides the overall point cloud into blocks, and uses one of the remaining blocks as the target point cloud.
[0127] The target point cloud in this application embodiment includes N types of attribute information, and at least one type of attribute information in the N types of attribute information includes M attribute instances, where N is a positive integer and M is a positive integer greater than 1.
[0128] For example, a target point cloud includes N types of attribute information, such as color, reflectivity, and transparency. The color attribute includes M different attribute instances, for example, color attributes include blue attribute instances, red attribute instances, etc.
[0129] The target point cloud obtained above is encoded to obtain a bitstream of the target point cloud. In some embodiments, the encoding of the target point cloud includes encoding the geometric information and attribute information of the point cloud separately to obtain a geometric bitstream and an attribute bitstream of the point cloud. In some embodiments, the geometric information and attribute information of the target point cloud are encoded simultaneously, and the obtained point cloud bitstream includes both geometric information and attribute information.
[0130] The embodiments of this application mainly involve encoding the attribute information of the target point cloud.
[0131] S502, the file encapsulation device encapsulates the bitstream of the target point cloud according to the first feature information of at least one of the M attribute instances to obtain a media file of the target point cloud, wherein the media file of the target point cloud includes the first feature information of the aforementioned at least one attribute instance.
[0132] The first characteristic information of an attribute instance can be understood as information used to identify that the attribute instance is different from the other attribute instances among the M attribute instances. For example, the priority and identifier of the attribute instance.
[0133] This application does not limit the specific content of the first feature information of the attribute instance in the embodiments.
[0134] In some embodiments, the first characteristic information of an attribute instance includes at least one of the following: the identifier of the attribute instance, the priority of the attribute instance, and the type of the attribute instance.
[0135] In one example, the identifier of an attribute instance is represented by the field attr_instance_id, and different values of this field represent the identifier value of the attribute instance.
[0136] In one example, the priority of an attribute instance is represented by the field attr_instance_priority. The smaller the value of this optional field, the higher the priority of the attribute instance.
[0137] Optionally, attr_instance_id can be reused to indicate the priority of an attribute instance. For example, the smaller the value of attr_instance_id, the higher the priority of the attribute instance.
[0138] In one example, the type of an attribute instance, also known as the attribute instance selection strategy, is represented by the field attr_instance_type, and different values of this field represent different types of attribute instances.
[0139] The type of attribute instance can be understood as a strategy used to instruct the file decompression device to select a target attribute instance from M attribute instances of the same type. Alternatively, it can be understood as an indication of the consumption scenario for different attribute instances. For example, if the consumption scenario for this attribute instance is that it is associated with scenario 1, then the file decompression device can request the attribute instance associated with scenario 1 in scenario 1.
[0140] In some embodiments, the type of the attribute instance includes at least one of the attribute instance associated with the recommendation window and the attribute instance associated with user feedback.
[0141] For example, if the type of the attribute instance is an attribute instance associated with user feedback, the file decompression device can determine the attribute instance associated with the user feedback information based on the user feedback information, and then determine the attribute information as the target attribute instance to be decoded.
[0142] For example, if the type of the attribute instance is an attribute instance associated with the recommendation window, the file decapsulation device can determine the attribute instance associated with the recommendation window based on the relevant information of the recommendation window, and then determine the attribute instance as the target attribute instance to be decoded.
[0143] In one possible implementation, if the value of the attr_instance_type field is the first value, it indicates that the type of the attribute instance is an attribute instance associated with the recommendation window.
[0144] In one possible implementation, if the value of the attr_instance_type field is the second value, it indicates that the type of the attribute instance is an attribute instance associated with user feedback.
[0145] For example, the possible values for the field attr_instance_type are shown in Table 1:
[0146] Table 1
[0147] values of attr_instance_type describe First value Instances associated with the viewport Second value Instances associated with user feedback other reserve
[0148] Optionally, the first value mentioned above is 0.
[0149] Optionally, the second value mentioned above is 1.
[0150] It should be noted that the above are just examples of the first and second values. The values of the first and second values include, but are not limited to, 0 and 1, and should be determined according to the actual situation.
[0151] In this step, the first feature information of at least one of the M attribute instances belonging to the same type of attribute information is added to the media file of the target point cloud.
[0152] The embodiments of this application do not limit the specific location where the first feature information of the above-mentioned at least one attribute instance is added in the media file. For example, it can be added in the header sample of the track corresponding to at least one attribute instance.
[0153] In some embodiments, the process of encapsulating the bitstream of the target point cloud according to the first feature information of at least one of the M attribute instances in S502 above to obtain a media file of the target point cloud (that is, adding the first feature information of at least one of the M attribute instances to the media file of the target point cloud) includes the following cases:
[0154] Case 1: If the geometric and attribute information of a frame of point cloud in the target point cloud is encapsulated in a track or a project, then the first feature information of at least one attribute instance is added to the subsample data boxes corresponding to the M attribute instances.
[0155] In scenario 1, the target point cloud is encapsulated using point cloud frames as encapsulation units. A single point cloud frame can be understood as a point cloud scanned by the point cloud acquisition device during a single scan. Alternatively, a single point cloud frame can be a point cloud of a preset size. During encapsulation, when the geometric and attribute information of a point cloud frame is encapsulated within a track or project, this track or project includes geometric information sub-samples and attribute information sub-samples. The first feature information of at least one attribute instance is added to the sub-sample data boxes corresponding to the M attribute instances.
[0156] In one example, if the N types of attribute information of the target point cloud are encapsulated in a subsample, then the first feature information of at least one attribute instance can be added to the subsample data box.
[0157] In another example, if each of the N types of attribute information of the target point cloud is encapsulated in a subsample, and if the above M attribute instances are attribute instances of the a-th type of attribute information, then the first characteristic information of at least one of the M attribute instances can be added to the subsample data box of the a-th type of attribute information.
[0158] In some embodiments, if the encapsulation standard of the media file is ISOBMFF, then the data structure of the subsample data box corresponding to case 1 above is as follows:
[0159] The codec_specific_parameters field in the SubsampleInformationBox is defined as follows:
[0160]
[0161] The payloadType is used to indicate the tlv_type data type of the G-PCC unit in the subsample.
[0162] attrIdx is used to indicate the ash_attr_sps_attr_idx of the G-PCC unit that contains attribute data in the subsample.
[0163] A value of 1 for multi_attr_instance_flag indicates that there are multiple instances of the attribute of the current type; a value of 0 indicates that there is only one instance of the attribute of the current type.
[0164] attr_instance_id indicates the identifier of the attribute instance.
[0165] `attr_instance_priority` indicates the priority of an attribute instance. The smaller the value of this field, the higher the priority of the attribute instance. When multiple attribute instances of an attribute type exist, the client can discard the attribute instance with the lower priority.
[0166] The `attr_instance_type` field indicates the type of the attribute instance. This field is used to indicate the consumption scenario of different instances, and the meanings of the field values are as follows:
[0167] values of attr_instance_type describe 0 Instances associated with the viewport 1 Instances associated with user feedback other reserve
[0168] In Case 1, after obtaining the media file, the file decapsulation device can obtain the first feature information of at least one of the M attribute instances from the aforementioned subsample data box, and then determine the target attribute instance to be decoded based on the first feature information, thereby avoiding the problem of low decoding efficiency caused by decoding all attribute instances.
[0169] Case 2: If the information of each of the above M attribute instances is encapsulated in a track or a project, then the first feature information of at least one attribute instance is added to the component information data box corresponding to the M attribute instances.
[0170] In scenario 2, when encapsulating the target point cloud, the geometric information and attribute information of a frame of point cloud are encapsulated separately. For example, the geometric information is encapsulated in a geometric track, and each attribute instance of each of the N types of attribute information is encapsulated in a track or project. Specifically, when encapsulating each attribute instance of M attribute instances belonging to the same type of attribute information in a track or project, the first characteristic information of at least one of the aforementioned attribute instances can be added to the component data box corresponding to the M attribute instances.
[0171] In some embodiments, if the encapsulation standard of the media file is ISOBMFF, then the data structure of the component data box corresponding to case 2 above is as follows:
[0172]
[0173]
[0174] Among them, gpcc_type is used to indicate the type of GPCC component, and its value meaning is shown in Table 2.
[0175] Table 2 Component Types
[0176] gpcc_type value describe 1 reserve 2 Geometric data 3 reserve 4 Attribute data 5..31 reserve
[0177] attr_index is used to indicate the ordinal number of the attribute indicated in the SPS (Sequence Parameter Set).
[0178] A value of 1 for attr_type_present_flag indicates that the GPCCComponentInfoBox data box indicates attribute type information; a value of 0 indicates that the GPCCComponentInfoBox data box does not indicate attribute type information.
[0179] The attr_type indicates the type of the attribute component, and its values are shown in Table 3.
[0180] Table 3
[0181]
[0182] attr_name is used to indicate the type information of attribute components that are human-readable.
[0183] A value of 1 for multi_attr_instance_flag indicates that there are multiple instances of the attribute of the current type; a value of 0 indicates that there is only one instance of the attribute of the current type.
[0184] attr_instance_id indicates the identifier of the attribute instance.
[0185] `attr_instance_priority` indicates the priority of an attribute instance. The smaller the value of this field, the higher the priority of the attribute instance. When multiple attribute instances of an attribute type exist, the client can discard the attribute instance with the lower priority.
[0186] Optionally, attr_instance_id can be reused to indicate the priority of an attribute instance. The smaller the value of attr_instance_id, the higher the priority of the attribute instance.
[0187] The `attr_instance_type` field indicates the type of the attribute instance. This field is used to indicate the consumption scenario of different instances, and the meanings of the field values are as follows:
[0188] values of attr_instance_type describe 0 Instances associated with the viewport 1 Instances associated with user feedback other reserve
[0189] In scenario 2, after obtaining the media file, the file decapsulation device can obtain the first feature information of at least one of the M attribute instances from the aforementioned component data box, and then determine the target attribute instance to be decoded based on the first feature information, thereby avoiding the problem of low decoding efficiency caused by decoding all attribute instances.
[0190] In one example of scenario 2, M attribute instances belonging to the same type of attribute information can be encapsulated one-to-one in M tracks or projects. Each track or project includes one attribute instance. In this way, the first feature information of the attribute instance can be directly added to the data box of the track or project corresponding to the attribute instance.
[0191] Case 3: If each of the M attribute instances is encapsulated in a track or a project, and the M tracks corresponding to the M attribute instances constitute a track group, or the M projects corresponding to the M attribute instances constitute an entity group, then the first feature information of at least one attribute instance among the M attribute instances is added to the track group data box or the entity group data box.
[0192] For example, each of the M attribute instances of the same type of attribute information is encapsulated in a track, resulting in M tracks. These M tracks form a track group. In this way, the first characteristic information of at least one of the M attribute instances can be added to the track group data box (AttributeInstanceTrackGroupBox).
[0193] For example, each of the M attribute instances of the same type of attribute information is encapsulated in a project, resulting in M projects. These M projects constitute an entity group. In this way, the first characteristic information of at least one of the M attribute instances can be added to the entity group's data box (AttributeInstanceEntityToGroupBox).
[0194] It should be noted that the location where the first characteristic information is added in the media file of the target point cloud includes, but is not limited to, the three cases mentioned above.
[0195] In some embodiments, if the type of the attribute instance is an attribute instance associated with the recommendation window, the method of this application further includes S502-1:
[0196] S502-1, The file encapsulation device adds the second characteristic information of the attribute instance to the metadata track of the recommendation window associated with the attribute instance.
[0197] In one example, the second characteristic information of the attribute instance is consistent with the first characteristic information of the attribute instance, including at least one of the attribute instance's identifier, attribute instance priority, and attribute instance type.
[0198] In another example, the second characteristic information of the attribute instance includes at least one of the attribute instance's identifier and the attribute type of the attribute instance. For example, the second characteristic information of the attribute instance includes the attribute instance's identifier. As another example, the second characteristic information of the attribute instance includes the attribute instance's identifier and the attribute type of the attribute instance.
[0199] In some embodiments, adding second characteristic information of an attribute instance to the metadata track of the recommendation window can be achieved through the following procedure:
[0200]
[0201]
[0202] If the viewport metadata track exists, the camera extrinsic information ExtCameraInfoStruct() should appear in the sample entry or within the sample. The following conditions must not occur: dynamic_ext_camera_flag is 0 and camera_extrinsic_flag[i] is 0 in all samples.
[0203] num_viewports indicates the number of viewports in the sample.
[0204] viewport_id[i] indicates the identifier of the corresponding viewport.
[0205] A value of 1 for viewport_cancel_flag[i] indicates that the viewport with the identifier viewport_id[i] has been canceled.
[0206] A value of 1 for `camera_intrinsic_flag[i]` indicates that the i-th viewport in the current sample contains camera intrinsics. If `dynamic_int_camera_flag` is 0, this field must also be 0. Similarly, when `camera_extrinsic_flag[i]` is 0, this field must also be 0.
[0207] A value of 1 for `camera_extrinsic_flag[i]` indicates that the i-th viewport in the current sample contains camera extrinsic parameters. If `dynamic_ext_camera_flag` is 0, then this field must be 0.
[0208] A value of 1 for attr_instance_asso_flag[i] indicates that the i-th window in the current sample is associated with the corresponding attribute instance. When attr_instance_type is 0, at least one sample in the current track must have an attr_instance_asso_flag value of 1.
[0209] attr_type indicates the type of attribute component, and its values are shown in Table 3 above.
[0210] attr_instance_id indicates the identifier of the attribute instance.
[0211] In this embodiment, if the attribute instance is an attribute instance associated with a recommendation window, then second characteristic information of the attribute instance is added to the metadata track of the recommendation window associated with the attribute instance. When the file decompression device requests the metadata track of the recommendation window, it can determine the target attribute instance to be decoded based on the second characteristic information of the attribute instance added to the metadata track of the recommendation window. For example, the second characteristic information includes the identifier of the attribute instance. The file decompression device can send the identifier of the attribute instance to the file encapsulation device, so that the file encapsulation device sends the media file of the attribute instance corresponding to the identifier of the attribute instance to the file decompression device for consumption. This avoids the file decompression device requesting unnecessary resources, thereby saving bandwidth and decoding resources and improving decoding efficiency.
[0212] In some embodiments, if M attribute instances are encapsulated one-to-one in M attribute instance tracks, the file encapsulation device associates the M attribute instance tracks through a track group data box.
[0213] Specifically, M attribute instances are encapsulated one-to-one in M attribute instance tracks, with each attribute instance track containing one attribute instance. This allows M attribute instances belonging to the same type of attribute information to be associated.
[0214] For example, associating tracks of different attribute instances of the same attribute type using track groups can be achieved by adding identifiers for M attribute instances to the track group data box.
[0215] In one possible implementation, associating M attribute instance tracks through a track group data box can be achieved through the following procedure:
[0216] Attribute instance track group
[0217] Data box type: 'paig'
[0218] Contains: TrackGroupBox
[0219] Mandatory: Non-mandatory
[0220] Quantity: 0 or more
[0221]
[0222]
[0223] Among them, attr_type indicates the type of attribute component, and its values are shown in Table 3.
[0224] attr_instance_id indicates the identifier of the attribute instance.
[0225] `attr_instance_priority` indicates the priority of an attribute instance. The smaller the value of this field, the higher the priority of the attribute instance. When multiple attribute instances of an attribute type exist, the client can discard the attribute instance with the lower priority.
[0226] In some embodiments, if M attribute instances are encapsulated one-to-one in M attribute instance projects, then the M attribute instance projects are associated through entity group data boxes.
[0227] Specifically, M attribute instances are encapsulated one-to-one within M attribute instance projects. Each attribute instance project contains one attribute instance, allowing M attribute projects belonging to the same type of attribute information to be associated.
[0228] For example, associating items with different attribute instances of the same attribute type using entity groups can be achieved by adding identifiers for M attribute instances to the entity group data box.
[0229] In one possible implementation, associating M attribute instance tracks through an entity group data box can be achieved through the following procedure:
[0230] Attribute instance entity to group
[0231] Data box type: 'paie'
[0232] Contains: GroupsListBox
[0233] Mandatory: Non-mandatory
[0234] Quantity: 0 or more
[0235]
[0236]
[0237] Among them, attr_type indicates the type of attribute component, and its values are shown in Table 3.
[0238] attr_instance_id indicates the identifier of the attribute instance.
[0239] `attr_instance_priority` indicates the priority of an attribute instance. The smaller the value of this field, the higher the priority of the attribute instance. When multiple attribute instances of an attribute type exist, the client can discard the attribute instance with the lower priority.
[0240] The point cloud media file encapsulation method provided in this application involves a file encapsulation device acquiring a target point cloud and encoding it to obtain a bitstream of the target point cloud. The target point cloud includes N types of attribute information, and at least one of these N types includes M attribute instances, where N is a positive integer and M is a positive integer greater than 1. Based on the first feature information of at least one of the M attribute instances, the bitstream of the target point cloud is encapsulated to obtain a media file of the target point cloud. This media file includes the first feature information of at least one attribute instance. In other words, this application adds the first feature information of the attribute instances to the media file, enabling the file decapsulation device to determine the specific target attribute instance to be decoded based on the first feature information of the attribute information, thereby saving bandwidth and decoding resources and improving decoding efficiency.
[0241] Figure 6 An interactive flowchart of a point cloud media file encapsulation and decapsulation method provided in this application embodiment is shown below. Figure 6 As shown, this embodiment includes the following steps:
[0242] S601. The file encapsulation device acquires the target point cloud and encodes the target point cloud to obtain the bitstream of the target point cloud.
[0243] The target point cloud includes N types of attribute information, and at least one type of attribute information in the N types of attribute information includes M attribute instances, where N is a positive integer and M is a positive integer greater than 1.
[0244] S602, The file encapsulation device encapsulates the bitstream of the target point cloud according to the first feature information of at least one of the M attribute instances to obtain a media file of the target point cloud. The media file of the target point cloud includes the first feature information of at least one attribute instance.
[0245] The implementation process of S601 and S602 can be referred to the specific description of S501 to S502 above, and will not be repeated here.
[0246] After the file encapsulation device encodes and encapsulates the target point cloud according to the above steps to obtain the media file of the target point cloud, it can interact with the file decapsulation device in the following ways:
[0247] In Method 1, the file encapsulation device can directly send the encapsulated target point cloud media file to the file decapsulation device, allowing the file decapsulation device to selectively consume some attribute instances based on the first feature information of the attribute instances in the media file.
[0248] Method 2: The file encapsulation device sends a signaling message to the file decapsulation device. Based on the signaling message, the file decapsulation device requests all or part of the media files of the attribute instances from the file encapsulation device for consumption.
[0249] In this embodiment, the process of consuming media files of partial attribute instances requested by the file decapsulation device in Method 2 is described, specifically referring to steps S603 to S604 below.
[0250] S603, The file encapsulation device sends the first information to the file decapsulation device.
[0251] The first information is used to indicate the first feature information of at least one of the M attribute instances.
[0252] The first characteristic information of an attribute instance includes at least one of the following: the attribute instance's identifier, the attribute instance's priority, and the attribute instance's type.
[0253] Optionally, the first piece of information mentioned above is DASH signaling.
[0254] In some embodiments, if the first information mentioned above is DASH signaling, the semantic description of the DASH signaling is shown in Table 4:
[0255] Table 4
[0256]
[0257]
[0258]
[0259]
[0260] It should be noted that Table 4 above is one form of the first information, and the first information in this application embodiment includes, but is not limited to, the content shown in Table 4 above.
[0261] The first characteristic information of an attribute instance includes at least one of the following: the attribute instance's identifier, the attribute instance's priority, and the attribute instance's type.
[0262] Optionally, the first piece of information mentioned above is DASH signaling.
[0263] S604. The file decapsulation device determines the target attribute instance based on the first feature information of at least one attribute instance.
[0264] In this step, the file decompression device determines the target attribute instance based on the first feature information of at least one attribute instance indicated by the first information in the following ways, including but not limited to:
[0265] Method 1: If the first characteristic information of an attribute instance includes the priority of the attribute instance, then one or more attribute instances with higher priority can be identified as the target attribute instance.
[0266] Method Two: If the first characteristic information of an attribute instance includes its identifier, and the identifier indicates the priority of the attribute instance, then one or more attribute instances can be selected as the target attribute instance based on their identifiers. For example, if a smaller identifier indicates higher priority, then the attribute instances with the smallest identifiers can be selected as the target attribute instances. Conversely, if a larger identifier indicates higher priority, then the attribute instances with the largest identifiers can be selected as the target attribute instances.
[0267] Method 3: The first characteristic information of the attribute instance includes the type of the attribute instance. Then, the target attribute instance can be determined from at least one attribute instance based on the type of the attribute instance. For details, please refer to Example 1 and Example 2 below.
[0268] Example 1: If the type of the attribute instance is an attribute instance associated with user feedback, the file decapsulation device determines the target attribute instance from at least one attribute instance based on the first characteristic information of at least one attribute instance of M attribute information.
[0269] For example, a target attribute instance is determined from at least one attribute instance based on the network bandwidth and / or computing power of the file decompression device, and the priority of the attribute instances in the first characteristic information. For instance, if the network bandwidth is sufficient and the device computing power is strong, a larger number of attribute instances from the at least one attribute instance can be determined as the target attribute instance. If the network bandwidth is insufficient, and / or the device computing power is weak, the attribute instance with the highest priority from the at least one attribute instance can be determined as the target attribute instance.
[0270] Example 2: If the type of the attribute instance is an attribute instance associated with the recommendation window, the file decapsulation device obtains the metadata track of the recommendation window and determines the target attribute instance from at least one attribute instance of M attribute information based on the second characteristic information of the attribute instance included in the metadata track of the recommendation window.
[0271] Optionally, the second characteristic information of the attribute instance includes at least one of the attribute instance's identifier and the attribute type of the attribute instance.
[0272] The method by which the file decompression device obtains the metadata track of the recommended window is as follows: the file encapsulation device sends second information to the file decompression device, which indicates the metadata track of the recommended window. Based on this second information, the file decompression device requests the metadata track of the recommended window from the file encapsulation device. The file encapsulation device then sends the metadata track of the recommended window to the file decompression device.
[0273] Optionally, the second information may be sent before the first information.
[0274] Optionally, the second information may be sent after the first information.
[0275] Optionally, the second information and the first information are sent simultaneously.
[0276] In this embodiment, if the type of the attribute instance is an attribute instance associated with the recommendation window, then the metadata track of the recommendation window includes the second characteristic information of the attribute instance. Thus, after the file decompression device obtains the metadata track of the recommendation window according to the above steps, it retrieves the second characteristic information of the attribute instance from the metadata track of the recommendation window, and determines the target attribute instance based on the second characteristic information, for example, determining the attribute instance corresponding to the second characteristic information as the target attribute instance.
[0277] After determining the target attribute instance to be decoded according to the above steps, the file decapsulation device executes the following S605.
[0278] S605, The file decapsulation device sends a first request message to the file encapsulation device, the first request message being used to request the media file of the target attribute instance.
[0279] For example, the first request information includes the identifier of the target attribute instance.
[0280] For example, the first request information includes the first characteristic information of the target attribute instance.
[0281] S606. The file encapsulation device sends the media file of the target attribute instance to the file decapsulation device according to the first request information.
[0282] For example, the first request information includes the identifier of the target attribute instance. In this way, the file encapsulation device can find the media file corresponding to the target attribute instance in the media file of the target point cloud, and send the media file of the target attribute instance to the file decapsulation device.
[0283] S607. The file decapsulation device decapsulates and then decodes the media file of the target attribute instance to obtain the attribute information of the target attribute instance.
[0284] Specifically, after receiving the media file of the target attribute instance, the file decapsulation device first decapsulates the media file of the target attribute instance to obtain the decapsulated target attribute instance bitstream, and then decodes the target attribute instance bitstream to obtain the decoded target attribute instance.
[0285] In some embodiments, if the attribute information of the target point cloud is encoded based on the geometric information of the point cloud, the file encapsulation device also sends the media file of the geometric information corresponding to the target attribute instance to the file decapsulation device for decoding the geometric information. Based on the decoded geometric information, attribute decoding is performed on the target attribute instance.
[0286] To further illustrate the technical solutions of the embodiments of this application, specific examples are provided below.
[0287] Example 1:
[0288] Step 11: Assume the target point cloud's bitstream contains two attribute instances of the same attribute type, and encapsulate the different attribute instances in the target point cloud's bitstream using multitracks to obtain the target point cloud's media file F1. The target point cloud's media file F1 includes Track1, Track2, and Track3:
[0289] Track1: GPCCComponentInfoBox: {gpcc_type=2(Geometry)}.
[0290] Track2: GPCCComponentInfoBox: {gpcc_type=4(Attribute); multi_attr_instance_flag=1; attr_instance_id=1; attr_instance_priority=0; attr_instance_type=1}.
[0291] Track3: GPCCComponentInfoBox: {gpcc_type=4(Attribute); multi_attr_instance_flag=1; attr_instance_id=2; attr_instance_priority=1; attr_instance_type=1}.
[0292] Track2 and Track3 are the tracks of two attribute instances.
[0293] Step 12: Based on the attribute instance information in the media file F1 of the target point cloud, generate DASH signaling (i.e., first information) to indicate the first characteristic information of at least one attribute instance. The DASH signaling includes the following:
[0294] Representation1: Corresponds to track1, component@component_type = 'geom'.
[0295] Representation2: Corresponding to track2, component@component_type='attr'; component@attr_instance_id=1; component@attr_instance_priority=0; component@attr_instance_type=1.
[0296] Representation3: Corresponding to track3, component@component_type='attr'; component@attr_instance_id=2; component@attr_instance_priority=1; component@attr_instance_type=1.
[0297] Send DASH signaling to the file decapsulation device.
[0298] Step 13: File decapsulation devices C1 and C2 request point cloud media files based on network bandwidth and information in DASH signaling.
[0299] Optionally, if file decapsulation device C1 has sufficient network bandwidth, it requests Representation1 to Representation3; if file decapsulation device C2 has limited network bandwidth, it requests Representation1 to Representation2.
[0300] Step 14: Transfer point cloud media files.
[0301] Step 15: The file decapsulation device receives the point cloud file.
[0302] Specifically, C1: Based on attr_instance_type=1, the two attribute instances received by C1 switch according to user interaction operations, and C1 can obtain a more personalized point cloud consumption experience.
[0303] C2: C2 receives only one attribute instance and obtains a basic point cloud consumption experience.
[0304] Example 2:
[0305] Step 21: Assume the target point cloud's bitstream contains two attribute instances of the same attribute type, and encapsulate the different attribute instances in the target point cloud's bitstream using multitracks to obtain the target point cloud's media file F1. The target point cloud's media file F1 includes Track1, Track2, and Track3:
[0306] Track1: GPCCComponentInfoBox: {gpcc_type=2(Geometry)}.
[0307] Track2: GPCCComponentInfoBox: {gpcc_type=4(Attribute); multi_attr_instance_flag=1; attr_instance_id=1; attr_instance_priority=0; attr_instance_type=0}.
[0308] Track3: GPCCComponentInfoBox: {gpcc_type=4(Attribute); multi_attr_instance_flag=1; attr_instance_id=2; attr_instance_priority=0; attr_instance_type=0}.
[0309] Track2 and Track3 are the tracks of two attribute instances.
[0310] Step 22: Based on the attribute instance information in the media file F1 of the target point cloud, generate DASH signaling (i.e., first information) to indicate the first characteristic information of at least one attribute instance. The DASH signaling includes the following:
[0311] Representation1: Corresponds to track1, component@component_type = 'geom'.
[0312] Representation2: Corresponding to track2, component@component_type='attr'; component@attr_instance_id=1; component@attr_instance_priority=0; component@attr_instance_type=0.
[0313] Representation3: Corresponding to track3, component@component_type='attr'; component@attr_instance_id=2; component@attr_instance_priority=0; component@attr_instance_type=0.
[0314] Send DASH signaling to the file decapsulation device.
[0315] Step 23: File decapsulation devices C1 and C2 request point cloud media files based on network bandwidth and information in DASH signaling.
[0316] C1: Network bandwidth is sufficient; requesting 2 property instances.
[0317] C2: Although representation2 and 3 have the same priority, since these two attribute instances are associated with the recommendation window, the corresponding media resources can be requested based on the second characteristic information of the attribute instance in the recommendation window metadata track, according to the user's viewing position, and only one attribute instance can be requested at a time.
[0318] Step 24: Transfer point cloud media files.
[0319] Step 25: The file decapsulation device receives the point cloud file.
[0320] C1: Based on attr_instance_type=0, after receiving two attribute instances, C1 selects one of the attribute instances to decode and consume based on the user's viewing window.
[0321] C2: C2 receives only one property instance and decodes the corresponding property instance for consumption.
[0322] The point cloud media file encapsulation and decapsulation method provided in this application embodiment involves a file encapsulation device sending first information to a file decapsulation device. This first information indicates the first feature information of at least one of M attribute instances. In this way, the file decapsulation device can select a target attribute instance to consume based on the first feature information of at least one attribute instance and the performance of the file decoding device itself, thereby saving network bandwidth and improving decoding efficiency.
[0323] Figure 7 An interactive flowchart of a point cloud media file encapsulation and decapsulation method provided in this application embodiment is shown below. Figure 7 As shown, this embodiment includes the following steps:
[0324] S701, The file encapsulation device acquires the target point cloud and encodes it to obtain the bitstream of the target point cloud.
[0325] The target point cloud includes N types of attribute information, and at least one type of attribute information in the N types of attribute information includes M attribute instances, where N is a positive integer and M is a positive integer greater than 1.
[0326] S702, the file encapsulation device encapsulates the bitstream of the target point cloud according to the first feature information of at least one of the M attribute instances to obtain a media file of the target point cloud, wherein the media file of the target point cloud includes the first feature information of at least one attribute instance.
[0327] The implementation process of S701 and S702 can be referred to the specific description of S501 to S502 above, and will not be repeated here.
[0328] After the file encapsulation device encodes and encapsulates the target point cloud according to the above steps to obtain the media file of the target point cloud, it can interact with the file decapsulation device in the following ways:
[0329] In Method 1, the file encapsulation device can directly send the encapsulated target point cloud media file to the file decapsulation device, allowing the file decapsulation device to selectively consume some attribute instances based on the first feature information of the attribute instances in the media file.
[0330] Method 2: The file encapsulation device sends a signaling message to the file decapsulation device. Based on the signaling message, the file decapsulation device requests all or part of the media files of the attribute instances from the file encapsulation device for consumption.
[0331] In this embodiment, the process of selecting and consuming media files of decoded attribute instances after requesting the complete media file of the target point cloud from the file decapsulation device in Method 2 is described, specifically referring to steps S703 to S704 below.
[0332] S703, The file encapsulation device sends the first information to the file decapsulation device.
[0333] The first information is used to indicate the first feature information of at least one of the M attribute instances.
[0334] The first characteristic information of an attribute instance includes at least one of the following: the attribute instance's identifier, the attribute instance's priority, and the attribute instance's type.
[0335] Optionally, the first piece of information mentioned above is DASH signaling.
[0336] In some embodiments, if the first information is DASH signaling, the semantic description of the DASH signaling is as shown in Table 4 above.
[0337] S704. The file decapsulation device sends a second request message to the file encapsulation device based on the first information.
[0338] The second request is used to request the media files for the target point cloud.
[0339] S705, the file encapsulation device sends the media file of the target point cloud to the file decapsulation device according to the second request information.
[0340] S706, The file decapsulation device determines the target attribute instance based on the first feature information of at least one attribute instance.
[0341] The implementation process of S706 is consistent with that of S604. Referring to the description of S604, for example, if the type of the attribute instance is an attribute instance associated with user feedback, the file decompression device determines the target attribute instance from at least one attribute instance based on the first characteristic information of at least one attribute instance of the M attribute information. As another example, if the type of the attribute instance is an attribute instance associated with the recommendation window, the file decompression device obtains the metadata track of the recommendation window and determines the target attribute instance from at least one attribute instance of the M attribute information based on the second characteristic information of the attribute instances included in the metadata track of the recommendation window.
[0342] S707, The file decapsulation device decapsulates and then decodes the media file of the target attribute instance to obtain the attribute information of the target attribute instance.
[0343] After identifying the target attribute instance to be decoded according to the above steps, the media file corresponding to the target attribute instance is retrieved from the media file of the received target point cloud. Next, the media file of the target attribute instance is first decapsulated to obtain the decapsulated target attribute instance bitstream, and then the bitstream of the target attribute instance is decoded to obtain the decoded target attribute instance.
[0344] The point cloud media file encapsulation and decapsulation method provided in this application involves a file encapsulation device sending first information to a file decapsulation device. This first information indicates the first feature information of at least one of M attribute instances. Thus, after the file decapsulation device requests the media file of the entire target point cloud, it can select a target attribute instance for decoding and consumption based on the first feature information of at least one attribute instance and the performance of the file decoding device itself, thereby saving network bandwidth and improving decoding efficiency.
[0345] It should be understood that Figures 5 to 7 This is merely an example of what is being done and should not be construed as limiting the scope of this application.
[0346] The preferred embodiments of this application have been described in detail above with reference to the accompanying drawings. However, this application is not limited to the specific details of the above embodiments. Within the scope of the technical concept of this application, various simple modifications can be made to the technical solutions of this application, and these simple modifications all fall within the protection scope of this application. For example, the various specific technical features described in the above specific embodiments can be combined in any suitable manner without contradiction. To avoid unnecessary repetition, this application will not describe the various possible combinations separately. Furthermore, various different embodiments of this application can also be arbitrarily combined, as long as they do not violate the spirit of this application, they should also be considered as the content disclosed in this application.
[0347] The above text combined Figure 5 and Figure 7 The method embodiments of this application are described in detail below, in conjunction with... Figures 8 to 10 The following describes in detail the device embodiments of this application.
[0348] Figure 8 This is a schematic diagram of a point cloud media file encapsulation device according to an embodiment of this application. The device 10 is applied to a file encapsulation device and includes:
[0349] The acquisition unit 11 is used to acquire a target point cloud and encode the target point cloud to obtain a code stream of the target point cloud. The target point cloud includes N types of attribute information, and at least one type of attribute information in the N types of attribute information includes M attribute instances. N is a positive integer and M is a positive integer greater than 1.
[0350] The encapsulation unit 12 is used to encapsulate the bitstream of the target point cloud according to the first feature information of at least one of the M attribute instances to obtain a media file of the target point cloud. The media file of the target point cloud includes the first feature information of at least one attribute instance.
[0351] In some embodiments, the first characteristic information of the attribute instance includes at least one of the following: the identifier of the attribute instance, the priority of the attribute instance, and the type of the attribute instance.
[0352] In some embodiments, the type of the attribute instance includes at least one of an attribute instance associated with the recommendation window and an attribute instance associated with user feedback.
[0353] In some embodiments, if the type of the attribute instance is an attribute instance associated with a recommendation window, the encapsulation unit 12 is further configured to add second characteristic information of the attribute instance to the metadata track of the recommendation window associated with the attribute instance.
[0354] In some embodiments, the second characteristic information of the attribute instance includes at least one of the identifier of the attribute instance and the attribute type of the attribute instance.
[0355] In some embodiments, the encapsulation unit 12 is specifically used to add the first feature information of the at least one attribute instance to the subsample data box corresponding to the M attribute instances when the geometric information and attribute information of a frame of point cloud in the target point cloud are encapsulated in a track or a project; or,
[0356] If each of the M attribute instances is encapsulated in a track or a project, then the first feature information of at least one attribute instance is added to the component information data box corresponding to the M attribute instances; or,
[0357] If each of the M attribute instances is encapsulated in a track or a project, and the M tracks corresponding to the M attribute instances constitute a track group, or the M projects corresponding to the M attribute instances constitute an entity group, then the first feature information of the at least one attribute instance is added to the track group data box or the entity group data box.
[0358] In some embodiments, the encapsulation unit 12 is further configured to associate the M attribute instance tracks through a track group data box if the M attribute instances are encapsulated one-to-one in M attribute instance tracks; or...
[0359] If the M attribute instances are encapsulated one-to-one in M attribute instance projects, then the M attribute instance projects are associated through entity group data boxes.
[0360] In some embodiments, the apparatus further includes a transceiver unit 13 for sending first information to a file decapsulation device, the first information being used to indicate first feature information of at least one of the M attribute instances.
[0361] In some embodiments, the transceiver unit 13 is configured to receive a first request message sent by the file decompression device, the first request being used to request a media file of a target attribute instance; and to send the media file of the target attribute instance to the file decompression device according to the first request message.
[0362] In some embodiments, the transceiver unit 13 is further configured to receive a second request message sent by the file decompression device, the second request being used to request the media file of the target point cloud; and to send the media file of the target point cloud to the file decompression device according to the second request message.
[0363] It should be understood that the device embodiments and method embodiments can correspond to each other, and similar descriptions can be referred to the method embodiments. To avoid repetition, further details will not be provided here. Specifically, Figure 8 The apparatus 10 shown can execute the method embodiment corresponding to the file packaging device, and the foregoing and other operations and / or functions of each module in the apparatus 10 are respectively for implementing the method embodiment corresponding to the file packaging device. For the sake of brevity, they will not be described in detail here.
[0364] Figure 9 This is a schematic diagram of a point cloud media file decompression device provided in an embodiment of this application. The device 20 is applied to a file decompression equipment and includes:
[0365] Transceiver unit 21 is used to receive the first information sent by the file encapsulation device;
[0366] Wherein, the first information is used to indicate the first feature information of at least one of the M attribute instances, wherein the M attribute instances are the M attribute instances included in at least one of the N types of attribute information included in the target point cloud, where N is a positive integer and M is a positive integer greater than 1.
[0367] In some embodiments, the first characteristic information of the attribute instance includes at least one of the following: the identifier of the attribute instance, the priority of the attribute instance, and the type of the attribute instance.
[0368] In some embodiments, the type of the attribute instance includes at least one of an attribute instance associated with the recommendation window and an attribute instance associated with user feedback.
[0369] In some embodiments, if the type of the attribute instance is an attribute instance associated with a recommendation window, then the second characteristic information of the attribute instance is added to the metadata track of the recommendation window associated with the attribute instance.
[0370] In some embodiments, the apparatus further includes a determining unit 22 and a decoding unit 23:
[0371] The determining unit 22 is configured to determine the target attribute instance based on the first feature information of the at least one attribute instance;
[0372] The transceiver unit 21 is configured to send a first request message to the file encapsulation device, the first request message being used to request the media file of the target attribute instance; and to receive the media file of the target attribute instance sent by the file encapsulation device.
[0373] The decoding unit 23 is used to decapsulate and then decode the media file of the target attribute instance to obtain the attribute information of the target attribute instance.
[0374] In some embodiments, the transceiver unit 21 is further configured to send a second request message to the file encapsulation device based on the first information, the second request being used to request a media file of the target point cloud; and to receive the media file of the target point cloud sent by the file encapsulation device;
[0375] The determining unit 22 is configured to determine the target attribute instance based on the first feature information of the at least one attribute instance;
[0376] The decoding unit 23 is used to obtain the media file of the target attribute instance from the media file of the target point cloud; decapsulate the media file of the target attribute instance and then decode it to obtain the attribute information of the target attribute instance.
[0377] In some embodiments, if the first feature information of an attribute instance includes the type of the attribute instance, then the determining unit 22 is specifically configured to, if the type of the attribute instance is an attribute instance associated with user feedback, determine the target attribute instance from the at least one attribute instance based on the first feature information of at least one attribute instance of the M attribute information; or,
[0378] If the type of the attribute instance is an attribute instance associated with the recommendation window, then the metadata track of the recommendation window is obtained, and the target attribute instance is determined from at least one attribute instance of the M attribute information according to the second characteristic information of the attribute instance included in the metadata track of the recommendation window.
[0379] In some embodiments, the second characteristic information of the attribute instance includes at least one of the identifier of the attribute instance and the attribute type of the attribute instance.
[0380] In some embodiments, if the geometric and attribute information of a frame of point cloud in the target point cloud is encapsulated in a track or a project, then the first feature information of the attribute instances is added to the sub-sample data boxes corresponding to the M attribute instances; or,
[0381] If each of the M attribute instances is encapsulated in a track or a project, then the first feature information of the attribute instance is added to the component information data box corresponding to the M attribute instances; or,
[0382] If each of the M attribute instances is encapsulated in a track or a project, and the M tracks corresponding to the M attribute instances constitute a track group, or the M projects corresponding to the M attribute instances constitute an entity group, then the first feature information of the attribute instance is added to the track group data box or the entity group data box.
[0383] In some embodiments, if the M attribute instances are encapsulated one-to-one in M attribute instance tracks, then the media file of the target point cloud includes a track group data box, which is used to associate the M attribute instance tracks; or, if the M attribute instances are encapsulated one-to-one in M attribute instance projects, then the media file of the target point cloud includes an entity group data box, which is used to associate the M attribute instance projects.
[0384] It should be understood that the device embodiments and method embodiments can correspond to each other, and similar descriptions can be referred to the method embodiments. To avoid repetition, further details will not be provided here. Specifically, Figure 9The apparatus 20 shown can execute the method embodiment corresponding to the file decompression device, and the foregoing and other operations and / or functions of each module in the apparatus 20 are respectively for implementing the method embodiment corresponding to the file decompression device. For the sake of brevity, they will not be described in detail here.
[0385] The apparatus of this application embodiment has been described above from the perspective of functional modules in conjunction with the accompanying drawings. It should be understood that this functional module can be implemented in hardware, in software instructions, or in a combination of hardware and software modules. Specifically, the steps of the method embodiments in this application can be completed by integrated logic circuits in the processor's hardware and / or by software instructions. The steps of the method disclosed in this application embodiment can be directly embodied as being executed by a hardware decoding processor, or by a combination of hardware and software modules in the decoding processor. Optionally, the software module can reside in a mature storage medium in the art, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, etc. This storage medium is located in memory, and the processor reads information from the memory and, in conjunction with its hardware, completes the steps in the above method embodiments.
[0386] Figure 10 This is a schematic block diagram of an electronic device provided in an embodiment of this application. The electronic device can be the file encapsulation device or the file decapsulation device described above, or the electronic device can have the functions of both a file encapsulation device and a file decapsulation device.
[0387] like Figure 10 As shown, the electronic device 40 may include:
[0388] Memory 41 and memory 42 are provided. Memory 41 is used to store computer programs and to transfer the program code to memory 42. In other words, memory 42 can call and run computer programs from memory 41 to implement the methods in the embodiments of this application.
[0389] For example, the memory 42 can be used to execute the above-described method embodiments according to instructions in the computer program.
[0390] In some embodiments of this application, the memory 42 may include, but is not limited to:
[0391] General-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
[0392] In some embodiments of this application, the memory 41 includes, but is not limited to:
[0393] Volatile memory and / or non-volatile memory. Non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory can be random access memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of RAM are available, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced Synchronous DRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DR RAM).
[0394] In some embodiments of this application, the computer program may be divided into one or more modules, which are stored in the memory 41 and executed by the memory 42 to perform the method provided in this application. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which describe the execution process of the computer program in the video production device.
[0395] like Figure 10 As shown, the electronic device 40 may further include:
[0396] Transceiver 40, transceiver 43 can be connected to memory 42 or memory 41.
[0397] The memory 42 can control the transceiver 43 to communicate with other devices; specifically, it can send information or data to other devices or receive information or data sent by other devices. The transceiver 43 may include a transmitter and a receiver. The transceiver 43 may further include antennas, and the number of antennas can be one or more.
[0398] It should be understood that the various components in the video production equipment are connected through a bus system, which includes a data bus, a power bus, a control bus, and a status signal bus.
[0399] This application also provides a computer storage medium storing a computer program thereon, which, when executed by a computer, enables the computer to perform the methods of the above-described method embodiments. Alternatively, embodiments of this application also provide a computer program product containing instructions that, when executed by a computer, cause the computer to perform the methods of the above-described method embodiments.
[0400] When implemented using software, it can be implemented entirely or partially as a computer program product. This computer program product includes one or more computer instructions. When these computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that integrates one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., digital video disc (DVD)), or a semiconductor medium (e.g., solid-state disk (SSD)).
[0401] Those skilled in the art will recognize that the modules and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0402] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple modules or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or modules may be electrical, mechanical, or other forms.
[0403] The modules described as separate components may or may not be physically separate. The components shown as modules may or may not be physical modules; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. For example, the functional modules in the various embodiments of this application may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module.
[0404] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A method for encapsulating point cloud media files, characterized in that, Applied to a file encapsulation device, the method includes: Acquire a target point cloud and encode the target point cloud to obtain a bitstream of the target point cloud. The target point cloud includes N types of attribute information, and at least one type of attribute information in the N types of attribute information includes M attribute instances. N is a positive integer and M is a positive integer greater than 1. Based on the first feature information of at least one of the M attribute instances, the bitstream of the target point cloud is encapsulated to obtain a media file of the target point cloud. The media file of the target point cloud includes the first feature information of the at least one attribute instance, and the first feature information is added to a data box inside the media file. The first feature information is used to identify information that distinguishes the current attribute instance from other attribute instances among the M attribute instances. The first feature information of the attribute instance includes: the identifier of the attribute instance, the priority of the attribute instance, and the type of the attribute instance. The type of the attribute instance can be used to instruct the file decapsulation device to select a target attribute instance from the M attribute instances of the same type, or to indicate a consumption scenario for different attribute instances. Send first information to the file decompression device, the first information being used to indicate the first feature information of at least one of the M attribute instances; Receive a second request message sent by the file decapsulation device, wherein the second request is used to request the media file of the target point cloud; Based on the second request information, the media file of the target point cloud is sent to the file decapsulation device; If the type of the attribute instance is an attribute instance associated with the recommendation window, then the method further includes: Add the second characteristic information of the attribute instance to the metadata track of the recommendation window associated with the attribute instance.
2. The method according to claim 1, characterized in that, The type of the attribute instance includes at least one of the attribute instances associated with the recommendation window and the attribute instances associated with user feedback.
3. The method according to claim 1, characterized in that, The second characteristic information of the attribute instance includes at least one of the attribute instance's identifier and the attribute type of the attribute instance.
4. The method according to any one of claims 1-3, characterized in that, The step of encapsulating the bitstream of the target point cloud based on the first feature information of at least one of the M attribute instances to obtain a media file of the target point cloud includes: If the geometric and attribute information of a frame of point cloud in the target point cloud is encapsulated in a track or a project, then the first feature information of the at least one attribute instance is added to the subsample data box corresponding to the M attribute instances; or, If each of the M attribute instances is encapsulated in a track or a project, then the first feature information of at least one attribute instance is added to the component information data box corresponding to the M attribute instances; or, If each of the M attribute instances is encapsulated in a track or a project, and the M tracks corresponding to the M attribute instances constitute a track group, or the M projects corresponding to the M attribute instances constitute an entity group, then the first feature information of the at least one attribute instance is added to the track group data box or the entity group data box.
5. The method according to any one of claims 1-3, characterized in that, The method further includes: If the M attribute instances are encapsulated one-to-one in M attribute instance tracks, then the M attribute instance tracks are associated through a track group data box; or, If the M attribute instances are encapsulated one-to-one in M attribute instance projects, then the M attribute instance projects are associated through entity group data boxes.
6. The method according to claim 1, characterized in that, The method further includes: Receive a first request message sent by the file decapsulation device, wherein the first request is used to request the media file of the target attribute instance; Based on the first request information, the media file of the target attribute instance is sent to the file decompression device.
7. A method for decapsulating point cloud media files, characterized in that, Applications include file decompression and depackaging equipment, including: Receive the first information sent by the file encapsulation device; Wherein, the first information is used to indicate the first feature information of at least one attribute instance among M attribute instances, wherein the M attribute instances are M attribute instances included in at least one of the N types of attribute information included in the target point cloud, wherein N is a positive integer and M is a positive integer greater than 1, the first feature information is used to identify information that distinguishes the current attribute instance from other attribute instances among the M attribute instances, and the first feature information of the attribute instance includes: the identifier of the attribute instance, the priority of the attribute instance, and the type of the attribute instance; wherein, the type of the attribute instance is used to instruct the file decompression device to select the target attribute instance from the M attribute instances of the same type, or to instruct the consumption scenario of different attribute instances; Based on the first information, a second request is sent to the file encapsulation device, the second request being used to request the media file of the target point cloud; The media file of the target point cloud sent by the file encapsulation device is received, wherein the first feature information is added to the data box inside the media file; The target attribute instance is determined based on the first feature information of the at least one attribute instance; Obtain the media file of the target attribute instance from the media file of the target point cloud; The media file of the target attribute instance is decapsulated and then decoded to obtain the attribute information of the target attribute instance; If the attribute instance is an attribute instance associated with a recommendation window, then the second characteristic information of the attribute instance is added to the metadata track of the recommendation window associated with the attribute instance.
8. The method according to claim 7, characterized in that, The type of the attribute instance includes at least one of the attribute instances associated with the recommendation window and the attribute instances associated with user feedback.
9. The method according to claim 7, characterized in that, The method further includes: The target attribute instance is determined based on the first feature information of the at least one attribute instance; Send a first request message to the file encapsulation device, the first request message being used to request the media file of the target attribute instance; Receive the media file containing the target attribute instance sent by the file encapsulation device; The media file of the target attribute instance is decapsulated and then decoded to obtain the attribute information of the target attribute instance.
10. The method according to claim 7, characterized in that, If the first feature information of the attribute instance includes the type of the attribute instance, then determining the target attribute instance based on the first feature information of the at least one attribute instance includes: If the type of the attribute instance is an attribute instance associated with user feedback, then the target attribute instance is determined from the at least one attribute instance based on the first feature information of at least one of the M attribute instances; or, If the type of the attribute instance is an attribute instance associated with the recommendation window, then the metadata track of the recommendation window is obtained, and the target attribute instance is determined from at least one attribute instance of the M attribute instances based on the second characteristic information of the attribute instance included in the metadata track of the recommendation window.
11. The method according to claim 7, characterized in that, The second characteristic information of the attribute instance includes at least one of the attribute instance's identifier and the attribute type of the attribute instance.
12. The method according to any one of claims 7-10, characterized in that, If the geometric and attribute information of a frame of point cloud in the target point cloud is encapsulated in a track or a project, then the first feature information of the attribute instance is added to the subsample data box corresponding to the M attribute instances. or, If each of the M attribute instances is encapsulated in a track or a project, then the first feature information of the attribute instance is added to the component information data box corresponding to the M attribute instances; or, If each of the M attribute instances is encapsulated in a track or a project, and the M tracks corresponding to the M attribute instances constitute a track group, or the M projects corresponding to the M attribute instances constitute an entity group, then the first feature information of the attribute instance is added to the track group data box or the entity group data box.
13. The method according to any one of claims 7-10, characterized in that, If the M attribute instances are encapsulated one-to-one in M attribute instance tracks, then the media file of the target point cloud includes a track group data box, which is used to associate the M attribute instance tracks. or, If the M attribute instances are encapsulated one-to-one in M attribute instance projects, then the media file of the target point cloud includes an entity group data box, which is used to associate the M attribute instance projects.
14. A device for encapsulating point cloud media files, characterized in that, Applied to a file packaging device, the device includes: An acquisition unit is used to acquire a target point cloud and encode the target point cloud to obtain a bitstream of the target point cloud. The target point cloud includes N types of attribute information, and at least one type of attribute information in the N types of attribute information includes M attribute instances. N is a positive integer and M is a positive integer greater than 1. An encapsulation unit is configured to encapsulate the bitstream of the target point cloud based on the first feature information of at least one of the M attribute instances to obtain a media file of the target point cloud. The media file of the target point cloud includes the first feature information of the at least one attribute instance, and the first feature information is added to a data box inside the media file. The first feature information is used to identify information that distinguishes the current attribute instance from other attribute instances among the M attribute instances. The first feature information of the attribute instance includes: the identifier of the attribute instance, the priority of the attribute instance, and the type of the attribute instance. The type of the attribute instance can be used to instruct the file decapsulation device to select a target attribute instance from the M attribute instances of the same type, or to instruct the consumption scenario of different attribute instances. The transceiver unit is configured to send first information to the file decompression device, the first information being used to indicate first feature information of at least one attribute instance among the M attribute instances; receive second request information sent by the file decompression device, the second request being used to request the media file of the target point cloud; and send the media file of the target point cloud to the file decompression device according to the second request information. If the type of the attribute instance is an attribute instance associated with the recommendation window, the encapsulation unit is further configured to: add the second characteristic information of the attribute instance to the metadata track of the recommendation window associated with the attribute instance.
15. A device for decapsulating point cloud media files, characterized in that, Applied to a file decompression and unpacking device, the device includes: The transceiver unit is used to receive the first information sent by the file encapsulation device; Wherein, the first information is used to indicate the first feature information of at least one attribute instance among M attribute instances, wherein the M attribute instances are M attribute instances included in at least one of the N types of attribute information included in the target point cloud, wherein N is a positive integer and M is a positive integer greater than 1, the first feature information is used to identify information that distinguishes the current attribute instance from other attribute instances among the M attribute instances, and the first feature information of the attribute instance includes: the identifier of the attribute instance, the priority of the attribute instance, and the type of the attribute instance; wherein, the type of the attribute instance is used to instruct the file decompression device to select the target attribute instance from the M attribute instances of the same type, or to instruct the consumption scenario of different attribute instances; The transceiver unit is further configured to send a second request to the file encapsulation device based on the first information, the second request being used to request the media file of the target point cloud; and to receive the media file of the target point cloud sent by the file encapsulation device, wherein the first feature information is added to the data box inside the media file; The determining unit is configured to determine the target attribute instance based on the first feature information of the at least one attribute instance; The decoding unit is used to obtain the media file of the target attribute instance from the media file of the target point cloud; decapsulate the media file of the target attribute instance and then decode it to obtain the attribute information of the target attribute instance; If the attribute instance is an attribute instance associated with a recommendation window, then the second characteristic information of the attribute instance is added to the metadata track of the recommendation window associated with the attribute instance.
16. A file packaging device, characterized in that, include: A processor and a memory, the memory being used to store a computer program, the processor being used to invoke and run the computer program stored in the memory to perform the method of any one of claims 1 to 6.
17. A file decompression and unpacking device, characterized in that, include: A processor and a memory, the memory being used to store a computer program, the processor being used to invoke and run the computer program stored in the memory to perform the method of any one of claims 7 to 13.
18. An electronic device, characterized in that, include: A processor and a memory, the memory being used to store a computer program, the processor being used to invoke and run the computer program stored in the memory to perform the method of any one of claims 1 to 6 or any one of claims 7 to 13.
19. A computer-readable storage medium, characterized in that, Used to store a computer program that causes a computer to perform the method as claimed in any one of claims 1 to 6 or any one of claims 7 to 13.