A data processing method of immersive media and related apparatus
By adding a target scaling strategy to the media file format data box, the content playback device only requests the container file at the current resolution, which solves the bandwidth waste caused by user-defined scaling and achieves bandwidth saving and the best viewing experience.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2020-06-04
- Publication Date
- 2026-06-19
AI Technical Summary
In existing immersive media transmission solutions, user-initiated scaling behavior leads to bandwidth waste, and the inability to anticipate user scaling behavior results in video content playback devices having to request container files for all scaling resolution versions, thus failing to obtain the best viewing experience.
The media file format data box contains the scaling strategy for the scaling area of immersive media in the target scaling mode. The content playback device requests the container file at the current resolution according to the strategy, performs scaling processing, and avoids requesting all scaling resolution versions.
It saves transmission bandwidth and automatically renders the scaling effect specified by the immersive media content creator in target scaling mode, improving the user viewing experience.
Smart Images

Figure CN116347183B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the fields of computer technology and virtual reality (VR) technology, and in particular to a data processing method and related apparatus for immersive media. Background Technology
[0002] In immersive media transmission solutions in related technologies, user-defined scaling of immersive media is already supported. For video content with a specific playback time and a specific screen area that supports scaling, the server prepares multiple scaled resolution versions of the container file for that area. When the user performs a scaling operation, the content playback device requests all scaled resolution versions of the container file from the server. Ultimately, the scaling ratio and resolution of the video are determined by the user's scaling behavior in some embodiments. However, the scaling behavior depends entirely on the user's actual scaling operation. Since the user's scaling behavior cannot be known in advance, the content playback device must request all scaled resolution versions of the video before the user performs scaling, inevitably resulting in bandwidth waste. Summary of the Invention
[0003] This application provides a data processing method and related apparatus for immersive media, which can save transmission bandwidth.
[0004] On one hand, embodiments of this application provide a data processing method for immersive media, applied to a computer device, including:
[0005] Obtain the media file format data box of the immersive media, which includes the scaling strategy of the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer;
[0006] Scaling is performed on the i-th scaling region of the immersive media according to the media file format data box.
[0007] This application embodiment obtains the media file format data box of immersive media. This media file format data box includes the scaling strategy for the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer. Scaling processing is performed on the i-th scaling region of the immersive media according to the media file format data box. Therefore, in the target scaling mode, the client does not need to request the container files for all scaling resolution versions, thus saving transmission bandwidth.
[0008] On one hand, embodiments of this application provide a data processing method for immersive media, applied to a content production device, including:
[0009] Obtain scaling information for immersive media;
[0010] Configure the media file format data box of the immersive media according to the scaling information of the immersive media; the media file format data box includes the scaling strategy of the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer;
[0011] Add the media file format data box of immersive media to the immersive media container file.
[0012] This application embodiment configures a media file format data box based on the immersive media and its scaling information, and adds the media file format data box of the immersive media to the encapsulation file of the immersive media. This enables the content playback device to request the encapsulation file corresponding to the target scaling mode at the current resolution from the server based on the media file format data box and consume it, without needing to request encapsulation files for all scaling resolution versions, thereby saving transmission bandwidth.
[0013] On one hand, embodiments of this application provide a data processing method for immersive media, applied to a content playback device, including:
[0014] Obtain the encapsulation file of the immersive media, which includes the media file format data box of the immersive media; the media file format data box includes the scaling strategy of the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer;
[0015] The packaged file is parsed and processed, and the parsed immersive media is displayed.
[0016] When displaying the i-th zoomed area of immersive media, zoom processing is performed on the i-th zoomed area of immersive media according to the media file format data box.
[0017] In this embodiment, the encapsulation file of the immersive media is parsed to obtain the media file format data box of the immersive media, and scaling processing is performed on the i-th scaling region of the immersive media according to the media file format data box. It is evident that, in the target scaling mode, the content playback device (client) does not need to request encapsulation files for all scaling resolution versions, thereby saving transmission bandwidth. Furthermore, while consuming the encapsulation file corresponding to the target scaling mode at the current resolution, the client automatically presents the scaling effect specified by the immersive media content creator according to the target scaling mode, so as to provide the user with the best viewing experience.
[0018] On one hand, embodiments of this application provide a data processing apparatus for immersive media, including:
[0019] The acquisition unit is used to acquire the media file format data box of the immersive media, which includes the scaling strategy of the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer;
[0020] The processing unit is used to perform scaling processing on the i-th scaling region of the immersive media according to the media file format data box.
[0021] On the one hand, embodiments of this application provide another data processing apparatus for immersive media, including:
[0022] The acquisition unit is used to acquire scaling information of immersive media;
[0023] The processing unit is configured to configure the media file format data box of the immersive media according to the scaling information of the immersive media; the media file format data box includes the scaling strategy of the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer; and to add the media file format data box of the immersive media to the encapsulation file of the immersive media.
[0024] On the one hand, embodiments of this application provide another data processing apparatus for immersive media, including:
[0025] The acquisition unit is used to acquire the encapsulation file of the immersive media, which includes the media file format data box of the immersive media; the media file format data box includes the scaling strategy of the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer;
[0026] The processing unit is used to parse the encapsulated file and display the parsed immersive media; when displaying the i-th zoomed area of the immersive media, the i-th zoomed area of the immersive media is zoomed according to the media file format data box.
[0027] On one hand, embodiments of this application provide a data processing device for immersive media, including:
[0028] A processor, adapted to execute computer programs;
[0029] A computer-readable storage medium storing a computer program that, when executed by a processor, implements the data processing method for immersive media described above.
[0030] This application extends the existing media file format data boxes and media presentation description files for immersive media. By supporting target (director) scaling mode, the content production device can formulate different scaling strategies for different resolutions according to the intentions of the immersive media content creator. The client requests the corresponding encapsulated file from the server and consumes it based on the scaling strategy corresponding to the current resolution. It is evident that in target scaling mode, the client does not need to request encapsulated files for all scaling resolution versions, thus saving transmission bandwidth. Furthermore, while consuming the encapsulated file corresponding to the target scaling mode at the current resolution, the client automatically presents the scaling effect specified by the immersive media content creator according to the target scaling mode, providing the user with the best viewing experience. Attached Figure Description
[0031] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0032] Figure 1a An architectural diagram of an immersive media system provided in an exemplary embodiment of this application is shown;
[0033] Figure 1b This invention illustrates a basic block diagram of video encoding provided by an exemplary embodiment of the present application;
[0034] Figure 1c A schematic diagram of 6DoF provided in an exemplary embodiment of this application is shown;
[0035] Figure 1d A schematic diagram of 3DoF provided in an exemplary embodiment of this application is shown;
[0036] Figure 1e A schematic diagram of 3DoF+ provided in an exemplary embodiment of this application is shown;
[0037] Figure 1f This illustration shows a schematic diagram of an input image segmentation provided in an embodiment of this application;
[0038] Figure 2 This illustration shows a schematic diagram of the i-th scaling region provided in an exemplary embodiment of this application;
[0039] Figure 3 A flowchart illustrating a data processing method for immersive media provided in an exemplary embodiment of this application is shown;
[0040] Figure 4A flowchart illustrating another data processing method for immersive media provided by an exemplary embodiment of this application is shown;
[0041] Figure 5 A flowchart illustrating another data processing method for immersive media provided by an exemplary embodiment of this application is shown;
[0042] Figure 6 This invention provides a schematic diagram of the structure of a data processing apparatus for immersive media according to an exemplary embodiment of this application.
[0043] Figure 7 This invention provides a schematic diagram of the structure of another immersive media data processing apparatus according to an exemplary embodiment of the present application.
[0044] Figure 8 This invention provides a schematic diagram of the structure of a content creation device according to an exemplary embodiment of this application.
[0045] Figure 9 A schematic diagram of the structure of a content playback device provided in an exemplary embodiment of this application is shown. Detailed Implementation
[0046] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.
[0047] This application relates to data processing technology for immersive media. Immersive media refers to media files that provide immersive media content, enabling users immersed in the content to experience visual, auditory, and other sensory experiences reminiscent of the real world. In some embodiments, immersive media can be 3DoF (Degree of Freedom) immersive media, 3DoF+ immersive media, or 6DoF immersive media. Immersive media content includes video content represented in three-dimensional (3D) space in various forms, such as three-dimensional video content represented in a spherical form. In some embodiments, immersive media content can be VR (Virtual Reality) video content, panoramic video content, spherical video content, or 360-degree video content; therefore, immersive media can also be called VR video, panoramic video, spherical video, or 360-degree video. Additionally, immersive media content also includes audio content synchronized with the video content represented in three-dimensional space.
[0048] Figure 1aThis application illustrates an architectural diagram of an immersive media system provided in an exemplary embodiment; as shown below. Figure 1a As shown, an immersive media system includes content creation equipment and content playback equipment. Content creation equipment can refer to the computer equipment used by the provider of the immersive media (e.g., the content creator), which can be a terminal (such as a PC, a smart mobile device, or a smartphone) or a server. Content playback equipment can refer to the computer equipment used by the user of the immersive media (e.g., a user), which can be a terminal (such as a PC, a smart mobile device, or a VR device, such as a VR headset or VR glasses). The data processing of immersive media includes data processing on the content creation equipment side and data processing on the content playback equipment side.
[0049] The data processing on the content production device side mainly includes: (1) the acquisition and production process of immersive media content; (2) the encoding and file encapsulation process of immersive media. The data processing on the content playback device side mainly includes: (3) the file decapsulation and decoding process of immersive media; (4) the rendering process of immersive media. In addition, the transmission process of immersive media between the content production device and the content playback device is involved. This transmission process can be based on various transmission protocols, including but not limited to: Dynamic Adaptive Streaming over HTTP (DASH) protocol, HTTP Live Streaming (HLS) protocol, Smart Media Transport Protocol (SMTP), Transmission Control Protocol (TCP), etc.
[0050] The following sections will provide a detailed introduction to each process involved in the data processing of immersive media.
[0051] Figure 1b This illustration shows a basic block diagram of video encoding provided by an exemplary embodiment of this application. (In conjunction with...) Figure 1a and Figure 1b A detailed introduction to each process involved in the data processing of immersive media:
[0052] I. Data processing at the content production device end:
[0053] (1) Acquire media content from immersive media.
[0054] From the perspective of how immersive media content is acquired, it can be divided into two methods: acquiring it through capture devices of real-world sound and visual scenes, and acquiring it through computer generation. In some embodiments, the capture device can refer to a hardware component installed in the content production device, such as a microphone, camera, or sensor on a terminal. In some embodiments, the capture device can also be a hardware device connected to the content production device, such as a camera connected to a server; used to provide the content production device with the acquisition service of immersive media content. The capture device can include, but is not limited to, audio devices, camera devices, and sensing devices. Among them, audio devices can include audio sensors, microphones, etc. Camera devices can include ordinary cameras, stereo cameras, light field cameras, etc. Sensing devices can include laser devices, radar devices, etc. The number of capture devices can be multiple, and these capture devices are deployed at some specific locations in the real space to simultaneously capture audio and video content from different angles in that space, and the captured audio and video content are synchronized in both time and space. Due to the different acquisition methods, the compression encoding methods corresponding to the media content of different immersive media may also differ.
[0055] (2) The production process of immersive media content.
[0056] The captured audio content itself is suitable for audio encoding for immersive media. The captured video content undergoes a series of processing steps before it becomes suitable for video encoding for immersive media. These processing steps include:
[0057] ① Stitching. Since the captured video content is taken by the capture device from different angles, stitching refers to stitching these video contents taken from various angles into a complete video that can reflect a 360-degree visual panorama of real space. That is, the stitched video is a panoramic video (or spherical video) represented in three-dimensional space.
[0058] ② Projection. Projection refers to the process of mapping a stitched three-dimensional video onto a two-dimensional (2D) image. The 2D image formed by projection is called a projected image. Projection methods may include, but are not limited to: latitude and longitude projection and regular hexahedral projection.
[0059] It should be noted that since the capture device can only capture panoramic video, after such video is processed by the content production device and transmitted to the content playback device for corresponding data processing, the user on the content playback device can only view 360-degree video information by performing certain specific actions (such as head rotation). Performing non-specific actions (such as moving the head) will not produce corresponding video changes, resulting in a poor VR experience. Therefore, it is necessary to provide additional depth information that matches the panoramic video to give users a better sense of immersion and a better VR experience. This involves various production techniques, with common techniques including six degrees of freedom (6DoF) production technology. Figure 1c This illustration shows a schematic diagram of 6DoF provided in an exemplary embodiment of this application. 6DoF is divided into window 6DoF, omnidirectional 6DoF, and 6DoF. Window 6DoF means that the user's rotational movement along the X and Y axes, and translation along the Z axis, are restricted; for example, the user cannot see the scene outside the window frame, and the user cannot pass through the window. Omnidirectional 6DoF means that the user's rotational movement along the X, Y, and Z axes is restricted; for example, the user cannot freely move through 3D 360-degree VR content within a restricted movement area. 6DoF means that the user can freely translate along the X, Y, and Z axes; for example, the user can freely walk within 3D 360-degree VR content. Similar to 6DoF are 3DoF and 3DoF+ production techniques. Figure 1d This illustration shows a schematic diagram of 3DoF provided in an exemplary embodiment of this application; as shown Figure 1d As shown, 3DoF refers to a user viewing media content by having their head rotate along the X, Y, and Z axes while the user is fixed at the center point in a three-dimensional space. Figure 1e A schematic diagram of 3DoF+ provided in an exemplary embodiment of this application is shown, such as... Figure 1e As shown, 3DoF+ refers to the ability of a user's head to move within a limited space based on 3DoF to view the images provided by immersive media when the virtual scene provided has a certain depth information.
[0060] (3) The encoding process of media content in immersive media.
[0061] Projected images can be encoded directly, or they can be encapsulated into regions before encoding. Modern mainstream video coding technologies, such as the international video coding standards HEVC (High Efficiency Video Coding), VVC (Versatile Video Coding), and the Chinese national video coding standard AVS (Audio Video Coding Standard), employ a hybrid coding framework, performing the following series of operations and processing on the input raw video signal:
[0062] 1) Block partition structure: The input image is divided into several non-overlapping processing units based on the size of the processing units. A similar compression operation is performed on each processing unit. This processing unit is called a Coding Tree Unit (CTU) or Largest Coding Unit (LCU). The CTU can be further subdivided to obtain one or more basic coding units, called Coding Units (CUs). Each CU is the most basic element in a coding process. Figure 1f This illustration shows a schematic diagram of an input image partitioning method provided in an embodiment of this application. The following describes various encoding methods that may be used for each CU.
[0063] 2) Predictive Coding: This includes intra-frame prediction and inter-frame prediction. The original video signal is predicted by a selected reconstructed video signal to obtain a residual video signal. The content production device needs to determine the most suitable predictive coding mode from among many possible modes for the current CU and inform the content playback device.
[0064] a. Intra-frame prediction: The predicted signal comes from a region within the same image that has already been encoded and reconstructed.
[0065] b. Inter-frame prediction: The predicted signal comes from other encoded images (called reference images) that are different from the current image.
[0066] 3) Transform Coding and Quantization: The residual video signal undergoes transform operations such as Discrete Fourier Transform (DFT) and Discrete Cosine Transform (DCT) to convert the signal into the transform domain, which are called transform coefficients. In the transform domain, the signal undergoes lossy quantization, losing some information, making the quantized signal more suitable for compression. Some video coding standards may offer more than one transform option; therefore, the content production device needs to select one transform for the current encoding CU and inform the content playback device. The fineness of quantization is usually determined by the quantization parameter (QP). A larger QP value means that coefficients with a wider range of values will be quantized into the same output, which usually results in greater distortion and a lower bitrate; conversely, a smaller QP value means that coefficients with a smaller range of values will be quantized into the same output, which usually results in less distortion and a higher bitrate.
[0067] 4) Entropy Coding or Statistical Coding: The quantized transform domain signal is statistically compressed and encoded based on the frequency of each value, ultimately outputting a binary (0 or 1) compressed bitstream. Simultaneously, other information generated during encoding, such as the selected mode and motion vectors, also requires entropy coding to reduce the bit rate. Statistical coding is a lossless coding method that effectively reduces the bit rate required to represent the same signal. Common statistical coding methods include Variable Length Coding (VLC) or Content-Adaptive Binary Arithmetic Coding (CABAC).
[0068] 5) Loop Filtering: After the encoded image undergoes inverse quantization, inverse transform, and prediction compensation operations (the reverse of operations 2-4 above), a reconstructed decoded image is obtained. Compared to the original image, the reconstructed image differs in some information due to the influence of quantization, resulting in distortion. Filtering the reconstructed image, such as deblocking, Sample Adaptive Offset (SAO) filters, or Adaptive Loop Filters (ALF), can effectively reduce the distortion caused by quantization. Since these filtered reconstructed images will serve as a reference for subsequent encoded images to predict future signals, the above filtering operations are also called loop filtering, or filtering operations within the coding loop.
[0069] It should be noted here that if six degrees of freedom (6DoF) technology is used (where users can move relatively freely in a simulated scene), a specific encoding method (such as point cloud encoding) is required during the video encoding process.
[0070] (4) The encapsulation process of immersive media.
[0071] The audio and video streams are encapsulated in a file container according to the immersive media file format (such as the ISO Base Media File Format, ISOBMFF) to form an immersive media file resource. This media file resource can be a media file or media clips forming the immersive media file. Metadata of this immersive media file resource is recorded using Media Presentation Description (MPD) according to the immersive media file format requirements. Metadata here refers to all information related to the presentation of immersive media, including descriptive information about the media content, descriptive information about the window, and signaling information related to the presentation of the media content, etc. Figure 1a As shown, the content production equipment stores media presentation description information and media file resources formed after the data processing process.
[0072] II. Data processing on the content playback device:
[0073] (1) The process of decapsulating and decoding immersive media files;
[0074] The content playback device can dynamically obtain immersive media file resources and corresponding media presentation description information from the content production device, either through recommendations from the content production device or based on user needs on the playback device side. For example, the content playback device can determine the user's orientation and position based on the user's head / eye / body tracking information, and then dynamically request the corresponding media file resources from the content production device based on the determined orientation and position. The media file resources and media presentation description information are transmitted from the content production device to the content playback device via a transmission mechanism (such as DASH, SMT). The file decapsulation process on the content playback device side is the reverse of the file encapsulation process on the content production device side. The content playback device decapsulates the media file resources according to the immersive media file format requirements to obtain audio and video streams. The decoding process on the content playback device side is the reverse of the encoding process on the content production device side. The content playback device decodes the audio stream to restore the audio content. Additionally, the video stream decoding process on the content playback device includes the following: ① Decoding the video stream to obtain a planar projected image. ② The projected image is reconstructed based on the media presentation description information to convert it into a 3D image. Here, reconstruction refers to the process of reprojecting the two-dimensional projected image into 3D space.
[0075] As can be seen from the above encoding process, at the content playback device end, for each CU, after obtaining the compressed bitstream, the content playback device first performs entropy decoding to obtain various mode information and quantized transform coefficients. Each coefficient undergoes inverse quantization and inverse transform to obtain the residual signal. On the other hand, based on the known encoding mode information, the prediction signal corresponding to that CU can be obtained. After adding the two, the reconstructed signal is obtained. Finally, the reconstructed value of the decoded image needs to undergo a loop filtering operation to generate the final output signal.
[0076] (2) The rendering process of immersive media.
[0077] The content playback device renders the audio content obtained from audio decoding and the 3D images obtained from video decoding based on metadata related to rendering and viewport in the media presentation description information. Once rendering is complete, the 3D image is played back and output. If 3DoF and 3DoF+ production techniques are used, the content playback device mainly renders the 3D image based on the current viewpoint, parallax, and depth information. If 6DoF production techniques are used, the content playback device mainly renders the 3D image within the viewport based on the current viewpoint. Here, viewpoint refers to the user's viewing position, parallax refers to the difference in line of sight between the user's two eyes or due to motion, and viewport refers to the viewing area.
[0078] Immersive media systems support data boxes, which are data blocks or objects containing metadata; that is, data boxes contain the metadata of the corresponding media content. Immersive media can include multiple data boxes, such as rotation data boxes, overlay information data boxes, media file format data boxes, etc. In immersive media system scenarios, to provide users with a better viewing experience, content creators often add more diverse presentation formats to the media content, and scaling is an important one. Scaling strategies can be configured in the media format data boxes of immersive media, such as in the ISOBMFF data box. The description information corresponding to the scaling strategy can be configured in the scaling description signaling file, such as in the spherical region scaling descriptor or the planar region scaling descriptor. According to existing encoding standards for immersive media (such as AVS), the syntax of the media file format data boxes for immersive media can be seen in Table 1 below:
[0079] Table 1
[0080]
[0081]
[0082] The semantics of the syntax shown in Table 1 above are as follows: num_regions indicates the number of zoom regions corresponding to the spherical region or 2D region on the projected image of the same omnidirectional video. zoom_reg_width[i] indicates the width of the i-th zoom region; zoom_reg_height[i] indicates the height of the i-th zoom region; zoom_reg_top[i] indicates the vertical offset of the i-th zoom region; zoom_reg_left[i] indicates the horizontal offset of the i-th zoom region. Figure 2 This illustration shows a schematic diagram of the i-th scaling region provided in an exemplary embodiment of this application. (See diagram below.) Figure 2 As shown, 201 represents the width of the projected image to which the i-th zoom region belongs, 202 represents the height of the projected image to which the i-th zoom region belongs, 203 represents the horizontal offset of the i-th zoom region (zoom_reg_left[i]), 204 represents the vertical offset of the i-th zoom region (zoom_reg_top[i]), 205 represents the height of the i-th zoom region (zoom_reg_height[i]), and 206 represents the width of the i-th zoom region (zoom_reg_width[i]). zoom_ratio indicates the zoom ratio of the i-th zoom region, in 2... -3The unit is , where i is a positive integer. When zoom_ratio is 0, it means the size of the i-th zoomed region after scaling is the same as its unscaled size; when zoom_ratio is non-zero, it indicates the actual or approximate ratio of the scaled size of the i-th zoomed region to its unscaled size (original size). zoom_algorithm_type indicates the scaling algorithm type used when rendering the i-th zoomed region. The mapping relationship between the values of zoom_algorithm_type and scaling algorithm types is shown in Table 2.
[0083] Table 2
[0084] Value describe 0 Protruding zoom 1 Spherical zoom (ensuring minimal center distortion) 2 Disc-shaped uniform zoom 3..255 Undefined
[0085] `zoom_symbolization_type` indicates the boundary symbol type of the i-th zoom region; `zoom_area_type` indicates the type of the i-th zoom region. The mapping relationship between the values of `zoom_algorithm_type` and the types of zoom regions is shown in Table 3.
[0086] Table 3
[0087]
[0088] zoom_description carries a text description of the i-th zoom region.
[0089] Description information corresponding to the scaling strategy of the media file format data box of the immersive media is stored in the scaling description signaling file of the immersive media. The scaling description signaling file may include at least one of the Sphere Region Zooming (SRWZ) descriptor and the two-dimensional (2D) Region Zooming (2DWZ) descriptor.
[0090] The Sphere Region Zooming (SRWZ) descriptor is a supplemental property element whose scheme identifier attribute (@schemeIdUri) equals "urn:avs:ims:2018:srwz". The SRWZ descriptor indicates the spherical region of the omnidirectional video in the omnidirectional video track carried by its corresponding representation level, and one or more zoomed regions of the spherical region on the projected image of the omnidirectional video.
[0091] When an SRWZ descriptor exists for the Representation level, and a SphereRegionZoomingBox also exists in the corresponding track for that Representation level, the SRWZ descriptor should carry information equivalent to the SphereRegionZoomingBox. The content playback device can request the container file corresponding to the spherical region zoom operation for omnidirectional video based on the SRWZ descriptor. The SRWZ descriptor should contain the elements and attributes defined in Table 4 below.
[0092] Table 4
[0093]
[0094]
[0095]
[0096] The two-dimensional (2D) region zooming (2DWZ) descriptor corresponding to the media file format data box of immersive media is a supplemental property element with the scheme identifier attribute (@schemeIdUri) equal to "urn:mpeg:mpegI:omaf:2018:2dwz". The 2DWZ descriptor indicates the 2D region on the projected image of the omnidirectional video in the omnidirectional video track carried by its corresponding representation level, as well as one or more zoomed regions of the 2D region on the projected image of the omnidirectional video.
[0097] When a 2DWZ descriptor exists for the Representation level, and a 2D Region Zooming Box also exists in the corresponding track for that Representation level, the 2DWZ descriptor should carry information equivalent to the 2D Region Zooming Box. The content playback device can request a container file corresponding to the 2D region zooming operation on the projected image of the omnidirectional video based on the 2DWZ descriptor. The 2DWZ descriptor should contain the elements and attributes defined in Table 5 below.
[0098] Table 5
[0099]
[0100]
[0101] Based on the media file format data boxes shown in Table 1, combined with the description information in the spherical region scaling descriptor shown in Table 4 and the 2D region scaling descriptor shown in Table 5, only user-defined scaling operations on immersive media at the content playback device end are supported. As mentioned above, user-defined scaling behavior will waste bandwidth and cannot obtain a better viewing experience. In order to save bandwidth and improve the user viewing experience, this application embodiment extends the existing media file format data boxes and media presentation description files for immersive media. The semantics of the extended media file format data box syntax can be seen in Table 6 below:
[0102] Table 6
[0103]
[0104]
[0105] The semantics of the new extended syntax in Table 6 compared to Table 1 are as follows: ①-④:
[0106] ① The scaling flag field `auto_zoom_flag` indicates whether target scaling mode (such as director scaling mode) is enabled. When `auto_zoom_flag` has a valid value, it means that target scaling mode is enabled, i.e., the i-th scaling region needs to be scaled under target scaling mode. When `auto_zoom_flag` has an invalid value, it means that target scaling mode is disabled, i.e., the i-th scaling region does not need to be scaled under target scaling mode, where i is a positive integer. Valid and invalid values are set according to the requirements of the encoding standard. Taking the AVS standard as an example, a valid value is 1, and an invalid value is 0.
[0107] ② The zoom_steps field indicates the number m of zoom steps involved in the zooming process of the i-th zoom region in the target zoom mode, where m is a positive integer; it is used to indicate that the i-th zoom region needs to be zoomed m times in the target zoom mode.
[0108] ③ One zoom step corresponds to one zoom ratio field `zoom_ratio`, so m zoom steps correspond to m `zoom_ratio` fields. The j-th `zoom_ratio` indicates the zoom ratio used when the i-th zoomed area of the immersive media is subjected to the j-th zoom step of the zoom processing. `zoom_ratio` is in increments of 2. -3The unit is m, where j is a positive integer and j≤m; when the j-th zoom_ratio is 0, the j-th zoom_ratio indicates that the size of the i-th zoomed area of the immersive media after the j-th zoom step in the target zoom mode is the same as the size before the zoom step; when the j-th zoom_ratio is non-zero, the j-th zoom_ratio indicates that the ratio between the size of the i-th zoomed area of the immersive media after the j-th zoom step in the target zoom mode and the size before the zoom step is the value of the j-th zoom_ratio.
[0109] ④ One zoom step corresponds to one zoom duration (zoom_duration) and one unit of measurement for the duration (zoom_duration_unit). Therefore, m zoom steps correspond to m zoom_durations and m zoom_duration_units. The j-th zoom_duration indicates the duration of the i-th zoomed area of the immersive media when the j-th zoom step of the zoom process is performed; the value of zoom_duration is non-zero. The j-th zoom_duration_unit indicates the unit of measurement for the duration of the i-th zoomed area of the immersive media when the j-th zoom step of the zoom process is performed; zoom_duration_unit is in seconds, and its value is non-zero.
[0110] The scaling description signaling file includes at least one of the following: a spherical region scaling descriptor and a planar region scaling descriptor. The semantics of the extended spherical region scaling descriptor syntax can be found in Table 7 below:
[0111] Table 7
[0112]
[0113]
[0114]
[0115] Comparing Tables 7 and 4 above, it can be seen that the extended spherical region scaling descriptor in this application embodiment adds descriptive information about the scaling strategy under the target scaling mode (i.e., director scaling mode) compared to the spherical region scaling descriptor in the existing standard. This includes the elements and attributes in Table 7 above: SphRegionZoom.zoomInfo@auto_zoom_flag, SphRegionZoom.zoomInfo@zoom_ratio, SphRegionZoom.zoomInfo@zoom_duration, and SphRegionZoom.zoomInfo@zoom_duration_unit, as well as the related descriptions of these elements and attributes.
[0116] The semantics of the extended planar region scaling descriptor syntax can be found in Table 8 below:
[0117] Table 8
[0118]
[0119]
[0120]
[0121] Comparing Tables 8 and 5 above, it can be seen that the extended planar region scaling descriptor in this embodiment adds descriptive information about the scaling strategy under the target scaling mode (i.e., director scaling mode) compared to the planar region scaling descriptor in the existing standard. This includes the elements and attributes in Table 8 above: twoDRegionZoom.zoomInfo@auto_zoom_flag, twoDRegionZoom.zoomInfo@zoom_ratio, twoDRegionZoom.zoomInfo@zoom_duration, and twoDRegionZoom.zoomInfo@zoom_duration_unit, as well as the related descriptions of these elements and attributes.
[0122] According to the media file format data boxes shown in Table 6 of this application embodiment, combined with the descriptions of scaling strategies in the spherical region scaling descriptors shown in Table 7 and the 2D region scaling descriptors shown in Table 8, under the target scaling mode (such as director scaling mode), the user on the content playback device can obtain and consume the container file corresponding to the current resolution of the content playback device based on the MPD file, without needing to request container files for all scaling resolution versions, thereby saving transmission bandwidth. Furthermore, while consuming the container file corresponding to the target scaling mode at the current resolution, the content playback device automatically presents the scaling effect specified by the immersive media content creator according to the target scaling mode, so as to provide the user with the best viewing experience.
[0123] Figure 3 The flowchart illustrates a data processing method for immersive media provided in an exemplary embodiment of this application; the method can be executed by a content creation device or a content playback device in an immersive media system, and the method includes the following steps S301-S302:
[0124] S301, Obtain the media file format data box of the immersive media, which includes the scaling strategy of the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer.
[0125] S302, perform scaling processing on the i-th scaling region of the immersive media according to the media file format data box.
[0126] In steps S301-S302, the syntax of the media file format data box for immersive media can be found in Table 6 above. The target scaling mode refers to scaling the i-th scaling area according to the scaling strategy when the i-th scaling area in the immersive media meets the scaling conditions (e.g., the playback progress of the immersive media reaches a preset position, or the user's viewpoint turns to a preset area). The scaling strategy is generated based on the scaling information specified by the immersive media content creator; for example, assuming the scaling information specified by the immersive media content creator is: when the user's viewpoint turns to the i-th scaling area, the i-th scaling area is enlarged to twice its original size, then the scaling strategy corresponding to this scaling information carries the position information (e.g., coordinates) of the i-th scaling area, the scaling conditions, the size information (width, height), and the scaling ratio.
[0127] In one implementation, the media file format data box may refer to the ISO Base Media File Format (ISOBMFF) data box, and the target scaling mode may refer to the director scaling mode.
[0128] Before performing scaling processing on the i-th scaling region of the immersive media according to the media file format data box, a scaling description signaling file for the immersive media can be obtained first. This scaling description signaling file includes description information of the scaling strategy and includes at least one of the following: a spherical region scaling descriptor and a planar region scaling descriptor. The spherical region scaling descriptor is encapsulated in a representation level within the media presentation description file of the immersive media, and the number of spherical region scaling descriptors in the representation level is less than or equal to one. The syntax of the spherical region scaling descriptor can be found in Table 7. The planar region scaling descriptor is also encapsulated in a representation level within the media presentation description file of the immersive media, and the number of planar region scaling descriptors in the representation level is less than or equal to one. The syntax of the planar region scaling descriptor can be found in Table 8. After the user opens the target scaling mode, the content playback device presents the immersive media file according to the scaling description signaling file and the media file format data box.
[0129] In this embodiment, a media file format data box of the immersive media is obtained. This media file format data box includes the scaling strategy of the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer. Scaling processing is performed on the i-th scaling region of the immersive media according to the media file format data box. It is evident that in the target scaling mode, the content playback device does not need to request container files for all scaling resolution versions, thereby saving transmission bandwidth.
[0130] Figure 4 A flowchart illustrating another immersive media data processing method provided in an exemplary embodiment of this application is shown; the method is performed by a content creation device in an immersive media system, and includes the following steps S401-S403:
[0131] S401, Obtain scaling information for immersive media.
[0132] Scaling information is generated based on the content creator's intent. For example, during the production process, the content creator may perform scaling on the immersive media. In one implementation, the content creator may first perform scaling on the i-th scaling area of the immersive media, such as shrinking the i-th scaling area for a few minutes and then enlarging it for a few minutes; or shrinking it by a factor of several and then enlarging it by a factor of several, etc., and then specify the scaling information based on the scaling effect of the scaling on the i-th scaling area. Alternatively, if the content creator knows the resolution of the immersive media, they may not need to perform scaling on the i-th scaling area first, and can directly specify the scaling information according to the resolution. The scaling information is used to indicate the corresponding scaling parameters when the i-th scaling area is scaled, including but not limited to: the position and size of the i-th scaling area (e.g., width, height, coordinates), the scaling steps performed on the i-th scaling area (e.g., shrinking and then enlarging), the scaling ratio (e.g., shrinking by a factor of several and enlarging by a factor of several), the duration of the scaling steps (e.g., shrinking for a few minutes and then enlarging for a few minutes), etc.
[0133] S402, Configure the media file format data box of the immersive media according to the scaling information of the immersive media; the media file format data box includes the scaling strategy of the i-th scaling area of the immersive media in the target scaling mode, where i is a positive integer.
[0134] Based on Table 6 above, the configuration process in step S402 may include the following (1)-(4):
[0135] (1) The scaling strategy includes the scaling flag field auto_zoom_flag; when the scaling information of the immersive media indicates that the i-th scaling area needs to be scaled in the target scaling mode, the scaling flag field is configured to a valid value, such as configuring the value of auto_zoom_flag to 1.
[0136] (2) The scaling strategy includes the zoom steps field zoom_steps; then, when the scaling information indicates that the i-th zoom area of the immersive media needs to be scaled in the target scaling mode, the value of the zoom steps field is configured to be m, where m is a positive integer.
[0137] (3) One scaling step corresponds to one scaling ratio field (zoom_ratio), so m scaling steps correspond to m zoom_ratios. The j-th scaling step in the m scaling steps corresponds to the j-th scaling ratio field (zoom_ratio) in the m scaling ratio fields, where j is a positive integer and j≤m. If the scaling information indicates that the size of the i-th scaling area of the immersive media after the j-th scaling step is the same as the size before the scaling process, then the j-th scaling ratio field is configured as an invalid value; if the scaling information indicates that the size of the i-th scaling area after the j-th scaling step is different from the size before the scaling process, then the scaling ratio field is configured as a valid value. The valid value is the ratio between the size of the i-th scaling area after the j-th scaling step and the size before the scaling process, as indicated by the scaling information. For example, if the scaling information of the immersive media indicates that the j-th scaling step for the i-th scaling area should be scaled up by 2 times, then the value of the j-th scaling ratio field in the m scaling ratio fields can be configured as 16.
[0138] (4) Each zoom step corresponds to a zoom duration (zoom_duration) and a duration measurement unit (zoom_duration_unit). Therefore, m zoom steps correspond to m zoom_durations and m zoom_duration_units. The j-th zoom step corresponds to the j-th zoom duration field and the j-th zoom duration unit field, where j is a positive integer and j≤m. Then, the value of the duration indicated by the zoom information when the j-th zoom area is subjected to the j-th zoom step is configured as the value of the j-th zoom duration field; and the measurement unit of the duration indicated by the zoom information when the j-th zoom area is subjected to the j-th zoom step is configured as the value of the j-th zoom duration unit field. For example, if the immersive media's zoom information indicates that the j-th zoom area is zoomed in for 3 minutes during the j-th zoom step, then the value of the duration of the j-th zoom step in the m zoom time fields is configured as 3, and the value of the duration unit field in the m zoom time unit fields is configured as 60.
[0139] In addition, the scaling description signaling file for immersive media can be configured based on the scaling information. The scaling description signaling file includes a description of the scaling strategy. The syntax of the scaling description signaling file can be found in Tables 7 and 8. The configuration method of the extended fields in the scaling description signaling file can be referred to the configuration method of the corresponding fields in the media file format data box mentioned above, and will not be repeated here.
[0140] The following is a detailed description of the scheme of this application embodiment through a complete example: The immersive media content creator specifies the following scaling information for video A: From the 10th to the 20th minute of video A (00:10:00-00:20:00), region B is scaled. Specifically, from the 10th to the 13th minute (00:10:00-00:13:00), region B is enlarged to twice its original size; from the 13th to the 17th minute (00:13:00-00:17:00), region B is restored to its original size; and from the 17th to the 20th minute (00:17:00-00:20:00), region B is enlarged to four times its original size. The content production device, based on the scaling information specified by the content creator for video A, configures the value of the scaling flag field to 1 and the value of the scaling step field to 3. The value of the scaling ratio field for scaling step 1 is configured to 16 (16×2). -3 =2), the duration field is configured to 3, and the duration unit field is configured to 60. It can be understood that the duration is calculated as 3 × 60s = 180s, or 3 minutes. Similarly, in scaling step 2, the scaling ratio field is configured to 0, the duration field to 4, and the duration unit field to 60; in scaling step 3, the scaling ratio field is configured to 32, the duration field to 3, and the duration unit field to 60.
[0141] It should be noted that, based on the scaling information specified by the content creator, the content production device will configure media file format data boxes and corresponding scaling description signaling files for various resolutions for immersive media. For example, based on the scaling information specified by the content creator, the content production device provides media file format data box 1 and scaling description signaling file 1 for video A at 4K resolution (4096×2160 pixels) to indicate that when video A is scaled at 4K resolution, it will present a scaling effect of "2x magnification → original ratio → 4x magnification". In addition, the content production device provides media file format data box 2 and scaling description signaling file 2 for video A at 2K resolution to indicate that when video A is scaled at 2K resolution, it will present a scaling effect of "1.5x magnification → original ratio → 3x magnification".
[0142] S403 adds the media file format data box of the immersive media to the immersive media's container file.
[0143] In one implementation, the content production device adds immersive media with the same content but different resolutions and their corresponding media file format data boxes to the encapsulation file of the immersive media.
[0144] In some embodiments, the content production device can package all media file format data boxes at different resolutions of the immersive media and send the packaged file to the content playback device, so that the content playback device can request the corresponding encapsulation file according to the current resolution and the packaged file.
[0145] In this embodiment, the content production device configures a media file format data box based on the immersive media and its scaling information, and adds the media file format data box of the immersive media to the encapsulation file of the immersive media. This enables the content playback device to request the encapsulation file corresponding to the target scaling mode at the current resolution from the server based on the media file format data box and consume it, without needing to request encapsulation files for all scaling resolution versions, thereby saving transmission bandwidth.
[0146] Figure 5 A flowchart illustrating another immersive media data processing method provided in an exemplary embodiment of this application is shown; the method is performed by a content playback device in an immersive media system, and the method includes the following steps S501-S503:
[0147] S501, Obtain the encapsulation file of the immersive media, which includes the media file format data box of the immersive media; the media file format data box includes the scaling strategy of the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer.
[0148] S502 parses and processes the encapsulated file and displays the parsed immersive media.
[0149] In one implementation, the content playback device first decapsulates the encapsulated file to obtain the immersive media's encoded file and the immersive media's media file format data box, and then decodes and displays the immersive media's encoded file.
[0150] S503, when displaying the i-th zoomed area of the immersive media, perform zoom processing on the i-th zoomed area of the immersive media according to the media file format data box.
[0151] Based on Table 6 above, the scaling process in step S503 may include the following (1)-(4):
[0152] (1) The scaling strategy includes the scaling flag field auto_zoom_flag; when the scaling flag field is a valid value, it indicates that the target scaling mode is enabled, and the content playback device performs scaling processing on the i-th scaling region of the immersive media. The scaling processing may involve requesting the video corresponding to the size of the i-th scaling region after scaling processing from the server and playing it.
[0153] (2) The scaling strategy includes the zoom_steps field; when the value of the zoom_steps field is m, m scaling operations are performed on the i-th scaling area of the immersive media in the target scaling mode. Where m is a positive integer. For example, if the value of the zoom_steps field is 3, then in the target scaling mode, the content playback device needs to perform 3 scaling operations on the i-th scaling area of the immersive media.
[0154] (3) One scaling step corresponds to one scaling ratio field (zoom_ratio), so m scaling steps correspond to m zoom_ratios. The j-th scaling step in the m scaling steps corresponds to the j-th scaling ratio field (zoom_ratio) in the m scaling ratio fields, where j is a positive integer and j≤m. When the j-th scaling ratio field is invalid, the size of the i-th scaling area is scaled to the size of the i-th scaling area before scaling in the target scaling mode; when the j-th scaling ratio field is valid, the j-th scaling step of scaling is performed on the i-th scaling area of the immersive media according to the valid value in the target scaling mode, so that the ratio between the size of the i-th scaling area of the immersive media after the j-th scaling step and the size of the i-th scaling area of the immersive media before scaling reaches the valid value.
[0155] (4) Each zoom step corresponds to a zoom duration (zoom_duration) and a duration measurement unit (zoom_duration_unit). Therefore, m zoom steps correspond to m zoom durations and m duration measurement units. The j-th zoom step corresponds to the j-th zoom duration field and the j-th zoom duration unit field, where j is a positive integer and j≤m. The j-th zoom step is the zoom processing performed on the i-th zoom area of the immersive media in the target zoom mode. The duration of the j-th zoom step is indicated by both the j-th zoom duration field and the j-th zoom duration unit field. It can be understood that during the zoom duration, the content playback device continuously zooms the image in the i-th zoom area of the immersive media until the zoom duration ends. For example, if the immersive media displays 20 frames of images during the zoom duration, the content playback device zooms and displays the i-th zoom area of these 20 frames.
[0156] In addition, before the content playback device obtains the encapsulation file of the immersive media, it can first obtain the MPD file of the immersive media. The MPD file includes scaling description signaling files for various resolutions. The content playback device obtains the encapsulation file corresponding to the current resolution of the content playback device based on the MPD file, and presents the scaling effect of the immersive media in the encapsulation file according to the implementation methods of steps (1) to (4) above.
[0157] The following is a detailed explanation of the solution in this application embodiment through a complete example: Assume that both User 1 and User 2 have selected Director Scaling mode, and User 1's base resolution is 4K. User 1 requests a video file from the server at the presentation level with Director Scaling mode enabled for 4K resolution. User 2's base resolution is 2K, and User 2 requests a video file at the presentation level with Director Scaling mode enabled for 2K resolution. The server receives the requests from User 1 and User 2, encapsulates the encapsulated files corresponding to 2K and 4K resolutions respectively, and pushes them to User 1 and User 2. The encapsulated file 1 of the immersive media received by User 1 includes:
[0158] auto_zoom_flag=1; zoom_steps=3;
[0159] step1: zoom_ratio=16; zoom_duration=3; zoom_duration_unit=60;
[0160] step2: zoom_ratio=0; zoom_duration=4; zoom_duration_unit=60;
[0161] step3: zoom_ratio=32; zoom_duration=3; zoom_duration_unit=60;
[0162] The immersive media package file 2 received by User 2 includes:
[0163] auto_zoom_flag=1; zoom_steps=3;
[0164] step1: zoom_ratio=12; zoom_duration=3; zoom_duration_unit=60;
[0165] step2: zoom_ratio=0; zoom_duration=4; zoom_duration_unit=60;
[0166] step3: zoom_ratio=24; zoom_duration=3; zoom_duration_unit=60;
[0167] Furthermore, the immersive media encapsulation files 1 and 2 received by users 1 and 2 may also include the location information, size information, and conditions for performing scaling processing of the scaling area i. Assuming the scaling processing condition is that scaling processing of the scaling area i is performed when the playback progress reaches the 10th minute, then the content playback device 1 used by user 1 will enlarge the scaling area i to twice its original size from the 10th to the 13th minute (00:10:00-00:13:00), restore the scaling area i to its original size from the 13th to the 17th minute (00:13:00-00:17:00), enlarge the scaling area i to four times its original size from the 17th to the 20th minute (00:17:00-00:20:00), and end scaling at the 20th minute (00:20:00). Similarly, the content playback device 2 used by user 2 will enlarge the zoom area i to 1.5 times its original size from the 10th to the 13th minute (00:10:00-00:13:00), restore the zoom area i to its original size from the 13th to the 17th minute (00:13:00-00:17:00), enlarge the zoom area i to 3 times its original size from the 17th to the 20th minute (00:17:00-00:20:00), and end the zooming at the 20th minute (00:20:00).
[0168] In this embodiment, the content playback device parses the container file of the immersive media to obtain the media file format data box of the immersive media, and performs scaling processing on the i-th scaling region of the immersive media according to the media file format data box. It is evident that in the target scaling mode, the content playback device does not need to request container files for all scaling resolution versions, thereby saving transmission bandwidth. Furthermore, while consuming the container file corresponding to the target scaling mode at the current resolution, the content playback device automatically presents the scaling effect specified by the immersive media content creator according to the target scaling mode, so as to provide the user with the best viewing experience.
[0169] The methods of the embodiments of this application have been described in detail above. In order to facilitate better implementation of the above solutions of the embodiments of this application, the apparatus of the embodiments of this application is provided below.
[0170] Please see Figure 6 , Figure 6 This illustration shows a schematic diagram of the structure of a data processing apparatus for immersive media according to an exemplary embodiment of this application; the data processing apparatus for immersive media can be a computer program (including program code) running on a content production device, for example, the data processing apparatus for immersive media can be application software in the content production device. Figure 6 As shown, the immersive media data processing device includes an acquisition unit 601 and a processing unit 602.
[0171] In one exemplary embodiment, the immersive media data processing apparatus can be used to perform Figure 3 The corresponding steps in the method shown; then:
[0172] The acquisition unit 601 is used to acquire the media file format data box of the immersive media, wherein the media file format data box includes the scaling strategy of the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer;
[0173] The processing unit 602 is configured to perform scaling processing on the i-th scaling region of the immersive media according to the media file format data box.
[0174] In one implementation, the media file format data box includes the International Organization for Standardization (ISO) Basic Media File Format Data Box; the target scaling mode includes the director scaling mode.
[0175] In one implementation, the scaling strategy includes a scaling flag field;
[0176] When the scaling flag field is a valid value, the scaling flag field is used to indicate that the i-th scaling region of the immersive media needs to be scaled in the target scaling mode.
[0177] In one implementation, the scaling strategy includes a scaling step field, the value of which is m, where m is a positive integer; the scaling step field is used to indicate that the number of scaling steps included when the i-th scaling region of the immersive media is scaled in the target scaling mode is m.
[0178] In one implementation, the scaling process includes m scaling steps, where m is a positive integer; the scaling strategy includes m scaling ratio fields; the j-th scaling step in the m scaling steps corresponds to the j-th scaling ratio field in the m scaling ratio fields, where j is a positive integer and j≤m;
[0179] The j-th scaling ratio field is used to indicate the scaling ratio used when the i-th scaling area of the immersive media is subjected to the j-th scaling step of the scaling process; the scaling ratio is in 2... -3 As a unit;
[0180] When the j-th scaling ratio field is invalid, the j-th scaling ratio field is used to indicate that the size of the i-th scaling area of the immersive media after the j-th scaling step of the scaling process performed in the target scaling mode is the same as the size before the scaling process is performed;
[0181] When the j-th scaling ratio field is a valid value, the j-th scaling ratio field is used to indicate the ratio between the size of the i-th scaling area of the immersive media after the j-th scaling step of the scaling process performed in the target scaling mode and the size before the scaling process is performed.
[0182] In one implementation, the scaling process includes m scaling steps, where m is a positive integer; the scaling strategy includes m scaling duration fields and m scaling duration unit fields; the j-th scaling step in the m scaling steps corresponds to the j-th scaling duration field in the m scaling duration fields and the j-th scaling duration unit field in the m scaling duration unit fields, where j is a positive integer and j≤m;
[0183] The j-th zoom duration field is used to indicate the value of the duration when the j-th zoom step of the zoom process is performed on the ith zoom area of the immersive media; the zoom duration field is a non-zero value.
[0184] The j-th scaling duration unit field is used to indicate the unit of measurement for the duration of the j-th scaling step when the i-th scaling area of the immersive media is subjected to scaling processing. The unit of measurement is in seconds, and the scaling duration unit field is a non-zero value.
[0185] In one embodiment, the acquisition unit 601 is further configured to:
[0186] Obtain the scaling description signaling file of the immersive media, the scaling description signaling file including description information of the scaling strategy.
[0187] In one embodiment, the scaling description signaling file includes at least one of the following: a spherical region scaling descriptor and a planar region scaling descriptor;
[0188] The spherical region scaling descriptor is encapsulated in the representation level of the media presentation description file of the immersive media, and the number of the spherical region scaling descriptors in the representation level is less than or equal to 1.
[0189] The planar region scaling descriptor is encapsulated in a representation level within the media presentation description file of the immersive media, and the number of the planar region scaling descriptors in the representation level is less than or equal to 1.
[0190] In another exemplary embodiment, the immersive media data processing apparatus can be used to perform Figure 4 The corresponding steps in the method shown; then:
[0191] Acquisition unit 601 is used to acquire scaling information of immersive media;
[0192] The processing unit 602 is configured to configure the media file format data box of the immersive media according to the scaling information of the immersive media; the media file format data box includes the scaling strategy of the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer; and add the media file format data box of the immersive media to the encapsulation file of the immersive media.
[0193] In one implementation, the scaling strategy includes a scaling flag field; the processing unit 602 is further configured to configure the media file format data box of the immersive media according to the scaling information of the immersive media, specifically for:
[0194] When the scaling information indicates that the i-th scaling region of the immersive media needs to be scaled in the target scaling mode, the scaling flag field is configured to a valid value.
[0195] In one implementation, the scaling strategy includes a scaling step field; the processing unit 602 is further configured to configure the media file format data box of the immersive media according to the scaling information of the immersive media, specifically for:
[0196] When the scaling information indicates that the i-th scaling region of the immersive media needs to be scaled in the target scaling mode, m scaling steps are configured to be performed. The value of the scaling step field is m, where m is a positive integer.
[0197] In one implementation, the scaling process includes m scaling steps, where m is a positive integer; the scaling strategy includes m scaling ratio fields; the j-th scaling step in the m scaling steps corresponds to the j-th scaling ratio field in the m scaling ratio fields, where j is a positive integer and j≤m; the processing unit 602 is further configured to configure the media file format data box of the immersive media according to the scaling information of the immersive media, specifically for:
[0198] If the scaling information indicates that the size of the i-th scaling area of the immersive media after the j-th scaling step of the scaling process is the same as the size before the scaling process, then the j-th scaling ratio field is configured as an invalid value;
[0199] If the scaling information indicates that the size of the i-th scaling region after the j-th scaling step is different from the size before the scaling process, then the scaling ratio field is configured as a valid value, wherein the valid value is the ratio between the size of the i-th scaling region after the j-th scaling step and the size before the scaling process, as indicated by the scaling information.
[0200] In one implementation, the scaling process includes m scaling steps, where m is a positive integer; the scaling strategy includes m scaling duration fields and m scaling duration unit fields; the j-th scaling step in the m scaling steps corresponds to the j-th scaling duration field in the m scaling duration fields and the j-th scaling duration unit field in the m scaling duration unit fields, where j is a positive integer and j≤m; the processing unit 602 is further configured to configure the media file format data box of the immersive media according to the scaling information of the immersive media, specifically for:
[0201] Configure the value of the duration when the i-th zoom region indicated by the zoom information is subjected to the j-th zoom step as the value of the j-th zoom duration field; and configure the unit of measurement of the duration when the i-th zoom region indicated by the zoom information is subjected to the j-th zoom duration unit field as the value of the j-th zoom duration unit field.
[0202] In one embodiment, the processing unit 602 is further configured to:
[0203] Configure the scaling description signaling file of the immersive media according to the scaling information, wherein the scaling description signaling file includes description information of the scaling strategy;
[0204] The scaling description signaling file is encapsulated into the representation level of the media presentation description file of the immersive media.
[0205] According to one embodiment of the present invention, Figure 6 The data processing apparatus for immersive media shown can be constructed by combining the various units into one or more other units, or by further dividing one or more units into several functionally smaller units. This achieves the same operation without affecting the technical effect of the embodiments of the present invention. The above units are based on logical function division. In practical applications, the function of one unit can be implemented by multiple units, or the function of multiple units can be implemented by one unit. In other embodiments of this application, the data processing apparatus for immersive media may also include other units. In practical applications, these functions can also be implemented with the assistance of other units, and can be implemented by multiple units working together. According to another embodiment of this application, the data processing apparatus for immersive media can be executed by running on a general-purpose computing device, such as a computer, which includes processing elements and storage elements such as a central processing unit (CPU), random access memory (RAM), and read-only memory (ROM). Figure 3 or Figure 4 The computer program (including program code) for each step involved in the corresponding method shown, to construct such... Figure 3 or Figure 4The present invention describes a data processing apparatus for immersive media and a data processing method for implementing immersive media according to embodiments of the present application. The computer program may be recorded on, for example, a computer-readable recording medium, loaded onto the aforementioned computing device via the computer-readable recording medium, and run therein.
[0206] Based on the same inventive concept, the principle and beneficial effects of the data processing device for immersive media provided in the embodiments of this application are similar to the principle and beneficial effects of the data processing method for immersive media in the embodiments of this application. For details, please refer to the principle and beneficial effects of the method implementation. For the sake of brevity, these will not be repeated here.
[0207] Please see Figure 7 , Figure 7 This illustration shows a schematic diagram of another immersive media data processing apparatus provided in an exemplary embodiment of this application; the immersive media data processing apparatus can be a computer program (including program code) running on a content playback device, for example, the immersive media data processing apparatus can be application software in the content playback device. Figure 7 As shown, the immersive media data processing device includes an acquisition unit 701 and a processing unit 702.
[0208] In one exemplary embodiment, the immersive media data processing apparatus can be used to perform Figure 3 The corresponding steps in the method shown; then:
[0209] The acquisition unit 701 is used to acquire the media file format data box of the immersive media, wherein the media file format data box includes the scaling strategy of the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer;
[0210] The processing unit 702 is configured to perform scaling processing on the i-th scaling region of the immersive media according to the media file format data box.
[0211] In one implementation, the media file format data box includes the International Organization for Standardization (ISO) Basic Media File Format Data Box; the target scaling mode includes the director scaling mode.
[0212] In one implementation, the scaling strategy includes a scaling flag field;
[0213] When the scaling flag field is a valid value, the scaling flag field is used to indicate that the i-th scaling region of the immersive media needs to be scaled in the target scaling mode.
[0214] In one implementation, the scaling strategy includes a scaling step field, the value of which is m, where m is a positive integer; the scaling step field is used to indicate that the number of scaling steps included when the i-th scaling region of the immersive media is scaled in the target scaling mode is m.
[0215] In one implementation, the scaling process includes m scaling steps, where m is a positive integer; the scaling strategy includes m scaling ratio fields; the j-th scaling step in the m scaling steps corresponds to the j-th scaling ratio field in the m scaling ratio fields, where j is a positive integer and j≤m;
[0216] The j-th scaling ratio field is used to indicate the scaling ratio used when the i-th scaling area of the immersive media is subjected to the j-th scaling step of the scaling process; the scaling ratio is in 2... -3 As a unit;
[0217] When the j-th scaling ratio field is invalid, the j-th scaling ratio field is used to indicate that the size of the i-th scaling area of the immersive media after the j-th scaling step of the scaling process performed in the target scaling mode is the same as the size before the scaling process is performed;
[0218] When the j-th scaling ratio field is a valid value, the j-th scaling ratio field is used to indicate the ratio between the size of the i-th scaling area of the immersive media after the j-th scaling step of the scaling process performed in the target scaling mode and the size before the scaling process is performed.
[0219] In one implementation, the scaling process includes m scaling steps, where m is a positive integer; the scaling strategy includes m scaling duration fields and m scaling duration unit fields; the j-th scaling step in the m scaling steps corresponds to the j-th scaling duration field in the m scaling duration fields and the j-th scaling duration unit field in the m scaling duration unit fields, where j is a positive integer and j≤m;
[0220] The j-th zoom duration field is used to indicate the value of the duration when the j-th zoom step of the zoom process is performed on the ith zoom area of the immersive media; the zoom duration field is a non-zero value.
[0221] The j-th scaling duration unit field is used to indicate the unit of measurement for the duration of the j-th scaling step when the i-th scaling area of the immersive media is subjected to scaling processing. The unit of measurement is in seconds, and the scaling duration unit field is a non-zero value.
[0222] In one embodiment, the acquisition unit 701 is further configured to:
[0223] Obtain the scaling description signaling file of the immersive media, the scaling description signaling file including description information of the scaling strategy.
[0224] In one embodiment, the scaling description signaling file includes at least one of the following: a spherical region scaling descriptor and a planar region scaling descriptor;
[0225] The spherical region scaling descriptor is encapsulated in the representation level of the media presentation description file of the immersive media, and the number of the spherical region scaling descriptors in the representation level is less than or equal to 1.
[0226] The planar region scaling descriptor is encapsulated in a representation level within the media presentation description file of the immersive media, and the number of the planar region scaling descriptors in the representation level is less than or equal to 1.
[0227] In another exemplary embodiment, the immersive media data processing apparatus can be used to perform Figure 5 The corresponding steps in the method shown; then:
[0228] The acquisition unit 701 is used to acquire the encapsulation file of the immersive media, wherein the encapsulation file includes the media file format data box of the immersive media; the media file format data box includes the scaling strategy of the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer;
[0229] The processing unit 702 is used to parse the encapsulated file and display the parsed immersive media; when displaying the i-th zoomed area of the immersive media, the i-th zoomed area of the immersive media is zoomed according to the media file format data box.
[0230] In one implementation, the scaling strategy includes a scaling flag field; the processing unit 702 is further configured to perform scaling processing on the i-th scaling region of the immersive media according to the media file format data box, specifically configured to:
[0231] When the value of the scaling flag field is valid, scaling processing is performed on the i-th scaling region of the immersive media in the target scaling mode.
[0232] In one implementation, the scaling strategy includes a scaling step field, the value of which is m, where m is a positive integer; the processing unit 702 is further configured to perform scaling processing on the i-th scaling region of the immersive media according to the media file format data box, specifically configured to:
[0233] In the target scaling mode, m scaling operations are performed on the i-th scaling region of the immersive media.
[0234] In one implementation, the scaling process includes m scaling steps, where m is a positive integer; the scaling strategy includes m scaling ratio fields; the j-th scaling step in the m scaling steps corresponds to the j-th scaling ratio field in the m scaling ratio fields, where j is a positive integer and j≤m; the processing unit 702 is further configured to perform scaling processing on the i-th scaling region of the immersive media according to the media file format data box, specifically configured to:
[0235] When the j-th scaling ratio field is invalid, the j-th scaling step of the scaling process is performed on the i-th scaling area of the immersive media in the target scaling mode, so that the size of the i-th scaling area of the immersive media after the j-th scaling step of the scaling process is performed is the same as the size of the i-th scaling area of the immersive media before the scaling process is performed.
[0236] When the j-th scaling ratio field is a valid value, in the target scaling mode, the j-th scaling step of the scaling process is performed on the i-th scaling area of the immersive media according to the valid value, so that the ratio between the size of the i-th scaling area of the immersive media after the j-th scaling step is performed and the size of the i-th scaling area of the immersive media before the scaling process is performed reaches the valid value.
[0237] In one implementation, the scaling process includes m scaling steps, where m is a positive integer; the scaling strategy includes m scaling duration fields and m scaling duration unit fields; the j-th scaling step in the m scaling steps corresponds to the j-th scaling duration field in the m scaling duration fields and the j-th scaling duration unit field in the m scaling duration unit fields, where j is a positive integer and j≤m; the processing unit 702 is further configured to perform scaling processing on the i-th scaling region of the immersive media according to the media file format data box, specifically configured to:
[0238] According to the common indication of the j-th zoom duration field and the j-th zoom duration unit field, the j-th zoom step of the zoom processing is performed on the i-th zoom area of the immersive media in the target zoom mode.
[0239] In one embodiment, the processing unit 702 is further configured to:
[0240] Obtain the scaling description signaling file of the immersive media, wherein the scaling description signaling file includes description information of the scaling strategy;
[0241] The acquisition unit 701 is also used to acquire the encapsulation file of the immersive media, specifically for:
[0242] The encapsulation file of the immersive media is obtained based on the scaling description signaling file.
[0243] According to one embodiment of the present invention, Figure 7 The data processing apparatus for immersive media shown can be constructed by combining the various units into one or more other units, or by further dividing one or more units into several functionally smaller units. This achieves the same operation without affecting the technical effect of the embodiments of the present invention. The above units are based on logical function division. In practical applications, the function of one unit can be implemented by multiple units, or the function of multiple units can be implemented by one unit. In other embodiments of this application, the data processing apparatus for immersive media may also include other units. In practical applications, these functions can also be implemented with the assistance of other units, and can be implemented by multiple units working together. According to another embodiment of this application, the data processing apparatus for immersive media can be executed by running on a general-purpose computing device, such as a computer, which includes processing elements and storage elements such as a central processing unit (CPU), random access memory (RAM), and read-only memory (ROM). Figure 3 or Figure 5 The computer program (including program code) for each step involved in the corresponding method shown, to construct such... Figure 3 or Figure 5 The present invention describes a data processing apparatus for immersive media and a data processing method for implementing immersive media according to embodiments of the present application. The computer program may be recorded on, for example, a computer-readable recording medium, loaded onto the aforementioned computing device via the computer-readable recording medium, and run therein.
[0244] Based on the same inventive concept, the principle and beneficial effects of the data processing device for immersive media provided in the embodiments of this application are similar to the principle and beneficial effects of the data processing method for immersive media in the embodiments of this application. For details, please refer to the principle and beneficial effects of the method implementation. For the sake of brevity, these will not be repeated here.
[0245] Figure 8 This illustration shows a schematic diagram of a content creation device provided in an exemplary embodiment of this application; the content creation device may refer to a computer device used by an immersive media provider, which may be a terminal (such as a PC, a smart mobile device (such as a smartphone) or a server. Figure 8 As shown, the content creation device includes a capture device 801, a processor 802, a memory 803, and a transmitter 804. Wherein:
[0246] The capture device 801 is used to acquire raw data (including audio and video content synchronized in time and space) of real-world sound-visual scenes to obtain immersive media. The capture device 801 may include, but is not limited to, audio devices, camera devices, and sensing devices. Audio devices may include audio sensors, microphones, etc. Camera devices may include ordinary cameras, stereo cameras, light field cameras, etc. Sensing devices may include laser devices, radar devices, etc.
[0247] The processor 802 (or Central Processing Unit, CPU) is the processing core of the content creation device. This processor 802 is adapted to implement one or more program instructions, specifically to load and execute one or more program instructions to achieve... Figure 3 or Figure 4 The flowchart illustrates the data processing method for immersive media.
[0248] Memory 803 is a memory device in the content creation apparatus used to store programs and media resources. It is understood that memory 803 here can include the built-in storage medium of the content creation apparatus, or it can include extended storage media supported by the content creation apparatus. It should be noted that the memory can be high-speed RAM, or non-volatile memory, such as at least one disk storage device; optionally, it can also be at least one memory located remotely from the aforementioned processor. The memory provides storage space for storing the operating system of the content creation apparatus. Furthermore, this storage space is also used to store computer programs, which include program instructions adapted to be called and executed by the processor to perform the various steps of the immersive media data processing method. In addition, memory 803 can also be used to store immersive media files formed after processing by the processor, which include media file resources and media presentation description information.
[0249] The transmitter 804 is used to enable transmission and interaction between the content creation device and other devices, specifically to facilitate the transmission of immersive media between the content creation device and the content playback device. That is, the content creation device uses the transmitter 804 to transmit relevant media resources for immersive media to the content playback device.
[0250] Please see again Figure 8 The processor 802 may include a converter 821, an encoder 822, and a packager 823; wherein:
[0251] Converter 821 performs a series of conversion processes on captured video content to make it suitable for immersive media video encoding. The conversion processes may include stitching and projection; optionally, they may also include region encapsulation. Converter 821 can convert captured 3D video content into 2D images and provide them to the encoder for video encoding.
[0252] Encoder 822 is used to encode the captured audio content to form an audio stream for immersive media. It is also used to encode the 2D image obtained by converter 821 to obtain a video stream.
[0253] The encapsulator 823 encapsulates audio and video streams into a file container according to the immersive media file format (such as ISOBMFF) to form an immersive media file resource. This media file resource can be a media file or a media segment forming an immersive media file. It also records the metadata of the immersive media file resource using media presentation description information according to the immersive media file format requirements. The encapsulated immersive media file obtained by the encapsulator is stored in memory and provided to the content playback device as needed for immersive media presentation.
[0254] In one exemplary embodiment, the processor 802 (i.e., the devices included in the processor) executes instructions by calling one or more instructions stored in memory. Figure 3 The illustrated steps of the immersive media data processing method. Specifically, memory 803 stores one or more first instructions, which are adapted to be loaded by processor 802 and executed in the following steps:
[0255] Obtain the media file format data box of the immersive media, wherein the media file format data box includes the scaling strategy of the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer;
[0256] Scaling is performed on the i-th scaling region of the immersive media according to the media file format data box.
[0257] In one embodiment, the media file format data box includes the International Organization for Standardization (ISO) Basic Media File Format Data Box; the target scaling mode includes the director scaling mode.
[0258] In one embodiment, the scaling strategy includes a scaling flag field. When the scaling flag field is a valid value, the scaling flag field is used to indicate that the i-th scaling region of the immersive media needs to be scaled in the target scaling mode.
[0259] In one embodiment, the scaling strategy includes a scaling step field, the value of which is m, where m is a positive integer; the scaling step field is used to indicate that the number of scaling steps included when the i-th scaling region of the immersive media is scaled in the target scaling mode is m.
[0260] In one embodiment, the scaling process includes m scaling steps, where m is a positive integer; the scaling strategy includes m scaling ratio fields; the j-th scaling step in the m scaling steps corresponds to the j-th scaling ratio field in the m scaling ratio fields, where j is a positive integer and j≤m;
[0261] The j-th scaling ratio field is used to indicate the scaling ratio used when the i-th scaling area of the immersive media is subjected to the j-th scaling step of the scaling process; the scaling ratio is in 2... -3 As a unit;
[0262] When the j-th scaling ratio field is invalid, the j-th scaling ratio field is used to indicate that the size of the i-th scaling area of the immersive media after the j-th scaling step of the scaling process performed in the target scaling mode is the same as the size before the scaling process is performed;
[0263] When the j-th scaling ratio field is a valid value, the j-th scaling ratio field is used to indicate the ratio between the size of the i-th scaling area of the immersive media after the j-th scaling step of the scaling process performed in the target scaling mode and the size before the scaling process is performed.
[0264] In one embodiment, the scaling process includes m scaling steps, where m is a positive integer; the scaling strategy includes m scaling duration fields and m scaling duration unit fields; the j-th scaling step in the m scaling steps corresponds to the j-th scaling duration field in the m scaling duration fields and the j-th scaling duration unit field in the m scaling duration unit fields, where j is a positive integer and j≤m;
[0265] The j-th zoom duration field is used to indicate the value of the duration when the j-th zoom step of the zoom process is performed on the ith zoom area of the immersive media; the zoom duration field is a non-zero value.
[0266] The j-th scaling duration unit field is used to indicate the unit of measurement for the duration of the j-th scaling step when the i-th scaling area of the immersive media is subjected to scaling processing. The unit of measurement is in seconds, and the scaling duration unit field is a non-zero value.
[0267] In one embodiment, the computer program in memory 803 is loaded by processor 802 and further performs the following steps:
[0268] Obtain the scaling description signaling file of the immersive media, the scaling description signaling file including description information of the scaling strategy.
[0269] In one embodiment, the scaling description signaling file includes at least one of the following: a spherical region scaling descriptor and a planar region scaling descriptor;
[0270] The spherical region scaling descriptor is encapsulated in the representation level of the media presentation description file of the immersive media, and the number of the spherical region scaling descriptors in the representation level is less than or equal to 1.
[0271] The planar region scaling descriptor is encapsulated in a representation level within the media presentation description file of the immersive media, and the number of the planar region scaling descriptors in the representation level is less than or equal to 1.
[0272] In another exemplary embodiment, the processor (specifically, the devices included in the processor) executes instructions by calling one or more instructions stored in memory 803. Figure 4 The illustrated steps of the immersive media data processing method. Specifically, the memory stores one or more second instructions, which are adapted to be loaded by the processor 802 and executed as follows:
[0273] Obtain scaling information for immersive media;
[0274] Configure the media file format data box of the immersive media according to the scaling information of the immersive media; the media file format data box includes the scaling strategy of the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer;
[0275] The media file format data box of the immersive media is added to the encapsulation file of the immersive media. In one embodiment, the scaling strategy includes a scaling flag field; when the one or more second instructions are loaded and executed by the processor 802 to configure the media file format data box of the immersive media according to the scaling information of the immersive media, the following steps are specifically performed:
[0276] When the scaling information indicates that the i-th scaling region of the immersive media needs to be scaled in the target scaling mode, the scaling flag field is configured to a valid value.
[0277] In one embodiment, the scaling strategy includes a scaling step field; the one or more second instructions are adapted to be loaded and executed by the processor 802 when configuring the media file format data box of the immersive media according to the scaling information of the immersive media, specifically performing the following steps:
[0278] When the scaling information indicates that the i-th scaling region of the immersive media needs to be scaled in the target scaling mode, m scaling steps are configured to be performed. The value of the scaling step field is m, where m is a positive integer.
[0279] In one embodiment, the scaling process includes m scaling steps, where m is a positive integer; the scaling strategy includes m scaling ratio fields; the j-th scaling step in the m scaling steps corresponds to the j-th scaling ratio field in the m scaling ratio fields, where j is a positive integer and j≤m; the one or more second instructions are adapted to be loaded and executed by the processor 802 to configure the media file format data box of the immersive media according to the scaling information of the immersive media, specifically executing the following steps:
[0280] If the scaling information indicates that the size of the i-th scaling area of the immersive media after the j-th scaling step of the scaling process is the same as the size before the scaling process, then the j-th scaling ratio field is configured as an invalid value;
[0281] If the scaling information indicates that the size of the i-th scaling region after the j-th scaling step is different from the size before the scaling process, then the scaling ratio field is configured as a valid value, wherein the valid value is the ratio between the size of the i-th scaling region after the j-th scaling step and the size before the scaling process, as indicated by the scaling information.
[0282] In one embodiment, the scaling process includes m scaling steps, where m is a positive integer; the scaling strategy includes m scaling duration fields and m scaling duration unit fields; the j-th scaling step in the m scaling steps corresponds to the j-th scaling duration field in the m scaling duration fields and the j-th scaling duration unit field in the m scaling duration unit fields, where j is a positive integer and j≤m; when the one or more second instructions are loaded and executed by the processor 802 to configure the media file format data box of the immersive media according to the scaling information of the immersive media, the following steps are specifically executed:
[0283] Configure the value of the duration when the i-th zoom region indicated by the zoom information is subjected to the j-th zoom step as the value of the j-th zoom duration field; and configure the unit of measurement of the duration when the i-th zoom region indicated by the zoom information is subjected to the j-th zoom duration unit field as the value of the j-th zoom duration unit field.
[0284] In one embodiment, the computer program in memory 803 is loaded by processor 802 and further performs the following steps:
[0285] Configure the scaling description signaling file of the immersive media according to the scaling information, wherein the scaling description signaling file includes description information of the scaling strategy;
[0286] The scaling description signaling file is encapsulated into the representation level of the media presentation description file of the immersive media.
[0287] Based on the same inventive concept, the principle and beneficial effects of the immersive media processing device provided in the embodiments of this application are similar to the principle and beneficial effects of the immersive media processing method in the embodiments of this application. For details, please refer to the principle and beneficial effects of the method implementation. For the sake of brevity, these will not be repeated here.
[0288] Figure 9 This illustration shows a schematic diagram of a content playback device provided in an exemplary embodiment of this application; the content playback device can refer to a computer device used by a user of immersive media, and the computer device can be a terminal (such as a PC, a smart mobile device (such as a smartphone), a VR device (such as a VR headset, VR glasses, etc.)). Figure 9 As shown, the content playback device includes a receiver 901, a processor 902, a memory 903, and a display / playback device 904. Wherein:
[0289] Receiver 901 is used to enable decoding and transmission interaction with other devices, specifically for the transmission of immersive media between the content production device and the content playback device. That is, the content playback device receives the relevant media resources for immersive media transmitted by the content production device through receiver 901.
[0290] Processor 902 (or Central Processing Unit, CPU) is the processing core of the content production device. Processor 902 is adapted to implement one or more program instructions, specifically to load and execute one or more program instructions to achieve... Figure 3 or Figure 5 The flowchart illustrates the data processing method for immersive media.
[0291] Memory 903 is a memory device in the content playback device used to store programs and media resources. It is understood that memory 903 here can include the built-in storage medium of the content playback device, or it can include extended storage media supported by the content playback device. It should be noted that memory 903 can be high-speed RAM, or non-volatile memory, such as at least one disk storage device; optionally, it can also be at least one memory located remotely from the aforementioned processor. Memory 903 provides storage space for storing the operating system of the content playback device. Furthermore, this storage space is also used to store computer programs, which include program instructions adapted to be called and executed by the processor to perform the various steps of the immersive media data processing method. In addition, memory 903 can also be used to store the three-dimensional image of the immersive media formed after processor processing, the audio content corresponding to the three-dimensional image, and information required for rendering the three-dimensional image and audio content.
[0292] Display / playback device 904 is used to output rendered sound and 3D images.
[0293] Please see again Figure 9 The processor 902 may include a parser 921, a decoder 922, a converter 923, and a renderer 924; wherein:
[0294] The parser 921 is used to depackage and encapsulate the rendering media from the content production device. Specifically, it depackages the media file resources according to the file format requirements of immersive media to obtain audio and video streams, and provides the audio and video streams to the decoder 922.
[0295] Decoder 922 decodes the audio stream to obtain the audio content and provides it to the renderer for audio rendering. Additionally, decoder 922 decodes the video stream to obtain a 2D image. Based on the metadata provided by the media presentation description information, if the metadata indicates that the immersive media has undergone a region encapsulation process, the 2D image refers to an encapsulated image; if the metadata indicates that the immersive media has not undergone a region encapsulation process, the planar image refers to a projected image.
[0296] Converter 923 is used to convert 2D images into 3D images. If the immersive media has undergone a region encapsulation process, converter 923 will first decapsulate the encapsulated image to obtain a projected image. Then, the projected image is reconstructed to obtain a 3D image. If the rendering media has not undergone a region encapsulation process, converter 923 will directly reconstruct the projected image to obtain a 3D image.
[0297] Renderer 924 is used to render the audio content and 3D images of immersive media. Specifically, it renders the audio content and 3D images based on the metadata related to rendering and viewing in the media presentation description information, and then outputs the rendered content to the display / playback device.
[0298] In one exemplary embodiment, the processor 902 (specifically, the devices included in the processor) executes instructions by calling one or more instructions stored in memory. Figure 3 The illustrated steps of the immersive media data processing method. Specifically, the memory stores one or more first instructions, which are adapted to be loaded by the processor 902 and executed in the following steps:
[0299] Obtain the media file format data box of the immersive media, wherein the media file format data box includes the scaling strategy of the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer;
[0300] Scaling is performed on the i-th scaling region of the immersive media according to the media file format data box.
[0301] In one embodiment, the media file format data box includes the International Organization for Standardization (ISO) Basic Media File Format Data Box; the target scaling mode includes the director scaling mode.
[0302] In one embodiment, the scaling strategy includes a scaling flag field. When the scaling flag field is a valid value, the scaling flag field is used to indicate that the i-th scaling region of the immersive media needs to be scaled in the target scaling mode.
[0303] In one embodiment, the scaling strategy includes a scaling step field, the value of which is m, where m is a positive integer; the scaling step field is used to indicate that the number of scaling steps included when the i-th scaling region of the immersive media is scaled in the target scaling mode is m.
[0304] In one embodiment, the scaling process includes m scaling steps, where m is a positive integer; the scaling strategy includes m scaling ratio fields; the j-th scaling step in the m scaling steps corresponds to the j-th scaling ratio field in the m scaling ratio fields, where j is a positive integer and j≤m;
[0305] The j-th scaling ratio field is used to indicate the scaling ratio used when the i-th scaling area of the immersive media is subjected to the j-th scaling step of the scaling process; the scaling ratio is in 2... -3 As a unit;
[0306] When the j-th scaling ratio field is invalid, the j-th scaling ratio field is used to indicate that the size of the i-th scaling area of the immersive media after the j-th scaling step of the scaling process performed in the target scaling mode is the same as the size before the scaling process is performed;
[0307] When the j-th scaling ratio field is a valid value, the j-th scaling ratio field is used to indicate the ratio between the size of the i-th scaling area of the immersive media after the j-th scaling step of the scaling process performed in the target scaling mode and the size before the scaling process is performed.
[0308] In one embodiment, the scaling process includes m scaling steps, where m is a positive integer; the scaling strategy includes m scaling duration fields and m scaling duration unit fields; the j-th scaling step in the m scaling steps corresponds to the j-th scaling duration field in the m scaling duration fields and the j-th scaling duration unit field in the m scaling duration unit fields, where j is a positive integer and j≤m;
[0309] The j-th zoom duration field is used to indicate the value of the duration when the j-th zoom step of the zoom process is performed on the ith zoom area of the immersive media; the zoom duration field is a non-zero value.
[0310] The j-th scaling duration unit field is used to indicate the unit of measurement for the duration of the j-th scaling step when the i-th scaling area of the immersive media is subjected to scaling processing. The unit of measurement is in seconds, and the scaling duration unit field is a non-zero value.
[0311] In one embodiment, the computer program in memory 903 is loaded by processor 902 and further performs the following steps:
[0312] Obtain the scaling description signaling file of the immersive media, the scaling description signaling file including description information of the scaling strategy.
[0313] In one embodiment, the scaling description signaling file includes at least one of the following: a spherical region scaling descriptor and a planar region scaling descriptor;
[0314] The spherical region scaling descriptor is encapsulated in the representation level of the media presentation description file of the immersive media, and the number of the spherical region scaling descriptors in the representation level is less than or equal to 1.
[0315] The planar region scaling descriptor is encapsulated in a representation level within the media presentation description file of the immersive media, and the number of the planar region scaling descriptors in the representation level is less than or equal to 1.
[0316] In another exemplary embodiment, the processor 902 (specifically, the devices included in the processor) executes by calling one or more instructions stored in memory. Figure 5 The illustrated steps of the immersive media data processing method. Specifically, memory 903 stores one or more second instructions, which are adapted to be loaded by processor 902 and executed as follows:
[0317] Obtain the encapsulation file of the immersive media, the encapsulation file including the media file format data box of the immersive media; the media file format data box includes the scaling strategy of the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer;
[0318] The packaged file is parsed and the parsed immersive media is displayed.
[0319] When displaying the i-th zoomed area of the immersive media, zooming processing is performed on the i-th zoomed area of the immersive media according to the media file format data box.
[0320] In one embodiment, the scaling strategy includes a scaling flag field; when the value of the scaling flag field is a valid value, the one or more second instructions are adapted to be loaded and executed by the processor 902 to perform scaling processing on the i-th scaling region of the immersive media according to the media file format data box, specifically executing the following steps:
[0321] Scaling is performed on the i-th scaling region of the immersive media in the target scaling mode.
[0322] In one embodiment, the scaling strategy includes a scaling step field, the value of which is m, where m is a positive integer; the one or more second instructions are adapted to be loaded and executed by the processor 902 to perform scaling processing on the i-th scaling region of the immersive media according to the media file format data box, specifically executing the following steps:
[0323] In the target scaling mode, m scaling operations are performed on the i-th scaling region of the immersive media.
[0324] In one embodiment, the scaling process includes m scaling steps, where m is a positive integer; the scaling strategy includes m scaling ratio fields; the j-th scaling step among the m scaling steps corresponds to the j-th scaling ratio field among the m scaling ratio fields, where j is a positive integer and j≤m; the one or more second instructions are adapted to be loaded and executed by the processor 902 to perform scaling processing on the i-th scaling region of the immersive media according to the media file format data box, specifically executing the following steps:
[0325] When the j-th scaling ratio field is invalid, the j-th scaling step of the scaling process is performed on the i-th scaling area of the immersive media in the target scaling mode, so that the size of the i-th scaling area of the immersive media after the j-th scaling step of the scaling process is performed is the same as the size of the i-th scaling area of the immersive media before the scaling process is performed.
[0326] When the j-th scaling ratio field is a valid value, in the target scaling mode, the j-th scaling step of the scaling process is performed on the i-th scaling area of the immersive media according to the valid value, so that the ratio between the size of the i-th scaling area of the immersive media after the j-th scaling step is performed and the size of the i-th scaling area of the immersive media before the scaling process is performed reaches the valid value.
[0327] In one embodiment, the scaling process includes m scaling steps, where m is a positive integer; the scaling strategy includes m scaling duration fields and m scaling duration unit fields; the j-th scaling step in the m scaling steps corresponds to the j-th scaling duration field in the m scaling duration fields and the j-th scaling duration unit field in the m scaling duration unit fields, where j is a positive integer and j≤m; the one or more second instructions are adapted to be loaded and executed by the processor 902 to perform scaling processing on the i-th scaling region of the immersive media according to the media file format data box, specifically executing the following steps:
[0328] According to the common indication of the j-th zoom duration field and the j-th zoom duration unit field, the j-th zoom step of the zoom processing is performed on the i-th zoom area of the immersive media in the target zoom mode.
[0329] In one embodiment, the computer program in memory 903 is loaded by processor 902 and further performs the following steps:
[0330] Obtain the scaling description signaling file of the immersive media, wherein the scaling description signaling file includes description information of the scaling strategy;
[0331] When processor 902 obtains the encapsulation file of immersive media through receiver 901, it performs the following steps:
[0332] The encapsulation file of the immersive media is obtained based on the scaling description signaling file.
[0333] Based on the same inventive concept, the principle and beneficial effects of the immersive media processing device provided in the embodiments of this application are similar to the principle and beneficial effects of the immersive media processing method in the embodiments of this application. For details, please refer to the principle and beneficial effects of the method implementation. For the sake of brevity, these will not be repeated here.
[0334] The above-disclosed embodiments are merely preferred embodiments of this application and should not be construed as limiting the scope of this application. Therefore, any equivalent variations made in accordance with the claims of this application shall still fall within the scope of this application.
Claims
1. A data processing method for immersive media, characterized in that, include: A media file format data box for immersive media is obtained. The media file format data box includes a scaling strategy for the i-th scaling region of the immersive media in a target scaling mode, where i is a positive integer. The scaling strategy includes a scaling ratio field. When the scaling ratio field is invalid, it indicates that the size of the i-th scaling region of the immersive media after scaling in the target scaling mode is the same as its size before scaling. When the scaling ratio field is valid, it indicates the ratio between the size of the i-th scaling region of the immersive media after scaling in the target scaling mode and its size before scaling. Scaling processing is performed on the i-th scaling region of the immersive media according to the media file format data box; The scaling process includes m scaling steps, where m is a positive integer. The scaling strategy includes m scaling duration fields and m scaling duration unit fields. The j-th scaling step in the m scaling steps corresponds to the j-th scaling duration field in the m scaling duration fields and the j-th scaling duration unit field in the m scaling duration unit fields, where j is a positive integer and j≤m. The j-th zoom duration field is used to indicate the value of the duration when the i-th zoom area of the immersive media is subjected to the j-th zoom step of the zoom process; the zoom duration field is a non-zero value; the j-th zoom duration unit field is used to indicate the unit of measurement of the duration when the i-th zoom area of the immersive media is subjected to the j-th zoom step of the zoom process, the unit of measurement is in seconds, and the zoom duration unit field is a non-zero value.
2. The method as described in claim 1, characterized in that, The media file format data box includes the International Organization for Standardization (ISO) Basic Media File Format Data Box; the target scaling mode includes the director scaling mode.
3. The method as described in claim 1 or 2, characterized in that, The scaling strategy includes a scaling flag field. When the scaling flag field is a valid value, it indicates that the i-th scaling region of the immersive media needs to be scaled in the target scaling mode.
4. The method as described in claim 1 or 2, characterized in that, The scaling strategy includes a scaling step field, the value of which is m, where m is a positive integer; The scaling step field is used to indicate that the number of scaling steps included when the i-th scaling region of the immersive media is scaled in the target scaling mode is m.
5. The method as described in claim 1 or 2, characterized in that, The scaling strategy includes m scaling ratio fields; the j-th scaling step in the m scaling steps corresponds to the j-th scaling ratio field in the m scaling ratio fields, where j is a positive integer and j≤m; The j-th scaling ratio field is used to indicate the scaling ratio used when the i-th scaling area of the immersive media is subjected to the j-th scaling step of the scaling process; the scaling ratio is in 2... -3 As a unit; When the j-th scaling ratio field is invalid, the j-th scaling ratio field is used to indicate that the size of the i-th scaling area of the immersive media after the j-th scaling step of the scaling process is the same as the size before the scaling process in the target scaling mode; When the j-th scaling ratio field is a valid value, the j-th scaling ratio field is used to indicate the ratio between the size of the i-th scaling area of the immersive media after the j-th scaling step of the scaling process and the size before the scaling process in the target scaling mode, and is the value of the j-th scaling ratio field.
6. The method as described in claim 1, characterized in that, The method further includes: Obtain the scaling description signaling file of the immersive media, the scaling description signaling file including description information of the scaling strategy.
7. The method as described in claim 6, characterized in that, The scaling description signaling file includes at least one of the following: a spherical region scaling descriptor and a planar region scaling descriptor; The spherical region scaling descriptor is encapsulated in the representation level of the media presentation description file of the immersive media, and the number of the spherical region scaling descriptors in the representation level is less than or equal to 1. The planar region scaling descriptor is encapsulated in the representation layer of the media presentation description file of the immersive media, and the number of the planar region scaling descriptors in the representation layer is less than or equal to 1.
8. A data processing method for immersive media, characterized in that, include: Obtain scaling information for immersive media; Configure the media file format data box of the immersive media according to the scaling information of the immersive media; The media file format data box includes a scaling strategy for the i-th scaling region of the immersive media in the target scaling mode; where i is a positive integer; the scaling strategy includes a scaling ratio field; when the scaling ratio field is invalid, the scaling ratio field is used to indicate that the size of the i-th scaling region of the immersive media after scaling is performed in the target scaling mode is the same as the size before scaling is performed; when the scaling ratio field is valid, the scaling ratio field is used to indicate the ratio between the size of the i-th scaling region of the immersive media after scaling is performed in the target scaling mode and the size before scaling is performed; Add the media file format data box of the immersive media to the encapsulation file of the immersive media; The scaling process includes m scaling steps, where m is a positive integer. The scaling strategy includes m scaling duration fields and m scaling duration unit fields. The j-th scaling step in the m scaling steps corresponds to the j-th scaling duration field in the m scaling duration fields and the j-th scaling duration unit field in the m scaling duration unit fields, where j is a positive integer and j≤m. The j-th zoom duration field is used to indicate the value of the duration when the i-th zoom area of the immersive media is subjected to the j-th zoom step of the zoom process; the zoom duration field is a non-zero value; the j-th zoom duration unit field is used to indicate the unit of measurement of the duration when the i-th zoom area of the immersive media is subjected to the j-th zoom step of the zoom process, the unit of measurement is in seconds, and the zoom duration unit field is a non-zero value.
9. A data processing method for immersive media, characterized in that, include: Obtain the encapsulation file of the immersive media, wherein the encapsulation file includes the media file format data box of the immersive media; The media file format data box includes a scaling strategy for the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer; the scaling strategy includes a scaling ratio field; when the scaling ratio field is invalid, the scaling ratio field is used to indicate that the size of the i-th scaling region of the immersive media after scaling processing in the target scaling mode is the same as the size before scaling processing; when the scaling ratio field is valid, the scaling ratio field is used to indicate the ratio between the size of the i-th scaling region of the immersive media after scaling processing in the target scaling mode and the size before scaling processing. The packaged file is parsed and the parsed immersive media is displayed. When displaying the i-th zoomed area of the immersive media, zoom processing is performed on the i-th zoomed area of the immersive media according to the media file format data box; The scaling process includes m scaling steps, where m is a positive integer. The scaling strategy includes m scaling duration fields and m scaling duration unit fields. The j-th scaling step in the m scaling steps corresponds to the j-th scaling duration field in the m scaling duration fields and the j-th scaling duration unit field in the m scaling duration unit fields, where j is a positive integer and j≤m. The j-th zoom duration field is used to indicate the value of the duration when the i-th zoom area of the immersive media is subjected to the j-th zoom step of the zoom process; the zoom duration field is a non-zero value; the j-th zoom duration unit field is used to indicate the unit of measurement of the duration when the i-th zoom area of the immersive media is subjected to the j-th zoom step of the zoom process, the unit of measurement is in seconds, and the zoom duration unit field is a non-zero value.
10. A data processing device for immersive media, characterized in that, include: An acquisition unit is configured to acquire a media file format data box of immersive media, the media file format data box including a scaling strategy for the i-th scaling region of the immersive media in a target scaling mode, where i is a positive integer; the scaling strategy includes a scaling ratio field; when the scaling ratio field is invalid, the scaling ratio field is used to indicate that the size of the i-th scaling region of the immersive media after scaling processing in the target scaling mode is the same as the size before scaling processing; when the scaling ratio field is valid, the scaling ratio field is used to indicate the ratio between the size of the i-th scaling region of the immersive media after scaling processing in the target scaling mode and the size before scaling processing. The processing unit is configured to perform scaling processing on the i-th scaling region of the immersive media according to the media file format data box; The scaling process includes m scaling steps, where m is a positive integer. The scaling strategy includes m scaling duration fields and m scaling duration unit fields. The j-th scaling step in the m scaling steps corresponds to the j-th scaling duration field in the m scaling duration fields and the j-th scaling duration unit field in the m scaling duration unit fields, where j is a positive integer and j≤m. The j-th zoom duration field is used to indicate the value of the duration when the i-th zoom area of the immersive media is subjected to the j-th zoom step of the zoom process; the zoom duration field is a non-zero value; the j-th zoom duration unit field is used to indicate the unit of measurement of the duration when the i-th zoom area of the immersive media is subjected to the j-th zoom step of the zoom process, the unit of measurement is in seconds, and the zoom duration unit field is a non-zero value.
11. A data processing device for immersive media, characterized in that, include: The acquisition unit is used to acquire the encapsulation file of the immersive media, wherein the encapsulation file includes the media file format data box of the immersive media; The media file format data box includes a scaling strategy for the i-th scaling region of the immersive media in the target scaling mode, where i is a positive integer; the scaling strategy includes a scaling ratio field; when the scaling ratio field is invalid, the scaling ratio field is used to indicate that the size of the i-th scaling region of the immersive media after scaling processing in the target scaling mode is the same as the size before scaling processing; when the scaling ratio field is valid, the scaling ratio field is used to indicate the ratio between the size of the i-th scaling region of the immersive media after scaling processing in the target scaling mode and the size before scaling processing. The processing unit is configured to parse the encapsulated file and display the parsed immersive media; when displaying the i-th zoomed area of the immersive media, the processing unit performs zooming on the i-th zoomed area of the immersive media according to the media file format data box. The scaling process includes m scaling steps, where m is a positive integer. The scaling strategy includes m scaling duration fields and m scaling duration unit fields. The j-th scaling step in the m scaling steps corresponds to the j-th scaling duration field in the m scaling duration fields and the j-th scaling duration unit field in the m scaling duration unit fields, where j is a positive integer and j≤m. The j-th zoom duration field is used to indicate the value of the duration when the i-th zoom area of the immersive media is subjected to the j-th zoom step of the zoom process; the zoom duration field is a non-zero value; the j-th zoom duration unit field is used to indicate the unit of measurement of the duration when the i-th zoom area of the immersive media is subjected to the j-th zoom step of the zoom process, the unit of measurement is in seconds, and the zoom duration unit field is a non-zero value.
12. A data processing device for immersive media, characterized in that, include: A processor, adapted to execute computer programs; A computer-readable storage medium storing a computer program, which, when executed by the processor, implements the data processing method for immersive media as described in any one of claims 1-9.