Method, apparatus, and system for processing audio scene information

The method encodes and decodes acoustic path information in voxel-based audio scenes using a two-dimensional voxel grid, addressing computational and storage challenges, enhancing efficiency and user experience in immersive audio applications.

JP2026520977APending Publication Date: 2026-06-25DOLBY INTERNATIONAL AB

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
DOLBY INTERNATIONAL AB
Filing Date
2024-06-05
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing techniques for voxel-based audio rendering in VR, AR, and XR environments face challenges in computationally expensive diffraction path recalculation and high storage/bandwidth requirements for acoustic path information, which impact user experience and device performance.

Method used

A method for encoding and decoding acoustic path information in voxel-based audio scenes using a two-dimensional voxel grid, where path information items specify positions, path lengths, and corner voxels, reducing data requirements through sequential coding and reuse of previous information, allowing lossless recovery.

Benefits of technology

The method significantly reduces bitstream and storage needs while maintaining lossless recovery of acoustic path information, improving computational efficiency and user experience in immersive audio applications.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026520977000001_ABST
    Figure 2026520977000001_ABST
Patent Text Reader

Abstract

This disclosure relates to a method for processing audio scene information. One such method includes the steps of: obtaining a voxel-based audio scene representation of an audio scene; sequentially encoding path information items for a first and a second position in a two-dimensional voxel grid, where each path information item specifies the first position, the second position, the path length of the acoustic path, and a corner voxel on the acoustic path; and generating an encoded path information item based on the path information item for the current path information item. The encoded path information item includes indications for the respective first and second positions. If the corner voxel specified by the current path information item is different from the corner voxel specified by the preceding path information item, the encoded path information item includes an indication of the corner voxel. If the corner voxel specified by the current path information item is the same as the corner voxel specified by the preceding path information item, the encoded path information item includes, instead of an indication of the corner voxel, an indication that the corner voxel specified by the current path information item is the same as the corner voxel specified by the preceding path information item. This disclosure further relates to the corresponding devices, computer programs, and computer-readable storage media.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] [Cross - Reference to Related Applications] This application claims the benefit of priority of U.S. Provisional Application No. 63 / 508,367, filed on June 15, 2023, the entire disclosure of which is incorporated herein by reference.

[0002] [Technical Field] This disclosure relates to techniques for processing audio scene information, for example, for memory, transmission, and / or audio rendering. In particular, this disclosure is directed to voxel - based scene representations and the coding of path information (e.g., acoustic path information such as diffraction path information) for voxel - based scene representations.

Background Art

[0003] MPEG (Moving Picture Experts Group) is an alliance of working groups jointly established by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), and sets standards for media coding, including audio coding. MPEG is organized under ISO / IEC SC 29, and the audio group is currently identified as Working Group (WG) 6. WG6 is currently working on a new audio standard (MPEG - I immersive audio, also known as ISO / IEC 23090 - 4).

[0004] The new MPEG-I standard enables acoustic experiences from different viewpoints and / or perspectives or listening positions by supporting scenes of movement using various degrees of freedom such as three degrees of freedom (3DOF) or six degrees of freedom (6DoF) in virtual reality (VR), augmented reality (AR), mixed reality (MR), and / or extended reality (XR) applications, and various movements around such scenes. 6DoF interaction extends the 3DOF spherical video / audio experience, which is limited to head rotations (pitch, yaw, and roll), to include translational movements (forward / backward, up / down, and left / right), and enables navigation within the virtual environment (e.g., physically walking through a room) in addition to head rotations.

[0005] In audio rendering in VR, AR, MR, and XR applications, an object-based approach is widely used by representing complex auditory scenes as a plurality of individual audio objects, each of which is associated with parameters or metadata that define the position / location and trajectory of that object within the scene. Alternatively, audio rendering in such environments also uses higher order ambisonics (HOA). However, a new use of "voxels" for audio scene rendering is currently being considered for use in new immersive audio experiences, etc. Voxels for audio rendering are related to media environments implemented in both hardware and software, such as video games and / or VR, AR, MR, and XR environments.

[0006] A voxel is a spatial volume to which acoustic properties or audio rendering commands are assigned. The voxel size can also be an encoder configuration parameter and can be selected (manually or automatically) according to the level of detail of the scene geometry (e.g., in the range of 10 cm to 1 m).

[0007] Voxels for audio rendering are, • Voxelization (or conversion) of mesh-based scene representations • From the scene representation used for scene generation (or video rendering) (for example, by downsampling smaller size voxels) It can be obtained.

[0008] However, traditional approaches to using voxels to provide realistic sound for user experiences (including motion) in VR, AR, MR, and XR environments remain challenging and computationally complex.

[0009] Typical techniques for diffraction modeling in 3D audio scenes, such as in computer-mediated reality applications, require recalculation of diffraction paths and other diffraction information whenever the audio scene, user position, or audio source position changes. For example, diffraction paths may change as the user and / or audio source move through the 3D audio scene. Furthermore, diffraction paths may change when the audio scene itself changes, for example, by indicating opening and closing doors or windows. Frequent recalculation of diffraction paths is computationally expensive, which requires relatively powerful computing devices to implement computer-mediated reality applications and / or, in some cases, can negatively impact the user experience. On the other hand, storing pre-calculated acoustic paths (e.g., acoustic diffraction paths) may require large amounts of storage and / or bandwidth.

[0010] For example, US11,606,662B2, US6,313,841B1, US10,275937B2, and US10,293,259B2 are related to (pre)calculation of routes in a given scene or environment, respectively. However, the amount of data generated by doing so, especially for a large number of routes, can be relatively large, particularly when there is an interest in lossless transmission or storage of route information.

[0011] Therefore, improved techniques are needed for coding path information (e.g., acoustic path information such as diffraction path information) in voxel-based audio scenes. In particular, techniques are needed that can reduce the bandwidth or storage requirements for processing the encoded path information. [Overview of the project]

[0012] In consideration of this need, the present disclosure provides a method for processing audio scene information (in particular voxel-based audio scene information), an apparatus for processing audio scene information, a computer program, and a computer-readable storage medium, each having the features of an independent claim.

[0013] One aspect of the present disclosure relates to a method for processing audio scene information, for example, a method for encoding audio scene information, particularly acoustic path information. The method may include the step of obtaining a voxel-based audio scene representation of an audio scene. The method may further include the step of sequentially encoding path information items for one or more first positions and one or more second positions in a two-dimensional voxel grid associated with the voxel-based audio scene representation. The voxel grid may be associated with a two-dimensional projection map generated from the voxel-based audio scene representation. Each path information item may specify a first position, a second position, the path length of the acoustic path between the first and second positions in the voxel grid, and a corner voxel on the acoustic path where the acoustic path changes direction. A corner voxel may be a voxel that does not obstruct a straight line to a second voxel. The method may further include the step of generating encoded path information items based on the path information items for the current path information item. The encoded route information entries may include an indication of each first position and an indication of each second position. If the corner voxel specified by the current route information entry is different from the corner voxel specified by the preceding route information entry, the encoded route information entry may include an indication of the corner voxel (e.g., the position of the corner voxel). This indication of the corner voxel may be an absolute or non-differential indication of the corner voxel, for example, using voxel coordinates or a voxel index. On the other hand, if the corner voxel specified by the current route information entry is the same as the corner voxel specified by the preceding route information entry, the encoded route information entry may include an indication that the corner voxel specified by the current route information entry is the same as the corner voxel specified by the preceding route information entry, instead of an indication of the corner voxel.

[0014] By sequentially coding the route information items and reusing information from preceding route information items, the proposed method can reduce the bitstream and storage requirements for processing the encoded route information. The original route information can still be fully restored; that is, the proposed method provides lossless coding of route information in voxel-based scenes.

[0015] In some embodiments, the encoded path information item may further include a path length indication. The path length indication may be an absolute or non-differential indication of the path length in units of voxel or voxel edge length, for example. In some embodiments, if the corner voxel specified by the current route information item is the same as the corner voxel specified by the preceding route information item, the encoded route information item may further include an indication of the difference between the route length specified by the current route information item and the route length specified by the preceding route information item.

[0016] This further reduces the amount of data required to encode route information, while still allowing for lossless recovery of the original route information.

[0017] In some embodiments, if the corner voxel specified by the current route information item is the same as the corner voxel specified by the preceding route information item, the encoded route information item may further include an indication of whether the encoded route information item includes an indication of the route length, or an indication of the difference between the route length specified by the current route information item and the route length specified by the preceding route information item. In this situation, the encoded route information item may further include an indication of the route length, or an indication of the difference between the route length specified by the current route information item and the route length specified by the preceding route information item.

[0018] This allows for a further overall reduction in the data required to encode the route information, while still enabling lossless recovery of the original route information. In some embodiments, the method may further include the step of determining a sequence of second locations for each first location by traversing a voxel grid according to a predetermined pattern. The method may further include the step of sequentially encoding items of path information for the determined sequence of second locations for each first location.

[0019] In some embodiments, a predetermined pattern may traverse the voxel grid in a raster scan manner along the rows and columns of the voxel grid.

[0020] In some embodiments, the difference may be encoded in 2 bits. Alternatively, the difference may take one of four predetermined values ​​(for example, the potential difference values ​​may be limited to a set of four different values).

[0021] By using the above pattern to traverse the voxel grid, if a corner voxel does not change from one path information item to the next, the difference between each path length can take only one of four values ​​that can be encoded efficiently using only two bits.

[0022] In some embodiments, each encoded path information item may include an indication of whether a first mode or a second mode is used. Each encoded path information item may further include an indication of a first location. Each encoded path information item may further include an indication of a second location. Each encoded path information item may further include an indication of whether an acoustic path exists for the first and second locations. When the first mode is used, each encoded path information item may further include an indication of whether the corner voxel specified by the path information item corresponding to the encoded path information item is the same as the corner voxel specified by the preceding path information item. Furthermore, in the first mode, if the corner voxel specified by the path information item corresponding to the encoded path information item is the same as the corner voxel specified by the preceding path information item, each encoded path information item may further include an indication of the path length. Furthermore, in the first mode, each encoded route information item may, if not, include a corner voxel indication and a route length indication. When the second mode is used, each encoded route information item may further include an indication of whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item. Furthermore, in the second mode, each encoded route information item may further include an indication of the difference between the previous route length and the current route length if the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item. Here, the previous route length may be the route length specified by the preceding route information item, and the current route length may be the route length specified by the route information item corresponding to the encoded route information item. Furthermore, in the second mode, each encoded route information item may, if not, include a corner voxel indication and a route length indication.

[0023] In some embodiments, each encoded path information item may include an indication of whether a first mode or a second mode is used. Each encoded path information item may further include an indication of a first location. Each encoded path information item may further include an indication of a second location. Each encoded path information item may further include an indication of whether an acoustic path exists for the first and second locations. If the first mode is used, each encoded path information item may further include an indication of whether the corner voxel specified by the path information item corresponding to the encoded path information item is the same as the corner voxel specified by the preceding path information item. And if the corner voxel specified by the path information item corresponding to the encoded path information item is the same as the corner voxel specified by the preceding path information item, the encoded path information item may further include an indication of whether the encoded path information item includes an indication of the path length or an indication of the difference between the previous path length and the current path length, along with an indication of the path length or an indication of the difference between the previous path length and the current path length. Here, the previous route length may be the route length specified by the preceding route information item. The current route length may be the route length specified by the route information item corresponding to the encoded route information item. Furthermore, in the first mode, the encoded route information item may include, if not, an indication of a corner voxel and an indication of a route length. When the second mode is used, each encoded route information item may further include an indication of whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item. Furthermore, in the second mode, if the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item, each encoded route information item may further include an indication of the difference between the previous route length and the current route length.Here, the previous path length may be the path length specified by the preceding path information item, and the current path length may be the path length specified by the path information item corresponding to the encoded path information item. Furthermore, in the second mode, each encoded path information item may otherwise include a corner voxel indication and a path length indication.

[0024] In some embodiments, the method may further include the step of outputting the encoded route information items to a bitstream.

[0025] Another aspect of the present disclosure relates to a method for processing audio scene information, for example, a method for decoding audio scene information, particularly acoustic path information. The method may include the step of receiving a bitstream containing a sequence of encoded path information items for one or more first locations and one or more second locations in a two-dimensional voxel grid related to a voxel-based audio scene representation. Each encoded path information item may correspond to a path information item specifying the first location, the second location, the path length of the acoustic path between the first and second locations in the voxel grid, and a corner voxel on the acoustic path where the acoustic path changes direction. The corner voxel may be a voxel to which a straight line to the second voxel is not obstructed. The method may further include the step of sequentially decoding the encoded path information items to generate corresponding path information items. Here, the step of generating a corresponding route information item for the current encoded route information item may include a step of determining whether the current encoded route information item includes an instruction that the corner voxel specified by the corresponding route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item. If the corner voxel specified by the corresponding route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, the method may include a step of setting the corner voxel specified by the route information item corresponding to the preceding encoded route information item as the corner voxel for the route information item corresponding to the current encoded route information item. On the other hand, if the corner voxel specified by the corresponding route information item is different from the corner voxel specified by the route information item corresponding to the preceding encoded route information item, the method may include a step of extracting the corner voxel instruction from the current encoded route information item.

[0026] In some embodiments, the step of generating the corresponding route information item may further include the step of extracting a route length indication from the currently encoded route information item.

[0027] In some embodiments, if the corner voxel specified by the corresponding route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, the step of generating the corresponding route information item may further include the step of extracting an indication of the difference between the previous route length and the current route length, where the previous route length may be the route length specified by the route information item corresponding to the preceding encoded route information item, and the current route length may be the route length specified by the route information item corresponding to the current encoded route information item.

[0028] In some embodiments, if the corner voxel specified by the corresponding route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, the step of generating the corresponding route information item may further include a step of extracting an indication of whether the encoded route information item includes an indication of the route length, or an indication of the difference between the previous route length and the current route length. Here, the previous route length may be the route length specified by the route information item corresponding to the preceding encoded route information item. The current route length may be the route length specified by the route information item corresponding to the current encoded route information item. The step of generating the corresponding route information item may further include a step of extracting an indication of the route length, or an indication of the difference between the previous route length and the current route length.

[0029] In some embodiments, one or more second locations may be associated with locations obtained by traversing a voxel grid according to a predetermined pattern for defining a sequence of second locations for each first location. Furthermore, for each first location, the encoded path information items may be decoded sequentially according to the sequence of second locations.

[0030] In some embodiments, a predetermined pattern may traverse the voxel grid in a raster scan manner along the rows and columns of the voxel grid.

[0031] In some embodiments, the difference may be encoded with 2 bits. Alternatively, the difference may take one of four predetermined values.

[0032] In some embodiments, each encoded path information item may include an indication of whether a first mode or a second mode is used. Each encoded path information item may further include an indication of a first location. Each encoded path information item may further include an indication of a second location. Each encoded path information item may further include an indication of whether an acoustic path exists for the first and second locations. If the first mode is used, each encoded path information item may further include an indication of whether the corner voxel specified by the path information item corresponding to the encoded path information item is the same as the corner voxel specified by the path information item corresponding to the preceding encoded path information item. If the first mode is used, and the corner voxel specified by the path information item corresponding to the encoded path information item is the same as the corner voxel specified by the path information item corresponding to the preceding encoded path information item, each encoded path information item may further include an indication of the path length. Otherwise, each encoded route information item may further include an indication of a corner voxel and an indication of the route length. When the second mode is used, each encoded route information item may further include an indication of whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item. When the second mode is used, and the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, each encoded route information item may further include an indication of the difference between the previous route length and the current route length, where the previous route length may be the route length specified by the route information item corresponding to the preceding encoded route information item, and the current route length may be the route length specified by the route information item corresponding to the encoded route information item.Otherwise, each encoded path information item may further include corner voxel indications and path length indications.

[0033] In some embodiments, each encoded path information item may include an indication of whether a first mode or a second mode is used. Each encoded path information item may further include an indication of a first location. Each encoded path information item may further include an indication of a second location. Each encoded path information item may further include an indication of whether an acoustic path exists for the first and second locations. If the first mode is used, each encoded path information item may further include an indication of whether the corner voxel specified by the path information item corresponding to the encoded path information item is the same as the corner voxel specified by the path information item corresponding to the preceding encoded path information item. If the first mode is used, and the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, each encoded route information item may further include an indication of whether the encoded route information item includes an indication of the route length or an indication of the difference between the previous route length and the current route length, along with an indication of the route length or an indication of the difference between the previous route length and the current route length. Here, the previous route length may be the route length specified by the route information item corresponding to the preceding encoded route information item. The current route length may be the route length specified by the route information item corresponding to the encoded route information item. Otherwise, each encoded route information item may further include an indication of the corner voxel and an indication of the route length. When the second mode is used, each encoded route information item may further include an indication of whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item.If the second mode is used, and the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, each encoded route information item may further include an indication of the difference between the previous route length and the current route length, where the previous route length may be the route length specified by the route information item corresponding to the preceding encoded route information item, and the current route length may be the route length specified by the route information item corresponding to the encoded route information item. Otherwise, each encoded route information item may further include an indication of a corner voxel and an indication of a route length.

[0034] In another embodiment, an apparatus for processing audio scene information is provided. The apparatus may include a processor and a memory coupled to the processor for storing instructions for the processor. The processor may be configured to perform all steps of the method according to the above embodiments and their respective embodiments.

[0035] In another embodiment, a computer program is described. The computer program may include executable instructions for performing a method or a step of a method outlined throughout this disclosure when executed by a computing device (e.g., a processor).

[0036] In another embodiment, a computer-readable storage medium is described. The storage medium may store a computer program adapted to run on a computing device (e.g., a processor) and, when run on the computing device, to perform a method or a step of a method outlined throughout this disclosure.

[0037] It should be noted that methods and systems including preferred embodiments outlined herein may be used alone or in combination with other methods and systems disclosed herein. Furthermore, all aspects of the methods and systems outlined herein may be combined in any way. In particular, the features of the claims may be combined with each other in any way.

[0038] It is recognized that the features of the apparatus and the steps of the method are interchangeable in many ways. In particular, as those skilled in the art will recognize, the details of the disclosed method can be realized by the corresponding apparatus, and vice versa. Furthermore, it is understood that any of the above descriptions made relating to the method (and, for example, its steps) is similarly applicable to the corresponding apparatus (and, for example, its blocks, stages, units), and vice versa. [Brief explanation of the drawing]

[0039] The present invention will be described illustratively below with reference to the attached drawings. [Figure 1] A schematic example of a processing chain for handling audio scene information for audio rendering is shown below. [Figure 2] A schematic example of diffraction paths for the sound source and listener positions in a voxel-based 3D audio scene is shown. [Figure 3] This flowchart shows an example of a method for processing audio scene information for audio rendering according to embodiments of the present disclosure. [Figure 4] This flowchart shows an example of the details of implementing the method shown in Figure 3 according to an embodiment of this disclosure. [Figure 5] An example of a processing chain for processing audio scene information for audio rendering according to embodiments of this disclosure is schematically shown. [Figure 6] An example of a processing chain for processing audio scene information for audio rendering according to embodiments of this disclosure is schematically shown. [Figure 7] An example of a processing chain for processing audio scene information for audio rendering according to embodiments of this disclosure is schematically shown. [Figure 8] This figure shows a measure of complexity as a function of time for different operating modes / implementations of processing audio scene information for audio rendering according to embodiments of this disclosure. [Figure 9] Examples of possible use cases of the technology according to embodiments of this disclosure are outlined below. [Figure 10A] A schematic representation of some voxel-based audio scenes according to embodiments of this disclosure is shown below. [Figure 10B] A schematic representation of some voxel-based audio scenes according to embodiments of this disclosure is shown below. [Figure 10C] A schematic representation of some voxel-based audio scenes according to embodiments of this disclosure is shown below. [Figure 11] An example of a voxel-based audio scene to which embodiments of this disclosure may be applied is schematically shown. [Figure 12] This flowchart shows an example of a method for processing audio scene information to encode acoustic path information according to an embodiment of the present disclosure. [Figure 13] This flowchart shows an example of a method for processing audio scene information for decoding acoustic path information according to an embodiment of the present disclosure. [Figure 14] This figure shows a measure of complexity as a function of time for different operating modes / implementations of processing audio scene information for audio rendering according to embodiments of this disclosure. [Figure 15] This is a schematic block diagram showing an example of an apparatus for carrying out the method according to the embodiments of this disclosure. [Modes for carrying out the invention]

[0040] Illustrative embodiments of the present disclosure are described below with reference to the accompanying drawings. In the drawings, identical elements may be indicated by the same reference numerals, and their repeated descriptions may be omitted.

[0041] Voxel-based audio scene representation First, an overview of voxel-related concepts for representing audio scenes is provided.

[0042] What are voxels for audio rendering? A voxel is understood as a spatial volume to which acoustic properties or audio rendering instructions are assigned.

[0043] What is the voxel size for audio rendering? The voxel size may also be an encoder configuration parameter. This may be selected (manually or automatically) according to the level of detail of the scene geometry (e.g., in the range of 10cm to 1m).

[0044] How large of an audio scene can it process? Large audio scenes do not necessarily result in a large number of voxels and high rendering complexity. For example, large audio scenes can • A set of independent subscenes (and a method for "teleporting" between these representations without "restarting" the renderer) • Scene update set (based on user position) It can be understood as such.

[0045] How can we address the discontinuity issues caused by voxel granularity? Any strong discontinuities in sound levels (and jumps in the direction of diffraction signals) can be avoided by applying interpolation (e.g., in time and space).

[0046] How do we represent a voxel-based audio scene? Any voxel-based representation of an audio scene may include representations of voxels that are not transmission voxels (e.g., occluder voxels), i.e., voxels to which sound cannot propagate or freely propagate, i.e., representations of occluder geometry. This indication may relate to the indication of the coordinates of each voxel (e.g., center coordinates, corner coordinates, etc.). These voxel coordinates may be represented, for example, by grid indices. Furthermore, voxel-based representations may include indications of material properties of non-transmission voxels, such as absorption coefficients, reflection coefficients, etc. In addition to occluder voxels, voxel-based representations may also indicate transmission voxels (e.g., air voxels), i.e., voxels to which sound can propagate, i.e., representations of sound propagation media. Thus, some implementations of voxel-based representations of audio scenes may include indications of the respective material properties for each voxel within a given section of space (e.g., within the boundary surrounding the audio scene).

[0047] Technology for processing audio scene information Figure 1 schematically shows a processing chain 100 that can be used to process audio scene information for audio rendering. Specifically, the processing chain 100 can be used to convert voxel-related data into parameters and signals necessary for audibility (or generally audio rendering). The processing chain 100 may be implemented in software, hardware, or a combination thereof. For example, the processing chain 100 may be implemented by a renderer / decoder coupled to an AR / VR / MR / XR device such as an AR / VR / MR / XR goggle. Specific implementations may include game consoles, set-top boxes, personal computers, etc.

[0048] The processing chain receives an audio scene description 20 from the bitstream (or storage / memory) 10. The audio scene description 20 may include a representation of a three-dimensional audio scene and information about the sound source locations of sound sources within the audio scene. The representation of the three-dimensional audio scene may be voxel-based, for example.

[0049] The processing chain 100 further receives instructions for the user's (listener's) position (listener's position) 30 within the audio scene. The audio scene description 20 and user position 30 are provided to a diffraction direction calculation block (diffraction calculation block) 40 for determining (e.g., calculating) diffraction information. The diffraction information may relate to the acoustic diffraction path within the audio scene between the sound source position and the listener position. The diffraction information is then provided to a diffraction modeling tool 50 for applying diffraction modeling and optional occlusion modeling based on the diffraction information. Occlusion modeling calculates the linear attenuation gain between the listener and the audio source. The diffraction modeling tool 50 may output audible audio data (3DoF audible data (3DoF auralizer data)) including, for example, the position, orientation, and frequency-dependent gain of the object to be rendered. The diffraction modeling tool output may be further processed by other rendering stages such as Doppler, directivity, distance attenuation, etc. Generally, the diffraction modeling tool 50 outputs diffraction information as detailed below. The audible audio data may then be used, for example, for audio playback.

[0050] In short, the processing chain shown in Figure 1 may be used to convert voxel-related data into parameters for audibility and parameters for the signal. The diffraction direction calculation block 40 and the diffraction modeling tool 50 may be considered non-limiting examples of rendering tools. Generally, the rendering tool may generate 3DoF audible data.

[0051] As described above, the scene description may include a voxel matrix and associated coefficients (e.g., reflection coefficient, occlusion coefficient, absorption coefficient, transmission coefficient, etc.). These coefficients may represent the material or material properties of each voxel. The rendering tools may include, for example, occlusion and diffraction modeling tools. The 3DoF audible data may include, for example, object position, orientation, and frequency-dependent gain.

[0052] As described above, a voxel-based representation of a 3D audio scene defines psychoacoustically relevant geometric elements and sound propagation media. In some implementations, the scene description may use the following parameters / interfaces (e.g., the following agreed data format or agreed data exchange points) to provide information to the rendering tool. Scene size: - Absolute units (e.g., meters) - Number of voxels and / or voxel size Scene anchor: - Coordinate anchor (maps absolute coordinates to voxel indices) - Scene anchor (maps subscenes to a subset of voxels) Scene content data: - Reference to material properties (e.g., transmission coefficient, reflection coefficient, etc.) that approximate the acoustic effects caused by obstructions (sound obstacles) located within the corresponding volume. - Reference to sound propagation medium characteristics (sound velocity, energy absorption, distance attenuation curve, etc.) that approximate the acoustic effects caused by a medium located within the corresponding volume. - Rendering control parameters that describe the intended occlusion modeling effect For example, the "global" or "local" occlusion type determines the length (and shape) of the shadow of the occlusion effect behind this voxel. - Rendering control parameters that describe the intended sound diffraction modeling effect For example, voxel types that control / cause changes in the direction of sound (i.e., the path of diffracted sound cannot pass through this volume). - Content control parameters describing the relationship between audio signals and scene authoring. For example, audio signal ID and / or signal gain determine which signals are perceptually relevant (rendered) within their respective volumes. - Rendering control parameters that describe the intended reverberation modeling effect For example, voxel types that control reverberation settings (e.g., RT60, DDR, RIR, etc.) Scene content update: -Referenced to update trigger events All data can rely on audio objects (to support the content creator's intent in flexible audio scene authoring).

[0053] 3DoF audible data may include the following information: - Parameters and associated signals for a set of audio objects (and HOAs) ○The parameters include metadata output from the rendering tool (parameters such as position, orientation and gain, reverberation coefficient, and IR that simulate the effects of occlusion, diffraction, and early reflections). ○The relevant signal represents the audio output of the rendering tool (i.e., the downmixed or duplicated audio signal). - Scene state identifier (i.e., metadata that enables mapping of scene description and user input to 3DoF audible data)

[0054] Figure 2 shows possible scene states and examples of diffraction paths for these scene states. It is understood that the scene states are related to or include the listener position 210 and the audio scene description (including the representation of the 3D audio scene and the sound source position 220).

[0055] The example in Figure 2 relates to a voxel-based representation of a three-dimensional audio scene. This voxel-based representation shows "air" voxels or empty voxels (i.e., voxels through which sound can propagate, or transparent voxels) 230 and shielding voxels 240 (i.e., voxels through which sound cannot propagate or is not freely propagated). Thus, shielding voxels may be understood as relating to voxels filled with materials other than air, which can reflect, block, or otherwise alter sound propagation. For shielding voxels 240, the representation may further show the respective transmission coefficient, reflection coefficient, and potentially absorption coefficient related to the material properties of these voxels. These coefficients may be linked to the ID or index of each voxel in the voxel-based representation. In general, voxel-based representations may define psychoacoustically relevant geometric elements and sound propagation media within an audio scene.

[0056] Listener position 210 is parameter L VOX As shown, the sound source location 220 is another parameter S VOX This is shown by.

[0057] The diffraction path (or generally the acoustic path) between the sound source position 220 and the listener position 210 may be determined using a pathfinding algorithm that takes the listener position 210, the sound source position 220, and a representation of the three-dimensional audio scene (or a two-dimensional representation, e.g., a 2D projection or 2D matrix derived therefrom in the form of a voxel grid) as input. For example, an algorithm for determining diffraction information takes the listener position 210, the sound source position 220, and a representation of the three-dimensional audio scene as input, and C VOX Variable r represents the position of the corner voxel (e.g., diffraction corner) 250 and the length of the diffraction path. in The output may include the following: For example, diffraction information (or generally acoustic path information) may be determined based on the following: [C vox ,r in ]=DiffractionDirectionCalculation(Lvox ,S vox ,VoxDataDiffractionMap) Here, DiffractionDirectionCalculation indicates an algorithm (the "path finding algorithm") for determining diffraction information, and VoxDataDiffractionMap indicates a voxel-based representation of a three-dimensional audio scene or a processed version thereof (e.g., a 2D projection or 2D matrix derived therefrom). C VOX is understood to indicate the coordinates of the diffraction corner (e.g., the coordinates of each voxel containing the diffraction corner, voxel / grid coordinates or voxel / grid indices). The diffraction corner may also be referred to as a corner voxel.

[0058] Here, DiffractionDirectionCalculation may include any of the feasible pathfinding algorithms, such as the fast traversal algorithm for ray tracing (see Amanatides, J. and A. Woo, A Fast Voxel Traversal Algorithm for Ray Tracing. Proceedings of EuroGraphics, 1987. 87) and the JPS algorithm (see Harabor, DD and A. Grastien, Online Graph Pruning for Pathfinding On Grid Maps. Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011). Furthermore, a 3D pathfinding algorithm may be directly applied using a voxel-based scene representation to obtain the shortest path between the sound source position 220 and the listener position 210. Alternatively, a 2D pathfinding algorithm may be applied to this task using a suitable 2D projection plane of the 3D voxel-based scene representation. For indoor (e.g., multi-room) sound simulations, the corresponding 2D projection plane may be similar to a floor plan describing the "sound propagation path topology." For outdoor sound simulation scenarios, it may be interesting to consider a second (e.g., vertical) 2D projection plane to describe the diffraction paths through sound obstacles or shielding structures. The pathfinding approach remains the same for all projection planes, but its application provides further paths that can be used for diffraction modeling.

[0059] The pathfinding algorithm is assumed to connect the sound source position 220 to the listener position and output an acoustic path (e.g., a diffraction path) consisting of multiple voxel designations (e.g., voxel indices or voxel coordinates) that indicate the direction of the acoustic path. For visualization purposes, in some cases, the acoustic path may be seen as relating to multiple linear path segments (line segments) whose ends are continuously linked. Each transition from one path segment to another relates to a change in the direction of the diffraction path.

[0060] According to the algorithm for determining diffraction information, the diffraction corner (or corner voxel) C VOX These are located on or near the diffraction pathway, and are a set of voxels representing corner voxels on the diffraction map (indicated by a voxel-based representation). set It may be determined as a voxel adjacent to a shielding voxel (within). For example, diffraction corner C VOX This is a set of voxels that form a diffraction pathway (P set ) from, path (P set ) changes the direction of "visible" (C) (from listener position Lc) set A voxel close to a shielding voxel (belonging to) may be selected. If there are more than one such corner, the diffraction path (P set The one furthest from the listener's position is selected according to the criteria.

[0061] Generally speaking, diffraction path algorithms can be said to determine diffraction information regarding the acoustic path (e.g., acoustic diffraction path) within an audio scene between the sound source position and the listener position.

[0062] This diffraction information (or generally acoustic path information) may be sufficient for the renderer to reconstruct / determine the virtual source position of the virtual audio source that encapsulates the effect of acoustic diffraction. This is the diffraction corner C. VOX The coordinates and diffraction path length r inThis is the case. For example, the virtual sound source position may be reconstructed by calculating the direction of the diffraction corner (e.g., azimuth angle, or azimuth and elevation angle) as seen from the listener's position. Using this direction, the path length r of the diffraction path is calculated. in By taking this as the virtual sound source distance to the listener's position, the virtual sound source position can be determined.

[0063] Note that diffraction information can be represented in different ways. One option is, as described above, the path length r in and diffraction corner C VOX This is diffraction information that includes / stores coordinates (e.g., grid coordinates). Based on the above, the following data elements may be defined. An example of scene state N1 may be represented as follows: N1={L vox ,S vox ,VoxDataDiffractionMap} That is, listener position L VOX , sound source position S VOX This may also relate to or include voxel-based representations of audio scenes (e.g., VoxDataDiffractionMap). The scene state identifier for scene state N1 may be defined as follows: SceneStateIdentifier=HASH(N1) Here, HASH is a hash function that generates a hash value for a scene state N1, for example, by mapping a scene state to a fixed-size value. Generally, a scene state identifier can be said to indicate a particular scene state or to identify a particular scene state. Furthermore, an example of diffraction information N2 may be represented as follows. N2={C vox ,r in} Here, r in This is the path length of the diffraction pathway, and C vox This indicates the position of the diffraction corner (e.g., voxel position) as shown above. A quantized version of the diffraction information N2 may be shown by N3. N3=voxSceneDiffractionPreComputedPathData(N1) N3(~=N2)=DiffractionDirectionCalculation(N1) Here, voxSceneDiffractionPreComputedPathData() is a bitstream syntax that parses the bitstream and extracts pre-calculated (stored and quantized) diffraction information (e.g., generated by processing chain 500 in Figure 6), and DiffractionDirectionCalculation() is a function that performs online calculation of the diffraction information, which may be done, for example, in the diffraction direction calculation block 40 in Figures 5 and 7. User's position voxel coordinate L vox Since it is fixed, diffraction information, for example, C vox and r in This may also be considered as relating to 3DOF audible data.

[0064] Table 1 shows an example of the MPEG-I standard syntax element voxSceneDiffractionPreComputedPathData(). This voxel payload data structure may have the following elements: numberOfVoxDiffractionPathData This element represents the number of pre-calculated diffraction path datasets. voxDiffractionPathStartVoxelPacked This element is the pre-calculated path initiation voxel of the diffraction pathway (e.g., S in Figure 2). vox or L vox This represents the packed form of the variable voxDiffractionPathStartVoxel, which indicates the voxel index of ). voxDiffractionPathStartVoxel may, for example, be a two-dimensional position on the diffraction map indicating the starting position of the diffraction path. voxDiffractionPathEndVoxelPacked This element is the path termination voxel of the pre-calculated diffraction pathway (for example, L in Figure 2). vox or S vox This represents the packed form of the variable voxDiffractionPathEndVoxel, which indicates the voxel index of ). voxDiffractionPathEndVoxel may, for example, be a two-dimensional position on the diffraction map indicating the end position of the diffraction path. voxDiffractionPathDataExistFlag This element indicates whether or not a diffraction path exists. voxDiffractionSourceDirectionPacked This element represents the packed form of the variable voxDiffractionSourceDirection, which indicates the voxel index of the voxel used to determine the diffraction source azimuth angle value. This is, for example, a corner voxel C vox It may also be possible to accommodate this. voxDiffractionPathLength This element represents the length of the diffraction path on the 2D matrix of the diffraction map. For example, this is the path length r. in It may also be possible to accommodate this. Note: voxSceneDimensions Number of voxels per scene dimension escapedValue() This element implements a common method for transmitting integer values ​​using a variable number of bits. It features a two-level escape mechanism that allows extending the range of representable values ​​by the sequential transmission of further bits. The syntax of escapedValue() is as defined in ISO / IEC 23003-3. [Table 1] The variables extracted from voxSceneDiffractionPreComputedPathData() (e.g., Figure 6) are further processed to output diffraction information N3.

[0065] The technical advantages and benefits of the technology disclosed herein are that, if corresponding processing has already been performed on the scene state and diffraction information or 3DoF audible data is available, the application of diffraction modeling tools or rendering tools can be avoided by using a scene state identifier or other information derived from the scene state. In this scenario, the renderer, - Reusing pre-calculated data, or - Applying data calculated by another renderer This allows access to diffraction information / 3DoF audible data (for known scene states) without using rendering tools.

[0066] Therefore, the technical advantages and benefits of the technology disclosed herein relate to lossless functionality intended for low complexity modes (complexity versus bitrate).

[0067] To fully implement such a mechanism, this disclosure proposes providing a processing chain for processing audio scene information for audio rendering (e.g., in a decoder / renderer), having an interface for providing / outputting diffraction information for later use or for use by a different decoder / renderer. This interface is understood to be a data interface for outputting data in a predetermined format, in particular to enable consistent reuse by other decoders / renderers. The interface may be implemented and / or utilized in either a software or hardware combination. Specifically, this may relate to providing / outputting a data element containing diffraction information and information about the scene state, such as a scene state identifier. The data element may have a predetermined format, for example, having a predetermined data field. Using this interface, the processing chain can provide the calculated diffraction information or 3DoF audible data, along with the scene state identifier, to other decoders / renderers and / or store it for later reuse.

[0068] In some implementations, the above may relate to providing / outputting (or receiving at the receiving end) pre-calculated acoustic path information for one or more (e.g., all feasible) combinations of sound source position and listener position in a voxel-based scene representation (e.g., a 2D voxel grid).

[0069] Example 1: If a decoder / renderer obtains acoustic path information / diffraction information (e.g., diffraction path) for a given user location (listener location), the decoder / renderer can reuse it until the user leaves the corresponding voxel volume (or the scene description is updated).

[0070] Example 2: If the calculated acoustic path information / diffraction information corresponds to scene conditions unknown to other decoders, they may reuse the acoustic path information / diffraction information and avoid running their own diffraction modeling or rendering tools.

[0071] The exchange and sharing of acoustic path / diffraction information between different decoders can be done using a database, which can be included in the bitstream (for example, accessed via application requests).

[0072] Figure 3 is a flowchart illustrating an example of a method 300 for processing audio scene information for audio rendering according to an embodiment of the present disclosure. Method 300 may be implemented in software, hardware, or a combination thereof. For example, the processing chain 100 may be implemented by a renderer / decoder coupled to an AR / VR / MR / XR device such as AR / VR / MR / XR goggles. Specific implementations may include game consoles, set-top boxes, personal computers, etc.

[0073] Method 300 includes steps S310 to S350, which can be performed by a decoder / renderer, for example. These steps may be performed, for example, each time the scene state changes. For example, in a scene state that is performed by the above-described scene state N1 and is related to, or understood to include, the listener position 210 and the audio scene description (including the representation of the 3D audio scene and the sound source position 220), the change in the scene state may be related to one or more of the following: a change in the listener position 210, a change in the sound source position, and a change in the 3D audio scene (or its representation). Alternatively, steps S310 to S350 may be performed for each of multiple processing cycles of the decoder / renderer. However, if the audio scene description has not changed, step S310 may be omitted. It should also be understood that steps S310 to S350 do not need to be performed in the order shown in Figure 3.

[0074] Step S310 At this point, an audio scene description is received. The audio scene description includes a representation of the 3D audio scene and information about the sound source locations of the sound sources within the audio scene. For example, the audio scene description includes, for example, the element S defined above. vox It may also include VoxDataDiffractionMap.

[0075] Step S320 In this process, information about the listener's position within the audio scene is received. The listener's position is, for example, the element L defined above. VOX It may also be possible to accommodate this.

[0076] Step S330 In this process, diffraction information regarding the acoustic diffraction path within the audio scene between the sound source position and the listener position is acquired. The acquired diffraction information may indicate the virtual sound source position of the virtual sound source. For example, the virtual sound source position is the diffraction corner C when viewed from the listener position. VOX It may have the same direction (for example, azimuth angle, or azimuth angle and elevation angle). The virtual sound source distance is the diffraction path length r in It may also correspond to the above. Therefore, the diffraction information is C as defined above. vox and r in This may include instructions.

[0077] Step S340 In this process, audio rendering is performed for the sound source based on diffraction information. This may include, for example, diffraction modeling.

[0078] Therefore, the virtual sound source position may be determined based on diffraction information. The virtual sound source may also be an audio sound source that encapsulates the effect of acoustic diffraction between the sound source position and the listener position in a 3D audio scene. For example, the virtual sound source position is C VOX and r in Based on, • Diffraction corner C as seen from the listener's position VOX Determining the direction (for example, azimuth angle, or azimuth angle and elevation angle), • The determined direction will be used as the virtual sound source direction of the virtual sound source as viewed from the listener's position. • Diffraction path length r in of, ○ The virtual sound source distance from the listener's position, or Virtual source gain compensation derived from diffraction and direct path length To be used as It may be determined by...

[0079] Next, audio rendering may include, for example, rendering a virtual sound source at the virtual sound source location.

[0080] Step S350 In this case, a representation of diffraction information is output. For example, outputting a representation of diffraction information may include outputting a data element that includes diffraction information and information about the scene state. The scene state is an audio scene description (e.g., S vox (and VoxDataDiffractionMap) and listener position (for example, L VOX ) may be included.

[0081] The output may be provided in a look-up table (LUT). The LUT contains, as its entries, different diffraction information entries indexed with information about each scene state (e.g., indexed with each scene state identifier). Thus, the LUT can be said to contain diffraction information and information about the scene states. The LUT can be stored and / or provided so that it can be retrieved later, for example, by another decoder from the bitstream, or, for example, by application request, from shared storage (e.g., cloud or server-based). Hash values ​​or scene state identifiers of the scene states can be used to actually retrieve the desired entries from the LUT.

[0082] Furthermore, the representation of diffraction information may be output to a bitstream (e.g., output bitstream) and / or storage (e.g., memory, cache, file, etc.). The storage may be local or shared (e.g., cloud-based). Generally, the representation of diffraction information may be output to a suitable medium for storing digital information or computer-related information. The output may be directed, at least partially, to an external or shared data source or data repository.

[0083] In some implementations, the representation of diffraction information may be output as part of the voxSceneDiffractionPreComputedPathData() syntax element, in accordance with ISO / IEC 23090-4 (Coded representation of immersive media - Part 4: MPEG-I immersive audio, https: / / www.iso.org / standard / 84711.html) or a future standard derived therefrom. For example, the syntax elements of voxSceneDiffractionMap() may be given by Table 2. [Table 2] voxSceneDiffractionMap() provides a compact representation of a 2D diffraction map (VoxDataDiffractionMap). This 2D representation is similar to the 3D representation used for voxel-based 3D audio scenes.

[0084] A MapElement is defined by two points (x, y indices) on the diffraction map and their corresponding values. The two points span a rectangle, and all covered grid cells are assigned the value voxDiffractionMapValue.

[0085] The bitstream element numberOfVoxDiffractionMapElements represents the number of MapElements.

[0086] The bitstream element voxDiffractionMapValue represents a binary value that controls the pathfinding algorithm. This value is useful because it indicates whether a path can pass through a grid cell. This value is defined for all entries on the diffraction map.

[0087] The bitstream element voxDiffractionMapPosPackedS represents a packed representation of the two indices of the starting grid cells of the MapElement. This can also be an array representing the set of all starting grid cells.

[0088] The bitstream element voxDiffractionMapPosPackedE represents a packed representation of the two indices of the end grid cells of the MapElement. This could also be an array representing the set of all end grid cells.

[0089] Both voxDiffractionMapPosPackedS and voxDiffractionMapPosPackedE are useful because they allow for a compact representation of data where a single voxDiffractionMapValue is used for all grid cells between these two variables.

[0090] Figure 4 is a flowchart showing an example of Method 300, which includes steps that may be performed to carry out the steps of Method 400. Method 400 includes steps S410 to S460. Of these, steps S410 to S450 may carry out step 330 of Method 300. Furthermore, step S460 may correspond to step S350.

[0091] Step S410 In this process, the current scene state is determined based on the audio scene description and the listener's position.

[0092] Step S420In this step, it is determined whether the current scene state corresponds to a known scene state for which pre-calculated diffraction information is available (e.g., can be retrieved). The pre-calculated diffraction information may be retrieved, for example, from a bitstream (input bitstream) or storage (in particular, including external or shared storage). Determining whether the current scene state corresponds to a known scene state may include determining a hash value based on the current scene state. This may further include comparing the hash value of the current scene state with the hash values ​​of known (e.g., previously encountered) scene states.

[0093] If it is determined that the current scene state corresponds to a known scene state ( Step S430 If YES, the method proceeds to step S440.

[0094] Step S440 In this process, diffraction information is determined by extracting pre-calculated diffraction information for known scene states from a bitstream or storage. The storage may be local storage (e.g., memory, cache, files, etc.) or shared storage (e.g., cloud storage, server storage).

[0095] Extracting pre-calculated diffraction information for known scene states may involve receiving a lookup table or entries to a lookup table from a bitstream (input bitstream) or storage. The lookup table may be considered a representation of the diffraction information. It may contain multiple pre-calculated diffraction information entries, each associated with a known scene state. The pre-calculated diffraction information and associated known scene states may correspond to the data elements described above. A known scene state may include, or indicate, a known audio scene description and a known listener position.

[0096] Selecting relevant entries from a received lookup table, or selecting relevant entries that should be received (if only those entries are received, and not the entire lookup table), may include using hash values, as described above.

[0097] On the other hand, if it is determined that the current scene state does not correspond to a known scene state (NO in step S430), the method proceeds to step S450.

[0098] Step S450 In this process, diffraction information is determined using a pathfinding algorithm based on the sound source position, the listener position, and the representation of the 3D audio scene. This may be done according to the procedure described above, with reference to Figure 2.

[0099] Step S460 In this step, diffraction information acquired via step S440 or step S450 is output. This step may correspond to step S350 described above.

[0100] In short, the proposed method may include (among other things) the following: • Check whether the pre-calculated diffraction path information (i.e., pre-calculated diffraction information) can be retrieved from the cache and reapplied to the current scene state and listener position. • Check whether the pre-calculated diffraction path information (i.e., pre-calculated diffraction information) for the current scene state can be retrieved from the memory cache or bitstream and reapplied.

[0101] Here, the scene state is the pathfinding algorithm, voxel C vox Input parameter L for the function DiffractionDirectionCalculation(), which includes selection and diffraction path length estimation steps. vox S vox It is defined via VoxDataDiffractionMap. Diffraction path information (e.g., diffraction information) is output parameter Cvox , r in This is defined via [a specific method]. This diffraction path information can be obtained directly from the bitstream syntax voxSceneDiffractionPreComputedPathData() for the corresponding scene state, if available, thus avoiding the call to DiffractionDirectionCalculation().

[0102] Diffraction path information C vox , r in Current scene state L vox S vox When a VoxDataDiffractionMap is obtained, this information can be cached in memory (and provided outside the renderer) for later reuse by the renderer (or other renderer instances).

[0103] In other words, the "diffraction pathfinding" according to this disclosure (e.g., as embodied by Method 300 and / or Method 400) may include the following processes: - Check whether the pre-calculated diffraction path information for the current scene state can be retrieved from the memory cache or bitstream and reapplied. The scene state is determined by the pathfinding algorithm and voxel C. vox Input parameter L for the function DiffractionDirectionCalculation(), which includes selection and diffraction path length estimation steps. vox S vox It is defined via VoxDataDiffractionMap. [C vox ,r in ]=DiffractionDirectionCalculation(L vox ,S vox (VoxDataDiffractionMap) Diffraction path information is output parameter C vox , r inThis is defined via [a specific method]. This diffraction path information can be obtained directly from the bitstream syntax voxSceneDiffractionPreComputedPathData() for the corresponding scene state, if available, thus avoiding the call to DiffractionDirectionCalculation(). -Diffraction path information C vox , r in Current scene state L vox When Svox and VoxDataDiffractionMap are obtained, this information can be cached in memory (and provided outside the renderer) for later reuse by the renderer (or other renderer instances).

[0104] In the above, the bitstream syntax definition may be written in the function() format in the MPEG standard document. This defines how to read / parse data (bitstream elements) from the bitstream. In this case, it is used to obtain the variables / information necessary to reconstruct the diffraction path information.

[0105] Figures 5, 6, and 7 show an example of a processing chain 500 according to the above, which can be used to process audio scene information for audio rendering. Specifically, the processing chain 500 can be used to convert voxel-related data into parameters and signals necessary for audibility.

[0106] Figure 5 illustrates the case where the current scene state is an unknown scene state. Unlike the processing chain 100 in Figure 1, the bitstream / memory 510 further includes data elements containing diffraction information and associated scene states, for example, in the form of a lookup table as described above.

[0107] Similar to processing chain 100, processing chain 500 receives an audio scene description 20 from the bitstream (or storage / memory) 510. Processing chain 500 then receives instructions for the user's (listener's) position (listener's position) 30 within the audio scene.

[0108] The diffraction direction calculation block (diffraction calculation block) 40 for determining (e.g., calculating) diffraction information, and the diffraction modeling tool 50 for applying diffraction modeling and optional occlusion modeling based on the diffraction information may be the same as in the case of the processing chain 100.

[0109] However, unlike the processing chain 100 in Figure 1, the audio scene description 20 and listener position 30 are used to determine the scene state 515 or scene state identifier. This scene state 515 (e.g., scene state N1 as defined above) or scene state identifier (e.g., HASH(N1)) is provided / input to the scene state analysis block 520, which determines whether the current scene state 515 corresponds to a known scene state 530 (YES in block 535) or not (NO in block 535). In this example, the current scene state 515 does not correspond to a known scene state (i.e., the current scene state 515 is an unknown scene state). Therefore, the audio scene description 20 and listener position 30 are input to the diffraction direction calculation block 40, similar to the processing chain 100, to generate diffraction information 550. The diffraction information 550 is then used for rendering / diffraction modeling, similar to the case of the processing chain 100. However, diffraction information 550 (for example, diffraction information N2 as described above or its quantized version N3) is output via an interface for later reuse by a renderer or other (external) rendering instance. Specifically, diffraction information 550, along with the corresponding scene state, may be output to a bitstream (or memory / storage) 510 via an output interface 555.

[0110] Figure 6 shows the case where the current scene state 515 is a known scene state.

[0111] Similarly, the current scene state 515 is provided / input to the scene state analysis block 520 to determine whether the current scene state 515 corresponds to a known scene state 530. In this example, the current scene state 515 corresponds to a known scene state. Therefore, instead of inputting the audio scene description 20 and listener position 30 to the diffraction direction calculation block 40 to calculate / generate diffraction information, the diffraction information is extracted / received from the bitstream (or storage / memory) 510 as described above (e.g., via step S450 of method 400). Furthermore, even if the diffraction information is not calculated locally, the diffraction information may be output to the bitstream (or memory / storage) 510, as in the case of Figure 5. This is because, once the diffraction information is obtained from one source (e.g., from a bitstream or storage), it can thus be made available to each of the other sources.

[0112] Figure 7 shows the complete processing chain 500, including data paths for both known and unknown scene states 515.

[0113] Figure 8 shows a measure of complexity as a function of time for different implementations that handle audio scene information or audio rendering, assuming a simple maze as the audio scene. Furthermore, it is assumed that the user moves randomly through the maze and therefore revisits previously visited locations. Graph 810 concerns the case where pre-calculated diffraction information is not available at all (e.g., diffraction information is not provided in a bitstream and memory / cache is disabled). In this case, the rendering load is substantially constant and relatively high. Graph 820 concerns the case where pre-calculated diffraction information is available locally (e.g., diffraction information is not provided in a bitstream and local memory / cache is enabled). In this case, the rendering load decreases over time because the diffraction information is stored locally. In other words, as more and more scene states are encountered, it becomes more relevant to (locally) known scene states. Graph 830 finally concerns the case where pre-calculated diffraction information is provided externally (e.g., complete diffraction information is provided in a bitstream). In this case, a significant portion of the scene state is related to known scene states, and diffraction information can be retrieved externally (e.g., from a bitstream or from external / shared storage on request) without local calculations, so the renderer's computational load is always low.

[0114] Figure 9 schematically illustrates an example of a possible use case of the technology according to embodiments of the present disclosure. Two listeners (users) A and B are shown at different locations within an audio scene (e.g., a house with different areas and floors). Users A and B may be users individually or jointly exploring a VR environment including the audio scene, for example, as part of a game, virtual tour, etc. Users exploring a common VR environment may, for example, be running social VR. Because the listeners are at different locations within the audio scene, users A and B generate different rendering results and different diffraction information. The present disclosure assumes that each user (or their respective device / decoder / renderer) makes their calculated diffraction information available to the other user. When user B enters an area of ​​the audio scene where user A was previously located, user B can benefit from user A's pre-calculated diffraction information, and vice versa. For example, user A's diffraction information may be made available to user B via a LUT that indexes different diffraction information items with corresponding scene states or scene state identifiers. By exchanging diffraction information between different devices / decoders / renderers, the computational load on both users' devices / decoders / renderers can be reduced depending on their motion patterns within the audio scene.

[0115] Furthermore, since users (listeners) tend to behave similarly, diffraction information (diffraction data) accumulates, particularly for relevant (e.g., frequently occurring) scene states. This is extremely difficult to achieve for encoder-side pre-calculation of diffraction information because the encoder does not have access to the actual listener locations and can therefore only make assumptions about them. Moreover, the use of data storage (e.g., physical / shared storage or bitstream bandwidth) is further inefficient for encoder-side pre-calculation in this case, due to some of the pre-calculated diffraction information relating to irrelevant or less relevant scene states.

[0116] For example, the proposed functions and technologies can create LUTs that correspond to the user's actual 6DoF behavior (not those assumed by the encoder), and therefore can be said to be related to smart, user-oriented LUT creation.

[0117] Representation and coding of acoustic path information The above describes methods for providing, outputting, storing, exchanging, and / or reusing pre-calculated scene state information. In connection with or in addition to this, it may be important to provide an interface for providing, outputting, storing, exchanging, and / or reusing acoustic path information between encoders and decoders / renderers (e.g., encoder-decoder, decoder 1-decoder 2, or decoder 1-decoder 1). To do so, independently of the details of calculating and exchanging acoustic path information, this disclosure provides a mechanism for, for example, an efficient representation (e.g., coding) of acoustic path information for storage or transmission. Thus, this mechanism may be used independently or in combination with the techniques described elsewhere in this disclosure.

[0118] Here, the acoustic path information may relate to, for example, pre-calculated acoustic path information for all feasible pairs of start and end positions (start voxel and end voxel) in a voxel grid (note that for some pairs, no valid path may exist). In this case, the acoustic path information may be provided, for example, by an encoder. Alternatively, the acoustic path information may relate to acoustic path information calculated at runtime by a decoder / renderer, for example, for later reuse by the same decoder or for use by a different decoder. This disclosure provides different modes for representing (e.g., encoding) the acoustic path information, depending on the respective use case, for example, depending on the amount and / or nature of the pre-calculated acoustic path information.

[0119] In a reference model (RM) for representing or encoding acoustic path information, pairs of "from-to" voxel coordinates (e.g., user and object voxel coordinates) are encoded for each acoustic diffraction path. If there is an acoustic diffraction path for a given pair of voxel coordinates: The diffraction corner coordinates (which determine the position of the diffraction source) and the diffraction path length (which determines the level / gain of the diffraction source) are explicitly coded and transmitted. • Otherwise: Further information is not coded or transmitted.

[0120] However, the RM approach for representing pre-calculated acoustic path data (acoustic diffraction path data) has high redundancy for the following reasons. • Repetition of voxel coordinates for static scene states (with respect to either the user or object position), and • Precision of the representation of redundant path lengths (encoded as IEEE float32)

[0121] This disclosure addresses both issues as follows: • Avoiding repetition of voxel coordinates (for example, by the nested order of coding for pairs of voxels), and • Application of differential diffraction path length coding (for example, by representing the path length in terms of continuous traversal of a voxel grid and the difference in the number of voxels or a previous value).

[0122] Generally, acoustic path information may include one or more path information items (acoustic path information items), each associated with a specific acoustic path. Each path information item specifies a first position in the voxel grid, a second position, the path length of the acoustic path between the first and second positions in the voxel grid, and a corner voxel. The first position (for example, the first voxel) is the sound source position S as defined above. VOXThe starting position of the acoustic path (e.g., the starting voxel) may be related to the following: The second position (e.g., the second voxel) is the listener position L defined above. VOX The starting position may be related to the end position of the acoustic path (e.g., the end voxel). For example, the starting position may be given by the syntax element pcpdStartVoxelPacked defined below, and the ending position may be given by the syntax element pcpdEndVoxelPacked defined below. In either case, the assignment of the first and second positions to the start and end positions may be reversed in some implementations, depending on the use case and requirements. A corner voxel may also be a voxel on the acoustic path where the acoustic path changes direction. In some implementations, a corner voxel may further be required to be visible from the second position (e.g., the end position) in the sense that a line of sight must exist between the second position and the corner voxel in the voxel grid. If there are more than one voxel that satisfy this definition, the voxel that satisfies the definition and is furthest from the second position (or closest to the first position) may be designated as the corner voxel.

[0123] This disclosure aims to efficiently encode sequences of routing information items. Generally, when encoding a given routing information item, the technology provided by this disclosure attempts to reuse information related to a preceding routing information item. For example, even if one or both of the first and second positions differ from one routing information item to the next, the corner voxels and / or routing lengths may be the same or similar.

[0124] To increase the probability that information can be reused among route information items, the technology of this disclosure performs encoding and decoding in a nested manner. That is, route information items in a sequence of route information are grouped in the sequence by their respective first positions. Furthermore, for each first position, route information items are preferably grouped such that the respective second positions of adjacent route information items in the sequence are close to each other, for example, adjacent within a voxel grid.

[0125] One exemplary implementation of the technology described herein attempts to reuse information about at least corner voxels. An example of a corresponding method for encoding acoustic path information is described with reference to Figures 12 and 13.

[0126] Figure 12 is a flowchart of a method 1200 for processing audio scene information, in particular a method 1200 for encoding acoustic path information, which includes multiple path information items, for a given voxel-based audio scene representation.

[0127] Step S1210 In this process, a voxel-based audio scene representation of the audio scene is obtained (e.g., received, extracted from a bitstream, read from storage, etc.).

[0128] Step S1220In this, for one or more first positions and one or more second positions within a two-dimensional voxel grid related to a voxel-based audio scene representation, path information items for each first and second position are sequentially encoded. That is, the path information items may be arranged in a given sequence (e.g., a predetermined sequence) and may be encoded sequentially according to this sequence. This sequence may be such that path information items specifying the same first position are consecutive with respect to each other, i.e., with respect to subsequences within the sequence. In this sense, the above sequence can be said to relate to the nesting and processing of related first and second positions and the grouping of path information items specifying the same first position. Preferably, for each such group, the second positions specified by the path information items within the group are processed according to a predetermined pattern, as described in more detail below.

[0129] As described above, each path information item may specify the first position, the second position, the path length of the acoustic path between the first and second positions in the voxel grid, and the corner voxel on the acoustic path where the acoustic path changes direction. It is understood that the voxel grid may be related to a two-dimensional projection map generated from a voxel-based audio scene representation.

[0130] Step S1230 In this process, an encoded route information item is generated based on the (current) route information item for the current route information item. The generated encoded route information item includes at least an indication of each first location and an indication of each second location. This may include further encoded information, as detailed below.

[0131] Further encoding steps for the current route information item differ depending on whether the corner voxel specified by the current route information item differs from the corner voxel specified by the preceding route information item (i.e., preceding in the sequence) (and therefore the further content of the encoded route information item also differs). It is understood that method 1200 may include a step of determining whether this is the case (not shown).

[0132] Step S1240 This relates to the case where the corner voxel specified by the current route information item differs from the corner voxel specified by the preceding route information item. In this case, the encoded route information item includes an indication of the corner voxel (location). In other words, the indication of the corner voxel is included in (or added to) the encoded route information item. This indication of the corner voxel may be an absolute, explicit, and / or non-differential indication of the corner voxel, for example, using the voxel coordinates or voxel index of the corner voxel. The indication may be related, for example, to the syntax element pcpdSourceDirectionPacked as defined below. Furthermore, the encoded route information item may include an indication, for example, in the form of a single-bit flag, that the corner voxel is different from that of the preceding route information item. This bit flag may be associated with the flag pcpdUsePrevSourceDirection=="false" (in the first mode, e.g., the selection mode defined below) or the flag pcpdUsePrevData=="false" (in the second mode, e.g., the full mode defined below).

[0133] Step S1250This relates to the case where the corner voxel specified by the current route information item is the same as the corner voxel specified by the preceding route information item (i.e., the corner voxel specified by the current route information item is identical to the corner voxel specified by the preceding route information item). In this case, the encoded route information item includes, instead of a corner voxel indication, an indication that the corner voxel specified by the current route information item is the same as the corner voxel specified by the preceding route information item. In other words, the encoded route information item includes (or is added to) an indication that the corner voxel specified by the current route information item is the same as the corner voxel specified by the preceding route information item. This indication may, for example, relate to a single-bit flag. This bit flag may relate to the flag pcpdUsePrevSourceDirection=="true" defined below (in the first mode, e.g., selection mode) or the flag pcpdUsePrevData=="true" defined below (in the second mode, e.g., full mode).

[0134] Method 1200 may further include, for example, the step (not shown in the drawings) of outputting the encoded routing information items to a bitstream for transmission or storage.

[0135] Starting from the above techniques, the disclosure provides two different modes (coding modes) that can be selected for encoding acoustic path information: a first mode (e.g., selection mode) that can be used when, for example, only a relatively small number of path information items should be encoded, and a second mode (e.g., full mode) that can be used when, for example, a large number of path information items should be encoded for, for example, all feasible pairs of first and second positions in an acoustic scene. Whether the first mode is used for a path information item (or sequence of path information items) or the second mode is used for a path information item may be signaled by a flag included in the bitstream, a flag valid for the entire sequence, or a flag within the encoded path information item. This flag may be, for example, the flag pcpdFullMode defined below.

[0136] In the first mode (e.g., pcpdFullMode=="false"), the generated encoded path information item further includes a path length indication (e.g., pcpdPathLength as defined below). This path length indication may be an absolute, explicit, and / or non-differential indication of the path length, for example, in units of voxel or voxel edge length. Thus, the first mode may reuse previous indications of corner voxels, but may include path length indications regardless of whether the corner voxel changes from one path information item to the next.

[0137] As an alternative implementation of the first mode, the generated encoded route information item may include an indication (e.g., a 1-bit flag) whether or not to include a route length indication, or an indication of the difference between the route length specified by the current route information item and the route length specified by the preceding route information item, if the corner voxel specified by the current route information item is the same as the corner voxel specified by the preceding route information item. Depending on this indication, the encoded route information item may include a route length indication, or an indication of the difference between the route length specified by the current route information item and the route length specified by the preceding route information item. The latter may, for example, be related to a 2-bit value.

[0138] In the second mode (for example, pcpdFullMode=="true"), the encoded path information item after generation does not necessarily include an indication of the path length. That is, in the second mode, if the corner voxel specified by the current path information item is the same as the corner voxel specified by the preceding path information item, the encoded path information item further includes an indication of the difference between the path length specified by the current path information item and the path length specified by the preceding path information item. This indication may take the form of the syntax element pcpdPathLengthDelta, for example, as defined below.

[0139] The above difference between path lengths can be encoded very efficiently for a specific order of path information items. Therefore, for encoding in the second mode, for each first position, the voxel grid may be traversed according to a predetermined pattern to determine a sequence of second positions, and for each first position, the path information items are sequentially encoded according to the determined sequence of second positions. That is, the sequence of path information items is determined by traversing the voxel grid, and encoding is performed according to this sequence.

[0140] One example of such a predetermined pattern is traversing a voxel grid in a raster scan manner along the rows and columns of the voxel grid.

[0141] When using such patterns, the difference in the above path lengths can be encoded very efficiently by using only 2 bits. In other words, the above difference may take only one of four predetermined values, in the sense that four predetermined values ​​are sufficient to encode the above difference (assuming that the corner voxels between the current path information item and the preceding path information item are the same). Examples of these four predetermined values ​​are given in Table 3 below, assuming that the voxel edge length is 1.

[0142] Accordingly, the bitstream may include an indication (for example, the bit flag pcpdFullMode, defined below) for the sequence of encoded routing information items whether the first mode or the second mode is used. Furthermore, each encoded routing information item may have the following: • An indication of the first position (for example, pcpdStartVoxelPacked as defined below) • A second position indication (for example, pcpdEndVoxelPacked as defined below), and • An indication of whether an acoustic path exists for the first and second positions (for example, the bit flag pcpdPathExists as defined below)

[0143] When the first mode (e.g., selection mode) is used (e.g., pcpdFullMode=="false"), the encoded routing information items further include: - An indication of whether the corner voxel specified by the corresponding route information item in the encoded route information is the same as the corner voxel specified by the preceding route information item (for example, a 1-bit flag such as the bit flag pcpdUsePrevSourceDirection defined below). - If the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item (for example, pcpdUsePrevSourceDirection=="true"), the route length instruction (for example, pcpdPathLength as defined below) • Otherwise (for example, pcpdUsePrevSourceDirection=="false"), the corner voxel directive (for example, pcpdSourceDirectionPacked as defined below) and the path length directive (for example, pcpdPathLength as defined below)

[0144] In an alternative embodiment of the first mode (e.g., selection mode), the encoded route information items may further include the following instead of the above: - An indication of whether the corner voxel specified by the corresponding route information item in the encoded route information is the same as the corner voxel specified by the preceding route information item (for example, a 1-bit flag such as the bit flag pcpdUsePrevSourceDirection defined below). - If the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item (e.g., pcpdUsePrevSourceDirection=="true"), an indication (e.g., a 1-bit flag) whether the route length is coded non-differentially (e.g., in absolute terms or explicitly) or differentially (e.g., as a difference value in relative terms). ○When the path length is coded non-differentially, the path length is indicated (non-differential, absolute, or explicit) (for example, pcpdPathLength as defined below). ○When path lengths are coded differentially, the difference between the previous path length and the current path length is indicated (for example, a 2-bit value), where the previous path length is the path length specified by the preceding path information item, and the current path length is the path length specified by the path information item corresponding to the encoded path information item. - If the corner voxel specified by the route information item corresponding to the encoded route information item is not the same as the corner voxel specified by the preceding route information item (for example, pcpdUsePrevSourceDirection=="false"), the corner voxel instruction (for example, pcpdSourceDirectionPacked as defined below) and the route length instruction (for example, pcpdPathLength as defined below)

[0145] If the second mode (e.g., full mode) is used (e.g., pcpdFullMode=="true"), the encoded routing information items will further include the following: - An indication of whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item (for example, the bit flag pcpdUsePrevData defined below) - If the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item (for example, pcpdUsePrevData == "true"), then an instruction between the previous route length and the current route length (for example, a 2-bit value such as pcpdPathLengthDelta as defined below), where the previous route length is the route length specified by the preceding route information item and the current route length is the route length specified by the route information item corresponding to the encoded route information item. • Otherwise (for example, pcpdUsePrevData=="false"), the corner voxel directive (for example, pcpdSourceDirectionPacked as defined below) and the path length directive (for example, pcpdPathLength as defined below)

[0146] Figure 13 is a flowchart of a method 1300 for processing audio scene information, in particular a method 1300 for decoding acoustic path information, which includes multiple (encoded) path information items, for a given voxel-based audio scene representation. The decoding method 1300 may include steps that reflect the steps of the corresponding encoding method 1200. Thus, it is understood that the data elements mentioned below (e.g., instructions, flags, etc.) may correspond to those defined above in the context of method 1200. In other words, the bitstream received by method 1300 may also be the bitstream output by method 1200.

[0147] Step S1310 At this point, a bitstream is received. The bitstream contains a sequence of encoded path information items for one or more first positions and one or more second positions in a two-dimensional voxel grid related to a voxel-based audio scene representation.

[0148] Each encoded path information item corresponds to the first position, the second position, the path length of the acoustic path between the first and second positions in the voxel grid, and the respective path information item specifying the corner voxel on the acoustic path where the acoustic path changes direction. It is understood that the bitstream received in this step may be the bitstream generated or output by method 1200 described above.

[0149] Step S1320 In this process, the encoded route information is sequentially decoded to generate (for example, restore) the corresponding route information items.

[0150] As described above, the encoded routing information items may be arranged in a given sequence (e.g., a predetermined sequence) within the bitstream and may be decoded sequentially according to this sequence. This sequence may be such that encoded routing information items specifying the same first position are consecutive with respect to each other, i.e., with respect to subsequences within the sequence. In this sense, the above sequence can be said to relate to the nesting and processing of related first and second positions, and to the grouping of encoded routing information items specifying the same first position. Preferably, for each such group, the second positions specified by the encoded routing information items within the group are processed according to a predetermined pattern as described above.

[0151] Further decoding steps for the current routing information item depend on whether the current encoded routing information item contains an instruction that the corner voxel specified by the corresponding routing information item is the same as the corner voxel specified by the routing information item corresponding to the preceding encoded routing information item.

[0152] therefore, Step S1330 In this process, it is determined whether the current encoded route information item contains an instruction that the corner voxel specified by the corresponding route information item is the same as the corner voxel specified by the route information corresponding to the preceding encoded route information item. This instruction may be, for example, a 1-bit flag. This bit flag may be associated with the flag pcpdUsePrevSourceDirection (in the first mode, e.g., selection mode) or the flag pcpdUsePrevData (in the second mode, e.g., full mode).

[0153] Step S1340This refers to the case where the corner voxel specified by the corresponding route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item (for example, pcpdUsePrevSourceDirection=="true" in the first mode, or pcpdUsePrevData=="true" in the second mode). In this case, the corner voxel specified by the route information item corresponding to the preceding encoded route information item is set as the corner voxel for the route information item corresponding to the current encoded route information item. That is, the corner voxel of the preceding decoded route information item is reused as the corner voxel for the current decoded route information item.

[0154] Step S1350 This relates to the case where the corner voxel specified by the corresponding route information item differs from the corner voxel specified by the route information item corresponding to the preceding encoded route information item (for example, pcpdUsePrevSourceDirection=="false" in the first mode and pcpdUsePrevData=="false" in the second mode). In this case, the (absolute, explicit, and / or non-differential) indication of the corner voxel (e.g., pcpdSourceDirectionPacked) is extracted from the current encoded route information item.

[0155] As described above, the Disclosure provides two different modes that can be selected for encoding and decoding acoustic path information: a first mode (e.g., a selection mode) that can be used when, for example, only a relatively small number of path information items should be encoded, and a second mode (e.g., a full mode) that can be used when, for example, a large number of path information items should be coded for, for example, all feasible pairs of first and second positions in an acoustic scene.

[0156] In the first mode, the encoded path information item obtained from the bitstream further includes a path length indicator (e.g., pcpdPathLength). This path length indicator may be an absolute, explicit, and / or non-differential indicator of the path length, for example, in units of voxel or voxel edge length. Therefore, generating a path information item corresponding to the current encoded path information item further includes extracting the path length indicator from the current encoded path information item.

[0157] In the second mode, the encoded path information entries do not necessarily include an indication of the path length, but if the corner voxel remains unchanged, they may instead include an indication of the difference between the path length specified by the current path information entry and the path length specified by the preceding path information entry (e.g., pcpdPathLengthDelta).

[0158] Therefore, if the corner voxel specified by the corresponding route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, the decoding in the second mode (e.g., step S1340) may further include extracting an indication of the difference between the previous route length and the current route length, where the previous route length is the route length specified by the route information item corresponding to the preceding encoded route information item, and the current route length is the route length specified by the route information item corresponding to the current encoded route information item.

[0159] As described above, one or more second locations may be associated with each first location with a location obtained by traversing a voxel grid according to a predetermined pattern for defining the sequence of second locations. It is understood that the encoded path information items are then decoded sequentially according to the sequence of second locations.

[0160] For example, a given pattern may traverse a voxel grid in a raster scan manner along the rows and columns of the voxel grid. In this case, the above difference may be encoded with 2 bits, or in other words, it may take only one of the four predetermined values.

[0161] Next, we will describe an example implementation of the mechanism described above.

[0162] This implementation supports the coding of pre-calculated acoustic path information (e.g., acoustic diffraction path data) in accordance with the above. • Diffraction data is pre-calculated for all voxels in the scene (i.e., for low-complexity rendering scenarios), and the "full" scene data mode corresponds to the second mode above (e.g., full mode), and • Diffraction data is calculated for a subset of the scene's voxels, corresponding to the first mode described above (e.g., selection mode), in a "selected" scene data mode.

[0163] These two modes provide the ability to support different scenarios for applying coding in accordance with this disclosure, for example, the following: • "Full" mode: "Encoder to renderer" (all data is pre-calculated and used by the renderer), and • "Selection" mode: "Encoder / Renderer to Renderer" (for example, some data is pre-calculated, but new data can be added or replaced during the rendering process)

[0164] Therefore, "Selection" mode offers more real-time relevant capabilities (for example, one renderer can act as an encoder for another), while "Full" mode can typically offer a smaller coded data size and / or lower rendering complexity.

[0165] In one particular example, the bitstream syntax (data interface) may be defined as follows:

number

[0166] The encoder (or renderer) can select a pre-calculated acoustic diffraction path data coding mode (signaled by pcpdFullMode or another appropriate flag) according to the resulting compression performance or desired compression performance for the current application scenario or scene. In "Selective" coding mode (e.g., pcpdFullMode=="false"), if an acoustic diffraction path exists (e.g., pcpdPathExists=="true") and previous data is unavailable (e.g., pcpdUsePrevSourceDirection=="false"), the corner voxel coordinates (e.g., packed coordinates) and path length data are explicitly read from the bitstream (similar to RM). On the other hand, if previous data is available (e.g., pcpdUsePrevSourceDirection=="true"), the last transmitted voxel corner coordinate data is used for calculations. In particular, in "Selective" coding mode, the path length must be transmitted for each path (e.g., if pcpdPathExists=="true"). pcpdSourceDirectionPacked=pcpdSourceDirectionPacked Prev

[0167] Alternatively, in “selective” coding mode, if previous data is available, the bitstream may include an indication (e.g., a 1-bit flag) that the path length is explicitly transmitted, or that coding can be done differentially by referencing the previous data (e.g., using a 2-bit value).

[0168] In "full" coding mode (e.g., pcpdFullMode=="true"), if an acoustic diffraction path exists (e.g., pcpdPathExists=="true") and previous data is unavailable (e.g., pcpdUsePrevData=="false"), the corner voxel coordinates (e.g., packed coordinates) and path length data are explicitly read from the bitstream (similar to RM), and if previous data is available (e.g., pcpdUsePrevData=="true"), the values ​​of the last transmitted voxel corner and path length are used for calculation. pcpdSourceDirectionPacked=pcpdSourceDirectionPacked Prev pcpdPathLength=pcpdPathLength Prev +Delta(pcpdPathLengthDelta)

[0169] The difference in path lengths (Delta) is encoded in 2 bits (e.g., by pcpdPathLengthDelta) because the difference in path lengths relative to the previous path length can be either + / -1 or + / -d, where d = sqrt(2) - 1, because the path length is calculated as the sum of horizontal steps plus diagonal steps on a uniform voxel grid.

[0170] Next, we will explain the evaluation results of the data compression performance of the technology disclosed herein.

[0171] Bitrate comparisons for all MPEG-I CfP Test1 scenes (VoxData payload only) are shown in Table 4 and Figure 14. Decoder output remains bit-accurate with respect to RM. Overall bitrate savings for Test1 are ~38% (full) and ~16% (selected). conditions: -RM: Current reference model (v25 bitstream) -PCPD (Full): Pre-calculated route data coding (full mode) -PDPD (Select): Pre-calculated route data coding (Select mode) [Table 4]

[0172] Voxel coordinate / index representation The following efficient voxel index representation may be applicable, for example, to the transmission or storage of both voxel grid and diffraction map (VoxDataDiffractionMap) entries. This may replace any fixed-length representation of the voxel index (voxel coordinates).

[0173] In the context of the proposed expression, the following steps may be taken.

[0174] Step 1: Determine the amount of bits (i.e., number, count) required for the current grid resolution / diffraction map dimension. For a 3D voxel grid and a 2D diffraction map, these numbers NbitsVox and NbitsMap may be determined, for example, as follows: NbitsVox=ceil(log2(L*W*H-1)) NbitsMap = ceil(log2(L*W-1)) Here, L, W, and H (length, width, and height) are the dimensions of the voxel grid and diffraction map, respectively. These values ​​may differ between the voxel grid and the diffraction map. Step 1 may be applied to both the encoder and decoder sides.

[0175] Step 2: The voxel index (x,y,z) and diffraction map index (x,y) are mapped to a packed representation index (Idx) and encoded using Nbits_vox and Nbits_map bits, respectively. In one embodiment, (x,y,z) is zero-based, and the packed representation index may range from 0 to L*W*H-1 for voxels and from 0 to L*W-1 for diffraction maps. The mapping from index (x,y,z) to packed representation index may be, for example, as follows: Idx(x,y,z)=(x-1)+((y-1)*L)+((z-1)*L*W) Step 2 may be performed only on the encoder side.

[0176] In the above, the packed representation index is an index that can uniquely identify a voxel in a voxel grid or diffraction map. In other words, a voxel in a voxel grid is assigned a unique consecutive index, and as a result, each voxel in the voxel grid can be uniquely identified by a single integer. Therefore, the packed representation index may be used to indicate either a voxel location in a voxel grid or a two-dimensional map. In particular, the packed representation index may be used to indicate any voxel location referred to throughout this disclosure.

[0177] The assignment of unique indices to voxels may follow a predetermined pattern. For example, a voxel grid may be scanned / traversed in the x, y, and z directions in this order to sequentially assign unique indices to each voxel.

[0178] The mapping from packed representation indices to voxel and diffraction map indices may be done, for example, as follows: x = floor(Idx) % L + 1 y=floor(Idx / L)%W+1 z = floor(Idx / L / W)%H+1 Here, % represents the modulo operator.

[0179] Device While methods and processing chains have been described above, it should be understood that this disclosure also relates to apparatus (e.g., computer apparatus or apparatus with general processing capabilities) for carrying out these methods and processing chains (or generally, the technology).

[0180] An example of such a device 1500 is schematically shown in Figure 15. The device 1500 includes a processor 1501 and a memory 1502 coupled to the processor 1501. The memory 1502 may store instructions to be executed by the processor 1501. The processor 1501 may be adapted to perform the processing chains described through this disclosure and / or the methods described through this disclosure (e.g., a method for processing audio scene information for audio rendering). The device 1500 may receive inputs (e.g., audio scene descriptions, listener positions, etc.) and generate outputs (e.g., representations of diffraction information, acoustic path information, etc.).

[0181] interpretation The systems described herein may be implemented in a suitable computer-based sound processing network environment (e.g., a server or cloud environment) for processing digital or digitized audio files. Parts of these systems may include one or more networks containing any desired number of individual machines, including one or more routers (not shown) that buffer and route data transmitted between computers. Such networks may be built on various different network protocols and may be the Internet, a Wide Area Network (WAN), a Local Area Network (LAN), or any combination thereof.

[0182] One or more of the components, blocks, processes, or other functional components may be implemented through a computer program that controls the execution of the system's processor-based computing device. It should also be noted that the various functions disclosed herein may be described using any number of combinations of hardware and firmware, and / or on various machine-readable or computer-readable media, as data and / or instructions embodied with respect to their operations, register transfers, logical components, and / or other characteristics. Computer-readable media on which such formatted data and / or instructions may be embodied include, but are not limited to, various forms of physical (non-temporary) non-volatile storage media such as optical storage media, magnetic storage media, or semiconductor storage media.

[0183] In particular, embodiments may include hardware, software, and electronic components or modules, which should be understood to be illustrated and described as if, for the purposes of discussion, the majority of the components were implemented solely in hardware. However, those skilled in the art will recognize, based on reading this detailed description, that in at least one embodiment, the electronic-based aspects may be implemented in software (e.g., stored in a non-temporary computer-readable medium) executable by one or more electronic processors, such as microprocessors and / or application-specific integrated circuits ("ASICs"). Therefore, it should be noted that multiple hardware and software-based devices and multiple different structural components may be used to implement the embodiments. For example, the computer-implemented neural network described herein may include one or more electronic processors, one or more computer-readable medium modules, one or more input / output interfaces, and various connections (e.g., system buses) connecting the various components.

[0184] While one or more implementations are described as examples with respect to specific embodiments, it should be understood that one or more implementations are not limited to the disclosed embodiments. Rather, as will be apparent to those skilled in the art, it is intended to cover a variety of modifications and similar configurations. Therefore, the appended claims should be given the broadest possible interpretation to encompass all such modifications and similar configurations.

[0185] Furthermore, it should be understood that the expressions and terms used herein are for illustrative purposes only and should not be considered limiting. The use of “including,” “comprising,” or “having” and their variations means encompassing the items and their equivalents listed thereafter, as well as any further items. Unless otherwise specified or limited, the terms “attached,” “connected,” “supported,” and “joined” and their variations are used broadly and encompass both direct and indirect attachment, connection, support, and joining.

[0186] Exemplary Implementation condition Various aspects and implementations of the present invention can also be understood from the following exemplary embodiments (EEE) which are not within the scope of the claims.

[0187] EEE1. A method for processing audio scene information, Steps to obtain a voxel-based audio scene representation of the audio scene, The steps include: sequentially encoding path information items for one or more first positions and one or more second positions in a two-dimensional voxel grid related to the voxel-based audio scene representation, wherein each path information item specifies the first position, the second position, the path length of the acoustic path between the first position and the second position in the voxel grid, and the corner voxel on the acoustic path where the acoustic path changes direction; The steps include generating an encoded route information item based on the route information item for the current route information item, and Includes, The encoded route information item includes an indication of each first location and an indication of each second location. If the corner voxel specified by the current route information item differs from the corner voxel specified by the preceding route information item, the encoded route information item includes an indication of the corner voxel. A method wherein, if the corner voxel specified by the current route information item is the same as the corner voxel specified by the preceding route information item, the encoded route information item includes, instead of the indication of the corner voxel, an indication that the corner voxel specified by the current route information item is the same as the corner voxel specified by the preceding route information item.

[0188] EEE2. The method according to EEE1, wherein the encoded route information item further includes an indication of the route length.

[0189] EEE3. The method according to EEE1, wherein if the corner voxel specified by the current route information item is the same as the corner voxel specified by the preceding route information item, the encoded route information item further includes an indication of the difference between the route length specified by the current route information item and the route length specified by the preceding route information item.

[0190] EEE4. If the corner voxel specified by the current route information item is the same as the corner voxel specified by the preceding route information item, the encoded route information item is: An indication of whether the encoded route information item includes an indication of the route length, or an indication of the difference between the route length specified by the current route information item and the route length specified by the preceding route information item, the indication of the path length, or the indication of the difference between the path length specified by the item of the current path information and the path length specified by the item of the preceding path information The method according to EEE1, including

[0191] EEE5. For each first position, traversing the voxel grid according to a predetermined pattern to determine a sequence of second positions; For each first position, continuously encoding items of path information for the determined sequence of second positions The method according to any one of EEE1 to EEE4, further including

[0192] EEE6. The method according to EEE5, wherein the predetermined pattern traverses the voxel grid in a raster scan manner along the rows and columns of the voxel grid.

[0193] EEE7. The difference is encoded in 2 bits, or The difference takes one of four predetermined values, the method according to EEE5 or EEE6 when dependent on claim 3.

[0194] EEE8. Each encoded item of path information includes an indication of whether the first mode or the second mode is used, an indication of the first position, an indication of the second position, and an indication of whether there is an acoustic path for the first position and the second position and when the first mode is used, each encoded item of path information includes an indication of whether the corner voxel specified by the item of path information corresponding to the encoded item of path information is the same as the corner voxel specified by the item of the preceding path information If the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item, the route length instruction and Otherwise, the instructions for the corner voxel and the instructions for the path length and It further includes, When the second mode described above is used, each encoded routing information item is: The instruction whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item, If the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item, then the instruction is the difference between the previous route length and the current route length, where the previous route length is the route length specified by the preceding route information item, and the current route length is the route length specified by the route information item corresponding to the encoded route information item. Otherwise, the instructions for the corner voxel and the instructions for the path length and The method described in any one of EEE1 to EEE7, further including the method described in any one of EEE1 to EEE7.

[0195] EEE9. Each encoded routing information item is: An instruction to use either the first mode or the second mode, The above-mentioned first position indication, The second position indication, An indication of whether or not an acoustic path exists for the first position and the second position. Includes, When the first mode is used, each encoded routing information item is: The instruction whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item, If the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item, then an instruction whether the encoded route information item includes the instruction for the route length or the instruction for the difference between the previous route length and the current route length, along with an instruction for the route length or the instruction for the difference between the previous route length and the current route length, wherein the previous route length is the route length specified by the preceding route information item, and the current route length is the route length specified by the route information item corresponding to the encoded route information item, Otherwise, the instructions for the corner voxel and the instructions for the path length and It further includes, When the second mode described above is used, each encoded routing information item is: The instruction whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item, If the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item, then an indication of the difference between the previous route length and the current route length, Otherwise, the instructions for the corner voxel and the instructions for the path length and The method described in any one of EEE1 to EEE7, further including the method described in any one of EEE1 to EEE7.

[0196] EEE10. The method according to any one of EEE1 to EEE9, further comprising the step of outputting the encoded route information items to a bitstream.

[0197] EEE11. A method for processing audio scene information, Steps include receiving a bitstream containing a sequence of encoded path information items for one or more first positions and one or more second positions in a two-dimensional voxel grid related to a voxel-based audio scene representation, wherein each encoded path information item corresponds to a path information item specifying the first position, the second position, the path length of the acoustic path between the first and second positions in the voxel grid, and the corner voxel on the acoustic path where the acoustic path changes direction; The steps include sequentially decoding encoded route information items to generate corresponding route information items, and Includes, For the currently encoded route information item, the step of generating the corresponding route information item is: The step of determining whether the current encoded route information item includes an instruction that the corner voxel specified by the corresponding route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, If the corner voxel specified by the corresponding route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, the step of setting the corner voxel specified by the route information item corresponding to the preceding encoded route information item as the corner voxel for the route information item corresponding to the current encoded route information item, If the corner voxel specified by the corresponding route information item is different from the corner voxel specified by the route information item corresponding to the preceding encoded route information item, the steps include extracting the instruction for the corner voxel from the current encoded route information item. Methods that include...

[0198] EEE12. The step of generating the corresponding path information item further includes: extracting an indication of the path length from the current encoded path information item, the method according to EEE11.

[0199] EEE13. When the corner voxel specified by the corresponding path information item is the same as the corner voxel specified by the path information item corresponding to the previous encoded path information item, the step of generating the corresponding path information item: is extracting an indication of the difference between the previous path length and the current path length, where the previous path length is the path length specified by the path information item corresponding to the previous encoded path information item, and the current path length is the path length specified by the path information item corresponding to the current encoded path information item, the method according to EEE11 further includes this step.

[0200] EEE14. When the corner voxel specified by the corresponding path information item is the same as the corner voxel specified by the path information item corresponding to the previous encoded path information item, the step of generating the corresponding path information item: is extracting an indication of whether the encoded path information item includes an indication of the path length or an indication of the difference between the previous path length and the current path length, where the previous path length is the path length specified by the path information item corresponding to the previous encoded path information item, and the current path length is the path length specified by the path information item corresponding to the current encoded path information item, and extracting the indication of the path length or the indication of the difference between the previous path length and the current path length and further includes this step, the method according to EEE11.

[0201] EEE15. The method according to any one of EEE11 to EEE14, wherein for each first position, the one or more second positions relate to positions obtained by traversing the voxel grid according to a predetermined pattern for defining a sequence of second positions, and for each first position, the encoded path information items are decoded sequentially according to the sequence of second positions.

[0202] EEE16. The method according to EEE15, wherein the predetermined pattern is obtained by traversing the voxel grid in a raster scan manner along the rows and columns of the voxel grid.

[0203] EEE17. The above difference is encoded in 2 bits, or The method according to claim 13, wherein the difference takes one of four predetermined values, as described in EEE15 or EEE16.

[0204] EEE18. Each encoded routing information item is: An instruction to use either the first mode or the second mode, The above-mentioned first position indication, The second position indication, An indication of whether or not an acoustic path exists for the first position and the second position. Includes, When the first mode is used, each encoded routing information item is: The instruction whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, If the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, then the route length instruction and Otherwise, the instructions for the corner voxel and the instructions for the path length and It further includes, When the second mode described above is used, each encoded routing information item is: The instruction whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, If the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, then this is an indication of the difference between the previous route length and the current route length, wherein the previous route length is the route length specified by the route information item corresponding to the preceding encoded route information item, and the current route length is the route length specified by the route information item corresponding to the encoded route information item, and Otherwise, the instructions for the corner voxel and the instructions for the path length and The method described in any one of EEE11 to EEE17, further including the method described in any one of EEE11 to EEE17.

[0205] EEE19. Each encoded routing information item is: An instruction to use either the first mode or the second mode, The above-mentioned first position indication, The second position indication, An indication of whether or not an acoustic path exists for the first position and the second position. Includes, When the first mode is used, each encoded routing information item is: The instruction whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, If the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, then an instruction whether the encoded route information item includes the instruction for the route length or the instruction for the difference between the previous route length and the current route length, along with the instruction for the instruction for the instruction for the route length or the instruction for the difference between the previous route length and the current route length, wherein the previous route length is the route length specified by the route information item corresponding to the preceding encoded route information item, and the current route length is the route length specified by the route information item corresponding to the encoded route information item, Otherwise, the instructions for the corner voxel and the instructions for the path length and It further includes, When the second mode described above is used, each encoded routing information item is: The instruction whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, If the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, then the indication of the difference between the previous route length and the current route length, Otherwise, the instructions for the corner voxel and the instructions for the path length and The method described in any one of EEE11 to EEE17, further including the method described in any one of EEE11 to EEE17.

[0206] EEE20. A device comprising a processor and a memory coupled to the processor for storing instructions for the processor, The processor is a device adapted to perform the method described in any one of EEE1 to EEE19.

[0207] EEE21. A program that, when executed by a processor, includes instructions that cause the processor to perform the method described in any one of EEE1 to EEE19.

[0208] A computer-readable storage medium containing the programs described in EEE22 and EEE21.

Claims

1. A method for processing audio scene information, Steps to obtain a voxel-based audio scene representation of the audio scene, The steps include: sequentially encoding path information items for one or more first positions and one or more second positions in a two-dimensional voxel grid related to the voxel-based audio scene representation, wherein each path information item specifies the first position, the second position, the path length of the acoustic path between the first position and the second position in the voxel grid, and the corner voxel on the acoustic path where the acoustic path changes direction; The steps include generating an encoded route information item based on the route information item for the current route information item, and Includes, The encoded route information item includes an indication of each first location and an indication of each second location. If the corner voxel specified by the current route information item differs from the corner voxel specified by the preceding route information item, the encoded route information item includes an indication of the corner voxel. A method wherein, if the corner voxel specified by the current route information item is the same as the corner voxel specified by the preceding route information item, the encoded route information item includes, instead of the indication of the corner voxel, an indication that the corner voxel specified by the current route information item is the same as the corner voxel specified by the preceding route information item.

2. The method according to claim 1, wherein the encoded route information item further includes an indication of the route length.

3. The method according to claim 1, wherein if the corner voxel specified by the current route information item is the same as the corner voxel specified by the preceding route information item, the encoded route information item further includes an indication of the difference between the route length specified by the current route information item and the route length specified by the preceding route information item.

4. If the corner voxel specified by the current route information item is the same as the corner voxel specified by the preceding route information item, the encoded route information item is: An indication of whether the encoded route information item includes an indication of the route length, or an indication of the difference between the route length specified by the current route information item and the route length specified by the preceding route information item, The instruction for the route length, or the instruction for the difference between the route length specified by the item of the current route information and the route length specified by the item of the preceding route information, and The method according to claim 1, including the method described in claim 1.

5. For each first position, the steps include traversing the voxel grid according to a predetermined pattern to determine the sequence of second positions, For each first position, the steps include sequentially encoding items of path information for the sequence of the determined second positions. The method according to claim 1, further comprising:

6. The method according to claim 5, wherein the predetermined pattern traverses the voxel grid in a raster scan manner along the rows and columns of the voxel grid.

7. The aforementioned difference is encoded in 2 bits, or, The method according to claim 3, wherein the difference takes one of four predetermined values.

8. Each encoded routing information item is: An instruction to use either the first mode or the second mode, The above-mentioned first position indication, The second position indication, An indication of whether or not an acoustic path exists for the first position and the second position. Includes, When the first mode is used, each encoded routing information item is: The instruction whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item, If the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item, the route length instruction and Otherwise, the instructions for the corner voxel and the instructions for the path length and It further includes, When the second mode described above is used, each encoded routing information item is: The instruction whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item, If the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item, then the instruction is the difference between the previous route length and the current route length, where the previous route length is the route length specified by the preceding route information item, and the current route length is the route length specified by the route information item corresponding to the encoded route information item. Otherwise, the instructions for the corner voxel and the instructions for the path length and The method according to claim 1, further comprising:

9. Each encoded routing information item is: An instruction to use either the first mode or the second mode, The above-mentioned first position indication, The second position indication, An indication of whether or not an acoustic path exists for the first position and the second position. Includes, When the first mode is used, each encoded routing information item is: The instruction whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item, If the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item, then an instruction whether the encoded route information item includes the instruction for the route length or the instruction for the difference between the previous route length and the current route length, along with an instruction for the route length or the instruction for the difference between the previous route length and the current route length, wherein the previous route length is the route length specified by the preceding route information item, and the current route length is the route length specified by the route information item corresponding to the encoded route information item, Otherwise, the instructions for the corner voxel and the instructions for the path length and It further includes, When the second mode described above is used, each encoded routing information item is: The instruction whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item, If the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the preceding route information item, then an indication of the difference between the previous route length and the current route length, Otherwise, the instructions for the corner voxel and the instructions for the path length and The method according to claim 1, further comprising:

10. The method according to claim 1, further comprising the step of outputting the encoded route information items to a bitstream.

11. A method for processing audio scene information, Steps include receiving a bitstream containing a sequence of encoded path information items for one or more first positions and one or more second positions in a two-dimensional voxel grid related to a voxel-based audio scene representation, wherein each encoded path information item corresponds to a path information item specifying the first position, the second position, the path length of the acoustic path between the first and second positions in the voxel grid, and the corner voxel on the acoustic path where the acoustic path changes direction; The steps include sequentially decoding encoded route information items to generate corresponding route information items, and Includes, For the currently encoded route information item, the step of generating the corresponding route information item is: The step of determining whether the current encoded route information item includes an instruction that the corner voxel specified by the corresponding route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, If the corner voxel specified by the corresponding route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, the step of setting the corner voxel specified by the route information item corresponding to the preceding encoded route information item as the corner voxel for the route information item corresponding to the current encoded route information item, If the corner voxel specified by the corresponding route information item is different from the corner voxel specified by the route information item corresponding to the preceding encoded route information item, the steps include extracting the instruction for the corner voxel from the current encoded route information item. Methods that include...

12. The step of generating the corresponding route information item is: The method according to claim 11, further comprising the step of extracting an indication of the route length from the currently encoded route information item.

13. If the corner voxel specified by the corresponding route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, the step of generating the corresponding route information item is: The method according to claim 11, further comprising the step of extracting an indication of the difference between a previous route length and a current route length, wherein the previous route length is a route length specified by a route information item corresponding to the preceding encoded route information item, and the current route length is a route length specified by a route information item corresponding to the current encoded route information item.

14. If the corner voxel specified by the corresponding route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, the step of generating the corresponding route information item is: Steps include extracting an indication whether the encoded route information item includes an indication of the route length, or an indication of the difference between the previous route length and the current route length, wherein the previous route length is the route length specified by the route information item corresponding to the preceding encoded route information item, and the current route length is the route length specified by the route information item corresponding to the current encoded route information item, A step of extracting the instruction for the aforementioned path length, or the instruction for the difference between the aforementioned previous path length and the aforementioned current path length. The method according to claim 11, further comprising:

15. The method according to claim 11, wherein the one or more second positions relate to positions obtained by traversing the voxel grid according to a predetermined pattern for defining a sequence of second positions for each first position, and for each first position, the encoded path information items are decoded sequentially according to the sequence of second positions.

16. The method according to claim 15, wherein the predetermined pattern traverses the voxel grid in a raster scan manner along the rows and columns of the voxel grid.

17. The aforementioned difference is encoded in 2 bits, or, The method according to claim 13, wherein the difference is one of four predetermined values.

18. Each encoded routing information item is: An instruction to use either the first mode or the second mode, The above-mentioned first position indication, The second position indication, An indication of whether or not an acoustic path exists for the first position and the second position. Includes, When the first mode is used, each encoded routing information item is: The instruction whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, If the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, then the route length instruction and Otherwise, the instructions for the corner voxel and the instructions for the path length and It further includes, When the second mode described above is used, each encoded routing information item is: The instruction whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, If the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, then this is an indication of the difference between the previous route length and the current route length, wherein the previous route length is the route length specified by the route information item corresponding to the preceding encoded route information item, and the current route length is the route length specified by the route information item corresponding to the encoded route information item, and Otherwise, the instructions for the corner voxel and the instructions for the path length and The method according to claim 11, further comprising:

19. Each encoded routing information item is: An instruction to use either the first mode or the second mode, The above-mentioned first position indication, The second position indication, An indication of whether or not an acoustic path exists for the first position and the second position. Includes, When the first mode is used, each encoded routing information item is: The instruction whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, If the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, then an instruction whether the encoded route information item includes the instruction for the route length or the instruction for the difference between the previous route length and the current route length, along with the instruction for the instruction for the instruction for the route length or the instruction for the difference between the previous route length and the current route length, wherein the previous route length is the route length specified by the route information item corresponding to the preceding encoded route information item, and the current route length is the route length specified by the route information item corresponding to the encoded route information item, Otherwise, the instructions for the corner voxel and the instructions for the path length and It further includes, When the second mode described above is used, each encoded routing information item is: The instruction whether the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, If the corner voxel specified by the route information item corresponding to the encoded route information item is the same as the corner voxel specified by the route information item corresponding to the preceding encoded route information item, then the indication of the difference between the previous route length and the current route length, Otherwise, the instructions for the corner voxel and the instructions for the path length and The method according to claim 11, further comprising:

20. A device comprising a processor and a memory coupled to the processor for storing instructions for the processor, The processor is adapted to perform the method according to any one of claims 1 to 19.

21. A program that, when executed by a processor, includes instructions causing the processor to perform the method described in any one of claims 1 to 19.

22. A computer-readable storage medium storing the program described in claim 21.