Reverberation of multiple higher order ambisonic signals

The apparatus efficiently selects the optimal spatial audio signal for reverberation based on prominence and proximity, addressing computational inefficiencies and artifacts in existing methods, achieving improved reverberation quality.

WO2026130990A1PCT designated stage Publication Date: 2026-06-25NOKIA TECHNOLOGIES OY

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
NOKIA TECHNOLOGIES OY
Filing Date
2025-11-25
Publication Date
2026-06-25

Smart Images

  • Figure EP2025084132_25062026_PF_FP_ABST
    Figure EP2025084132_25062026_PF_FP_ABST
Patent Text Reader

Abstract

There is provided an apparatus, method, and computer program for: obtaining scene information, the scene information comprising: a spatial audio signal configuration specification; and audio scene information, the audio scene information defining at least one audio scene; determining at least one audio source within the at least one audio scene based on the audio scene information; identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification; obtaining at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source; and rendering at least one reverberant audio signal at least based on an application of reverberation on the obtained at least one signal component.
Need to check novelty before this filing date? Find Prior Art

Description

REVERBERATION OF MULTIPLE HIGHER ORDER AMBISONIC SIGNALSFIELD

[0001] The present application relates to a method, apparatus, system and computer program for reverberation of multiple Ambisonic signals and in particular but not exclusively to method, apparatus, system and computer program for reverberation of multiple higher order Ambisonic signals.BACKGROUND

[0002] Reverberation refers to the persistence of sound in a space after an actual sound source has stopped. Different spaces are characterized by different reverberation characteristics. For conveying spatial impression of an environment, reproducing reverberation perceptually accurately is important. Room acoustics are often modelled with an individually synthesized early reflection portion and a statistical model for the diffuse late reverberation. Fig. 1 depicts an example of a synthesized room impulse response where the direct sound 101 is followed by discrete early reflections 103 (or reflection echoes) which have a direction of arrival (DOA) and diffuse late reverberation 107 which can be synthesized without any specific direction of arrival. The delay dl(t) 102 in Fig. 1 can be seen to denote the direct sound arrival delay from the source to the listener. Furthermore the delay d2(t) 104 can denote the delay from the source to the listener for one of the early reflections (in this case the first arriving reflection). Additionally the delay d3(t) 106 can denote the delay from the source the onset of the diffuse late reverberation.

[0003] One method of reproducing reverberation is to utilize a set of D loudspeakers (or virtual loudspeakers reproduced binaurally using a set of head-related transfer functions (HRTFs)). The loudspeakers are positioned around the listener somewhat evenly. Mutually incoherent reverberant signals are reproduced from these loudspeakers, producing a perception of surrounding diffuse reverberation.

[0004] The reverberation produced by the different loudspeakers has to be mutually incoherent. In a simple case the reverberations can be produced using the different channels of the same reverberator, where the output channels are uncorrelated but otherwise share the same acoustic characteristics such as reverberation time and level (specifically, the diffuse-to-direct ratio or reverberant-to-direct ratio or diffuse-to- total ratio or diffuse-to-source ratio or any other suitable parameter for representing reverberation energy or level). Such uncorrelated outputs sharing the same acoustic characteristics can be obtained, for example, from the output taps of a feedback delay network (FDN) reverberator with suitable tuning of the delay line lengths and mixing matrix, or from a reverberator based on using decaying uncorrelated noise sequences byusing a different uncorrelated noise sequence in each channel. In this case, the different reverberant signals effectively have the same features, and the reverberation is typically perceived to be similar in all directions.SUMMARY

[0005] According to a first aspect, there is provided an apparatus for applying reverberation to at least one audio signal, the apparatus comprising at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the system at least to perform: obtaining scene information, the scene information comprising: a spatial audio signal configuration specification; and audio scene information, the audio scene information defining at least one audio scene; determining at least one audio source within the at least one audio scene based on the audio scene information; identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification; obtaining at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source; and rendering at least one reverberant audio signal at least based on an application of reverberation on the obtained at least one signal component.

[0006] The apparatus caused to perform obtaining the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source may be caused to perform beamforming to the at least one spatial audio signal.

[0007] The apparatus caused to perform obtaining of the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source may be caused to perform receiving the at least one audio signal component from a further apparatus.

[0008] The apparatus caused to perform obtaining the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source may be caused to perform: determining whether the at least one audio signal component is available at the apparatus; selecting the at least one audio signal component available at the apparatus; and receiving or obtaining the at least one component from a further apparatus when the at least one audio signal component is not available at the apparatus.

[0009] The at least one audio signal component may comprise at least one of: at least one multichannel audio signals; at least one audio channel; at least one audio source; and at least one audio beam.

[0010] The apparatus caused to perform identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification may be further caused to perform: determining at least one dominant audiosource from the at least one audio source; and identifying the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source.

[0011] The apparatus caused to perform determining at least one dominant audio source may be caused to perform determining a direction of the at least one dominant audio source relative to a position for the at least one spatial audio signal.

[0012] The apparatus caused to perform identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification may be further caused to perform identifying the at least one audio signal based on the direction of the at least one dominant audio source.

[0013] The apparatus caused to perform identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification may further caused to perform: determining an availability of at least two audio signal components; and identifying the at least one audio signal component based on the availability of the at least two audio signal components.

[0014] The apparatus caused to perform identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification may be further caused to perform: determining a distance between the at least one spatial audio signal and a listener position; and identifying the at least one spatial audio signal based on the determined distance.

[0015] The apparatus caused to perform identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification may be further caused to perform: obtaining metadata indicating at least one spatial audio signal; and identifying the at least one spatial audio signal based on the metadata.

[0016] The apparatus caused to perform obtaining at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source may be further caused to perform switching from a first audio signal component to a further audio signal component.

[0017] The apparatus caused to perform switching from the first audio signal component to the further audio signal component may be caused to perform transitioning from the first audio signal component to the further audio signal component.

[0018] The apparatus caused to perform transitioning from the first audio signal component to the further audio signal component may be caused to perform interpolating between the first audio signal component and the further audio signal component.

[0019] According to a second aspect, there is provided an apparatus for applying reverberation to at least one audio signal, the apparatus comprising means configured to: obtain scene information, the sceneinformation comprising: a spatial audio signal configuration specification; and audio scene information, the audio scene information defining at least one audio scene; determine at least one audio source within the at least one audio scene based on the audio scene information; identify at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification; obtain at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source; and render at least one reverberant audio signal at least based on an application of reverberation on the obtained at least one signal component.

[0020] The means configured to obtain the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source may be configured to beamform to the at least one spatial audio signal.

[0021] The means configured to obtain of the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source may be configured to receive the at least one audio signal component from a further apparatus.

[0022] The means configured to perform obtaining the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source may be configured to: determine whether the at least one audio signal component is available at the apparatus; select the at least one audio signal component available at the apparatus; and receive or obtain the at least one component from a further apparatus when the at least one audio signal component is not available at the apparatus.

[0023] The at least one audio signal component may comprise at least one of: at least one multichannel audio signals; at least one audio channel; at least one audio source; and at least one audio beam.

[0024] The means configured to identify at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification may be further configured to: determine at least one dominant audio source from the at least one audio source; and identify the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source.

[0025] The means configured to determine at least one dominant audio source may be configured to determine a direction of the at least one dominant audio source relative to a position for the at least one spatial audio signal.

[0026] The means configured to identify at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification may be further configured to identify the at least one audio signal based on the direction of the at least one dominant audio source.

[0027] The means configured to identify at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification may further be configured to: determine an availability of at least two audio signal components; and identify the at least one audio signal component based on the availability of the at least two audio signal components.

[0028] The means configured to identify at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification may be further configured to: determine a distance between the at least one spatial audio signal and a listener position; and identify the at least one spatial audio signal based on the determined distance.

[0029] The means configured to identify at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification may be further configured to: obtain metadata indicating at least one spatial audio signal; and identify the at least one spatial audio signal based on the metadata.

[0030] The apparatus configured to obtain at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source may be further configured to switch from a first audio signal component to a further audio signal component.

[0031] The apparatus configured to switch from the first audio signal component to the further audio signal component may be configured to transition from the first audio signal component to the further audio signal component.

[0032] The apparatus configured to transition from the first audio signal component to the further audio signal component may be configured to interpolate between the first audio signal component and the further audio signal component.

[0033] According to a third aspect, there is provided a method for an apparatus for applying reverberation to at least one audio signal, the method comprising: obtaining scene information, the scene information comprising: a spatial audio signal configuration specification; and audio scene information, the audio scene information defining at least one audio scene; determining at least one audio source within the at least one audio scene based on the audio scene information; identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification; obtaining at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source; and rendering at least one reverberant audio signal at least based on an application of reverberation on the obtained at least one signal component.

[0034] Obtaining the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source may comprise beamforming to the at least one spatial audio signal.

[0035] Obtaining of the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source may comprise receiving the at least one audio signal component from a further apparatus.

[0036] Obtaining the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source may comprise: determining whether the at least one audio signal component is available at the apparatus; selecting the at least one audio signal component available at the apparatus; and receiving or obtaining the at least one component from a further apparatus when the at least one audio signal component is not available at the apparatus.

[0037] The at least one audio signal component may comprise at least one of: at least one multichannel audio signals; at least one audio channel; at least one audio source; and at least one audio beam.

[0038] Identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification may comprise: determining at least one dominant audio source from the at least one audio source; and identifying the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source.

[0039] Determining at least one dominant audio source may comprise determining a direction of the at least one dominant audio source relative to a position for the at least one spatial audio signal.

[0040] Identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification may further comprise identifying the at least one audio signal based on the direction of the at least one dominant audio source.

[0041] Identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification may further comprise: determining an availability of at least two audio signal components; and identifying the at least one audio signal component based on the availability of the at least two audio signal components.

[0042] Identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification may further comprise: determining a distance between the at least one spatial audio signal and a listener position; and identifying the at least one spatial audio signal based on the determined distance.

[0043] Identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specificationmay further comprise: obtaining metadata indicating at least one spatial audio signal; and identifying the at least one spatial audio signal based on the metadata.

[0044] Obtaining at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source may further comprise switching from a first audio signal component to a further audio signal component.

[0045] Switching from the first audio signal component to the further audio signal component may comprise transitioning from the first audio signal component to the further audio signal component.

[0046] Transitioning from the first audio signal component to the further audio signal component may comprise interpolating between the first audio signal component and the further audio signal component.

[0047] According to a fourth aspect, there is provided a computer readable medium comprising instructions which, when executed by an apparatus, cause the apparatus for applying reverberation to at least one audio signal to perform at least the following: obtaining scene information, the scene information comprising: a spatial audio signal configuration specification; and audio scene information, the audio scene information defining at least one audio scene; determining at least one audio source within the at least one audio scene based on the audio scene information; identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification; obtaining at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source; and rendering at least one reverberant audio signal at least based on an application of reverberation on the obtained at least one signal component.

[0048] According to a fifth aspect, there is an apparatus for applying reverberation to at least one audio signal, the apparatus comprising: circuitry configured to obtain scene information, the scene information comprising: a spatial audio signal configuration specification; and audio scene information, the audio scene information defining at least one audio scene; circuitry configured to determine at least one audio source within the at least one audio scene based on the audio scene information; circuitry configured to identify at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification; circuitry configured to obtain at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source; and circuitry configured to render at least one reverberant audio signal at least based on an application of reverberation on the obtained at least one signal component.

[0049] According to a sixth aspect, there is an apparatus for applying reverberation to at least one audio signal, the apparatus comprising: means for obtaining scene information, the scene information comprising: a spatial audio signal configuration specification; and audio scene information, the audio scene information defining at least one audio scene; means for determining at least one audio source within the at least one audio scene based on the audio scene information; means for identifying at least one spatial audio signalbased at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification; means for obtaining at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source; and means for rendering at least one reverberant audio signal at least based on an application of reverberation on the obtained at least one signal component.

[0050] According to a seventh aspect, there is provided a non-transitory computer readable medium comprising program instructions that, when executed by an apparatus, cause the apparatus to perform at least the method according to any of the preceding aspects.

[0051] According to an eighth aspect there is provided a computer program comprising instructions [or a computer readable medium comprising instructions] for causing an apparatus, for defining a file format carriage for applying reverberation to at least one audio signal to perform at least the following: obtaining scene information, the scene information comprising: a spatial audio signal configuration specification; and audio scene information, the audio scene information defining at least one audio scene; determining at least one audio source within the at least one audio scene based on the audio scene information; identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification; obtaining at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source; and rendering at least one reverberant audio signal at least based on an application of reverberation on the obtained at least one signal component.

[0052] In the above, many different embodiments have been described. It should be appreciated that further embodiments may be provided by the combination of any two or more of the embodiments described above.DESCRIPTION OF FIGURES

[0053] Embodiments will now be described, by way of example only, with reference to the accompanying Figures in which:

[0054] Fig.1 shows a model of room acoustics with regard to the room impulse response;

[0055] Fig.2 shows schematically a reverberator which includes an example feedback delay network (FDN) according to some embodiments;

[0056] Fig.3 shows schematically an example apparatus within which the reverberator which includes an example feedback delay network (FDN) as shown in Fig.2 according to some embodiments;

[0057] Figs.4a and 4b show a flow diagram of the operation of the example apparatus as shown in Fig.3 with respect to reverberant audio signal rendering;

[0058] Fig.5 shows schematically a further example apparatus within which the reverberator which includes an example feedback delay network (FDN) as shown in Fig.2 according to some embodiments;

[0059] Fig.6 shows schematically an example reverberator parameter determiner as shown in Fig.3 and 5 according to some embodiments;

[0060] Fig.7 shows an example audio scene and reflective surfaces to demonstrate example source and image source positions;

[0061] Fig.8 shows an example source distribution and echoes associated with the sources;

[0062] Figs.9a and 9b show an example system within which some embodiments can be implemented; and

[0063] Fig.10 shows an example device suitable for implementing the apparatus shown in previous figures.DETAILED DESCRIPTION

[0064] The following relates to apparatus, methods and computer programs for establishing rendering of reverberation for audio scenes represented with several spatial audio signals, where the spatial audio signal used for rendering the reverberation of a source can be automatically selected to enable improved quality of reverberation rendering.

[0065] In a virtual acoustics rendering system, reverberation is typically rendered as a combination of a certain number of distinct early reflections (or reflection echoes) and a stochastic model for the late reverberation. The early reflection synthesis is thus typically position-dependent in that it varies with source and listener positions, while the late reverberation synthesis is not. Together these two can create a plausible reverberation rendering for a physical or virtual space. The reverberation rendering is combined (summed) with direct sound rendering, which involves distance gain attenuation, air absorption filtering, and directional reproduction (binaural or loudspeaker) of the direct sound component that directly propagates to the ears of the listener without reflecting or reverberating in the space.

[0066] As discussed above one method of reproducing reverberation is to utilize a set of N loudspeakers (or virtual loudspeakers reproduced binaurally using a set of head-related transfer functions (HRTF)). The loudspeakers are positioned around the listener somewhat evenly. Mutually incoherent reverberant signals are reproduced from these loudspeakers, producing a perception of surrounding diffuse reverberation.

[0067] The positioning and the number of the loudspeakers suitable for producing the diffuse perception has been studied, e.g., in K. Hiyama, S. Komiyama, and K. Hamasaki, The Minimum Number of Loudspeakers and Its Arrangementfor Reproducing the Spatial Impression of Diffuse Sound Field, AES 113thConvention, 2002 and C. Kirch, J Poppitz, T. Wendt, S. van der Par, and S. Ewert, Spatial Resolution of Late Reverberation in Virtual Acoustic Environments. Submitted to Trends in Hearing (currently available in the Carl von Ossietzky Universitat Oldenburg website), 2021 . It has been found that somewhere around 6 - 12 loudspeakers are required, depending on the positioning of the loudspeakers.

[0068] The reverberation produced by the different loudspeakers is designed to be mutually incoherent, which can be achieved through optimization by the algorithm designer. In a simple case, e.g., the reverberations can be produced using the different channels of the same reverberator, where the output channels are uncorrelated but otherwise share the same acoustic characteristics such as RT60 time and level (specifically, the diffuse-to-direct ratio or reverberant-to-direct ratio or source-to-diffuse energy ratio). Such uncorrelated outputs sharing the same acoustic characteristics can be obtained, for example, from the output taps of a feedback delay network (FDN) reverberator with suitable tuning of the delay line lengths, or from a reverberator based on using decaying uncorrelated noise sequences by using a different uncorrelated noise sequence in each channel. In this case, the different reverberant signals basically have the same features, and the reverberation is typically perceived to be similar from all directions.

[0069] It is known a common approach when reverberating multi-channel signals to reverberate each channel of a multichannel audio representation, such as first order ambisonics (FOA), higher order ambisonics (HOA) or surround (7.1.4, 5.1 or the like) by reverberating each channel individually.

[0070] However, this technique generates a high computational load, with associated power consumption / battery drain / processor cost as the number of reverberators that need to be run in parallel is proportional to the number of channels. For example with HOA the number of channels can be high (for example 16 for 3rdorder HOA, 25 for 4thorder HOA).

[0071] Another known approach and one which is currently used in the MPEG-I audio Draft international standard (ISO / IEC 23090-4 DIS) where a downmix processing is applied to the multichannel audio signals and then the downmixed audio signal is reverberated with a reverberator. This enables efficient reverberation rendering but can lead to audible artefacts as the downmixing may create comb filtering effects if correlated channel signals are summed and does not enable adjustment of the levels of individual sound sources.

[0072] It has also been proposed, for example in US provisional application 63 / 736,077 a system for rendering reverberation for a HOA array signal such that the reverberation level of sound sources can be individually adjusted. This is achieved with the help of acoustic beamforming.

[0073] In MPEG-I, one potential and designed for input audio format is Multiple Point Higher Order Ambisonics (MPHOA), where the input audio contains HOA from two or more positions within the audio scene. The renderer synthesizes the audio scene at a position between or at microphone positions by means of interpolation of spatial parameters. However, reverberation cannot be applied to sources within such MPHOA captures.

[0074] Therefore, there is currently research, the results of which are described in the following embodiments, for apparatus and methods for computationally efficient rendering of reverberation for audio scenes represented with several spatial audio signals, where the spatial audio signal or spatial audio signal component used for rendering the reverberation of a source can be automatically selected to enableimproved quality of reverberation rendering. The embodiments thus determine which spatial audio signal is best for capturing a beam of each sound source to be used as input for reverberation (with respect to at least one of early and late reverberation).

[0075] Thus, for example, in some embodiments there can be implemented an apparatus, or suitable method, for applying reverberation to at least one audio signal. In these embodiments the apparatus can comprise at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the system at least to perform (or the method comprises): obtaining a spatial audio information, the spatial audio information comprising: at least one spatial audio signal; a spatial audio signal configuration specification; and audio scene information, the audio scene information defining at least one audio scene. Furthermore the apparatus or method can comprise determining at least one audio source within the at least one audio scene based on the audio scene information. Then the apparatus or method can comprise identifying at least one audio signal component based at least partly on a determination of a prominence of the at least one audio source within the at least one spatial audio signal. Following this the apparatus can be caused to perform or the method can comprise obtaining the identified at least one audio signal component. Then the apparatus can be caused to perform or the method can comprise rendering at least one reverberant audio signal at least based on an application of reverberation on the selected at least one signal component.

[0076] In some embodiments the selection (or identification) of the first audio signal component or spatial audio signal is implemented based on at least one of the following features:Prominence of sound source within the spatial audio signal;Availability of the spatial audio signal;Proximity of the spatial audio signal to a listener position; andMetadata indication of preferred spatial audio signal based on listener position.

[0077] The obtaining of the spatial audio information can in some embodiments be from a spatial audio signal bitstream.

[0078] The obtaining of the identified at least one audio signal component can in some embodiments be caused to perform: determining whether the identified at least one audio signal component is one from at least two audio signal components available at the apparatus; selecting the identified at least one audio signal component of the at least two audio signal components available at the apparatus when the identified at least one audio signal component is one from at least two audio signal components available at the apparatus. In some embodiments the apparatus can be caused to perform receiving or obtaining the at least one audio signal component from a further apparatus when the identified at least one audio signal component is not one from at least two audio signal components available at the apparatus.

[0079] The at least one audio signal component can comprise at least one of: at least one multichannel audio signals; at least one audio channel; at least one audio source; and at least one audio beam.

[0080] The identifying of the at least one audio signal component based at least partly on the determination of the prominence of the at least one audio source within the at least one spatial audio signal can in some embodiments be such that the apparatus is further caused to perform: determining at least one dominant audio source; and identifying the at least one audio signal component based on the at least one dominant audio source.

[0081] The apparatus (or method) caused to perform determining at least one dominant audio source can be caused to perform determining a direction of the at least one dominant audio source relative to a position for the at least one spatial audio signal and identifying the at least one audio signal component based on the direction of the at least one dominant audio source.

[0082] The apparatus (or method) caused to perform identifying at least one audio signal component based at least partly on the determination of the prominence of the at least one audio source within the at least one spatial audio signal can be further caused to perform: determining an availability of at least two audio signal components and then identifying the at least one audio signal component based on the availability of the at least two audio signal components.

[0083] The identifying of the at least one audio signal component based at least partly on the determination of the prominence of the at least one audio source within the at least one spatial audio signal can be further caused to perform: determining a distance between the at least one audio signal component and a listener position; and identifying the at least one audio signal component based on the determined distance between the at least one audio signal component and the listener position.

[0084] Furthermore the identifying of the at least one audio signal component based at least partly on the determination of the prominence of the at least one audio source within the at least one spatial audio signal may be further caused to perform: obtaining metadata indicating at least one audio signal component and identifying the at least one signal component based on metadata indicating at least one audio signal component.

[0085] The obtaining of the identified at least one audio signal component can be switching from a first audio signal component to a further audio signal component, and may further comprise transitioning from the first audio signal component to the further audio signal component, and / or interpolating between the first audio signal component and the further audio signal component.

[0086] A determination for a need or requirement to switch the first spatial audio signal to a second spatial audio signal can in some embodiments be obtained before the switch.

[0087] In some embodiments the apparatus and methods may further comprise gradually transitioning from a first beam signal obtained from the first spatial audio signal to a second beam signal obtained from thesecond spatial audio signal. This gradual transitioning can also comprise interpolating between first reverberation gain associated with the first beam signal and a second reverberation gain associated with the second beam signal.

[0088] In some embodiments, when a first array is determined to be available at the apparatus, beamforming may be performed at the apparatus to extract the beam. If an array is not available on the apparatus, for example, because of a partial delivery of the multiple spatial audio signals, then a monophonic beam signal (obtained on a server or further apparatus from the optimal spatial audio signal) may be retrieved or obtained from the server or further apparatus and used for reverberation rendering.

[0089] The effect of the application of these embodiments is such that an optimal beam signal to be used for reverberating sources on audio scenes comprising multiple spatial audio signals can be automatically identified and obtained (and in some embodiments selected). This means that reverberation quality can be optimal while the system always selects the best source for the beam signal (or audio component such as channel or channels), instead of attempting to beam from only the available spatial audio signals towards the sources (the available spatial audio signals might be far from the source, for example). Moreover, the embodiments enable obtaining or retrieving the optimal beam signal (or audio signal component) in a bitrate efficient manner (mono signal only obtained from a second apparatus) while naive solutions would be forced to obtain or retrieve complete spatial audio signals which consumes a higher bitrate.

[0090] As described above, for example with respect to a virtual acoustics rendering system, reverberation is typically rendered as a combination of at least two echo-generating components. A so-called early reflection echo synthesis component 103 which generates a certain number of perceptually distinct echoes, and a late reverberation synthesis component 107 which generates a stream of echoes which are relatively indistinct but adhere to the overall decay properties of a stochastic model for the late reverberation.

[0091] Reflection echo synthesis is typically spatially dynamic in that the levels and directions of arrival (DoAs) of the reflections depend on the source and listener positions and orientations. For example a reflection processor (such as shown in Fig.5 as reference 851) can be configured to produce a discrete number of echoes that are precise and independently varied in their intensity and coloration, as determined by attenuation with distance, air absorption filtering, reflection surface absorption filtering, and which are specular relative to features and geometry of the modelled room which in turn determines their encoded DoA. These echoes correspond to early reflections depicted in the impulse response (ref. Fig. 1 , item 103). The reflection processor (which is shown later in the rendering system of Fig.5 with reference 851) is typically external to and running in parallel with the reverberator. Herein, the echoes produced by the reflection processor can also be referred to as reflection echoes.

[0092] Late reverberation, in contrast to early reflections, is not considered to be spatially dynamic in that the echoes produced in late reverberation synthesis do not vary with source-listener orientation.

[0093] A feedback network FN 200 as shown in Fig.2 is an example of a reverberator configured to produce a decaying stream of many echoes which increase in number (density) while decreasing in intensity (loudness) over time, as characterized by the decay properties of stochastic late reverberation (as shown in Fig.1 by the reference 107). The feedback (FB) 200 reverberator shows an input audio signal which is configured to pass through the network, splitting into numerous paths which form “echoes” which are separated in time by independent delay lines, all of which subsequently recirculate through the network, being further divided among the delay lines, subsequently splitting into more echoes with each recirculation through the network, and so on. These echoes can be designed to have only a loose correspondence to the geometry of the virtual room so do not represent geometrically precise (specular) reflections, nor do they convey the attenuation characteristics of specific reflection surfaces. There can be several reverberators in a virtual acoustics renderer, each of which models the characteristics of a room or Acoustic Environment (AE). The rooms (AEs) can have connections to each other, which means that sound sources in any room can contribute to any reverberator. Also, connected or second order reverberation can be implemented by feeding the output of a reverberator into another reverberator.

[0094] Fig.2 shows an example reverberator 200 which could be employed in some embodiments. The reverberator 200 is configured to receive a beam or input audio signal 201 (which can be designated s£n(t), where t is the sample (time) index). Furthermore, the reverberator 200 is configured based on (received) reverberator parameters. In some embodiments the reverberator 200 is further configured to receive directional configuration (and the room dimensions) that may be used to configure the reverberation.

[0095] In this example embodiment, the reverberator 200 has D (for example D = 15) output channels indexed with d = 1, 2,The resulting reverberant audio signals 210 srev(t, d) are mutually incoherent, and they have acoustical characteristics according to the reverberator parameters.

[0096] The D uncorrelated outputs are subsequently rendered from different spatial directions defined by the directional configuration.

[0097] In some embodiments the reverberator 200 comprises a pre-delay line z~mvre205, configured to receive and delay the input audio signal. The reverberator 200 also comprises a reverberation ratio control filter GEQratio 203 which is configured to receive the pre-delay line output 262. The reverberator 200 further comprises a number D of feedback delay lines z~md251 and corresponding feedback delay line attenuation filters GEQd 253. The signals which are output from the feedback delay line attenuation filters GEQd 253 are sent to inputs of a feedback matrix A 257. The outputs of the feedback matrix A 257 are sent to D signal combiners 254 (adders) to sum the outputs of the feedback matrix A 257 with the output of GEQratio 203 to be used as inputs to each of the feedback delay lines z~md251.

[0098] The outputs of delay line attenuation filters GEQd 253 are routed to D signal multipliers 261 which in turn output the reverberant audio signals 210.

[0099] In some embodiments the reverberator 200 is configured to receive reverberator parameters which comprise a delay length mpre, in samples, for pre-delay line zm re205, coefficients of a reverberation ratio control filter GEQratio 203, delay lengths mdfor each of D feedback delay lines z~md251, coefficients for each of D feedback delay line attenuation filters GEQd 253, coefficients for the feedback matrix A 257. The reverberator parameters also comprise output channel gains gdwhich are used to configure D signal multipliers 261.

[0100] In some embodiments the frequency dependent gain elements or attenuation filters GEQd 253 are implemented as graphic equalizer (EQ) filters using M biquad HR band filters. In the case of octave-band filtering, M = 10. Thus, the reverberator parameters corresponding to each graphic EQ filter comprise the feedforward and feedback coefficients for 10 biquad HR filters, the gains for biquad band filters, and the overall gain.

[0101] The feedback delay lines z~md251 can also be referred as loop delay lines or recirculating delay lines and the feedback delay line attenuation filters GEQd 253 can be referred to as loop filters or recirculating filters. In some embodiments the coefficients of feedback matrix A 257 are hardcoded in software code rather than provided as parameters.

[0102] The reverberator thus comprises multiple recirculating delay lines associated with the feedback network (FN) 250. The feedback matrix A 257 is used to control the recirculation gain and routing within the network. The feedback delay line attenuation filters GEQd 253 can be implemented in some embodiments as graphic EQ filters implemented as cascades of second-order section HR filters and can facilitate controlling the energy decay rate at different frequencies. The feedback delay line attenuation filters GEQd 253 furthermore are designed such that they attenuate the signal by the desired amount with each pass through the FN such that the desired reverberation time (RT6Q) is achieved.

[0103] The number of delay lines D can be adjusted depending on quality requirements and the desired tradeoff between reverberation quality (e.g. modal density, temporal and spatial diffuseness, diffuse onset time) and computational complexity. In an embodiment, an efficient implementation with D = 15 delay lines is used. This makes it possible to define the coefficients of the feedback matrix A 457 as proposed by Rocchesso in Maximally Diffusive Yet Efficient Feedback Delay Networks for Artificial Reverberation, IEEE Signal Processing Letters, Vol. 4. No. 9, Sep 1997, in terms of a Galois sequence facilitating efficient implementation.

[0104] The feedback network (FN) 258 of the reverberator is a feedback delay network (FDN). Reverberator 200 can thus produce 15 nearly uncorrelated outputs which are subsequently rendered from different spatial directions defined by the directional configuration. The output signals are reproduced using loudspeakers (or alternatively virtual loudspeakers that are convolved with HRTFs or yet alternatively encoded to ambisonics which is then decoded to a binaural or loudspeaker format) that are positioned in the corresponding spatialdirections, and the levels of which are controlled with channel gain coefficients. The resulting reverberant audio signals have acoustical characteristics according to the reverberator parameters, namely a desired frequency-dependent rate of decay and level.

[0105] In some embodiments the reverberator 200 parameters are related to the reverberation characteristics of the acoustic environment or room which the reverberator relates to. In a scene with several rooms there can be several reverberator 200 instances implemented.

[0106] Fig.3 shows an example system or apparatus representing a reverberator processing system 300 suitable for rendering late-stage reverberation according to some embodiments and which employs the reverberator 200 as shown in Fig.2. In the following described embodiments the audio signal component is a beam formed audio signal or beam signal, however in some embodiments the audio signal component can be an audio signal channel, a multichannel audio signal, or an audio source and thus the ’beam signal’ or beam signal elements can be more generally designated as audio signal component or audio signal component element.

[0107] The system comprises inputs such as spatial audio signals 303, reverberation configuration specification 302, and directional configuration specification 312. Furthermore the system comprises further inputs such as listener position 306, beam signal provider 328, spatial audio signal configuration specification 324, source configuration specification 318 and beam configuration specification 320.

[0108] The apparatus or reverberator processing system 300 further comprises a spatial audio signal selector 322 (or more generally an audio signal component identifier). The spatial audio selector 322 or audio signal component identifier is configured to receive the spatial audio signals 303, spatial audio signal configuration specification 324 and source configuration specification 318 as input. In some embodiments the spatial audio selector 322 or audio signal component identifier is configured to obtain or receive the listener position 306. The input to the spatial audio signal selector 322 or audio signal component identifier can be generally known as spatial audio information, the spatial audio information comprising: the at least one spatial audio signal; the spatial audio signal configuration specification; and audio scene information, the audio scene information defining at least one audio scene.

[0109] The spatial audio signal selector 322 can in some embodiments be configured to select and output the best spatial audio signal 301 for the source configured in source configuration specification 318 from the several spatial audio signals in 303. The arrangement of spatial audio signals such as their positions and orientations are described in spatial audio signal configuration specification 324.

[0110] In a simple example, the spatial audio signal selector can select the spatial audio signal 301 whose position is closest to the sound source of interest indicated in the source configuration specification 318 to be used as the output spatial audio signal 301. In other embodiments other criteria for selection, such as described later, can be employed.

[0111] In some embodiments, the spatial audio signal selector 322 or audio signal component identifier can be configured to identify the spatial audio signal or audio signal component and ‘select’ or provide this identification to be provided to the beam signal obtainer 316.

[0112] The reverberator processing system 300 further comprises a beam signal receiver 326 (or more generally an identified audio component receiver). The beam signal receiver 326 is configured to receive spatial audio receiving configuration 328 as input and produce a beam signal 201 b as output. The beam signal receiver is configured to communicate with a beam signal provider 328 which typically operates on a different device or further apparatus, such as a server, and hosts all spatial audio signals (and including the spatial audio signals 303 provided to the spatial audio signal selector 322) and their associated metadata such as spatial audio signal configuration specification 324.

[0113] When a spatial audio signal or signal component which is optimal for extracting a beam signal 201 a corresponding to a source identified in the source configuration specification 318 is not available in the spatial audio signals 303 input, the spatial audio signal selector 322 can provide the beam signal receiver 326 information in the spatial audio signal receiving configuration. The beam signal receiver 326 can then be configured to request a suitable beam signal 201 b (or more generally the identified audio signal component) and the beam signal 201 b (or identified audio signal component) can be provided from the beam signal provider 328.

[0114] In other words the beam signal provider 328 is configured not to provide the missing spatial audio signal but the identified audio signal component beam signal 201 b which already has been generated or extracted by the beam signal provider 328 from the optimal spatial audio signal indicated by spatial audio receiving configuration 328.

[0115] The reverberator processing system 300 further comprises a beam signal obtainer 316 (or more generally identified audio component obtainer) which is configured to receive the spatial audio signal 301 (either the signal or identification of the audio signal or audio signal component) and source configuration specification or information (for example the sound source direction, source reverberation gain, and reference distance) 318 and the beam configuration specification or information 320 and produce a beam signal 201 a to be input to the reverberator 200.

[0116] Thus, in some embodiments the beam signal obtainer is configured to generate the beam signal 201 a or audio signal component when the identified audio signal component is available or able to be generated at the apparatus from the spatial audio signal 301 or when the identified audio signal component is not available then the beam signal receiver 326 is configured to retrieve the audio signal component or beam signal 201 b from the beam signal provider 328.

[0117] In some embodiments, the reverberator processing system 300 comprises a reverberator parameter determiner 303 configured to obtain the reverberation configuration specification 302 and directionalconfiguration specification 312. The Reverberator parameter determiner 303 can in these embodiments be configured to convert these specifications into suitable reverberator parameters 304 for the reverberator 200.

[0118] Furthermore in some embodiments the reverberator processing system 300 further comprises a binaural renderer 309 configured to render reverberant binaural signals 314 with late reverberation that are perceived according to the reverberant characteristics specified in the reverberation configuration specification 302 and directional characteristics specified in the directional configuration specification 312 and where the reverberation is applied to a source in the spatial audio signal 301 obtained using the source configuration specification 318. The reverberation configuration specification 302 and directional configuration specification 312 and the source configuration specification 318 can, for example, be obtained from a bitstream or from a listening space description format (LSDF) input to the renderer.

[0119] In some embodiments, the reverberation configuration specification 302 comprises suitable parameters for configuring the reverberator 200. Suitable reverberation configuration specification 302 includes, for example, the reverberation times / ?T60(fc) in frequency bands (where k is the frequency band index), reverberant-to-direct ratio RDR k) , pre-delay time tpre, and / or a virtual space geometry specification. Alternative to the RDR, the diffuse-to-source energy ratio (DSR) can be used.

[0120] In some embodiments, the directional configuration specification 312 can indicate encoding directions used to render the reverberation by a suitable rendering scheme that creates a perception of enveloping diffuse reverberation, such as ambisonics or amplitude panning rendering, or simply rendered directly to a surrounding (real or virtual) loudspeaker setup. As an example, the directional configuration may specify a spherical design such as a t-design, Lebedev grid, or other suitable (nearly) uniform spherical layout with D points representing encoding directions (and thus the number of reverberator output channels).

[0121] In some embodiments, the source configuration specification 318 can indicate the spatial direction (azimuth and elevation) of a sound source in sound source direction, a desired reverberation gain for the source as source reverberation gain, and a reference distance for the sound source. The spatial direction can indicate the direction of the sound source as captured by the spatial audio signal 301. The source reverberation gain can indicate a desired reverberation gain for the sound source. A reference distance can indicate a reference distance for the sound source also impacting the reverberant signal level.

[0122] In some embodiments, the beam configuration specification 320 can indicate beamforming configuration data such as beam directions and widths and attenuations associated with the spatial audio signal 301 or which can be applied to or otherwise obtained from the spatial audio signal 301 .

[0123] Figs.4a and 4b show an example flow diagram of the operations of the example reverberator processing system shown in Fig.3 according to some embodiments.

[0124] First obtain spatial audio signals, spatial audio signal configuration specification and source configuration specification as shown in Fig.4a by 401.

[0125] Then identify / determine a source of interest based on source configuration specification as shown in Fig.4a by 403. The source of interest is in some embodiments the sound source captured by the spatial audio signals that is to be reverberated.

[0126] Then, identify / determine spatial audio signals capturing the source of interest based on spatial audio signal configuration specification and source configuration specification as shown in Fig.4a by 405. The identification or determining of spatial audio signals capturing the source of interest based on spatial audio signal configuration specification and source configuration specification can in some embodiments be one where the spatial audio signal selector 322 determines the one or more spatial audio signals which capture the source of interest with sufficient quality.

[0127] Then, identify / determine a selected spatial audio signal (or audio signal component) from the spatial audio signals capturing the source of interest by considering at least a prominence of the source of interest within the spatial audio signal as shown in Fig .4a by 407. The determining or identifying of a selected spatial audio signal from the spatial audio signals capturing the source of interest by considering at least prominence of the source of interest within the spatial audio signal can consider the following aspects:

[0128] Prominence of sound source within the spatial audio signal: a spatial audio signal close to the source of interest is likely to capture the source of interest with good signal to noise ratio. Such proximity can be determined by calculating the distance between the point of capture of a spatial audio signal and the location of the source of interest. Directivity and / or orientation of the source of interest can be taken into account in some embodiments so that if source of interest is directed towards the position of spatial audio signal then it is considered more prominent. That is the level of the source of interest is likely to be high compared to other sources. In embodiments where the spatial audio signal selector has a model of the source of interest then it can also analyze the content of the spatial audio signal to determine the prominence of the signal. An example of this can include applying correlation analysis.

[0129] Proximity of the spatial audio signal to a listener position: in some embodiments a spatial audio signal closest to the listener position can be selected.

[0130] Metadata indication of preferred spatial audio signal based on listener position: in some embodiments there can be a bitstream indication indicating the optimal spatial audio signal for each sound source. This indication can change over time.

[0131] Then, a determination is made whether the identified audio signal component or selected spatial audio signal is available as shown in Fig.4a by 409. This can be, for example, because the apparatus or system of Fig.3 can receive only a subset of the spatial audio signals for rendering at any given time. This means that the optimal spatial audio signal for capturing a source of interest may not always be available for generating the audio signal component, or beam.

[0132] If the identified audio signal component or selected spatial audio signal is available then the spatial audio signal is output as shown in Fig.4a by 411 .

[0133] Then from the spatial audio signal the audio signal component, for example the output beam signal is generated or otherwise obtained and output as shown in Fig.4a by 413. The obtaining of the audio signal component or beam signal can be implemented as described in US provisional 63 / 736,077.

[0134] If the identified audio signal component or selected spatial audio signal is not available then information identifying the identified audio signal component (for example the beam) or spatial audio receiving configuration (which would be able to generate the identified audio signal component or signal beam) is provided to the audio component receiver or beam signal receiver as shown in Fig.4a by 415. In other words if the selected spatial audio signal is not available there is implemented the operation of providing spatial audio receiving configuration to the beam signal receiver. The spatial audio signal receiving configuration can indicate, for example, the desired spatial audio signal, timestamp, and source of interest

[0135] Then from the information identifying the identified audio signal component (for example the beam) or spatial audio receiving configuration (which would be able to generate the identified audio signal component or signal beam) the audio signal component, for example the beam audio is requested and received (from a further apparatus or server) as shown in Fig.4a by 417. The request can, for example, indicate the spatial audio signal, source of interest, and timestamp. In response, the beam signal provider 328 provides a beam signal that has been created from the indicated spatial audio signal at the indicated timestamp and which captures the source of interest. A beam signal is provided from beam signal provider 328 instead of the full spatial audio signal as this substantially saves network bandwidth. The beam signals can be created offline and stored at the beam signal provider 328. In the creation the same logic can be applied as in spatial audio signal selector 322 for selecting the spatial audio signal and in beam signal obtainer 316 for obtaining the beam signal. However, the operations of beam signal provider 328 are executed on a further apparatus, for example a network server. The received beam signal can be output as beam signal 201 b.

[0136] Then, the received audio signal component, for example the beam signal, is output as the audio component / beam signal as shown in Fig.4a by 419. In this case the beam signal obtainer 316 does not need to perform the beamforming step anymore but it can just pass the received beam signal 201 b as the output beam signal 201 a.

[0137] Then, reverberant binaural signals 314 are rendered by processing the reverberated audio signals 210 by the configured binaural renderer 309 as shown in Fig.4 by 415.

[0138] Then, the reverberation configuration specification 302 and directional configuration specification 312 inputs are obtained as shown in Fig.4b by 421.

[0139] Then, the reverberator parameters 304 are determined from the reverberation configuration specification 302 and directional configuration specification 312 inputs as shown in Fig.4b by 423.

[0140] Then, the reverberator 200 is configured using reverberator parameters 304 as shown in Fig.4b by 425.

[0141] Then, the binaural renderer 309 is configured using the directional configuration specification 312 as shown in Fig.4b by 427.

[0142] Then, reverberant audio signals 210 are generated by processing the audio signal with the configured reverberator 200 as shown in Fig.4 by 429.

[0143] Then, reverberant binaural signals 314 are rendered by processing the reverberated audio signals 210 by the configured binaural renderer 309 as shown in Fig.4 by 431.

[0144] Then, reverberant binaural signals 314 are output as shown in Fig.4 by 433.

[0145] In some embodiments, when the source spatial audio signal for a sound source beam signal changes, there can be a period during which two beam signals are obtained from two spatial audio signals (or received from the beam signal receiver). In this case, the beam signal obtainer 316 can be configured to perform cross fading between these beam signals or more generally audio signal components to ensure smooth transition between different signal sources. The cross fading can involve gradually decreasing at least one gain value for the previous beam signal (or audio signal component) and gradually increasing at least one gain value for the next beam signal (or audio signal component). This way transitions can be done in a smooth manner when the optimal source spatial audio changes.

[0146] In some embodiments, the spatial audio signal selector 322 can provide a confidence value related to how well a source of interest can be obtained into the beam signal. This confidence value can be used to apply an additional gain to the beam signal before it is fed to reverberation or early reflection processing. If the confidence of obtaining good beamforming of a source is low then the gain value can be low so that the reverberation effect created from the source is lower. Correspondingly, if the confidence of obtaining a good beamforming of a source signal is high, this additional gain value can be high so that stronger reverberation effect can be created. This way it is ensured that strong reverberation or reflection effects are created for sources which can be well separated from the spatial audio signals via beamforming.

[0147] With respect to Fig.5 is shown an example virtual audio scene rendering system 800 comprising the reverberation system 300 as shown in Fig.3 according to some embodiments.

[0148] The virtual audio scene rendering system 800 can further comprise a reflection processor 851 . With respect to the reverberator and the reflection processor these are configured to generate audio signals associated with echoes within the system. For example, the reflection processor is configured to produce a discrete number of echoes which are specular with regard to features and geometry of the modelled room and are correspondingly precise and independently varied in their arrival direction, intensity, and coloration,as characterizes early reflections in a room impulse response (ref. Fig. 1 , region 103). The echoes produced by the reflection processor are accordingly referred to as reflection echoes. The reflection processor is external to and running in parallel with the reverberator 300 which is configured to produce late reverberation.

[0149] Fig.5 further shows an example reflection binaural renderer 859 configured to receive the reflection audio signal 850 from the reflection processor and generate reflection binaural audio signals 854.

[0150] Fig.6 shows an example reflection processor 851 and the associated reflection binaural renderer 859 suitable for using along with the embodiments as discussed herein. There are several ways to calculate or simulate early reflections. As an example, the image source method can be used such as detailed in in J. B. Allen and D. A. Berkley, “Image method for efficiently simulating small-room acoustic,” J. Acoust. Soc. Am., vol. 65, pp. 943-950, April 1979 and J. Borish. “Extension of the image model to arbitrary polyhedra.” The Journal of the Acoustical Society of America 75.6 (1984): 1827-1836 and and Fig.7.

[0151] These parameters, such as delay 906, absorption 908, attenuation 910 and direction of arrival (DoA) 912, can be explained with respect to Fig.7 where an example box or rectangular space is shown with reflecting surfaces 1000, 1002, 1004, 1006. Within the virtual acoustic space is the source 1020 and the listener 1010. The directions of a reflection between the source 1020 and the listener 1010 is shown where on the reflecting surface between the source 1020 and listener 1010 is a reflection and / or absorption point 1040. The mirroring of the source 1020 across the reflecting surface 1006 can be used to establish an image source 1030. The line connecting the image source 1030 to the listener 1010 can then be used to establish the reflection and / or absorption point 1040 and the DoA of the reflection with respect to the listener. The delay to be applied to synthesize a reflection is obtained based on the distance of the reflecting path (path from the image source to the listener which equals the length of the path from the source 1020 to the listener 1010). The absorption corresponds to the reflecting surface 1006 from which this sound trajectory is reflected (the reflection and / or absorption point 1040). The distance attenuation is set proportional to 1 / r where r equals the length of the reflection path from the source to the listener. In addition, air absorption can be included in the attenuation of the image source. The DoA of a reflection is set based on the angle of arrival from the reflection point to the listener.

[0152] Fig.8 illustrates an example of multiple echo orders, in which the left side of Fig.8 shows a XY plane representation 500 of a virtual “shoebox” room 501 and its virtual image rooms in a grid-like structure. Third- order image sources calculated by the image source method are depicted (one example image source indicated by item 502). The circle proscribed by a radius 503 may determine the time-equivalent value of tpre, and the remaining distance to each image source, exemplified by the distance 504, may correspond to the chosen values of mdand mcwhen converting the distances to time-of-flight and then to samples. In this example, Fig.8 510 shows ToAs (including tpre) for various echo orders. For example plot 511 shows order1 echoes, 512 shows order 2 echoes, 513 shows order 3 echoes and 514 the combination of echoes up to (and including) order 3.

[0153] In the example early reflection renderer shown in Fig.6, a reflection parameter determiner 901 is configured to receive the inputs of room geometry 906, listener position 900, source position 902, and absorption coefficients 904 and generate control parameters such as delay 906, absorption 908, attenuation 910 and direction of arrival (DoA) 912 and pass these to the processors described hereafter.

[0154] In some embodiments the beam signal 201 is first fed into a delay line 903 which buffers audio signal samples and enables picking segments of past samples of the beam signal 201 .

[0155] The reflection signal obtainer 905 can receive the output of the delay line 903 and the delay 906 parameter. The reflection signal obtainer is configured to obtain a past signal sample based on the delay 906 to obtain a delayed signal.

[0156] A reflection absorption processor 907 then can filter the selected past signal sample to apply an equalizer filter to model the frequency-dependent absorption data for the reflection to obtain delayed and absorption-filtered signal.

[0157] A reflection attenuation processor 909 can then attenuate the delayed and absorption-filtered signal by applying a 1 / r attenuation and optionally air absorption to obtain delayed and absorption-filtered and attenuated signal.

[0158] Finally, a reflection spatializer 911 can be configured to spatialize the delayed and absorption-filtered and attenuated signal by HRTF filtering with a left and right HRTF filter corresponding to the desired DoA for this reflection to obtain a reverberant binaural signal 912 containing the synthesized reflection portion. In some situations, the reflection spatializer can be the binauralizer.

[0159] The virtual audio scene rendering system can furthermore comprise the reverberation rendering system of Fig.3, that renders the diffuse late reverberation based on the beam signal 201 received from the beam signal obtainer 316. It is noted that early reflections are rendered individually for each of the beam signals 2011, 2012, ..., 201n. This is because early reflections each can have a different direction of arrival and thus need to be rendered separately. For late reverberation, the beam signals 2011, 2012, ..., 201ncan be mixed with an adder 870 and fed into the Reverberator 200 as a downmix.

[0160] Figs.9a and 9b show schematically an example system where the embodiments are implemented in an encoder device 1101 which performs part of the functionality; writes data into a bitstream 1121 and transmits that for a renderer device 1141 , which decodes the bitstream, performs reverberator processing according to the embodiments and outputs audio for headphone listening. New elements compared to background are indicated with thicker lines.

[0161] The encoder side 1101 of Fig.9a can be performed on content creator computers and / or network server computers. The output of the encoder is the bitstream 1121 which is made available for downloadingor streaming. The decoder / renderer 1141 functionality runs on an end-user-device, which can be a mobile device, personal computer, sound bar, tablet computer, car media system, home HiFi or theatre system, head mounted display for AR or VR, smart watch, or any suitable system for audio consumption.

[0162] The encoder 1101 is configured to receive the virtual scene description 1100 and the audio signals 1104. The virtual scene description 1100 can be provided in the MPEG-I encoder input format (EIF) or in another suitable format. Generally, the virtual scene description contains an acoustically relevant description of the contents of the virtual scene, and contains, for example, the scene geometry as a mesh or as voxels, acoustic materials, acoustic environments with reverberation parameters, positions of sound sources, positions of spatial audio signals and their orientations and other audio element related parameters such as whether reverberation is to be rendered for an audio element or not. The encoder 1101 in some embodiments comprises a scene encoder 1113 configured to generate scene and reverberation parameters.

[0163] The encoder 1101 further comprises a MPEG-H 3D audio encoder 1114 configured to obtain the audio signals 1904 and MPEG-H encode them and pass them to a bitstream encoder 1115.

[0164] The encoder 1101 further comprises a spatial audio signal selector 322 / 1155 which is configured to obtain spatial audio signals from the audio signals 1104 and positions and orientations of spatial audio signals and sound source positions from virtual scene description 1100 and provided the selected spatial audio signals to a beam signal obtainer 316 / 1156. The beam signal obtainer 316 / 1156 produces the beam signals for each sound source at each time instant from the spatial audio signals, and the beam signals are passed to the MPEG-H 3D audio encoder 1114 for encoding. The encoded beam signals are then stored into the beam signal provider 328 which is configured to output into the bitstream encoder 1115 so that beam signals can be provided via the bitstream when requested by the decoder / renderer 1141.

[0165] The encoder 1101 furthermore in some embodiments comprises a bitstream encoder 1115 which is configured to receive the output of the scene encoder 1113 and the encoded audio signals from the MPEG- H encoder 1114 and the encoded beam signals from the beam signal provider 328 and generate the bitstream 1121 which can be passed to the bitstream decoder 1141. The bitstream 1121 in some embodiments can be streamed to end-user devices or made available for download or stored.

[0166] The decoder 1141 in some embodiments comprises a bitstream decoder 1141 configured to decode the bitstream.

[0167] The decoder 1141 further can comprise a scene payload decoder 1143 configured to obtain the encoded scene parameters and decode these in an opposite or inverse operation to the scene payload encoder 1113.

[0168] The reverberator parameter determiner 303 / 1142 is configured to receive the decoded reverberation configuration specification and room dimensions and reverberation parameters 1140 information andgenerate the reverberator parameters discussed herein. Note that in some embodiments no reverberation parameters are received but reverberator parameters are obtained from the scene payload decoder 1143.

[0169] Furthermore, the spatial audio signal selector 322 / 1155 is configured to receive input from the scene payload decoder and MPEG-H 3D audio decoder 1154 and produce spatial audio signal to the beam signal obtainer 316 / 1161 or request a beam signal from the beam signal receiver 326 / 1173. The beam signal receiver 326 / 1173 is configured to receive output from the bitstream decoder 1151 and provide the beam signal to the beam signal obtainer 316 / 1161. There is a communication channel from the beam signal receiver 326 / 1173 to the beam signal provider 328 / 1175 such that beam signals can be requested when needed.

[0170] Furthermore, the beam signal obtainer 200 / 1161 is configured to receive source configuration specification 318 from the scene payload and beam configuration specification 320 and provide its output to a reverberator 200 / 1164 and reflection processor 851 / 1162.

[0171] Furthermore, the head pose generator 1147 receives information from a head mounted device 1170 or similar and generates head pose information or parameters which can be passed to the reverberant signal combiner 310, binaural renderer 309 / 1159, the early reflection renderer 990 / 1162 and the direct sound binaural renderer 1163.

[0172] The decoder 1141 comprise MPEG-H 3D audio decoder 1144 which is configured to decode the audio signals and pass them to the reverberators 201 / 1161 and spatial audio signal selector 322 / 1155 and direct sound processing 1165.

[0173] The decoder 1141 furthermore comprises reverberator 201 / 1164 configured to implement a suitable reverberation of the audio signals from the MPEG-H 3D audio decoder 1144.

[0174] The decoder further comprises a binaural renderer 309 / 1159 configured to generate binaural reverberant audio signals from the reverberant signals output of the Reverberator 201 / 1164 based on the Head pose.

[0175] The decoder furthermore comprises an early reflection renderer 990 / 1162 configured to obtain the output of the Beam signal obtainer and generate early reflections as described above and pass these to an early reflection binaural renderer 1199.

[0176] The decoder further comprises an early reflection (ER) binaural renderer 1199 configured to generate binaural early reflection audio signals from the output of the Reflection processor 851 / 1162.

[0177] Additionally, the decoder / renderer 1141 comprises a direct sound processor 1165 which is configured to receive the decoded audio signals and configured to implement any direct sound processing such as air absorption and distance-gain attenuation and which can be passed to a direct sound binaural renderer 1163 which with the head orientation determination (from a suitable sensor) can generate the direct sound component which with the reverberant component is passed to a binaural signal combiner 1167. Thebinaural signal combiner 1167 is configured to combine the direct, early reflection, and reverberant parts to generate a suitable output (for example for headphone reproduction).

[0178] Furthermore, in some embodiments the decoder comprises a head orientation determiner which passes the head orientation information to the head pose generator 1147.

[0179] In some embodiments an alternative to transmitting reverberation parameters from the encoder to the renderer can be to transmit reverberator parameters in the bitstream. Reverberator parameters refer to the FDN parameters such as delay line lengths, attenuation filters, reverberation ratio control filters, and so on.

[0180] In some embodiments the assignment of reverberator outputs to loudspeaker channels happens during configuration of the reverberator. The assignment can be stored during configuration and provided to the reverberant signal router.

[0181] In some embodiments, the output is a multichannel loudspeaker setup (such as 5.1 or 7.1 +4 multichannel loudspeaker setup). In that case, the spatial processing can be modified by using the directions of the actual loudspeakers as the directional configuration and omitting the binaural Tenderers, and reproducing the reverberant audio signals from the corresponding loudspeakers of the loudspeaker setup. In the case of loudspeaker output, instead of binaural renderer 309 / 1159 in Fig.9b there is implemented a loudspeaker renderer (or panner) which in the simplest case will pass through the output signals to a loudspeaker signal combiner which will replace the binaural signal combiner 1167. Correspondingly, the direct sound part and early reflection part are spatialized with a panner such as vector-base amplitude panning instead of the binaural processors.

[0182] Even though the examples have been described with multiple HOA signals, in other embodiments other audio signal formats can be implemented where there are several spatial audio signals capturing the audio scene, and the spatial audio signals carry capture position and orientation information. Moreover, in these embodiments there is a beamforming or otherwise to focus towards sources. Other types of multimicrophone spatial audio signals could also be used for the purpose.

[0183] In some embodiments the spatial audio signals are represented in the form of transport channels plus metadata then the beamformed signals could be retrieved from the server all the time instead of obtained on the device. This is because on the server original multi-microphone versions of the spatial audio signals could be stored from which beamformed signals can be obtained with good quality. On the apparatus or device, if the spatial audio signals are in the form of transport channels and metadata the beamforming ability could be limited but is still sufficient for determining from which spatial audio signal the beamform signals for sources of interest should be obtained. Thus, in this case the transport signals plus metadata representation can still be used for determining which spatial signal to use for beamforming but the beamformed signal isrequested from the beam signal provider even if the spatial audio signal is available on the apparatus or device.

[0184] The selected spatial audio signal can in some embodiments be the same for early reflection rendering and late reverberation rendering. In some embodiments a different spatial audio signal can be selected for early and late reverberation rendering. For example, this can be beneficial if different criteria are considered important for early and late reverberation rendering. In this case two different selections of spatial audio signal can be performed, one selecting based on ER related criteria and another selecting on late reverberation related criteria.

[0185] With respect to Fig.10 an example electronic device which may be used as any of the apparatus parts of the system as described above. The device may be any suitable electronics device or apparatus. For example, in some embodiments the device 2000 is a mobile device, user equipment, tablet computer, computer, audio playback apparatus, etc. The device may for example be configured to implement the encoder or the renderer or any functional block as described above.

[0186] In some embodiments the device 2000 comprises at least one processor or central processing unit 2007. The processor 2007 can be configured to execute various program codes such as the methods described herein.

[0187] In some embodiments the device 2000 comprises a memory 2011. In some embodiments the at least one processor 2007 is coupled to the memory 2011 . The memory 2011 can be any suitable storage means. In some embodiments the memory 2011 comprises a program code section for storing program codes implementable upon the processor 2007. Furthermore, in some embodiments the memory 2011 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 2007 whenever needed via the memory-processor coupling.

[0188] In some embodiments the device 2000 comprises a user interface 2005. The user interface 2005 can be coupled in some embodiments to the processor 2007. In some embodiments the processor 2007 can control the operation of the user interface 2005 and receive inputs from the user interface 2005. In some embodiments the user interface 2005 can enable a user to input commands to the device 2000, for example via a keypad. In some embodiments the user interface 2005 can enable the user to obtain information from the device 2000. For example, the user interface 2005 may comprise a display configured to display information from the device 2000 to the user. The user interface 2005 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 2000 and further displaying information to the user of the device 2000. In some embodiments the user interface 2005 may be the user interface for communicating.

[0189] In some embodiments the device 2000 comprises an input / output port 2009. The input / output port 2009 in some embodiments comprises a transceiver. The transceiver in such embodiments can be coupled to the processor 2007 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network. The transceiver or any suitable transceiver or transmitter and / or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.

[0190] The transceiver can communicate with further apparatus by any suitable known communications protocol. For example, in some embodiments the transceiver can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).

[0191] The input / output port 2009 may be configured to receive the signals.

[0192] In some embodiments the device 2000 may be employed as at least part of the renderer. The input / output port 2009 may be coupled to headphones (which may be a headtracked or a non-tracked headphones) or similar.

[0193] It should be understood that the apparatuses may comprise or be coupled to other units or modules used in or for transmission and / or reception. Although the apparatuses have been described as one entity, different modules and memory may be implemented in one or more physical or logical entities.

[0194] It is noted that whilst some embodiments have been described in relation to specific communication networks, similar principles can be applied in relation to other networks and communication systems. Therefore, although certain embodiments were described above by way of example with reference to certain example architectures for wireless networks, technologies and standards, embodiments may be applied to any other suitable forms of communication systems than those illustrated and described herein.

[0195] It is also noted herein that while the above describes example embodiments, there are several variations and modifications which may be made to the disclosed solution without departing from the scope of the present invention.

[0196] As used herein, “at least one of the following: ” and “at least one of ” and similar wording, where the list of two or more elements are joined by “and” or “or”, mean at least any one of the elements, or at least any two or more of the elements, or at least all the elements.

[0197] In general, the various embodiments may be implemented in hardware or special purpose circuitry, software, logic or any combination thereof. Some aspects of the disclosure may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the disclosure is not limited thereto. While variousaspects of the disclosure may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

[0198] As used in this application, the term “circuitry” may refer to one or more or all of the following:(a) hardware-only circuit implementations (such as implementations in only analog and / or digital circuitry) and(b) combinations of hardware circuits and software, such as (as applicable):(c) a combination of analog and / or digital hardware circuit(s) with software / firmware and(i) any portions of hardware processor(s) with software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions); and(ii) hardware circuit(s) and or processor(s), such as a microprocessor(s) or a portion of a microprocessor(s), that requires software (e.g., firmware) for operation, but the software may not be present when it is not needed for operation.

[0199] This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor (or multiple processors) or portion of a hardware circuit or processor and its (or their) accompanying software and / or firmware. The term circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in server, a cellular network device, or other computing or network device.

[0200] The embodiments of this disclosure may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Computer software or program, also called program product, including software routines, applets and / or macros, may be stored in any apparatus-readable data storage medium and they comprise program instructions to perform particular tasks. A computer program product may comprise one or more computer-executable components which, when the program is run, are configured to carry out embodiments. The one or more computer-executable components may be at least one software code or portions of it.

[0201] Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memorychips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as DVD and the data variants thereof, CD. The physical media is a non- transitory media.

[0202] The term “non-transitory,” as used herein, is a limitation of the medium itself (i.e., tangible, not a signal) as opposed to a limitation on data storage persistency (e.g., RAM vs. ROM).

[0203] The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may comprise one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), FPGA, gate level circuits and processors based on multi core processor architecture, as non-limiting examples.

[0204] Embodiments of the disclosure may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.

[0205] The scope of protection sought for various embodiments of the disclosure is set out by the independent claims. The embodiments and features, if any, described in this specification that do not fall under the scope of the independent claims are to be interpreted as examples useful for understanding various embodiments of the disclosure.

[0206] The foregoing description has provided by way of non-limiting examples a full and informative description of the exemplary embodiment of this disclosure. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this disclosure will still fall within the scope of this invention as defined in the appended claims. Indeed, there is a further embodiment comprising a combination of one or more embodiments with any of the other embodiments previously discussed.

Claims

CLAIMS1. An apparatus for applying reverberation to at least one audio signal, the apparatus comprising at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the system at least to perform: obtaining scene information, the scene information comprising: a spatial audio signal configuration specification; and audio scene information, the audio scene information defining at least one audio scene; determining at least one audio source within the at least one audio scene based on the audio scene information; identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification; obtaining at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source; and rendering at least one reverberant audio signal at least based on an application of reverberation on the obtained at least one signal component.

2. The apparatus as claimed in claim 1 , caused to perform obtaining the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source is caused to perform beamforming to the at least one spatial audio signal.

3. The apparatus as claimed in claim 1 , caused to perform obtaining of the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source is caused to perform receiving the at least one audio signal component from a further apparatus.

4. The apparatus as claimed in any of claims 1 to 3, caused to perform obtaining the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source is caused to perform: determining whether the at least one audio signal component is available at the apparatus; selecting the at least one audio signal component available at the apparatus; andreceiving or obtaining the at least one component from a further apparatus when the at least one audio signal component is not available at the apparatus.

5. The apparatus as claimed in any of claims 1 to 4, wherein the at least one audio signal component comprises at least one of: at least one multichannel audio signals; at least one audio channel; at least one audio source; and at least one audio beam.

6. The apparatus as claimed in any of claims 1 to 5, caused to perform identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification is further caused to perform: determining at least one dominant audio source from the at least one audio source; and identifying the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source.

7. The apparatus as claimed in claim 6, caused to perform determining at least one dominant audio source is caused to perform determining a direction of the at least one dominant audio source relative to a position for the at least one spatial audio signal.

8. The apparatus as claimed in claim 7, wherein the apparatus caused to perform identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification is further caused to perform identifying the at least one audio signal based on the direction of the at least one dominant audio source.

9. The apparatus as claimed in any of claims 1 to 8, caused to perform identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification is further caused to perform: determining an availability of at least two audio signal components; and identifying the at least one audio signal component based on the availability of the at least two audio signal components.

10. The apparatus as claimed in any of claims 1 to 9, caused to perform identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification is further caused to perform: determining a distance between the at least one spatial audio signal and a listener position; and identifying the at least one spatial audio signal based on the determined distance.

11. The apparatus as claimed in any of claims 1 to 10, caused to perform identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification is further caused to perform: obtaining metadata indicating at least one spatial audio signal; and identifying the at least one spatial audio signal based on the metadata.

12. The apparatus as claimed in any of claims 1 to 11 , caused to perform obtaining at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source is further caused to perform switching from a first audio signal component to a further audio signal component.

13. The apparatus as claimed in claim 12, caused to perform switching from the first audio signal component to the further audio signal component is caused to perform transitioning from the first audio signal component to the further audio signal component.

14. The apparatus as claimed in claim 13, caused to perform transitioning from the first audio signal component to the further audio signal component is caused to perform interpolating between the first audio signal component and the further audio signal component.

15. A method for an apparatus for applying reverberation to at least one audio signal, the method comprising at least: obtaining scene information, the scene information comprising: a spatial audio signal configuration specification; and audio scene information, the audio scene information defining at least one audio scene; determining at least one audio source within the at least one audio scene based on the audio scene information;identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification; obtaining at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source; and rendering at least one reverberant audio signal at least based on an application of reverberation on the obtained at least one signal component.

16. The method as claimed in claim 15, wherein obtaining the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source comprises beamforming to the at least one spatial audio signal.

17. The method as claimed in claim 15, wherein obtaining of the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source comprises receiving the at least one audio signal component from a further apparatus.

18. The method as claimed in any of claims 15 to 17, wherein obtaining the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source comprises: determining whether the at least one audio signal component is available at the apparatus; selecting the at least one audio signal component available at the apparatus; and receiving or obtaining the at least one component from a further apparatus when the at least one audio signal component is not available at the apparatus.

19. The method as claimed in any of claims 15 to 18, wherein the at least one audio signal component comprises at least one of: at least one multichannel audio signals; at least one audio channel; at least one audio source; and at least one audio beam.

20. The method as claimed in any of claims 15 to 19, wherein identifying at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification further comprises: determining at least one dominant audio source from the at least one audio source; andidentifying the at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source.21 . An apparatus for applying reverberation to at least one audio signal, the apparatus comprising means configured to: obtain scene information, the scene information comprising: a spatial audio signal configuration specification; and audio scene information, the audio scene information defining at least one audio scene; determine at least one audio source within the at least one audio scene based on the audio scene information; identify at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification; obtain at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source; and render at least one reverberant audio signal at least based on an application of reverberation on the obtained at least one signal component.

22. A computer readable medium comprising instructions which, when executed by an apparatus, cause the apparatus for applying reverberation to at least one audio signal at least to: obtain scene information, the scene information comprising: a spatial audio signal configuration specification; and audio scene information, the audio scene information defining at least one audio scene; determine at least one audio source within the at least one audio scene based on the audio scene information; identify at least one spatial audio signal based at least partly on a distance of the at least one audio source and at least one spatial audio signal based on spatial audio signal configuration specification; obtain at least one audio signal component related to the at least one spatial audio signal containing the determined at least one audio source; and render at least one reverberant audio signal at least based on an application of reverberation on the obtained at least one signal component.