Reverberation of higher order ambisonic audio signals
The method addresses the inefficiencies in existing reverberation techniques by using beamforming and a feedback delay network to efficiently render reverberation with adjustable sound source levels, improving computational efficiency and reducing artifacts.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- NOKIA TECHNOLOGIES OY
- Filing Date
- 2025-11-25
- Publication Date
- 2026-06-25
AI Technical Summary
Existing methods for rendering reverberation in higher order ambisonic audio signals are computationally intensive and lack flexibility in adjusting the reverberation level of individual sound sources, leading to high computational load and potential audible artifacts.
A method involving beamforming and gain control to generate a monophonic audio signal from multichannel audio signals, applying a reverberation gain based on sound source direction and distance, and using a feedback delay network (FDN) reverberator to produce mutually incoherent reverberant signals for efficient and flexible reverberation rendering.
Enables computationally efficient rendering of reverberation with adjustable sound source levels, reducing computational load and avoiding audible artifacts while maintaining spatial accuracy.
Smart Images

Figure EP2025084131_25062026_PF_FP_ABST
Abstract
Description
[0001] REVERBERATION OF HIGHER ORDER AMBISONIC AUDIO SIGNALS
[0002] Field
[0003] The present application relates to apparatus and methods for rendering of reverberation with higher order ambisonics audio signals, but not exclusively for rendering of reverberation with higher order ambisonics audio signals in augmented reality and / or virtual reality apparatus.
[0004] Background
[0005] Reverberation refers to the persistence of sound in a space after an actual sound source has stopped. Different spaces are characterized by different reverberation characteristics. For conveying spatial impression of an environment, reproducing reverberation perceptually accurately is important. Room acoustics are often modelled with an individually synthesized early reflection portion and a statistical model for the diffuse late reverberation. Fig. 1 depicts an example of a synthesized room impulse response where the direct sound 101 is followed by discrete early reflections 103 (or reflection echoes) which have a direction of arrival (DOA) and diffuse late reverberation 107 which can be synthesized without any specific direction of arrival. The delay dl(t) 102 in Fig. 1 can be seen to denote the direct sound arrival delay from the source to the listener. Furthermore the delay d2(t) 104 can denote the delay from the source to the listener for one of the early reflections (in this case the first arriving reflection). Additionally the delay d3(t) 106 can denote the delay from the source the onset of the diffuse late reverberation.
[0006] One method of reproducing reverberation is to utilize a set of D loudspeakers (or virtual loudspeakers reproduced binaurally using a set of head-related transfer functions (HRTFs)). The loudspeakers are positioned around the listener somewhat evenly. Mutually incoherent reverberant signals are reproduced from these loudspeakers, producing a perception of surrounding diffuse reverberation.
[0007] The reverberation produced by the different loudspeakers has to be mutually incoherent. In a simple case the reverberations can be produced using the different channels of the same reverberator, where the output channels are uncorrelated but otherwise share the same acoustic characteristics such as reverberation time and level (specifically, the diffuse-to-direct ratio or reverberant-to-direct ratio or diffuse-to-total ratio or diffuse-to-source ratio or any other suitable parameter for representing reverberation energy or level). Such uncorrelated outputs sharing the same acoustic characteristics can be obtained, for example, from the output taps of a feedback delay network (FDN) reverberator with suitable tuning of the delay line lengths and mixing matrix, or from a reverberator based on using decaying uncorrelated noise sequences by using a different uncorrelated noise sequence in each channel. In this case, the different reverberant signals effectively have the same features, and the reverberation is typically perceived to be similar in all directions. Summary
[0008] There is provided according to a first aspect an apparatus for applying reverberation to at least one audio signal, the apparatus comprising at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the system at least to perform: receiving a multichannel audio signal; determining at least one audio source based on at least one of: the multichannel audio signal; and spatial metadata associated with the multichannel audio signal; obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source; determining a reverberation gain based at least on a direction of the at least one audio source and a direction associated with the at least one further audio signal; applying the reverberation gain to the at least one further audio signal to control the reverberation of the at least one further audio signal; and rendering a reverberated signal based on the gain applied at least one further audio signal.
[0009] The multichannel audio signal may be a component of a spatial audio signal further comprising spatial metadata associated with the multichannel audio signal.
[0010] The apparatus may be further caused to perform determining the at least one audio source based on the spatial metadata associated with the multichannel audio signal.
[0011] The apparatus caused to perform obtaining at least one further audio signal may be caused to perform beamforming the multichannel audio signal to generate the at least one further audio signal.
[0012] The apparatus caused to perform determining the reverberation gain based at least on a direction of the at least one audio source and a direction associated with the at least one further audio signal may be caused to perform: determining a beamforming gain for the direction of the at least one audio source; and performing determining the reverberation gain based on the beamforming gain for the direction of the at least one audio source.
[0013] The at least one further audio signal may be a monophonic signal.
[0014] The at least one multichannel audio signal may be a multichannel loudspeaker signal.
[0015] The apparatus caused to perform obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source may be caused to perform selecting a loudspeaker signal from the multichannel loudspeaker signal which has a direction closest to the direction of the sound source.
[0016] The at least one audio signal may be a multichannel Ambisonics signal.
[0017] The apparatus caused to perform obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source may be caused to perform steering a beam towards the audio source by applying one or more gains to the multichannel Ambisonics signal. The apparatus may be caused to perform determining the reverberation gain further based on at least one of: a desired reverberation gain for the audio source; and a distance of the sound source compared to a reference distance.
[0018] The apparatus caused to perform obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source may be caused to perform at least one of: obtaining a beam direction for the audio source from the spatial metadata indicating a position of the at least one audio source; determining a beam direction for the audio source based on a determined audio level or audio energy associated with at least two spatial directions; and further caused to perform beamforming the at least one multichannel audio signal based on the beam direction.
[0019] The apparatus caused to perform obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source may be further caused to perform generating a downmix signal from two or more beamformed at least one audio signal audio signals, wherein the at least one further audio signal is the downmix signal.
[0020] According to a second aspect there is provided a method for applying reverberation to at least one audio signal, the method comprising: receiving a multichannel audio signal; determining at least one audio source based on at least one of: the multichannel audio signal; and spatial metadata associated with the multichannel audio signal; obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source; determining a reverberation gain based at least on a direction of the at least one audio source and a direction associated with the at least one further audio signal; applying the reverberation gain to the at least one further audio signal to control the reverberation of the at least one further audio signal; and rendering a reverberated signal based on the gain applied at least one further audio signal.
[0021] The multichannel audio signal may be a component of a spatial audio signal further comprising spatial metadata associated with the multichannel audio signal.
[0022] The method may further comprise determining the at least one audio source based on the spatial metadata associated with the multichannel audio signal.
[0023] Obtaining at least one further audio signal may comprise beamforming the multichannel audio signal to generate the at least one further audio signal.
[0024] Determining the reverberation gain based at least on a direction of the at least one audio source and a direction associated with the at least one further audio signal may comprise: determining a beamforming gain for the direction of the at least one audio source; and performing determining the reverberation gain based on the beamforming gain for the direction of the at least one audio source.
[0025] The at least one further audio signal may be a monophonic signal.
[0026] The at least one multichannel audio signal may be a multichannel loudspeaker signal. Obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source may comprise selecting a loudspeaker signal from the multichannel loudspeaker signal which has a direction closest to the direction of the sound source.
[0027] The at least one audio signal may be a multichannel Ambisonics signal.
[0028] Obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source may comprise steering a beam towards the audio source by applying one or more gains to the multichannel Ambisonics signal.
[0029] The method may comprise determining the reverberation gain further based on at least one of: a desired reverberation gain for the audio source; and a distance of the sound source compared to a reference distance.
[0030] Obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source may comprise at least one of: obtaining a beam direction for the audio source from the spatial metadata indicating a position of the at least one audio source; determining a beam direction for the audio source based on a determined audio level or audio energy associated with at least two spatial directions; and further caused to perform beamforming the at least one multichannel audio signal based on the beam direction.
[0031] Obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source may comprise generating a downmix signal from two or more beamformed at least one audio signal audio signals, wherein the at least one further audio signal is the downmix signal.
[0032] According to a third aspect there is provided an apparatus for applying reverberation to at least one audio signal, the apparatus comprising means configured to: receive a multichannel audio signal; determining at least one audio source based on at least one of: the multichannel audio signal; and spatial metadata associated with the multichannel audio signal; obtain at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source; determine a reverberation gain based at least on a direction of the at least one audio source and a direction associated with the at least one further audio signal; apply the reverberation gain to the at least one further audio signal to control the reverberation of the at least one further audio signal; and render a reverberated signal based on the gain applied at least one further audio signal.
[0033] The multichannel audio signal may be a component of a spatial audio signal further comprising spatial metadata associated with the multichannel audio signal.
[0034] The means may be further configured to determine the at least one audio source based on the spatial metadata associated with the multichannel audio signal. The means configured to obtain at least one further audio signal may be configured to perform beamforming the multichannel audio signal to generate the at least one further audio signal.
[0035] The means configured to determine the reverberation gain based at least on a direction of the at least one audio source and a direction associated with the at least one further audio signal may be configured to: determine a beamforming gain for the direction of the at least one audio source; and determine the reverberation gain based on the beamforming gain for the direction of the at least one audio source.
[0036] The at least one further audio signal may be a monophonic signal.
[0037] The at least one multichannel audio signal may be a multichannel loudspeaker signal.
[0038] The means configured to obtain at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source may be configured to select a loudspeaker signal from the multichannel loudspeaker signal which has a direction closest to the direction of the sound source.
[0039] The at least one audio signal may be a multichannel Ambisonics signal.
[0040] The means configured to obtain at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source may be configured to steer a beam towards the audio source by applying one or more gains to the multichannel Ambisonics signal.
[0041] The means configured to determine the reverberation gain further based on at least one of: a desired reverberation gain for the audio source; and a distance of the sound source compared to a reference distance.
[0042] The means configured to obtain at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source may be configured to, at least one of: obtain a beam direction for the audio source from the spatial metadata indicating a position of the at least one audio source; determine a beam direction for the audio source based on a determined audio level or audio energy associated with at least two spatial directions; and further configured to beamform the at least one multichannel audio signal based on the beam direction.
[0043] The means configured to obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source may further comprise generating a downmix signal from two or more beamformed at least one audio signal audio signals, wherein the at least one further audio signal is the downmix signal.
[0044] According to a fourth aspect there is provided an apparatus for applying reverberation to at least one audio signal, the apparatus comprising: receiving circuitry configured to receive a multichannel audio signal; determining at least one audio source based on at least one of: the multichannel audio signal; and spatial metadata associated with the multichannel audio signal; obtaining circuitry configured to obtain at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source; determining circuitry configured to determine a reverberation gain based at least on a direction of the at least one audio source and a direction associated with the at least one further audio signal; applying circuitry configured to apply the reverberation gain to the at least one further audio signal to control the reverberation of the at least one further audio signal; and rendering circuitry configured to render a reverberated signal based on the gain applied at least one further audio signal.
[0045] According to a fifth aspect there is provided a computer program comprising instructions [or a computer readable medium comprising instructions] for causing an apparatus, for applying reverberation to at least one audio signal, the apparatus caused to perform at least the following: receiving a multichannel audio signal; determining at least one audio source based on at least one of: the multichannel audio signal; and spatial metadata associated with the multichannel audio signal; obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source; determining a reverberation gain based at least on a direction of the at least one audio source and a direction associated with the at least one further audio signal; applying the reverberation gain to the at least one further audio signal to control the reverberation of the at least one further audio signal; and rendering a reverberated signal based on the gain applied at least one further audio signal.
[0046] According to a sixth aspect there is provided a non-transitory computer readable medium comprising program instructions for causing an apparatus, for applying reverberation to at least one audio signal, to perform at least the following: receiving a multichannel audio signal; determining at least one audio source based on at least one of: the multichannel audio signal; and spatial metadata associated with the multichannel audio signal; obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source; determining a reverberation gain based at least on a direction of the at least one audio source and a direction associated with the at least one further audio signal; applying the reverberation gain to the at least one further audio signal to control the reverberation of the at least one further audio signal; and rendering a reverberated signal based on the gain applied at least one further audio signal.
[0047] According to a seventh aspect there is provided an apparatus, for applying reverberation to at least one audio signal, comprising: means for receiving a multichannel audio signal; determining at least one audio source based on at least one of: the multichannel audio signal; and spatial metadata associated with the multichannel audio signal; means for obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source; means for determining a reverberation gain based at least on a direction of the at least one audio source and a direction associated with the at least one further audio signal; means for applying the reverberation gain to the at least one further audio signal to control the reverberation of the at least one further audio signal; and means for rendering a reverberated signal based on the gain applied at least one further audio signal. According to an eighth aspect there is provided a computer readable medium comprising instructions for causing an apparatus, for applying reverberation to at least one audio signal, to perform at least the following: receiving a multichannel audio signal; determining at least one audio source based on at least one of: the multichannel audio signal; and spatial metadata associated with the multichannel audio signal; obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source; determining a reverberation gain based at least on a direction of the at least one audio source and a direction associated with the at least one further audio signal; applying the reverberation gain to the at least one further audio signal to control the reverberation of the at least one further audio signal; and rendering a reverberated signal based on the gain applied at least one further audio signal.
[0048] An apparatus comprising means for performing the actions of the method as described above. An apparatus configured to perform the actions of the method as described above.
[0049] A computer program comprising program instructions for causing a computer to perform the method as described above.
[0050] A computer program product stored on a medium may cause an apparatus to perform the method as described herein.
[0051] An electronic device may comprise apparatus as described herein.
[0052] A chipset may comprise apparatus as described herein.
[0053] Embodiments of the present application aim to address problems associated with the state of the art.
[0054] Summary of the Figures
[0055] For a better understanding of the present application, reference will now be made by way of example to the accompanying drawings in which:
[0056] Fig.1 shows a model of room acoustics with regard to the room impulse response;
[0057] Fig.2 shows schematically a reverberator which includes an example feedback delay network (FDN) according to some embodiments;
[0058] Fig.3 shows schematically an example apparatus within which the reverberator which includes an example feedback delay network (FDN) as shown in Fig.2 according to some embodiments;
[0059] Fig.4 shows a flow diagram of the operation of the example apparatus as shown in Fig.3 with respect to reverberant audio signal rendering;
[0060] Fig.5 shows schematically an example reverberator parameter determiner as shown in Fig.3 in further detail according to some embodiments;
[0061] Fig.6 shows a flow diagram of the operation of the example reverberator parameter determiner as shown in Fig.5 according to some embodiments; Fig.7 shows schematically an example beam signal obtainer as shown in Fig.3 in further detail according to some embodiments;
[0062] Fig.8 shows a flow diagram of the operation of the example beam signal obtainer as shown in Fig.7 according to some embodiments;
[0063] Fig.9 shows schematically an example binaural renderer as shown in Fig.3 in further detail according to some embodiments;
[0064] Fig.10 shows schematically an example virtual audio scene rendering system according to some embodiments;
[0065] Fig.11 shows schematically an example reflection processor according to some embodiments; Fig.12 shows an example audio scene and reflective surfaces to demonstrate example source and image source positions;
[0066] Fig.13 shows an example source distribution and echoes associated with the sources;
[0067] Fig.14 shows an example system within which some embodiments can be implemented; and Fig.15 shows an example device suitable for implementing the apparatus shown in previous figures.
[0068] Embodiments of the Application
[0069] The following describes in further detail suitable apparatus and possible mechanisms for rendering late reverberation for Ambisonic or other spatial audio signals.
[0070] In a virtual acoustics rendering system, reverberation is typically rendered as a combination of a certain number of distinct early reflections (or reflection echoes) and a stochastic model for the late reverberation. The early reflection synthesis is thus typically position-dependent in that it varies with source and listener positions, while the late reverberation synthesis is not. Together these two can create a plausible reverberation rendering for a physical or virtual space. The reverberation rendering is combined (summed) with direct sound rendering, which involves distance gain attenuation, air absorption filtering, and directional reproduction (binaural or loudspeaker) of the direct sound component that directly propagates to the ears of the listener without reflecting or reverberating in the space.
[0071] One method of reproducing reverberation is to utilize a set of N loudspeakers (or virtual loudspeakers reproduced binaurally using a set of head-related transfer functions (HRTF)). The loudspeakers are positioned around the listener somewhat evenly. Mutually incoherent reverberant signals are reproduced from these loudspeakers, producing a perception of surrounding diffuse reverberation.
[0072] The positioning and the number of the loudspeakers suitable for producing the diffuse perception has been studied, e.g., in K. Hiyama, S. Komiyama, and K. Hamasaki, The Minimum Number of Loudspeakers and Its Arrangement for Reproducing the Spatial Impression of Diffuse Sound Field, AES 113thConvention, 2002 and C. Kirch, J Poppitz, T. Wendt, S. van der Par, and S. Ewert, Spatial Resolution of Late Reverberation in Virtual Acoustic Environments. Submitted to Trends in Hearing (currently available in the Carl von Ossietzky Universität Oldenburg website), 2021. It has been found that somewhere around 6 – 12 loudspeakers are required, depending on the positioning of the loudspeakers.
[0073] The reverberation produced by the different loudspeakers is designed to be mutually incoherent, which can be achieved through optimization by the algorithm designer. In a simple case, e.g., the reverberations can be produced using the different channels of the same reverberator, where the output channels are uncorrelated but otherwise share the same acoustic characteristics such as RT60 time and level (specifically, the diffuse-to-direct ratio or reverberant-to-direct ratio or source-to-diffuse energy ratio). Such uncorrelated outputs sharing the same acoustic characteristics can be obtained, for example, from the output taps of a feedback delay network (FDN) reverberator with suitable tuning of the delay line lengths, or from a reverberator based on using decaying uncorrelated noise sequences by using a different uncorrelated noise sequence in each channel. In this case, the different reverberant signals basically have the same features, and the reverberation is typically perceived to be similar from all directions.
[0074] It is known a common approach when reverberating multi-channel signals to reverberate each channel of a multichannel audio representation, such as first order ambisonics (FOA), higher order ambisonics (HOA) or surround (7.1.4, 5.1 or the like) by reverberating each channel individually.
[0075] However, this technique generates a high computational load, with associated power consumption / battery drain / processor cost as the number of reverberators that need to be run in parallel is proportional to the number of channels. For example with HOA the number of channels can be high (for example 16 for 3rdorder HOA, 25 for 4thorder HOA).
[0076] Another known approach and one which is currently used in the MPEG-I audio Draft international standard (ISO / IEC 23090-4 DIS) where a downmix processing is applied to the multichannel audio signals and then the downmixed audio signal is reverberated with a reverberator. This enables efficient reverberation rendering but can lead to audible artefacts as the downmixing may create comb filtering effects if correlated channel signals are summed and does not enable adjustment of the levels of individual sound sources.
[0077] Yet another known approach for reverberation of such multichannel audio signals is to select only the omnidirectional component from an ambisonics signal to be reverberated. However this approach produces the limitation that it does not allow adjustment of the reverberation level of individual sources.
[0078] Therefore, there is currently research, the results of which are described in the following embodiments for apparatus and methods for computationally efficient rendering of reverberation from spatial audio signals (and multichannel or ambisonics audio signals) which enable the flexibility of sound source level adjustment.
[0079] The concept as discussed in the following embodiments relate to apparatus and methods for rendering of reverberation for sources in spatial audio signals such that the reverberation level of each source can be controlled while still enabling computationally efficient rendering of the reverberation based on a downmix signal.
[0080] This can be achieved in some embodiments by first receiving a first or spatial audio signal, determining at least one sound source within the first or spatial audio signal, obtaining another or second audio signal containing the same at least one sound source, applying a reverberation related gain to the another or second audio signal, and rendering a reverberated signal using the gain applied another or second audio signal.
[0081] In some embodiments the another or second audio signal is a monophonic audio signal which can be obtained as a beam signal.
[0082] Furthermore in some embodiments the first or spatial audio signal is a multichannel loudspeaker signal and the beam signal is obtained by selecting a loudspeaker which is towards the direction of the sound source.
[0083] In another embodiment the first or spatial audio signal is a multichannel Ambisonics signal and the second signal is a monophonic signal obtained by steering a beam towards the sound source by applying one or more gains to the multichannel Ambisonics signals.
[0084] In another embodiment, the spatial audio signal is accompanied by spatial metadata or spatial metadata is analyzed or otherwise obtained from / of the spatial audio signal. Such spatial metadata can comprise parameters on time-frequency slots of the spatial audio signal. Such parameters can in an embodiment comprise directions of arrival and diffuseness or direct to diffuse ratio parameters.
[0085] In one example embodiment the reverberation related gain depends at least the direction of the beam signal compared to the direction of the sound source, a desired reverberation gain for the sound source, and a distance for the sound source compared to a reference distance.
[0086] In some embodiments the beam direction for a sound source is obtained either from received metadata indicating source position or determination of audio level (or loudness) towards different spatial directions within the spatial or first audio signal.
[0087] Some embodiments further involve creating a downmix signal from two or more beam signals and using the downmix signal as an input to a reverberator for generating a suitable reverberation.
[0088] As shown in Fig.1 in a virtual acoustics rendering system, reverberation is typically rendered as a combination of at least two echo-generating components. A first component is a so-called early reflection echo synthesis component 103 which generates a certain number of perceptually distinct echoes, and a late reverberation synthesis component 107 which generates a stream of echoes which are relatively indistinct but adhere to the overall decay properties of a stochastic model for the late reverberation.
[0089] Reflection echo synthesis is typically spatially dynamic in that the levels and directions of arrival (DoAs) of the reflections depend on the source and listener positions and orientations. For example a reflection processor can be configured to produce a discrete number of echoes that are precise and independently varied in their intensity and coloration, as determined by attenuation with distance, air absorption filtering, reflection surface absorption filtering, and which are specular relative to features and geometry of the modelled room which in turn determines their encoded DoA. These echoes correspond to early reflections depicted in the impulse response (ref. Fig. 1, item 103). The reflection processor (which is shown later in the rendering system of Fig.10 with reference 1051) is typically external to and running in parallel with the reverberator. Herein, the echoes produced by the reflection processor can also be referred to as reflection echoes.
[0090] Late reverberation, in contrast to early reflections, is not considered to be spatially dynamic in that the echoes produced in late reverberation synthesis do not vary with source-listener orientation.
[0091] A feedback network FN 200 as shown in Fig.2 is an example of a reverberator configured to produce a decaying stream of many echoes which increase in number (density) while decreasing in intensity (loudness) over time, as characterized by the decay properties of stochastic late reverberation (as shown in Fig.1 by the reference 107). The feedback (FB) 200 reverberator shows an input audio signal which is configured to pass through the network, splitting into numerous paths which form “echoes” which are separated in time by independent delay lines, all of which subsequently recirculate through the network, being further divided among the delay lines, subsequently splitting into more echoes with each recirculation through the network, and so on. These echoes can be designed to have only a loose correspondence to the geometry of the virtual room so do not represent geometrically precise (specular) reflections, nor do they convey the attenuation characteristics of specific reflection surfaces. There can be several reverberators in a virtual acoustics renderer, each of which models the characteristics of a room or Acoustic Environment (AE). The rooms (AEs) can have connections to each other, which means that sound sources in any room can contribute to any reverberator. Also, connected or second order reverberation can be implemented by feeding the output of a reverberator into another reverberator.
[0092] Fig.2 shows an example reverberator 200 which could be employed in some embodiments. The reverberator 200 is configured to receive a beam or input audio signal 201 (which can be designated sin(t), where t is the sample (time) index). Furthermore, the reverberator 200 is configured based on (received) reverberator parameters. In some embodiments the reverberator 200 is further configured to receive directional configuration (and the room dimensions) that may be used to configure the reverberation.
[0093] In this example embodiment, the reverberator 200 has D (for example D = 15) output channels indexed with d = 1, 2,
[0094]
[0095] The resulting reverberant audio signals 210 srev(t, d) are mutually incoherent, and they have acoustical characteristics according to the reverberator parameters.
[0096] The D uncorrelated outputs are subsequently rendered from different spatial directions defined by the directional configuration. In some embodiments the reverberator 200 comprises a pre-delay line z-mpre205, configured to receive and delay the input audio signal. The reverberator 200 also comprises a reverberation ratio control filter GEQratio 203 which is configured to receive the pre-delay line output 262. The reverberator 200 further comprises a number D of feedback delay lines z-m251 and corresponding feedback delay line attenuation filters GEQd 253. The signals which are output from the feedback delay line attenuation filters GEQd 253 are sent to inputs of a feedback matrix A 257. The outputs of the feedback matrix A 257 are sent to D signal combiners 254 (adders) to sum the outputs of the feedback matrix A 257 with the output of GEQratio 203 to be used as inputs to each of the feedback delay lines z-m251.
[0097] The outputs of delay line attenuation filters GEQd 253 are routed to D signal multipliers 261 which in turn output the reverberant audio signals 210.
[0098] In some embodiments the reverberator 200 is configured to receive reverberator parameters which comprise a delay length mpre, in samples, for pre-delay line z-mpre205, coefficients of a reverberation ratio control filter GEQratio 203, delay lengths mdfor each of D feedback delay lines z-m251, coefficients for each of D feedback delay line attenuation filters GEQd 253, coefficients for the feedback matrix A 257. The reverberator parameters also comprise output channel gains gdwhich are used to configure D signal multipliers 261.
[0099] In some embodiments the frequency dependent gain elements or attenuation filters GEQd 253 are implemented as graphic equalizer (EQ) filters using M biquad IIR band filters. In the case of octave-band filtering, M = 10. Thus, the reverberator parameters corresponding to each graphic EQ filter comprise the feedforward and feedback coefficients for 10 biquad IIR filters, the gains for biquad band filters, and the overall gain.
[0100] The feedback delay lines z-m251 can also be referred as loop delay lines or recirculating delay lines and the feedback delay line attenuation filters GEQd 253 can be referred to as loop filters or recirculating filters. In some embodiments the coefficients of feedback matrix A 257 are hardcoded in software code rather than provided as parameters.
[0101] The reverberator thus comprises multiple recirculating delay lines associated with the feedback network (FN) 250. The feedback matrix A 257 is used to control the recirculation gain and routing within the network. The feedback delay line attenuation filters GEQd 253 can be implemented in some embodiments as graphic EQ filters implemented as cascades of second-order section IIR filters and can facilitate controlling the energy decay rate at different frequencies. The feedback delay line attenuation filters GEQd 253 furthermore are designed such that they attenuate the signal by the desired amount with each pass through the FN such that the desired reverberation time (RT60) is achieved.
[0102] The number of delay lines D can be adjusted depending on quality requirements and the desired tradeoff between reverberation quality (e.g. modal density, temporal and spatial diffuseness, diffuse onset time) and computational complexity. In an embodiment, an efficient implementation with D = 15 delay lines is used. This makes it possible to define the coefficients of the feedback matrix A 257 as proposed by Rocchesso in Maximally Diffusive Yet Efficient Feedback Delay Networks for Artificial Reverberation, IEEE Signal Processing Letters, Vol. 4. No. 9, Sep 1997, in terms of a Galois sequence facilitating efficient implementation.
[0103] The feedback network (FN) 250 of the reverberator is a feedback delay network (FDN). Reverberator 200 can thus produce 15 nearly uncorrelated outputs which are subsequently rendered from different spatial directions defined by the directional configuration. The output signals are reproduced using loudspeakers (or alternatively virtual loudspeakers that are convolved with HRTFs or yet alternatively encoded to ambisonics which is then decoded to a binaural or loudspeaker format) that are positioned in the corresponding spatial directions, and the levels of which are controlled with channel gain coefficients. The resulting reverberant audio signals have acoustical characteristics according to the reverberator parameters, namely a desired frequency-dependent rate of decay and level.
[0104] In some embodiments the reverberator 200 parameters are related to the reverberation characteristics of the acoustic environment or room which the reverberator relates to. In a scene with several rooms there can be several reverberator 200 instances implemented.
[0105] Fig.3 shows an example system or apparatus representing a reverberator processing system 300 suitable for rendering late-stage reverberation according to some embodiments and which employs the reverberator 200 as shown in Fig.2. The system comprises inputs such as spatial audio signal 301, reverberation configuration specification 302, and directional configuration specification 312. The reverberator processing system 300 further comprises a beam signal obtainer 316 which is configured to receive the spatial audio signal 301 and source configuration specification or information (for example the sound source direction, source reverberation gain, and reference distance) 318 and the beam configuration specification or information 320 and produce a beam signal 201 to be input to the Reverberator 200.
[0106] Furthermore in some embodiments the reverberator processing system 300 further comprises a binaural renderer 309 configured to render reverberant binaural signals 314 with late reverberation that are perceived according to the reverberant characteristics specified in the reverberation configuration specification 302 and directional characteristics specified in the directional configuration specification 312 and where the reverberation is applied to a source in the spatial audio signal 301 obtained using the source configuration specification 318. The reverberation configuration specification 302 and directional configuration specification 312 and the source configuration specification 318 can, for example, be obtained from a bitstream or from a listening space description format (LSDF) input to the renderer.
[0107] In some embodiments, the reverberation configuration specification 302 comprises suitable parameters for configuring the reverberator 200. Suitable reverberation configuration specification 302 includes, for example, the reverberation times RT60(k) in frequency bands (where k is the frequency band index), reverberant-to-direct ratio RDR^k), pre-delay time tpre, and / or a virtual space geometry specification. Alternative to the RDR, the diffuse-to-source energy ratio (DSR) can be used.
[0108] In some embodiments, the directional configuration specification 312 can indicate encoding directions used to render the reverberation by a suitable rendering scheme that creates a perception of enveloping diffuse reverberation, such as ambisonics or amplitude panning rendering, or simply rendered directly to a surrounding (real or virtual) loudspeaker setup. As an example, the directional configuration may specify a spherical design such as a t-design, Lebedev grid, or other suitable (nearly) uniform spherical layout with D points representing encoding directions (and thus the number of reverberator output channels).
[0109] In some embodiments, the source configuration specification 318 can indicate the spatial direction (azimuth and elevation) of a sound source in sound source direction, a desired reverberation gain for the source as source reverberation gain, and a reference distance for the sound source. The spatial direction can indicate the direction of the sound source as captured by the spatial audio signal 301. The source reverberation gain can indicate a desired reverberation gain for the sound source. A reference distance can indicate a reference distance for the sound source also impacting the reverberant signal level.
[0110] In some embodiments, the beam configuration specification 320 can indicate beamforming configuration data such as beam directions and widths and attenuations associated with the spatial audio signal 301 or which can be applied to or otherwise obtained from the spatial audio signal 301.
[0111] In some embodiments, the reverberator processing system 300 comprises a reverberator parameter determiner 303 configured to obtain the reverberation configuration specification 302 and directional configuration specification 312. The Reverberator parameter determiner 303 can in these embodiments be configured to convert these specifications into suitable reverberator parameters 304 for the reverberator 200.
[0112] Fig.4 shows an example flow diagram of the operations of the example reverberator processing system shown in Fig.3 according to some embodiments.
[0113] First, the spatial audio signal 201, reverberation configuration specification 302, and directional configuration specification 312 are obtained as shown in Fig.4 by 401.
[0114] Then, the reverberator parameters 304 are determined from the reverberation configuration specification 302 and directional configuration specification 312 inputs as shown in Fig.4 by 403.
[0115] Then, the reverberator 200 is configured using reverberator parameters 304 as shown in Fig.4 by 405.
[0116] Then, the binaural renderer 309 is configured using the directional configuration specification 312 as shown in Fig.4 by 407.
[0117] Then, the beam signal obtainer 316 is configured using the source configuration specification 318 and the beam configuration specification 320 as shown in Fig.4 by 409. Then, a beam signal 201 is generated by processing the spatial audio signal 301 with the beam signal obtainer as shown in Fig.4 by 411.
[0118] Then, reverberant audio signals 210 are generated by processing the audio signal with the configured reverberator 200 as shown in Fig.4 by 413.
[0119] Then, reverberant binaural signals 314 are rendered by processing the reverberated audio signals 210 by the configured binaural renderer 309 as shown in Fig.4 by 415.
[0120] Then, reverberant binaural signals 314 are output as shown in Fig.4 by 417.
[0121] Fig.5 shows schematically an example reverberator parameter determiner 303 which in some embodiments is configured to receive or otherwise obtain the directional configuration specification 312 and the reverberation configuration specification 302 and based on these generate suitable reverberator parameters 304, such as: number of reverberator output channels; feedback delay line lengths via feedback delay line lengths determiner 501; feedback attenuation filter coefficients via feedback attenuation filter parameters determiner 503; reverberation ratio control filter coefficients via reverberation ratio control filter parameter determiner 505; pre-delay line length via pre-delay line length determiner 507.
[0122] For example, in some embodiments, the reverberator parameter determiner 303 comprises a feedback delay line lengths determiner 501 which is configured to determine the feedback delay lengths mdfor each of the D channels of the reverberator.
[0123] The feedback delay lengths mdcan be based on a virtual space geometry specification. For example, a bounding box that encloses or is aligned with the walls of the physical or virtual room can be defined with dimensions xDim,yDim, zDim. If the room is not shaped as a shoebox (or cuboid) then a shoebox can be fit inside or around the room and the dimensions of the fitted shoebox can be utilized for the delay line lengths. Alternatively, the dimensions can be obtained as three longest orthogonal dimensions in the non-shoebox shaped room, or by a mesh if the bounding box is provided as a mesh, or by another suitable method. When the method is executed in a renderer then the enclosure vertices are obtained from the bitstream (for VR acoustic environments) or the LSDF (for an AR acoustic environment) and the dimensions can be calculated.
[0124] The feedback delay lengths mdcan, in some embodiments, be set proportionally to standing wave resonance frequencies in the virtual room or physical room (the acoustic environment).
[0125] The dimensions can further be converted to modified dimensions of a virtual room or enclosure by predetermined ratios which are suited for the generation of preferable room modes.
[0126] The delay line lengths mdcan further be made to be mutually prime integers. This choice minimizes coherent repetition in the impulse response of the FN. The sieve of the Sundaram algorithm can be used to find the prime numbers up to the maximum delay line length. Each delay line length can then be mapped to the closest prime number in the obtained set of prime numbers. In some embodiments the reverberator parameter determiner 303 further comprises a feedback attenuation filter parameter determiner 503 which is configured to determine attenuation filter coefficients for feedback attenuation filters GEQd253. The filter coefficients can be configured so that the rate of attenuation produced by the recirculation through the dual-stage delay lines results in the desired reverberation time RT60(fc). This determination can be implemented in a frequency-dependent manner to ensure the appropriate rate of decay of signal energy at specified frequencies. For a frequency bin k, the desired attenuation per signal sample is γsamp(k) = -60 / (fs* RT60(k)) dB, where fsis the sampling rate. The attenuation in decibels for a delay line pair of aggregate length md, where md= md a+ md b, is then Y
[0127]
[0128] γGEQd(k) = md* γsamp(k),
[0129] which serves as a target command gain in the design procedure of cascade graphic equalizer filters as described in V. Valimaki and J. Liski, “Accurate cascade graphic equalizer,” IEEE Signal Process. Lett., vol. 24, no. 2, pp. 176-180, Feb. 2017, to produce the attenuation filter coefficients for GEQd. The cited design procedure operates in octave bands, although methods for similar graphic EQ structures can support third octave bands, increasing the number of biquad filters to 31 and providing a better match for detailed target responses, such as detailed in J. Ramb, J. Liski, and V. Valimaki, “Third-Octave and Bark Graphic-Equalizer Design with Symmetric Band Filters,” Applied Sciences, vol. 10, no. 4, p. 1222, Feb. 2020.
[0130] In some embodiments the reverberator parameter determiner 303 further comprises a pre-delay line length determiner 505 which is configured to determine a predelay length mprein samples based on tprethat denotes an onset timing of the diffuse state of the reverberator, in which case mprecan be set such that the diffuse onset timing of the FN matches a desired tpre. The diffuse onset timing of the FN can be estimated by any mixing time or diffuseness estimator, or predicted from analytic methods using the virtual space geometry specification provided in the reverberation configuration specification 302.
[0131] In some embodiments, the reverberator parameter determiner 303 further comprises a reverberation ratio control filter parameter determiner 507 configured such that, when the filter GEQratio203 is applied to the input signal 201, the resultant reverberation will have the desired energy ratio defined by the RDR(k). The input to the design procedure can in some embodiments be the vector of reverberant-to-direct (RDR) energy ratio values RDR(k) obtained by the reverberation configuration specification 302. The generated coefficients of GEQratiois designed to match the reverberator spectrum energy to the target spectrum energy. To do this, an estimate of the RDR of the reverberator output is determined by the following procedure.
[0132] Firstly, rendering a unit impulse through the reverberator that has been configured with the parameters produced by the delay line lengths determiner 501, feedback attenuation filter parameters determiner 503, and tpreset to O. The input to the reverberator can be a buffer of zeros of a sufficient length to capture the reverberation tail, such as the maximum RT60(k) among all frequency bands, with a unit impulse written to the head of the buffer.
[0133] Once rendered, the energy of the reverberator output is measured, along with that of the unit impulse, and the ratio of these energies is calculated. The procedure for measuring signal energy is detailed in the following.
[0134] The monophonic output signal srev(t), which is a function of time t, can be obtained by summation of the outputs of the feedback network 258. A FFT of length NFFTis calculated over srev(t) and its magnitude spectrum can be obtained as
[0135] H(kb) = abs(FFT(srev(t))).
[0136] Here, kbare the FFT bin indices. The positive half spectral energy density is
[0137] Srev(kb) = (1 / NFFT) * H(kb)2,
[0138]
[0139] IVFFT
[0140] where the energy from the negative frequency indices kbis added into the corresponding positive frequency indices kb. The energy of a unit impulse can be calculated or obtained analytically and is denoted
[0141]
[0142] as Sunit(kb).
[0143] In some embodiments the energy of each band at index k are calculated as the positive half spectral energy density of the reverberator Srev(kb) and the positive half spectral energy density of the unit impulse Sunit(kb). Band energies can be calculated as
[0144] V~ 'bfiigh
[0145] S(k) = ) S(kb\
[0146]
[0147] ^—‘b=biow
[0148] where blowand bhighare the lowest and highest bin indices belonging to band k, respectively. The band bin indices can be obtained by comparing the frequencies of the bins to the lower and upper frequencies of each band.
[0149] The reproduced RDR of the reverberator can then be obtained as
[0150] RDRrev(k) = Srev(k) / Sunit(k)
[0151] The target linear magnitude response for GEQraZiocan be obtained as
[0152] gGEQratio= sqrt(RDR(k)) / sqrt(RDRrev(k))
[0153] where RDR(k) is the target linear RDR value from the reverberation configuration specification 302. The target response control gain can then be determined by
[0154] γGEQratio(k) = 20 * log10(gGEQratio).
[0155] The RDR target response control gain can also be obtained directly in the logarithmic domain as γGEQratio(k) = 10 * log10(RDR(k)) - 10 * log10(RDRrev(k)).
[0156]
[0157] γGEQratio(k) is then provided to the graphic equalizer design routine, previously cited in the description of feedback attenuation filter parameters determiner 503, to produce filter coefficients for GEQratio- Fig.6 shows an example flow diagram of the operations of the reverberator parameter determiner 303 of Fig.3 as further shown in Fig.5.
[0158] First, is shown in Fig.6 by 601 the operation of obtaining the Directional configuration specification and Reverberation configuration specification.
[0159] Then is the operation of obtaining or determining Feedback delay lengths as shown in Fig.6 by 603. Following this is the operation of obtaining / determining feedback attenuation filter parameters as shown in Fig.6 by 605.
[0160] Then, is obtaining or determining the pre-delay line length parameters as shown in Fig.6 by 607. Furthermore is shown in Fig.6 by 609 is obtaining or determining reverberation ratio control filter parameters.
[0161] Then as shown in Fig.6 by 611 is the operation of outputting parameters such as: Number of reverberator output channels; Feedback attenuation filter coefficients; Reverberation ratio control filter coefficients; Pre-delay line length; Feedback delay line lengths.
[0162] With respect to Fig.7 is shown schematically an example beam signal obtainer 316 as shown in Fig.3 in further detail.
[0163] The beam signal obtainer 316 is configured to receive or otherwise obtain the spatial audio signal 301, the source configuration specification 318 and the beam configuration specification 320 and based on these generate a beam signal 201 which can be passed to the reverberator.
[0164] In some embodiments the beam signal obtainer 316 comprises a beam determiner 701. The beam determiner 701 is configured to obtain or receive the spatial audio signal 301, the source configuration specification 318 and the beam configuration specification 320 and based on these specifications process the spatial audio signal 301 to generate an initial-beam audio signal to be passed to a gain processor 705.
[0165] In some embodiments the beam direction for a sound source is obtained either from received or obtained metadata indicating source position or determination of audio level (or loudness) towards different spatial directions within the spatial audio signal. In some embodiments where metadata or other determination indicates a direction 0, (azimuth in degrees, elevation in degrees) for the source, then a beam indicated in the beam configuration data closest to 0, is selected.
[0166] In some embodiments where no such metadata exists, analysis of spatial energy towards spatial directions can be used to determine beams with a largest sound source energy. The beams with the largest energy can then be used as the beams for the top N sound sources. This enables the system to automatically reverberate some of the most dominant sound sources even though their accurate sound source positions are not known.
[0167] In some embodiments a beam signal towards a sound source at direction
[0168]
[0169] θ, φ is computed or otherwise determined from the spatial audio signal h(n, j) based on beam configuration data. In an embodiment the operation is based on beamforming with Ambisonic signals, formulated as
[0170] j
[0171] h(n) = ∑ wjYj(θ, φ)h(n, j)
[0172]
[0173] j=i
[0174] where Yjare real spherical harmonics ordered with the ACN ambisonic ordering and wjare ambisonic order-dependent weights controlling the shape of the beamformer (e.g., cardioid, hypercardioid, or supercardioid patterns). The index n represents the sample and j the channel.
[0175] In some embodiments the spatial audio signal is in virtual loudspeaker signal sk(n), where k indicates the index of the direction θk, φkof the loudspeaker. In this case beamforming can be achieved by selecting the loudspeaker signal sm(n) as the beam signal such that θm, φmis the direction closest to the beam direction θ, φ.
[0176] Furthermore the beam signal obtainer 316 comprises a gain determiner 703. The gain determiner 703 is configured to obtain or receive the spatial audio signal 301, the source configuration specification 318 and the beam configuration specification 320 and based on these generate a gain value to control the gain processor 705.
[0177] In embodiments the reverberation related gain depends on at least the direction of the initial-beam audio signal compared to a direction of the sound source gain_beam, a desired reverberation gain for the sound source gain_reverb, and a distance for the sound source compared to a reference distance gain_refdist.
[0178] For example, in embodiments the gain is calculated as
[0179] gain = gain_beam * gain_reverb * gain_refdist
[0180] In some embodiments where there is a beam in the beam configuration towards the sound source direction then the beam direction dependent gain gain_beam can be set at unity.
[0181] If the beam closest to the sound source direction has an attenuation of 0.9 towards the sound source direction as indicated by the beam configuration, then the beam direction dependent gain gain_beam can be set to 1 / 0.9 to compensate for the attenuation.
[0182] The gain_beam value is therefore employed in order to compensate for the gain that the beam signal has missed from the source signal.
[0183] In some embodiments a desired reverberation gain gain_reverb can indicate a desired reverberation gain for the source. In some embodiments a parameter such as gain_refdist can indicate gain depending on the reference distance of the source.
[0184] Also the beam signal obtainer 316 comprises a gain processor 705. The gain processor 705 is configured to obtain or receive the initial-beam audio signal and the apply a gain based on the gain value to generate the beam audio signal 201.
[0185] Fig.8 shows an example flow diagram of the operations of the beam signal obtainer 316 of Fig.3 as shown in further detail in Fig.7.
[0186] First, the spatial audio signal is obtained as shown in Fig.8 by 801.
[0187] Then, the sound source direction, sound source reverberation gain, and reference distance are obtained as shown in Fig.8 by 803.
[0188] Then, the beam signal is obtained from the spatial audio signal based on the sound source direction and beam configuration as shown in Fig.8 by 805.
[0189] Then, a gain for the beam signal is obtained based on sound source separation in the beam signal, source reverberation gain, and reference distance as shown in Fig.8 by 807.
[0190] Furthermore as shown in Fig.8 by 809 is the operation of applying the determined gain to the beam signal as shown by.
[0191] Then, the beam signal with gain applied is output as shown in Fig.8 by 811.
[0192] Fig.9 furthermore shows schematically in further detail an example binaural renderer 309 employed in some embodiments. The reverberant audio signals 210 are forwarded to the binaural renderer 309, which also receives directional configuration specification 312 as further inputs. The binaural renderer 309 in some embodiments is configured to render the reverberant audio signals to reverberant binaural signals 314 which can, for example, be reproduced using headphones. These signals are perceived as surrounding and enveloping with acoustical characteristics according to reverberation configuration specification 302 and which have been rendered from the spatial audio signal 301 using the source configuration specification 318 and the beam configuration specification 320.
[0193] The input to the binaural renderer 309 is the Reverberant audio signals 210 srev(t, d) and the directional configuration 312 indicating encoding directions for each combined reverberant audio signal. In the example shown in Fig.9 the binaural renderer 309 is organized on a channel-by-channel basis and there is one HRTF processor 901 d per reverberant audio channel. For example, a first channel a HRTF processor 9011 is configured to receive the reverberant audio signal 210i (channel one) and the directional configuration 312i associated with channel one. A second channel a HRTF processor 9012 is configured to receive the reverberant audio signal 2102 (channel two) and the directional configuration 3122 associated with channel two. Also shown is a Dth channel HRTF processor 901Dconfigured to receive the reverberant audio signal 210D(channel D) and the directional configuration 312Dassociated with channel D. Each of the HRTF processors can comprise an HRTF filter pair hbin(m, i, d), where m is the time index of the filter coefficients, i = 1, 2 is the index of the binaural channel, and d is the reverberator output channel index.
[0194] The operation of the dth HRTF processor 901 d is as follows. Using the HRTF filter pairs hbin(m, i, d), reverberant binaural audio signals sbin(t, i, d) 902d can be determined for each channel of the reverberant audio signals 210d by
[0195] S
[0196]
[0197] bin(d> ^0 h-bin ® scomb(t, d)
[0198] where ® denotes convolution (the filtering may also be performed in the frequency domain in some implementations instead of time-domain convolution) and scomb(t, d) is the reverberant signal.
[0199] The reverberant binaural audio signals sbin(t, i, d) 902d can then be passed to a binaural signal combiner 903.
[0200] The reverberant binaural audio signals sbin(t, i, d) 902d can then be combined across channels d in the binaural signal combiner 903 by
[0201] $ billed', l) ' S billed, I, d)
[0202]
[0203] d
[0204] yielding the reverberant binaural signals sbin(t, i) 314 which is the output.
[0205] With respect to Fig.10 is shown an example virtual audio scene rendering system 1000 which can comprise a reverberation system 300 as shown in Fig.3 with an additional signal structure of reflection processor 1051 and reflection binaural renderer 1059. With respect to the reverberator and the reflection processor 1051 these are configured to generate audio signals associated with echoes within the system. For example, the reflection processor 1051 is configured to produce a discrete number of echoes which are specular with regard to features and geometry of the modelled room and are correspondingly precise and independently varied in their arrival direction, intensity, and coloration, as characterizes early reflections in a room impulse response (ref. Fig. 1, region 103). The echoes produced by the reflection processor are accordingly referred to as reflection echoes. The reflection processor is external to and operates in parallel with the reverberator which is configured to produce late reverberation.
[0206] Fig.11 shows an example reflection processor 1051 and the associated reflection binaural renderer 1059 suitable for using along with the embodiments as discussed herein. There are several ways to calculate or simulate early reflections. As an example, the image source method can be used such as detailed in J. B. Allen and D. A. Berkley, “Image method for efficiently simulating small-room acoustic,” J. Acoust. Soc. Am., vol. 65, pp. 943-950, April 1979 and J. Borish. “Extension of the image model to arbitrary polyhedra.” The Journal of the Acoustical Society of America 75.6 (1984): 1827-1836 and as shown in Fig.12.
[0207] These parameters, such as delay 1106, absorption 1108, attenuation 1110 and direction of arrival (DoA) 1112, can be explained with respect to Fig.12 where an example box or rectangular space is shown with reflecting surfaces 1200, 1202, 1204, 1206. Within the virtual acoustic space is the source 1220 and the listener 1210. The directions of a reflection between the source 1220 and the listener 1210 is shown where on the reflecting surface between the source 1220 and listener 1210 is a reflection and / or absorption point 1240. The mirroring of the source 1220 across the reflecting surface 1206 can be used to establish an image source 1230. The line connecting the image source 1230 to the listener 1210 can then be used to establish the reflection and / or absorption point 1240 and the DoA of the reflection with respect to the listener. The delay to be applied to synthesize a reflection is obtained based on the distance of the reflecting path (path from the image source to the listener which equals the length of the path from the source 1220 to the listener 1210). The absorption corresponds to the reflecting surface 1206 from which this sound trajectory is reflected (the reflection and / or absorption point 1240). The distance attenuation is set proportional to 1 / r where r equals the length of the reflection path from the source to the listener. In addition, air absorption can be included in the attenuation of the image source. The DoA of a reflection is set based on the angle of arrival from the reflection point to the listener.
[0208] In the example early reflection renderer shown in Fig.11, a reflection parameter determiner 1101 is configured to receive the inputs of room geometry 1106, listener position 1100, source position 1102, and absorption coefficients 1104 and generate control parameters such as delay 1106, absorption 1108, attenuation 1110 and direction of arrival (DoA) 1112 and pass these to the processors described hereafter.
[0209] In some embodiments the input audio signal 201 is first fed into a delay line 1103 which buffers audio signal samples and enables picking segments of past samples of the audio signal 201.
[0210] The reflection signal obtainer 1105 can receive the output of the delay line 1103 and the delay 1106 parameter. The reflection signal obtainer is configured to obtain a past signal sample based on the delay 1106 to obtain a delayed signal.
[0211] A reflection absorption processor 1107 then can filter the selected past signal sample to apply an equalizer filter to model the frequency-dependent absorption data for the reflection to obtain delayed and absorption-filtered signal.
[0212] A reflection attenuation processor 1109 can then attenuate the delayed and absorption-filtered signal by applying a 1 / r attenuation and optionally air absorption to obtain delayed and absorption-filtered and attenuated signal.
[0213] Finally, a reflection spatializer 1111 can be configured to spatialize the delayed and absorption-filtered and attenuated signal by HRTF filtering with a left and right HRTF filter corresponding to the desired DoA for this reflection to obtain a reverberant binaural signal 1112 containing the synthesized reflection portion. In some situations, the reflection spatializer can be the binauralizer.
[0214] The virtual audio scene rendering system of Fig.10 can furthermore comprise the reverberation rendering system of Fig.3, that renders the diffuse late reverberation based on the beam signal 201 received from the Beam signal obtainer 316. It is noted that early reflections are rendered individually for each of the beam signals 2011, 2012,.... 201n. This is because early reflections each can have a different direction of arrival and thus need to be rendered separately. For late reverberation, the beam signals 2011, 2012,.... 201ncan be mixed with an adder 1070 and fed into the Reverberator 200 as a downmix.
[0215] Fig.13 illustrate an example of multiple echo orders, in which the left side of Fig.13 shows a XY plane representation 1300 of a virtual “shoebox” room 1301 and its virtual image rooms in a grid like structure. Third-order image sources calculated by the image source method are depicted (one example image source indicated by item 1302). The circle proscribed by a radius 1303 may determine the time-equivalent value of tpre, and the remaining distance to each image source, exemplified by the distance 1304, may correspond to the chosen values of mdand mcwhen converting the distances to time-of-flight and then to samples. In this example, Fig.131310 shows ToAs (including tpre) for various echo orders. For example plot 1311 shows order 1 echoes, 1312 shows order 2 echoes, 1313 shows order 3 echoes and 1314 the combination of echoes up to (and including) order 3.
[0216] Fig.14 shows schematically an example system where the embodiments are implemented in an encoder device 1401 which performs part of the functionality; writes data into a bitstream 1421 and transmits that for a renderer device 1441, which decodes the bitstream, performs reverberator processing according to the embodiments and outputs audio for headphone listening.
[0217] The encoder side 1401 of Fig.14 can be performed on content creator computers and / or network server computers. The output of the encoder is the bitstream 1421 which is made available for downloading or streaming. The decoder / renderer 1441 functionality runs on an end-user-device, which can be a mobile device, personal computer, sound bar, tablet computer, car media system, home HiFi or theatre system, head mounted display for AR or VR, smart watch, or any suitable system for audio consumption.
[0218] The encoder 1401 is configured to receive the virtual scene description 1400 and the audio signals 1404. The virtual scene description 1400 can be provided in the MPEG-I encoder input format (EIF) or in another suitable format. Generally, the virtual scene description contains an acoustically relevant description of the contents of the virtual scene, and contains, for example, the scene geometry as a mesh or as voxels, acoustic materials, acoustic environments with reverberation parameters, positions of sound sources, and other audio element related parameters such as whether reverberation is to be rendered for an audio element or not. The virtual scene description can also contain the source configuration specification as a function of time for each HOA signal and the beam configuration specification (which is time-invariant). The encoder 1401 in some embodiments comprises a scene and reverberation payload encoder 1413 configured to generate reverberation parameters.
[0219] The encoder 1401 further comprises a MPEG-H 3D audio encoder 1414 configured to obtain the audio signals 1404 and MPEG-H encode them and pass them to a bitstream encoder 1415. The encoder 1401 furthermore in some embodiments comprises a bitstream encoder 1415 which is configured to receive the output of the scene and reverberation payload encoder 1413 and the encoded audio signals from the MPEG-H encoder 1414 and generate the bitstream 1421 which can be passed to the bitstream decoder 1441. The bitstream 1421 in some embodiments can be streamed to end-user devices or made available for download or stored.
[0220] The decoder 1441 in some embodiments comprises a bitstream decoder 1441 configured to decode the bitstream.
[0221] The decoder 1441 further can comprise a scene payload decoder 1453 configured to obtain the encoded scene and reverberation parameters and decode these in an opposite or inverse operation to the scene and reverberation payload encoder 1413. The output of the scene payload decoder 1453 can be to the reverberator parameter determiner 303 / 1452 and the beam signal obtainer 316 / 1461.
[0222] The reverberator parameter determiner 303 / 1452 is configured to receive the decoded reverberation configuration specification and room dimensions and directional configuration specification and generate the reverberator control parameters discussed herein.
[0223] Furthermore, the head pose generator 1457 receives information from a head mounted device 1470 or similar and generates head pose information or parameters which can be passed to the binaural renderer 309 / 1459, the reflection renderer 1051 / 1462 and the direct sound binaural renderer 1069 / 1469.
[0224] The decoder 1441 comprise MPEG-H 3D audio decoder 1454 which is configured to decode the audio signals and pass them to the beam signal obtainer 316 / 1461 and direct sound processor 1165.
[0225] The beam signal obtainer 316 / 1461 is configured to receive the decoded beam configuration specification and source configuration specification and the audio signals and generate or select the beam audio signal 201 to be passed to the reverberators 201 / 1464 and reflection processor 1051 / 1462.
[0226] The decoder 1441 furthermore comprises reverberators 201 / 1464 configured to implement a suitable reverberation of the beam audio signals.
[0227] The output of the reverberator 201 / 1461 is configured to output reverberated audio based on the reverberator parameters to a binaural renderer 309 / 1459.
[0228] The decoder furthermore comprises a reflection processor 1051 / 1462 configured to receive the beam audio signal and room geometry 1498 and generate the reflection audio signals and pass these to the reflection binaural renderer 1059 / 1499.
[0229] The decoder 1441 can further comprise a (early) reflection renderer 1059 / 1499 configured to obtain the output of the reflection processor 1051 / 1462 and generate binaural early reflection audio signals.
[0230] The decoder further comprises a binaural renderer 309 / 1459 configured to generate binaural reverberant audio signals from the output of the reverberators 201 / 1464. Additionally, the decoder / renderer 1441 comprises a direct sound processor 1465 which is configured to receive the decoded audio signals and configured to implement any direct sound processing such as air absorption and distance-gain attenuation and which can be passed to a direct sound binaural renderer 1463 which with the head orientation determination (from a suitable sensor) can generate the direct sound component which with the reverberant component is passed to a binaural signal combiner 1467.
[0231] The binaural signal combiner 1467 is configured to combine the direct, early reflection, and reverberant parts to generate a suitable output (for example for headphone reproduction).
[0232] Furthermore, in some embodiments the decoder comprises a head orientation determiner which passes the head orientation information to the head pose generator 1457.
[0233] As an alternative to transmitting reverberation parameters from the encoder to the renderer it is possible in some embodiments to transmit reverberator parameters in the bitstream. Reverberator parameters refer to the FDN parameters such as delay line lengths, attenuation filters, reverberation ratio control filters, and so on.
[0234] In some embodiments the assignment of reverberator outputs to loudspeaker channels happens during configuration of the reverberator. The assignment can be stored during configuration and provided to the reverberant signal router.
[0235] In some embodiments, the output is a multichannel loudspeaker setup (such as 5.1 or 7.1+4 multichannel loudspeaker setup). In that case, the spatial processing proposed in Fig.9 can be modified by using the directions of the actual loudspeakers as the directional configuration and omitting the binaural renderers, and reproducing the reverberant audio signals from the corresponding loudspeakers of the loudspeaker setup. In the case of loudspeaker output, instead of binaural renderer 309 / 1459 in Fig.14 there will be a loudspeaker renderer (or panner) which in the simplest case will pass through the output signals to a loudspeaker signal combiner which will replace the binaural signal combiner 1467. Correspondingly, the direct sound part and early reflection part are spatialized with a panner such as vector-base amplitude panning instead of the binaural processors.
[0236] With respect to Fig.15 an example electronic device which may be used as any of the apparatus parts of the system as described above. The device may be any suitable electronics device or apparatus. For example, in some embodiments the device 2000 is a mobile device, user equipment, tablet computer, computer, audio playback apparatus, etc. The device may for example be configured to implement the encoder or the renderer or any functional block as described above.
[0237] In some embodiments the device 2000 comprises at least one processor or central processing unit 2007. The processor 2007 can be configured to execute various program codes such as the methods described herein. In some embodiments the device 2000 comprises a memory 2011. In some embodiments the at least one processor 2007 is coupled to the memory 2011. The memory 2011 can be any suitable storage means. In some embodiments the memory 2011 comprises a program code section for storing program codes implementable upon the processor 2007. Furthermore, in some embodiments the memory 2011 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 2007 whenever needed via the memory-processor coupling.
[0238] In some embodiments the device 2000 comprises a user interface 2005. The user interface 2005 can be coupled in some embodiments to the processor 2007. In some embodiments the processor 2007 can control the operation of the user interface 2005 and receive inputs from the user interface 2005. In some embodiments the user interface 2005 can enable a user to input commands to the device 2000, for example via a keypad. In some embodiments the user interface 2005 can enable the user to obtain information from the device 2000. For example, the user interface 2005 may comprise a display configured to display information from the device 2000 to the user. The user interface 2005 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 2000 and further displaying information to the user of the device 2000. In some embodiments the user interface 2005 may be the user interface for communicating.
[0239] In some embodiments the device 2000 comprises an input / output port 2009. The input / output port 2009 in some embodiments comprises a transceiver. The transceiver in such embodiments can be coupled to the processor 2007 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network. The transceiver or any suitable transceiver or transmitter and / or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
[0240] The transceiver can communicate with further apparatus by any suitable known communications protocol. For example, in some embodiments the transceiver can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802. X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).
[0241] The input / output port 2009 may be configured to receive the signals.
[0242] In some embodiments the device 2000 may be employed as at least part of the renderer. The input / output port 2009 may be coupled to headphones (which may be a headtracked or a non-tracked headphones) or similar. In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
[0243] The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
[0244] The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general-purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
[0245] Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
[0246] Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication. The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.
Claims
CLAIMS:
1. An apparatus for applying reverberation to at least one audio signal, the apparatus comprising at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the system at least to perform:receiving a multichannel audio signal;determining at least one audio source based on at least one of: the multichannel audio signal; and spatial metadata associated with the multichannel audio signal;obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source;determining a reverberation gain based at least on a direction of the at least one audio source and a direction associated with the at least one further audio signal;applying the reverberation gain to the at least one further audio signal to control the reverberation of the at least one further audio signal; andrendering a reverberated signal based on the gain applied at least one further audio signal.
2. The apparatus as claimed in claim 1, wherein the multichannel audio signal is a component of a spatial audio signal further comprising spatial metadata associated with the multichannel audio signal.
3. The apparatus as claimed in claim 2, further caused to perform determining the at least one audio source based on the spatial metadata associated with the multichannel audio signal.
4. The apparatus as claimed in any of claims 1 to 3, caused to perform obtaining at least one further audio signal is caused to perform beamforming the multichannel audio signal to generate the at least one further audio signal.
5. The apparatus as claimed in claim 4, caused to perform determining the reverberation gain based at least on a direction of the at least one audio source and a direction associated with the at least one further audio signal is caused to perform:determining a beamforming gain for the direction of the at least one audio source; and performing determining the reverberation gain based on the beamforming gain for the direction of the at least one audio source.
6. The apparatus as claimed in claim 4, wherein the at least one further audio signal is a monophonic signal.
7. The apparatus as claimed in any of claims 1 to 6, wherein the at least one multichannel audio signal is a multichannel loudspeaker signal.
8. The apparatus as claimed in claim 7, caused to perform obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source is caused to perform selecting a loudspeaker signal from the multichannel loudspeaker signal which has a direction closest to the direction of the sound source.
9. The apparatus as claimed in any of claims 1 to 5, wherein the at least one audio signal is a multichannel Ambisonics signal.
10. The apparatus as claimed in claim 9, wherein the apparatus caused to perform obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source is caused to perform steering a beam towards the audio source by applying one or more gains to the multichannel Ambisonics signal.
11. The apparatus as claimed in any of claims 1 to 10, wherein the apparatus is caused to perform determining the reverberation gain further based on at least one of: a desired reverberation gain for the audio source; anda distance of the sound source compared to a reference distance.
12. The apparatus as claimed in any of claims 1 to 11, wherein the apparatus caused to perform obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source is caused to perform at least one of:obtaining a beam direction for the audio source from the spatial metadata indicating a position of the at least one audio source;determining a beam direction for the audio source based on a determined audio level or audio energy associated with at least two spatial directions; and further caused to perform beamforming the at least one multichannel audio signal based on the beam direction.
13. The apparatus as claimed in any of claims 1 to 12, caused to perform obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source is further caused to perform generating a downmix signal from two or more beamformed at least one audio signal audio signals, wherein the at least one further audio signal is the downmix signal.
14. A method for an apparatus for applying reverberation to at least one audio signal, the method comprising:receiving a multichannel audio signal;determining at least one audio source based on at least one of: the multichannel audio signal; and spatial metadata associated with the multichannel audio signal;obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source;determining a reverberation gain based at least on a direction of the at least one audio source and a direction associated with the at least one further audio signal;applying the reverberation gain to the at least one further audio signal to control the reverberation of the at least one further audio signal; andrendering a reverberated signal based on the gain applied at least one further audio signal.
15. The method as claimed in claim 14, wherein the multichannel audio signal is a component of a spatial audio signal further comprising spatial metadata associated with the multichannel audio signal.
16. The method as claimed in claim 15, further comprising determining the at least one audio source based on the spatial metadata associated with the multichannel audio signal.
17. The method as claimed in any of claims 14 to 16, wherein obtaining at least one further audio signal comprises beamforming the multichannel audio signal to generate the at least one further audio signal.
18. The method as claimed in claim 17, wherein determining the reverberation gain based at least on a direction of the at least one audio source and a direction associated with the at least one further audio signal comprises:determining a beamforming gain for the direction of the at least one audio source; and performing determining the reverberation gain based on the beamforming gain for the direction of the at least one audio source.
19. The apparatus as claimed in claim 18, wherein the at least one further audio signal is a monophonic signal.
20. The apparatus as claimed in any of claims 14 to 19, wherein the at least one multichannel audio signal is a multichannel loudspeaker signal.
21. An apparatus for applying reverberation to at least one audio signal, the apparatus comprising means configured to:receive a multichannel audio signal;determine at least one audio source based on at least one of: the multichannel audio signal; and spatial metadata associated with the multichannel audio signal;obtain at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source;determine a reverberation gain based at least on a direction of the at least one audio source and a direction associated with the at least one further audio signal;apply the reverberation gain to the at least one further audio signal to control the reverberation of the at least one further audio signal; andrender a reverberated signal based on the gain applied at least one further audio signal.
22. A computer program comprising instructions which, when executed by an apparatus, for applying reverberation to at least one audio signal, cause the apparatus to perform:receiving a multichannel audio signal;determining at least one audio source based on at least one of: the multichannel audio signal; and spatial metadata associated with the multichannel audio signal;obtaining at least one further audio signal, the at least one further audio signal comprising the determined at least one audio source;determining a reverberation gain based at least on a direction of the at least one audio source and a direction associated with the at least one further audio signal;applying the reverberation gain to the at least one further audio signal to control the reverberation of the at least one further audio signal; andrendering a reverberated signal based on the gain applied at least one further audio signal.