Sound field presence reproduction device and sound field presence reproduction method

The sound field presence reproduction device and method effectively isolate and reproduce the audience presence in satellite venues by using an ambisonic microphone and signal processing to remove unwanted sound components, ensuring accurate reproduction of the intended atmosphere.

JP7876153B2Active Publication Date: 2026-06-19PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO LTD

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Patents
Current Assignee / Owner
PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO LTD
Filing Date
2022-09-16
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing scene-based spatial audio reproduction technologies struggle to accurately reproduce the atmosphere of audience presence in satellite venues, as they mix audience presence with other sound signals from live venues, such as spoken words and background music, making it difficult to isolate and reproduce the sense of presence accurately.

Method used

A sound field presence reproduction device and method using an ambisonic microphone to capture audience presence, employing an encoding unit to encode sound source signals, an erasing unit to remove reference signals, and a generation unit to generate speaker drive signals for accurate reproduction in satellite venues.

Benefits of technology

The method allows for high-accuracy reproduction of the atmosphere of audience presence in satellite venues by isolating and reproducing the intended sound field, excluding other sound components.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 0007876153000040
    Figure 0007876153000040
  • Figure 0007876153000041
    Figure 0007876153000041
  • Figure 0007876153000042
    Figure 0007876153000042
Patent Text Reader

Abstract

To reproduce an atmosphere of audience seat side presence in a sound collection space, where sounds are collected using ambisonics microphones, with high accuracy in at least one satellite site.SOLUTION: A sound field presence reproducing device comprises: an acquisition section for acquiring a sound collection signal, which is sound-collected by a sound collection device disposed in a sound collection space, and sound source signals of one or more sound sources in the sound collection space; an erasure section which defines the sound source signal as a reference signal and executes erasure processing for erasing a component of the reference signal included in the sound collection signal; an encoding section which performs encoding processing on a signal after the erasure processing; a generation section which generates a speaker drive signal for reproducing, in a reproduction space which is different from the sound collection space, sound field presence in the sound collection space for each of a plurality of speakers disposed in the reproduction space on the basis of the signal after the encoding processing; and a sound field reproduction section which outputs the speaker drive signal on a speaker basis from each of the plurality of speakers.SELECTED DRAWING: Figure 5
Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field]

[0001] This disclosure relates to a sound field presence reproduction device and a sound field presence reproduction method. [Background technology]

[0002] Recently, scene-based spatial audio reproduction technology has been attracting attention for its ability to reproduce (play back) sound fields in real time. Scene-based spatial audio reproduction technology is a method that uses ambisonic microphones, in which multiple omnidirectional microphone elements are arranged on a rigid sphere or multiple directional microphones are arranged on a hollow spherical surface, to capture multi-channel signals. By applying signal processing to these signals, speakers are arranged to surround the listening environment (space) to reproduce (play back) a three-dimensional sound field in real time, as if the listener were actually present at the location where the ambisonic microphones were placed.

[0003] Prior art relating to sound field reproduction is known, for example, Patent Document 1. Patent Document 1 discloses an audio recording device that receives sound pickup signals from a wireless microphone attached to a subject and generates a multi-channel audio signal based on each audio signal picked up by multiple microphones. This audio recording device assigns the sound pickup signals from the wireless microphone to one or more arbitrary channels of the multi-channel audio signal, combines them at an arbitrary mixing ratio, and records them together with the captured image signal on a recording medium. [Prior art documents] [Patent Documents]

[0004] [Patent Document 1] Japanese Patent Publication No. 2006-314078 [Overview of the project] [Problems that the invention aims to solve]

[0005] Here, we envision using the scene-based spatial sound reproduction technology described above to place ambisonic microphones (see above) on the audience side of a large concert hall or other live venue, for example, to capture the sense of presence (hereinafter sometimes referred to as "audience presence") from the audience side, such as applause, murmurs, murmurs, and cheers, during a theatrical performance or other show taking place on the main stage, and then reproducing that audience presence in one or more satellite venues different from the live venue. Patent Document 1 does not disclose in detail the relationship between the atmosphere of the sound field captured by a microphone and the atmosphere of the sound field captured by a wireless microphone, and it is considered difficult to apply the technology of Patent Document 1 to realize the above-mentioned assumption. Furthermore, even if the ambisonic microphone is placed on the audience side, it is highly likely that it will capture not only the audience presence but also sound signals that have propagated through the space within the live venue, such as spoken words of actors on stage, sound effects, background music, and original sound sources. In this case, other sound components besides the sense of presence from the audience in the live venue are mixed in, making it difficult to accurately reproduce the sound field of the audience presence for listeners in satellite venues. Patent Document 1 does not present a solution for reproducing the sound field of the audience presence captured from the live venue with high sensitivity in a satellite venue.

[0006] This disclosure was devised in view of the conventional circumstances described above, and aims to provide a sound field presence reproduction device and a sound field presence reproduction method that accurately reproduce the atmosphere of audience presence in a sound recording space captured using an ambisonic microphone within at least one satellite venue. [Means for solving the problem]

[0008] Furthermore, this disclosure provides a sound field presence reproduction device comprising: an acquisition unit that acquires at least one sound pickup signal picked up by a sound pickup device arranged in a sound pickup space; an encoding unit that encodes sound source signals of one or more sound sources in the sound pickup space; an erasing unit that uses the encoded sound source signal as a reference signal and performs an erasing process to erase the component of the reference signal included in the sound pickup signal; a generation unit that generates a speaker drive signal for each of a plurality of speakers arranged in a reproduction space different from the sound pickup space, based on the signal after the erasing process, for reproducing the sound field presence in the sound pickup space within the reproduction space; and a sound field reproduction unit that outputs the speaker drive signal for each of the plurality of speakers from each of the plurality of speakers.

[0011] These comprehensive or specific embodiments may be implemented as systems, devices, methods, integrated circuits, computer programs, or recording media, or as any combination of systems, devices, methods, integrated circuits, computer programs, and recording media. [Effects of the Invention]

[0012] According to this disclosure, the atmosphere of the audience seating area within a sound recording space, captured using an ambisonic microphone, can be reproduced with high accuracy in at least one satellite venue. [Brief explanation of the drawing]

[0013] [Figure 1] This diagram schematically illustrates the concept from sound field capture to sound field reproduction in scene-based spatial audio playback technology using ambisonic microphones. [Figure 2] A diagram showing an example of a basis for ambisonic components based on spherical harmonic function expansions for order n and frequency m. [Figure 3] A schematic diagram illustrating an example of the operation of a sound field presence reproduction system. [Figure 4] Block diagram showing an example of the system configuration of the sound field presence reproduction system according to Embodiment 1. [Figure 5]A diagram showing an example of the operation outline from sound field presence feeling sound collection to sound field presence feeling reproduction in the sound field presence feeling reproduction system of FIG. 4 [Figure 6] A flowchart showing an example of the operation procedure of sound field presence feeling reproduction by the sound field presence feeling reproduction device according to Embodiment 1 in chronological order [Figure 7] A block diagram showing an example of the system configuration of the sound field presence feeling reproduction system according to a modified example of Embodiment 1 [Figure 8] A diagram showing an example of the operation outline from sound field presence feeling sound collection to sound field presence feeling reproduction in the sound field presence feeling reproduction system of FIG. 7 [Figure 9] A flowchart showing an example of the operation procedure of sound field presence feeling reproduction by the sound field presence feeling reproduction device according to a modified example of Embodiment 1 in chronological order [Figure 10] A block diagram showing an example of the system configuration of the sound field presence feeling reproduction system according to Embodiment 2 [Figure 11] A diagram showing an example of the operation outline from sound field presence feeling sound collection to sound field presence feeling reproduction in the sound field presence feeling reproduction system of FIG. 10 [Figure 12] A flowchart showing an example of the operation procedure of sound field presence feeling reproduction by the sound field presence feeling reproduction device according to Embodiment 2 in chronological order [Figure 13] A block diagram showing an example of the system configuration of the sound field presence feeling reproduction system according to a modified example of Embodiment 2 [Figure 14] A diagram showing an example of the operation outline from sound field presence feeling sound collection to sound field presence feeling reproduction in the sound field presence feeling reproduction system of FIG. 13 [Figure 15] A flowchart showing an example of the operation procedure of sound field presence feeling reproduction by the sound field presence feeling reproduction device according to a modified example of Embodiment 2 in chronological order

Modes for Carrying Out the Invention

[0014] Hereinafter, embodiments specifically disclosing an acoustic presence reproduction device and an acoustic presence reproduction method according to the present disclosure will be described in detail with appropriate reference to the drawings. However, detailed descriptions may be omitted if they are not necessary. For example, detailed descriptions of well-known matters and duplicate descriptions of substantially the same configurations may be omitted. This is to avoid making the following descriptions unnecessarily redundant and to facilitate the understanding of those skilled in the art. Note that the accompanying drawings and the following descriptions are provided for those skilled in the art to fully understand the present disclosure, and are not intended to limit the subject matter recited in the claims.

[0015] In the following embodiments, a scene-based stereophonic reproduction technique using an ambisonics microphone as a sound collection device for collecting sound source signals such as sounds, music, and human voices in a sound collection space (e.g., a live venue) will be exemplified and described. In this scene-based stereophonic reproduction technique, signals (sound collection signals) collected by a plurality of microphone elements constituting an ambisonics microphone or point sound sources that can be expressed as monaural signals are expressed (encoded) as an intermediate representation ITMR1 (see FIG. 1) or a B-format signal using spherical harmonic functions, thereby uniformly handling sound fields arriving from all directions in the ambisonics signal region (described later). Further, by decoding (decrypting) this intermediate representation, a speaker drive signal is generated to achieve desired sound field reproduction in a reproduction space (e.g., a satellite venue).

[0016] Hereinafter, the "sound field" is defined as a space (including a location) where sound spreads. The sound propagating in the sound field includes sound from one or more sound sources propagating in the target space. Here, the sound source is, for example, not only the sound source of various performances (e.g., band performance, musical drama) being performed on the main stage of a sound collection space such as a live venue LV1, but also sounds that give a sense of presence such as cheers, noises, murmurs, and applause occurring on the audience side away from the main stage in the live venue LV1.

[0017] (Embodiment 1) First, the concept of scene-based spatial audio reproduction technology will be explained with reference to Figure 1. Figure 1 is a schematic diagram illustrating the concept from sound field capture to sound field reproduction in scene-based spatial audio reproduction technology using the Ambisonics Microphone AMB1. The Ambisonics Microphone AMB1 is placed at a predetermined position on the audience side within the sound capture space (e.g., live venue LV1). In live venue LV1, sound signals propagating within that space are captured by the Ambisonics Microphone AMB1. For example, if a band with multiple members is performing on the main stage of live venue LV1, sound signals from various sound sources such as vocals, bass, guitar, and drums are captured. Also, if a musical is being performed, speech signals from one or more actors (sound sources) are captured. On the other hand, sound signals that give a sense of presence to the audience, such as cheers, murmurs, roars, and applause, are also captured by the Ambisonics Microphone AMB1.

[0018] As an example of a sound pickup device, the Ambisonics Microphone AMB1 is equipped with four microphone elements Mc1, Mc2, Mc3, and Mc4. Each of the microphone elements Mc1 to Mc4 is hollow-shaped so that it faces the four vertices from the center of the cube CB1 in Figure 1, with direction Dr1 being the front direction, and has a unidirectional pattern in the direction of each vertex. Microphone element Mc1 faces the front left upper (FLU) of the Ambisonics Microphone AMB1 and mainly picks up sound in that direction. Microphone element Mc2 faces the front right lower (FRD) of the Ambisonics Microphone AMB1 and mainly picks up sound in that direction. Microphone element Mc3 faces the rear left lower (BLD) of the Ambisonics Microphone AMB1 and mainly picks up sound in that direction. The microphone element Mc4 is positioned towards the rear right up (BRU) of the ambisonic microphone AMB1, and primarily picks up sound from that direction.

[0019] The sound pickup signals from these four directions (i.e., FLU, FRD, BLD, BRU) are called A-format signals. A-format signals cannot be used directly and are converted to B-format signals, which are intermediate representations (ITMR1) with directional characteristics. B-format signals include, for example, B-format signals W for omnidirectional (omnidirectional) sound, B-format signals X for front-to-back sound, B-format signals Y for left-to-right sound, and B-format signals Z for up-and-down sound. A-format signals are converted to B-format signals using the following conversion formula.

[0020] W = FLU + FRD + BLD + BRU X = FLU + FRD - BLD - BRU Y = FLU - FRD + BLD - BRU Z = FLU - FRD - BLD + BRU

[0021] By combining B-format signals W, X, Y, and Z, sound signals from all directions—front, back, left, right, up, and down—can be obtained. Furthermore, by changing the signal levels of each of the B-format signals W, X, Y, and Z and combining them, it is possible to generate sound signals with arbitrary directional characteristics from among all directions—front, back, left, right, up, and down. For example, as shown in Figure 1, a total of eight satellite speakers SPk1, SPk2, SPk3, SPk4, SPk5, SPk6, SPk7, and SPk8 are placed at each vertex of a reproduction space modeled as a cube (e.g., satellite venue STL1), and a three-dimensional coordinate system similar to that of the sound collection space (e.g., live venue LV1) is considered (i.e., the front, back, left, right, and up and down directions are parallel or the same direction). Note that here, for the sake of clarity, the number of satellite speakers is given as eight, but it goes without saying that the number is not limited to eight.

[0022] The positions of satellite speakers SPk1 to SPk8 are determined by a predetermined distance and angle (azimuth angle θ) from the reference position (e.g., center position LSP1) in the reproducible space (e.g., satellite venue STL1). i and elevation angle φ i) can be identified by the following. In Figure 1, i is a variable that represents a satellite speaker located in the reproduction space (e.g., satellite venue STL1), and in the example in Figure 1, it takes any integer from 1 to 8.

[0023] Assume that the user, the listener, is located at the center position LSP1 of the reproduction space (e.g., satellite venue STL1), facing forward. Under these circumstances, the sound field within the sound pickup space (e.g., live venue LV1) can be freely reproduced within the reproduction space (e.g., satellite venue STL1) based on the B-format signal W, X, Y, Z data obtained by encoding the A-format signal picked up in the sound pickup space (e.g., live venue LV1), and the respective directions of the satellite speakers SPk1 to SPk8 within the reproduction space (e.g., satellite venue STL1). In other words, when the user, the listener, is present in the reproduction space (e.g., satellite venue STL1), the listener's forward direction is used as the reference direction, and it becomes possible to reproduce and output sound in any three-dimensional direction from that reference direction.

[0024] Next, with reference to Figure 2, we will describe the basis for ambisonic components based on spherical harmonic expansions for order n and frequency m. Figure 2 shows an example of a basis for ambisonic components based on spherical harmonic expansions for order n and frequency m.

[0025] In Figure 2, the horizontal axis (m) represents the degree, and the vertical axis (n) represents the order. The degree m takes values ​​from -n to +n. The total number of spherical harmonics up to order n = N is (N+1). 2It includes n bases. For example, when n=N=0, one base is obtained (i.e., an omnidirectional B-format signal W). Also, for example, when n=N=1, four bases are obtained (i.e., an omnidirectional B-format signal W corresponding to (n,m)=(0,0), a forward-backward B-format signal X corresponding to (n,m)=(1,-1), a vertical B-format signal Z corresponding to (n,m)=(1,0), and a left-right B-format signal Y corresponding to (n,m)=(1,1)). The same applies to n=N=2 and beyond, so the explanation is omitted.

[0026] Spherical harmonics are known to exhibit increasing spatial periodicity as n and m increase. Therefore, different combinations of n and m can be used to represent B-format signals with different directional patterns (directional characteristics). If we define the dimension for order n and frequency m as K = n(n+1) + m based on Ambisonics Channel Numbering (ACN), then spherical harmonics can be expressed in vector form as shown in equation (1). In equation (1), the superscript T indicates the transpose.

[0027]

number

[0028] [ka]

[0029]

number

[0030]

number

[0031] Next, with reference to Figure 3, an example of the operation overview of the sound field presence reproduction system will be described. Figure 3 is a schematic diagram showing an example of the operation overview of the sound field presence reproduction system. In Figure 3, the sound pickup space in which the ambisonic microphone AMB1 is placed is explained using, for example, a live venue LV1 where a band is performing with various sound sources such as vocals, bass, guitar, and drums. However, as mentioned above, the live venue LV1, which is the sound pickup space, is not limited to band performances, but may also include a musical performance with one or more actors, a concert with multiple instruments, or an orchestral performance, and the same applies hereafter.

[0032] As shown in Figure 3, the live venue LV1 is equipped with a main stage STG1, on which the band performs. During the band performance, audio signals such as the vocals (an example of a sound source) SS2, bass (an example of a sound source) SS1, and guitar (an example of a sound source) SS3 propagate widely throughout the space within the live venue LV1 and reach the audience. These signals may reach the audience directly from their respective sound source locations through the space, or they may be reproduced through amplification devices such as speakers installed in the live venue LV1 and then reach the audience. The ambisonic microphone AMB1 is positioned at a predetermined location on the audience side of the live venue LV1 (for example, in the center of the audience area) with the primary purpose of capturing sounds that convey the sense of presence to the audience. Therefore, the ambisonic microphone AMB1 primarily captures sounds that give the audience a sense of presence during the band performance, such as cheers, murmurs, commotion, and applause.

[0033] However, as mentioned above, the bass sound signal SS1, the vocal sound signal SS2, and the guitar sound signal SS3 during a band performance propagate within the space of the live venue LV1. As a result, the diffuse sound components DS1, DS2, and DS3 of sound signals SS1, SS2, and SS3 (including reverberation; the same applies hereafter) are picked up as sound signals by the ambisonic microphone AMB1. Consequently, because the ambisonic microphone AMB1 picks up the diffuse sound components DS1, DS2, and DS3, which are not intended to be picked up, it was difficult for conventional scene-based spatial sound reproduction technology to accurately reproduce the sense of presence from the audience side of the live venue LV1 in the satellite venue STL1.

[0034] Therefore, the following embodiment describes an example of a sound field presence reproduction system that accurately reproduces the atmosphere of audience presence in a sound recording space captured using an ambisonic microphone within at least one satellite venue.

[0035] Next, with reference to Figures 4 and 5, the system configuration and operation overview of the sound field presence reproduction system 100 according to Embodiment 1 will be described. Figure 4 is a block diagram showing an example of the system configuration of the sound field presence reproduction system 100 according to Embodiment 1. Figure 5 is a diagram showing an example of the operation overview from sound field presence recording to sound field presence reproduction in the sound field presence reproduction system 100 of Figure 4.

[0036] The sound field presence reproduction system 100 includes a sound field presence sound recording device 10 and a sound field presence reproduction device 20. The sound field presence sound recording device 10 and the sound field presence reproduction device 20 are connected to each other via a network NW1, enabling data communication. The network NW1 may be a wired network or a wireless network. The wired network may include at least one of the following: wired LAN (Local Area Network), wired WAN (Wide Area Network), or power line communication (PLC), or other network configurations capable of wired communication. On the other hand, the wireless network may include at least one of the following: wireless LAN such as Wi-Fi (registered trademark), wireless WAN, short-range wireless communication such as Bluetooth (registered trademark), or mobile cellular communication network such as 4G or 5G, or other network configurations capable of wireless communication.

[0037] The sound field presence sound recording device 10 is placed in a sound recording space (e.g., a live venue LV1) and includes an ambisonic microphone AMB1, an A / D conversion unit 7, and individual sound recording microphones M1, ..., Mn. Here, n represents the number of individual sound sources in the live venue LV1 (e.g., independent sound sources such as vocals, bass, and guitar in the case of a band performance), and is specifically an integer of 2 or more. Note that the sound field presence sound recording device 10 only needs to have an ambisonic microphone AMB1, and the A / D conversion unit 7 may be provided in the sound field presence reproduction device 20.

[0038] The Ambisonics microphone AMB1 is equipped with four microphone elements Mc1, Mc2, Mc3, and Mc4. Microphone element Mc1 picks up sound from the front upper left direction (see Figure 1), microphone element Mc2 picks up sound from the front lower right direction (see Figure 1), and microphone element Mc3 picks up sound from the rear lower left direction (see Figure 1) and rear upper right direction (see Figure 1). The Ambisonics microphone AMB1 may also be equipped with more unidirectional microphone elements than the four hollow-arranged microphone elements Mc1, Mc2, Mc3, and Mc4, or it may be equipped with an omnidirectional microphone element arranged on a rigid sphere. By using an Ambisonics microphone equipped with a large number of microphone elements, it becomes possible to synthesize an Ambisonics signal of the second order or higher in the encoding unit 13 of the sound field presence reproduction device 20. The signals (acquired signals) picked up by each microphone element constituting the Ambisonics microphone AMB1 are input to the A / D conversion unit 7.

[0039] At least the A / D conversion unit 7 is composed of a semiconductor chip on which at least one of the following electronic devices is mounted, such as a CPU (Central Processing Unit), a DSP (Digital Signal Processor), a GPU (Graphical Processing Unit), or an FPGA (Field Programmable Gate Array), or dedicated hardware.

[0040] The A / D conversion unit 7 converts the analog sound pickup signals from each microphone element constituting the ambisonic microphone AMB1 into digital sound pickup signals. These converted sound pickup signals are transmitted to the sound field presence reproduction device 20 via the communication interface (not shown) and network NW1 provided by the sound field presence sound pickup device 10.

[0041] The individual sound pickup microphone M1 captures individual sound sources (first sound sources) generated from unique sound sources (e.g., the vocalist of a band performance or the performer of a musical play) during events such as band performances or musical plays on the main stage STG1 (see Figure 3) of the live venue LV1. The individual sound pickup microphone M1 may be a headset microphone worn by, for example, the vocalist of a band performance or the performer of a musical play. The individual sound source signal captured by the individual sound pickup microphone M1 is transmitted to the sound field presence reproduction device 20 via the communication interface (not shown) and network NW1 provided by the sound field presence sound pickup device 10.

[0042] Similarly, the individual sound microphone Mn captures individual sound sources (the nth sound source) that originate from unique sound sources (for example, a guitar in a band performance, or sound effects or background music in a musical performance) during events such as band performances or musical plays on the main stage STG1 (see Figure 3) of the live venue LV1. The individual sound microphone Mn may be, for example, a headset microphone worn by a guitarist in a band performance, or a microphone capable of capturing sound effects or background music in a musical performance. The individual sound source signal captured by the individual sound microphone Mn is transmitted to the sound field presence reproduction device 20 via the communication interface (not shown) and network NW1 provided by the sound field presence sound capture device 10.

[0043] The sound field presence reproduction device 20 is placed in a reproduction space (for example, a satellite venue STL1) and includes echo cancellation units 21, ..., 2n, an encoding unit 22, a microphone element direction designation unit 23, a speaker direction designation unit 24, a decoding unit 25, a sound field playback unit 26, and satellite speakers SPk1, ..., SPkp. Here, p indicates the number of satellite speakers placed in the satellite venue STL1, and is specifically an integer of 2 or more. Also, n, which indicates the number of echo cancellation units 21 to 2n, and n, which indicates the number of individual sound pickup microphones M1 to Mn, are the same. In other words, the sound field presence reproduction device 20 is provided with the same number of echo cancellation units as the number of types of sound sources picked up by the individual sound pickup microphones.

[0044] The echo cancellation unit 21 receives the sound pickup signals from each microphone element of the ambisonic microphone AMB1 sent from the sound field presence sound recording device 10 (A / D conversion unit 7 side), and further receives the individual sound source signals of the first sound source (see above) sent from the sound field presence sound recording device 10 (individual sound recording microphone M1 side) as the first reference signal M1S. The echo cancellation unit 21 performs an erasure process (e.g., echo cancellation process) to erase the component of the first reference signal M1S (i.e., the individual sound source signal picked up by the individual sound recording microphone M1) included in the sound pickup signals from each microphone element of the ambisonic microphone AMB1. The echo cancellation unit 21 outputs the signal after the erasure process (first sound pickup signal) to the encoding unit 22.

[0045] Similarly, the echo cancellation unit 2n receives the sound pickup signals from each microphone element of the ambisonic microphone AMB1 sent from the sound field presence sound recording device 10 (A / D conversion unit 7 side), and further receives the individual sound source signal of the nth sound source (see above) sent from the sound field presence sound recording device 10 (individual sound recording microphone Mn side) as the nth reference signal MnS. The echo cancellation unit 2n performs an erasure process (e.g., echo cancellation process) to erase the component of the nth reference signal MnS (i.e., the individual sound source signal picked up by the individual sound recording microphone Mn) included in the sound pickup signals from each microphone element of the ambisonic microphone AMB1. The echo cancellation unit 2n outputs the signal after the erasure process (the nth sound pickup signal) to the encoding unit 22.

[0046] Here, each of the echo cancellation units 21 to 2n may be configured as an echo canceller using an adaptive filter that operates in the time domain, for example. This echo canceller can be configured as a Single Channel EchoCanceller, for example. Therefore, as shown in Figure 5, each of the echo cancellation units 21 to 2n can be configured using the same number of Single Channel EchoCancellers as the number of microphone elements of the ambisonic microphone AMB1 (for example, 4). This Single Channel EchoCanceller may be configured as disclosed in, for example, the reference non-patent literature. By using this configuration, especially when there is no correlation between the reference signals (in other words, when the crosstalk component is below a predetermined threshold to the extent that it can be considered to be substantially free of crosstalk), it becomes possible to erase (suppress) the component of the reference signal (i.e., the individual sound source sound signal picked up by the individual pickup microphone) included in the sound picked up by each microphone element of the ambisonic microphone AMB1 with high accuracy. Furthermore, the echo cancellation sections 21-2n may be implemented as an echo cancellation process using adaptive filters in the frequency domain or subband domain after the time-domain signal has been forward-transformed using a DFT (Discrete Fourier Transform), or the canceled signal may be inverse-transformed back to the time domain using an IFFT (Inverse Fast Fourier Transform) before subsequent processing.

[0047] <Reference Non-Patent Literature> Chapter 5 Acoustic Echo Canceller, "Example of Adaptive Filter Configuration" (see Figure 5.2), p4 / (17), Institute of Electronics, Information and Communication Engineers, 2012, [Retrieved September 2, 2022], Internet<URL:https: / / www.ieice-hbkb.org / files / 02 / 02gun_06hen_05.pdf>

[0048] Each of the echo cancellation units 21 to 2n is provided for the purpose of eliminating (suppressing) individual sound sources that have propagated within the sound pickup space (e.g., live venue LV1). For this reason, each of the echo cancellation units 21 to 2n may be provided together with the encoding unit 22 on the sound pickup space (e.g., live venue LV1) side, or it may be provided together with the encoding unit 22 on the reproduction space (e.g., satellite venue STL) side. In this case, if it is provided on the sound pickup space (e.g., live venue LV1) side, only the component of the first-order ambisonic signal (i.e., audience presence) which is the output of the encoding unit 22 will be sent to the sound field presence reproduction device 20. On the other hand, if it is provided on the reproduction space (e.g., satellite venue STL1) side, both the component of the first-order ambisonic signal (i.e., audience presence) which is the output of the encoding unit 22 and the individual sound source signals will be sent to the sound field presence reproduction device 20. Alternatively, the echo cancellation units 21-2n may be placed only in the sound pickup space (e.g., live venue LV1), while the encoding unit 22 is placed in the reproduction space (e.g., satellite venue STL). In this case, only the output signal component of the echo cancellation unit 2n will be sent to the sound field presence reproduction device 20.

[0049] Furthermore, in the reproduced space (for example, satellite venue STL1), the sound field presence reproduction device 20 may output individual sound source signals, each picked up by the individual sound pickup microphones M1 to Mn of the sound field presence sound pickup device 10, from each of the satellite speakers SPk1 to SPkp, or from other satellite speakers (not shown) provided for the individual sound source signals, for the purpose of reproducing the sense of sound field presence.

[0050] [ka]

[0051] [ka]

[0052] Here, we will explain the details of the encoding process performed by the encoding unit 22.

[0053] Generally, for any angle (θ, φ) on the spherical surface, the sound pressure p observed (microphone) at a position with a radius r is known to be expanded as Equation (4) using the spherical harmonic function of Equation (2) as a basis for the solution of the interior problem in the spherical harmonic function region of the wave equation with respect to the wave number k. In Equation (4), A m n is the expansion coefficient, and R n (kr) is the radial function term. Also, the infinite sum with respect to the order n is approximated by truncating at a finite order N, and the accuracy of sound field reproduction changes according to this truncation order N. Hereinafter, the truncation order is expressed as N.

[0054]

Number

[0055]

Chemistry

[0056]

Number

[0057]

Number

[0058] In Equation (6), i is the imaginary unit, and j n (kr) is the spherical Bessel function of the n-th order, and j ’ n (kr) is its derivative. In the present disclosure, the expansion coefficient vector γ m n is handled as a B-format signal (intermediate representation) which is the output of the encoding process by the encoding unit 22. Hereinafter, this expansion coefficient vector may be referred to as an ambisonics domain signal or simply an ambisonics signal on an ambisonics domain different from the time domain.

[0059] More specifically, in the encoding process by the encoding unit 22, the acquired sound signal, which is a time-domain signal after the removal of the reference signal components output from each of the echo cancellation units 21 to 2n, is converted into an ambisonic signal (e.g., a first-order ambisonic signal). This ambisonic signal (e.g., a first-order ambisonic signal) is then decoded by the decoding unit 25 and converted into a speaker drive signal.

[0060] [ka]

[0061]

number

[0062]

number

[0063]

number

[0064] [ka]

[0065] [ka]

[0066] [ka]

[0067] The sound field reproduction unit 26 converts the digital speaker drive signals for each satellite speaker output from the decoding unit 25 into analog speaker drive signals, amplifies the signals, and outputs (plays back) them from the corresponding satellite speakers.

[0068] Satellite speakers SPk1, ..., SPkp are placed at each vertex (see Figure 1) of the reproduction space (e.g., satellite venue STL1) modeled as a cube, and reproduce (recreate) the sound field based on speaker drive signals from the sound field reproduction unit 26. The number of speakers may be varied depending on the sound field to be reproduced. It is also possible to reproduce the sound field using fewer than p satellite speakers (for example, 8 in the example in Figure 1) by not reproducing for a specific direction, or by combining it with a commonly known virtual sound image generation method such as a transaural system or VBAP (Vector Based Amplitude Panning) method. Conversely, it is also possible to reproduce the sound field using more than p satellite speakers (for example, 8 in the example in Figure 1). Furthermore, the speaker placement does not have to be at each vertex of the reproduction space (e.g., satellite venue STL1) as long as they are placed so as to surround the reference position (e.g., center position LSP1) of the satellite venue STL1. The sound field reproduction unit 26 may output a signal to a playback device for both ears, such as headphones or earphones, worn by the listener (user), instead of to the satellite speakers. Furthermore, when supplying a signal to the playback device for both ears of the listener (user) (for example, the headphones or earphones mentioned above), the sound field reproduction unit 28 may generate a playback signal corresponding to an azimuth angle of ±90° through a decoding process described later. Alternatively, it may generate virtual sound images for multiple directions surrounding the head, and generate a playback signal by multiplying the virtual sound image in the corresponding direction in the frequency domain or convolving it in the time domain with a transfer characteristic that allows the user to perceive a three-dimensional sound image, such as an HRTF (Head Related Transfer Function), corresponding to these multiple angles. This allows for sound field reproduction not only from each of the satellite speakers SPk1 to SPkp located in the satellite venue STL1, but also to a playback device (for example, the headphones or earphones mentioned above) worn by the listener (user) located in the satellite venue STL1.

[0069] Here, we will explain the details of the processing performed by the decoding unit 25.

[0070] [ka]

[0071]

number

[0072] Next, with reference to Figure 6, the operation procedure for sound field presence reproduction by the sound field presence reproduction device 20 will be described. Figure 6 is a flowchart showing a time-series example of the operation procedure for sound field presence reproduction by the sound field presence reproduction device 20 according to Embodiment 1.

[0073] In Figure 6, the ambisonics microphone AMB1 of the sound field presence sound recording device 10 records sounds occurring around a predetermined position on the audience side within the sound recording space (e.g., live venue LV1) (e.g., sounds that give a sense of presence to the audience side) (step St21). The recorded signals from each microphone element of the ambisonics microphone AMB1 recorded in step St21 are transmitted to the sound field presence reproduction device 20B. However, as mentioned above, the sounds recorded in step St21 include not only sounds that give a sense of presence to the audience side, but also sounds from one or more sound sources such as performances or plays on the main stage STG1 (see Figure 3) in the sound recording space (e.g., live venue LV1). Furthermore, individual sound sources (in other words, the first reference signal M1S to the nth reference signal MnS) captured by individual microphones M1 to Mn from one or more sound sources such as performances or theatrical performances on the main stage STG1 (see Figure 3) of the sound recording space (for example, live venue LV1) are also transmitted to the sound field presence reproduction device 20 (step St21).

[0074] The sound field presence reproduction device 20 repeatedly performs echo cancellation processing (see above) on the time axis for each reference signal in each of the echo cancellation units 21 to 2n, using the sound pickup signal for each microphone element of the ambisonic microphone AMB1 as the main signal and the sound pickup signals of each individual sound pickup microphone M1 to Mn as reference signals (step St22). More specifically, the echo cancellation unit 21 of the sound field presence reproduction device 20 receives various signals sent in step St21 (specifically, the sound pickup signals for each microphone element of the ambisonic microphone AMB1 and the corresponding first reference signal M1S), and performs erasure processing (e.g., echo cancellation processing) on ​​the time axis to erase the component of the first reference signal M1S (i.e., the individual sound source sound signal picked up by the individual sound pickup microphone M1) contained in the sound pickup signal for each microphone element of the ambisonic microphone AMB1 (step St22). Similarly, the echo cancellation unit 2n of the sound field presence reproduction device 20 receives various signals sent in step St21 (specifically, the sound pickup signals for each microphone element of the ambisonic microphone AMB1 and the corresponding nth reference signal MnS), and performs an erasure process (e.g., echo cancellation process) on the time axis to erase the nth reference signal MnS (i.e., the individual sound source sound signal picked up by the individual sound pickup microphone Mn) contained in the sound pickup signals for each microphone element of the ambisonic microphone AMB1 (step St22).

[0075] [ka]

[0076] [ka]

[0077] As described above, the sound field presence reproduction device 20 according to Embodiment 1 comprises an acquisition unit (echo cancellation unit 21-2n) that acquires sound pickup signals (diffuse sound components DS1-DS3) picked up by a sound pickup device (ambisonic microphone AMB1) placed in a sound pickup space (live venue LV1) and sound source signals (sound signal SS1, voice signal SS2, sound signal SS3) from one or more sound sources in the sound pickup space, and an erasure unit (echo cancellation unit) that uses the sound source signals as reference signals and performs an erasure process on the time axis to erase the reference signal components included in the sound pickup signals. The sound field presence reproduction device 20 according to Embodiment 1 can reproduce with high accuracy in at least one satellite venue the atmosphere of the audience side in the sound pickup space (live venue LV1) mainly picked up by the ambisonic microphone AMB1 by erasing components of one or more individual sound sources (reference signals) in the live venue LV1 picked up by the ambisonic microphone AMB1 on the time axis. This is achieved by encoding the signal after erasing the signal after erasing the signal, encoding the signal, encoding the component of the speaker drive signal for each of the multiple speakers (satellite venue STL1) located in a reproduction space (satellite venue STL1) different from the sound pickup space.

[0078] Furthermore, the sound field presence reproduction device 20 further includes a microphone element direction specification unit 23 that specifies the direction information of the multiple microphone elements Mc1 to Mc4 provided by the sound acquisition device (ambisonic microphone AMB1). The encoding unit 22 performs encoding processing using the direction information of each of the multiple microphone elements Mc1 to Mc4 and the signal after the erasure process. As a result, the sound field presence reproduction device 20 can generate an ambisonic signal having multiple direction resolutions (see B-format signal in Figure 1) by taking into account the direction information of each of the microphone elements Mc1 to Mc4 provided by the ambisonic microphone AMB1.

[0079] Furthermore, the sound field presence reproduction device 20 includes a speaker direction specification unit 24 that specifies the direction information of multiple speakers (satellite speakers SPk1 to SPkp) within the reproduction space (satellite venue STL1). The generation unit (decoding unit 25) uses the direction information of each of the multiple speakers and the encoded signal to generate speaker drive signals for each of the multiple speakers in the ambisonics region. As a result, the sound field presence reproduction device 20 can generate speaker drive signals that can reproduce the sense of presence on the audience side within the live venue LV1 by taking into account the direction information of each of the multiple satellite speakers SPk1 to SPkp arranged in the space of the satellite venue STL1 from their respective reference positions (for example, see the central position LSP1 corresponding to the listener's position).

[0080] Furthermore, the elimination section (echo cancellation section 21-2n) of the sound field presence reproduction device 20 is composed of a number of single echo cancellers (see Figure 5) determined based on the number of microphone elements Mc1-Mc4 on the sound pickup device (ambisonic microphone AMB1) and the number of sound sources. Each single echo canceller receives the sound source signal of the corresponding sound source (for example, a sound signal or speech signal from an individual sound pickup microphone) and performs elimination processing (echo cancellation processing) on ​​the time axis. This makes it possible to eliminate (suppress) the component of the reference signal (i.e., the individual sound source sound signal picked up by the individual sound pickup microphone Mn) included in the sound pickup signal of each microphone element of the ambisonic microphone AMB1 with high precision, especially when there is no correlation between the reference signals (in other words, when it is below a predetermined threshold to the extent that the crosstalk component can be considered to be substantially absent). The absence of crosstalk components means, for example, that when a vocalist on the main stage STG1 (see Figure 3) sings, the sound is not picked up by other individual microphones, or even if it is picked up, the sound pressure level is below the predetermined threshold mentioned above.

[0081] (Modified version of Embodiment 1) Embodiment 1 describes an example in which each of the echo cancellation units 21 to 2n in a sound field presence reproduction device is configured as a Single EchoCanceller. A modification of Embodiment 1 describes an example in which the echo cancellation units 21 to 2n in the sound field presence reproduction device are replaced with a multi-channel echo canceller that handles multiple audio channels. In the modification of Embodiment 1, configurations and contents that overlap with Embodiment 1 are given corresponding common reference numerals to simplify or omit the explanation, while different contents are explained.

[0082] First, with reference to Figures 7 and 8, the system configuration and operation overview of the sound field presence reproduction system 100A according to a modified example of Embodiment 1 will be described. Figure 7 is a block diagram showing an example of the system configuration of the sound field presence reproduction system 100A according to a modified example of Embodiment 1. Figure 8 is a diagram showing an example of the operation overview from sound field presence sound acquisition to sound field presence reproduction in the sound field presence reproduction system 100A of Figure 7.

[0083] The sound field presence reproduction system 100A includes a sound field presence sound recording device 10 and a sound field presence reproduction device 20A. The sound field presence sound recording device 10 and the sound field presence reproduction device 20A are connected to each other via a network NW1, enabling data communication.

[0084] The sound field presence reproduction device 20A is placed in a reproduction space (for example, a satellite venue STL1) and includes a multi-channel echo cancellation unit 21A, an encoding unit 22, a microphone element direction specification unit 23, a speaker direction specification unit 24, a decoding unit 25, a sound field playback unit 26, and satellite speakers SPk1, ..., SPkp. In other words, in a modified embodiment of Embodiment 1, instead of the echo cancellation units 21~2n of Embodiment 1, a multi-channel echo cancellation unit 21A is provided in the sound field presence reproduction device 20A that inputs the sound source signals of the sound sources picked up by each of the n individual sound-collecting microphones.

[0085] The multi-channel echo cancellation unit 21A receives the sound pickup signals from each microphone element of the ambisonic microphone AMB1 sent from the sound field presence sound recording device 10 (A / D conversion unit 7 side), and further receives the individual sound source signals from the first sound source (see above) to the nth sound source (see above) sent from the sound field presence sound recording device 10 (individual sound recording microphone M1 side) as the first reference signal M1S to the nth reference signal MnS. The multi-channel echo cancellation unit 21A performs an erasure process (e.g., multi-channel echo cancellation process) in the time domain to erase each component of the sound pickup signal from the ambisonic microphone AMB1, from the first reference signal M1S (i.e., the individual sound source signal picked up by the individual sound recording microphone M1) to the nth reference signal MnS (i.e., the individual sound source signal picked up by the individual sound recording microphone Mn). The multi-channel echo cancellation unit 21A outputs the signals after the cancellation process (from the first to the nth acquired sound signal) to the encoding unit 22.

[0086] Here, the multi-channel echo cancellation unit 21A may be configured as an echo canceller using an adaptive filter that operates in the time domain, for example. This echo canceller can be configured as a Multi Channel EchoCanceller, for example. The configuration of this Multi Channel EchoCanceller may be based on the stereo echo canceller shown in the reference non-patent document, which is an example where two reference signals are input to the Multi Channel EchoCanceller. Therefore, as shown in Figure 8, the multi-channel echo cancellation unit 21A can be configured using the same number of Multi Channel EchoCancellers or stereo echo cancellers as the number of microphone elements (e.g., 4) of the ambisonic microphone AMB1. This Multi Channel EchoCanceller or stereo echo canceller may be a configuration disclosed in the reference non-patent document or a configuration obtained by referring to that configuration. By using this configuration, even if there is a correlation between the reference signals (in other words, when it exceeds a predetermined threshold that can be considered to contain no crosstalk components), it becomes possible to eliminate (suppress) the components of the reference signal (i.e., the individual sound source signals picked up by the individual sound pickup microphones Mn) included in the sound pickup signal of the ambisonic microphone AMB1 with high precision. Furthermore, the multi-channel echo cancellation section 21A may be implemented as an echo cancellation process using adaptive filters in the frequency domain or subband domain after the time domain signal has been forward-converted using DFT or the like, or the canceled signal may be inversely converted back to the time domain using IFFT or the like before subsequent processing.

[0087] <Reference Non-Patent Literature> Chapter 5 Acoustic Echo Canceller, "Example of Stereo Echo Canceller Configuration" (see Figures 5 and 8), Institute of Electronics, Information and Communication Engineers, 2012, [Retrieved September 2, 2022], Internet<URL:https: / / www.ieice-hbkb.org / files / 02 / 02gun_06hen_05.pdf>

[0088] Next, with reference to Figure 9, the operation procedure for sound field presence reproduction by the sound field presence reproduction device 20A will be described. Figure 9 is a flowchart showing in chronological order an example of the operation procedure for sound field presence reproduction by the sound field presence reproduction device according to a modified example of Embodiment 1. In the explanation of Figure 9, the same step numbers are assigned to content that overlaps with the explanation of Figure 6, and the explanation is simplified or omitted, while different content is explained.

[0089] In Figure 9, the sound field presence reproduction device 20A performs multi-channel echo cancellation processing (see above) in the time domain in the multi-channel echo cancellation unit 21A, using the sound pickup signal from the ambisonic microphone AMB1 as the main signal and the sound pickup signals from the individual sound pickup microphones M1 to Mn as reference signals (step St22A). More specifically, the multi-channel echo cancellation unit 21A of the sound field presence reproduction device 20A receives various signals sent in step St21 (specifically, the sound pickup signals for each microphone element of the ambisonic microphone AMB1, and the first reference signal M1S to the nth reference signal MnS), and performs an erasure process (for example, a multi-channel echo cancellation process) on the time axis to erase the components of the first reference signal M1S (i.e., the individual sound source sound signal picked up by the individual sound pickup microphone M1) to the nth reference signal MnS (i.e., the individual sound source sound signal picked up by the individual sound pickup microphone Mn) contained in the sound pickup signals for each microphone element of the ambisonic microphone AMB1 (step St22A). The processing from step St22A onward overlaps with Figure 6, so the explanation is omitted.

[0090] As described above, in the sound field presence reproduction system 100A according to a modified embodiment of Embodiment 1, the erasure unit (multi-channel echo cancellation unit 21A) of the sound field presence reproduction device 20A is composed of a number of multi-channel echo cancellers determined based on the number of microphone elements Mc1 to Mc4 provided in the sound acquisition device (ambisonic microphone AMB1) (see Figure 8). Each multi-channel echo canceller receives the sound source signal (sound source sound signal) corresponding to each of the multiple sound sources and performs erasure processing (multi-cancel echo cancellation processing). This makes it possible to erase (erase) the component of the reference signal (i.e., the individual sound source sound signal acquired by the individual sound acquisition microphone) included in the sound acquisition signal of each microphone element of the ambisonic microphone AMB1 with high precision, even if there is a correlation between the reference signals (in other words, when it is above a predetermined threshold that can be considered to contain no crosstalk components).

[0091] (Embodiment 2) Embodiment 1 describes an example in which, in a sound field presence reproduction device, echo cancellation processing is performed in the time domain using the sound pickup signal for each microphone element of the ambisonics microphone AMB1 and the sound source sound signal (reference signal) for each sound source in the sound pickup space (e.g., live venue LV1) before performing the encoding processing to generate a first-order ambisonics signal. Embodiment 2 describes an example in which echo cancellation processing is performed in the ambisonics domain using a first-order ambisonics signal and sound source sound signals with directional specifications for each sound source in the sound pickup space (e.g., live venue LV1). In Embodiment 2, configurations and contents that overlap with Embodiment 1 are given corresponding common reference numerals to simplify or omit the explanation, while different contents are explained.

[0092] First, with reference to Figures 10 and 11, the system configuration and operation overview of the sound field presence reproduction system 100B according to Embodiment 2 will be described. Figure 10 is a block diagram showing an example of the system configuration of the sound field presence reproduction system 100B according to Embodiment 2. Figure 11 is a diagram showing an example of the operation overview from sound field presence recording to sound field presence reproduction in the sound field presence reproduction system 100B of Figure 10.

[0093] The sound field presence reproduction system 100B includes a sound field presence sound recording device 10B and a sound field presence reproduction device 20B. The sound field presence sound recording device 10B and the sound field presence reproduction device 20B are connected to each other via a network NW1, enabling data communication.

[0094] The sound field presence sound recording device 10B is placed in a sound recording space (e.g., a live venue LV1) and includes an ambisonic microphone AMB1, an A / D conversion unit 7, an encoding unit 8, a microphone element direction specification unit 9, and individual sound recording microphones M1, ..., Mn. The sound field presence sound recording device 10 only needs to have at least the ambisonic microphone AMB1, and the A / D conversion unit 7, encoding unit 8, and microphone element direction specification unit 9 may be provided in the sound field presence reproduction device 20B.

[0095] [ka]

[0096] [ka]

[0097] The sound field presence reproduction device 20B is placed in a reproduction space (for example, a satellite venue STL1) and includes echo cancellation units 21B, ..., 2nB, encoding units 31, ..., 3n, sound source position designation units 41, ..., 4n, speaker direction designation unit 24, decoding unit 25B, sound field playback unit 26, and satellite speakers SPk1, ..., SPkp. Furthermore, the n representing the number of echo cancellation units 21~2n, the n representing the number of encoding units 31~3n, the n representing the number of sound source position designation units 41~4n, and the n representing the number of individual sound pickup microphones M1~Mn are the same. In other words, the sound field presence reproduction device 20B is provided with the same number of echo cancellation units, encoding units, and sound source position designation units as the number of types of sound sources picked up by the individual sound pickup microphones.

[0098] [ka]

[0099] [ka]

[0100] [ka]

[0101] [ka]

[0102] [ka]

[0103] [ka]

[0104] Here, each of the echo cancellation units 21B to 2nB may be configured as an echo canceller using an adaptive filter that operates in the ambisonics region, for example. This echo canceller can be configured as a Single Channel EchoCanceller, for example. Therefore, as shown in Figure 11, each of the echo cancellation units 21B to 2nB can be configured using the same number of Single Channel EchoCancellers as the number of microphone elements (e.g., 4) of the ambisonics microphone AMB1. This Single Channel EchoCanceller may be configured as disclosed in, for example, the reference non-patent literature. By using this configuration, even if there is no correlation between the reference signals (in other words, if it is below a predetermined threshold to the extent that the crosstalk component can be considered substantially absent), it is possible to erase (suppress) the component of the reference signal (i.e., the individual sound source sound signal picked up by the individual sound pickup microphone) contained in the signal component based on the sound pickup signal of each microphone element of the ambisonics microphone AMB1 (e.g., the signal having resolution in each direction of W, X, Y, and Z as shown in Figure 1) with high accuracy.

[0105] <Reference Non-Patent Literature> Chapter 5 Acoustic Echo Canceller, "Example of Adaptive Filter Configuration" (see Figure 5.2), p4 / (17), Institute of Electronics, Information and Communication Engineers, 2012, [Retrieved September 2, 2022], Internet<URL:https: / / www.ieice-hbkb.org / files / 02 / 02gun_06hen_05.pdf>

[0106] [ka]

[0107] Next, with reference to Figure 12, the operation procedure for sound field presence reproduction by the sound field presence reproduction device 20B will be described. Figure 12 is a flowchart showing a time-series example of the operation procedure for sound field presence reproduction by the sound field presence reproduction device 20B according to Embodiment 2. In the explanation of Figure 12, the same step numbers are assigned to content that overlaps with the explanation of Figure 6, and the explanation is simplified or omitted, while different content is explained.

[0108]

change

[0109]

change

[0110]

change

[0111]

change

[0112] As described above, in the sound field presence reproduction system 100B according to Embodiment 2, the sound field presence reproduction device 20B includes an acquisition unit (echo cancellation unit 21B to 2nB) that acquires at least the sound pickup signal (diffuse sound components DS1 to DS3) picked up by a sound pickup device (ambisonic microphone AMB1) placed in the sound pickup space (live venue LV1), an encoding unit 31 to 3n that encodes the sound source signals (sound signal SS1, voice signal SS2, sound signal SS3) of one or more sound sources in the sound pickup space, and a reference signal for the sound pickup. The system includes an erasure unit (echo cancellation unit 21B~2nB) that performs an erasure process to erase the reference signal component contained in the signal; a generation unit (decoding unit 25B) that generates speaker drive signals for each of several speakers (satellite speakers SPk1~SPkp) placed in a reproduction space (satellite venue STL1) different from the sound pickup space, based on the signal after the erasure process, in order to reproduce the sound field presence in the sound pickup space within the reproduction space; and a sound field reproduction unit 26 that outputs speaker drive signals for each of the multiple speakers. As a result, the sound field presence reproduction device 20B according to Embodiment 2 can reproduce with high accuracy in at least one satellite venue the atmosphere of presence on the audience side within the sound pickup space (live venue LV1) mainly picked up by the ambisonic microphone AMB1, by eliminating the components of one or more individual sound sources (reference signals) within the live venue LV1 picked up by the ambisonic microphone AMB1 in the ambisonic domain rather than in the time domain.

[0113] Furthermore, the sound pickup signals acquired by the acquisition unit (echo cancellation unit 21B to 2nB) are signals encoded using the directional information of each of the multiple microphone elements Mc1 to Mc4 provided by the sound pickup device (ambisonic microphone AMB1). As a result, the sound field presence reproduction device 20B can acquire a first-order ambisonic signal with high directional resolution as the input signal to be processed for erasure in the erasure unit (each of the echo cancellation unit 21B to 2nB).

[0114] Furthermore, the sound field presence reproduction device 20B further includes a speaker direction specification unit 24 that specifies the direction information of multiple speakers (satellite speakers SPk1 to SPkp) within the reproduction space (satellite venue STL1). The generation unit (decoding unit 25B) generates speaker drive signals for each of the multiple speakers using the direction information of each of the multiple speakers and the signal after the erasure process. As a result, the sound field presence reproduction device 20B can accurately generate speaker drive signals with high directional resolution that can reproduce the sense of presence on the audience side within the live venue LV1 and have high directional resolution by performing a decoding process using the signal after the erasure process in the ambisonics region, taking into account the direction information of each of the multiple satellite speakers SPk1 to SPkp from their respective reference positions (for example, see the central position LSP1 corresponding to the listener's position).

[0115] Furthermore, the sound field presence reproduction device 20B is further equipped with sound source position specification units 41 to 4n that specify the position information of one or more sound sources within the sound collection space (live venue LV1). Each of the encoding units 31 to 3n performs encoding processing using the sound source signal and position information of the corresponding sound source. As a result, the sound field presence reproduction device 20B can generate a highly accurate reference signal necessary for the erasure processing of the erasure unit (echo cancellation units 21B to 2nB) by taking into account the direction in which individual sound sources exist within the live venue LV1.

[0116] Furthermore, the erasure unit (echo cancellation unit 21B~2nB) is composed of a number of single echo cancellers (e.g., 4n (=4×n)) determined based on the number of microphone elements Mc1~Mc4 (e.g., 4) and the number of sound sources (e.g., n) of the sound pickup device (ambisonics microphone AMB1). The single echo canceller receives the signal after the sound source signal of the corresponding sound source has been encoded and performs the erasure process. As a result, the sound field presence reproduction device 20B can erase (suppress) the components of the reference signal (i.e., the first-order ambisonics signal based on the individual sound source sound signal and individual sound source direction picked up by the individual sound pickup microphone Mn) contained in the signal component based on the sound pickup signal of each microphone element of the ambisonics microphone AMB1 (e.g., the signal having resolution in each direction of W, X, Y, and Z as shown in Figure 1) with high precision, especially when there is no correlation between the reference signals (in other words, when it is below a predetermined threshold to the extent that the crosstalk component can be considered to be substantially absent).

[0117] (Modified version of Embodiment 2) Embodiment 2 describes an example in which each of the echo cancellation units 21B to 2nB in a sound field presence reproduction device is configured as a Single EchoCanceller. A modification of Embodiment 2 describes an example in which, instead of the echo cancellation units 21B to 2nB in a sound field presence reproduction device, a multi-channel echo canceller is configured to handle multiple audio channels in the ambisonic domain rather than in the time domain. In the modification of Embodiment 2, configurations and contents that overlap with Embodiments 1 and 2 are given corresponding common reference numerals to simplify or omit the explanation, while different contents are explained.

[0118] First, with reference to Figures 13 and 14, the system configuration and operation overview of the sound field presence reproduction system 100C according to a modified example of Embodiment 2 will be described. Figure 13 is a block diagram showing an example of the system configuration of the sound field presence reproduction system 100C according to a modified example of Embodiment 2. Figure 14 is a diagram showing an example of the operation overview from sound field presence sound acquisition to sound field presence reproduction in the sound field presence reproduction system 100C of Figure 13.

[0119] The sound field presence reproduction system 100C includes a sound field presence sound recording device 10B (see Figure 10) and a sound field presence reproduction device 20C. The sound field presence sound recording device 10B and the sound field presence reproduction device 20C are connected to each other via a network NW1, enabling data communication.

[0120] [ka]

[0121] [ka]

[0122] Here, the multi-channel echo cancellation unit 21C may be configured as an echo canceller using multiple adaptive filters operating in the ambisonics region, for example. This echo canceller can be configured as a Multi Channel EchoCanceller, for example. The configuration of this Multi Channel EchoCanceller may be based on the stereo echo canceller shown in the reference non-patent document, which is an example where two reference signals are input to the Multi Channel EchoCanceller. Therefore, as shown in Figure 14, the multi-channel echo cancellation unit 21C can be configured using the same number of Multi Channel EchoCancellers or stereo echo cancellers as the number of microphone elements (e.g., 4) of the ambisonics microphone AMB1. This Multi Channel EchoCanceller or stereo echo canceller may be a configuration disclosed in the reference non-patent document or a configuration obtained by referring to that configuration. By using this configuration, even if there is a correlation between the reference signals (in other words, when it exceeds a predetermined threshold that can be considered to contain no crosstalk components), it becomes possible to eliminate (suppress) the components of the reference signal (i.e., the individual sound source sound signals picked up by the individual sound pickup microphones Mn) that are included in the signal components based on the sound picked up by the ambisonic microphone AMB1 (for example, the signal with resolution in each of the W, X, Y, and Z directions shown in Figure 1) with high precision.

[0123] <Reference Non-Patent Literature> Chapter 5 Acoustic Echo Canceller, "Example of Stereo Echo Canceller Configuration" (see Figures 5 and 8), Institute of Electronics, Information and Communication Engineers, 2012, [Retrieved September 2, 2022], Internet<URL:https: / / www.ieice-hbkb.org / files / 02 / 02gun_06hen_05.pdf>

[0124] [ka]

[0125] Next, with reference to Figure 15, the operation procedure for sound field presence reproduction by the sound field presence reproduction device 20C will be described. Figure 15 is a flowchart showing in chronological order an example of the operation procedure for sound field presence reproduction by the sound field presence reproduction device 20C according to a modified example of Embodiment 2. In the explanation of Figure 15, the same step numbers are assigned to content that overlaps with the explanations of Figures 6, 9, or 12, and the explanation is simplified or omitted, while different content is explained.

[0126] [ka]

[0127] [ka]

[0128] As described above, in the sound field presence reproduction device 20C according to a modified embodiment of Embodiment 2, the erasure unit (multi-channel echo cancellation unit 21C) is composed of a number of multi-channel echo cancellers determined based on the number of microphone elements Mc1 to Mc4 provided in the sound acquisition device (ambisonics microphone AMB1). The multi-channel echo canceller receives the signal after the sound source signal corresponding to each of the multiple sound sources has been encoded and performs erasure processing (multi-channel echo cancellation processing). As a result, the sound field presence reproduction device 20C can erase (suppress) the components of the reference signal (i.e., individual sound source sound signals acquired by individual sound acquisition microphones) included in the signal component based on the sound acquisition signal of each microphone element of the ambisonics microphone AMB1 (for example, the signal having resolution in each direction of W, X, Y, and Z as shown in Figure 1) with high precision in the ambisonics region, even if there is a correlation between the reference signals (in other words, when it is above a predetermined threshold to the extent that it can be considered that no crosstalk components are included).

[0129] While embodiments have been described above with reference to the attached drawings, this disclosure is not limited to such examples. It is clear to those skilled in the art that various modifications, alterations, substitutions, additions, deletions, and equivalents can be conceived within the scope of the claims, and these are also understood to fall within the technical scope of this disclosure. Furthermore, the components of the embodiments described above can be combined in any way without departing from the spirit of the invention. [Industrial applicability]

[0130] This disclosure is useful as a sound field presence reproduction device and sound field presence reproduction method for reproducing with high accuracy the atmosphere of audience presence in a sound recording space captured using an ambisonic microphone, within at least one satellite venue. [Explanation of symbols]

[0131] 7 A / D conversion section 8, 22, 31, 3n encoder 9.23 Microphone element direction selection section 10, 10B Sound Field Immersion Sound Recording System 20, 20A, 20B, 20C Sound field realistic reproduction device 21, 2n, 21B, 2nB echo cancellation section 21A, 21C Multi-channel echo cancellation section 24 Speaker direction selection section 25, 25B, 25C decoding section 26 Sound field reproduction section 41, 4n Sound source position specification section 100, 100A, 100B, 100C Sound Field Immersion Reproduction System AMB1 Ambisonics Microphone M1, Mn Individually Recording Microphones SPk1, SPkp Satellite Speakers

Claims

1. An acquisition unit that acquires at least the sound pickup signal picked up by a sound pickup device placed in the sound pickup space, An encoding unit that encodes the sound source signals of one or more sound sources in the sound collection space, An erasing unit that uses the sound source signal after the encoding process as a reference signal and performs an erasing process to erase the component of the reference signal included in the acquired sound signal, Based on the signal after the erasure process, a generation unit generates speaker drive signals for each of several speakers arranged in a reproduction space different from the sound pickup space, for reproducing the sound field presence in the sound pickup space within the reproduction space. The system comprises a sound field reproduction unit that outputs a speaker drive signal for each of the plurality of speakers, A device for reproducing the sense of presence in a sound field.

2. The sound-collecting signal is a signal encoded using the directional information of the multiple microphone elements provided by the sound-collecting device. The sound field presence reproduction device according to claim 1.

3. The system further includes a speaker direction designation unit that designates the direction information of the plurality of speakers within the reproduction space, The generation unit generates a speaker drive signal for each of the multiple speakers using the direction information of each of the multiple speakers and the signal after the erasure process. The sound field presence reproduction device according to claim 1.

4. The system further comprises a sound source positioning unit that specifies the position information of one or more sound sources within the sound collection space, The encoding unit performs the encoding process using the sound source signal of the corresponding sound source and the position information. The sound field presence reproduction device according to claim 1.

5. The erasure unit is composed of a number of single echo cancellers determined based on the number of microphone elements in the sound-collecting device and the number of sound sources. The single echo canceller inputs the signal after the sound source signal of the corresponding sound source has been encoded and performs the cancellation process. The sound field presence reproduction device according to claim 4.

6. The erasure unit is composed of a number of multi-channel echo cancellers determined based on the number of microphone elements provided in the sound-collecting device. The multi-channel echo canceller inputs the signal after the encoding process has been performed on the sound source signal corresponding to each of the multiple sound sources and performs the cancellation process. The sound field presence reproduction device according to claim 4.