The description of the virtual
sound image by three-dimensional audio includes
direction information (
horizontal angle x, height angle y) and distance information z, while the
sound image reconstructed by traditional stereo and
surround sound only has
degrees of freedom in the horizontal direction and height , does not conform to the definition of three-dimensional audio, resulting in the inconsistency of audio and video experience when the 3D
multimedia system is reconstructed, and cannot provide the audience with a real sense of immersion and envelopment in the sense of hearing, making it difficult to have an immersive feeling
[0008] First, the
sound image reconstructed by traditional stereo and
surround sound only has
degrees of freedom in the horizontal direction and height, which does not conform to the definition of 3D audio, resulting in the inconsistency of audio and video experience in the reconstruction of the 3D
multimedia system, which cannot provide The audience provides a real sense of immersion and envelopment in the
auditory sense, and it is difficult to have an immersive feeling. Most of the existing research on 3D audio technology focuses on the restoration of the direction of the sound source, and a few studies on distance restoration are only Concentrated in the free
sound field, but in the free
sound field, relying on intensity cues can only provide the relative distance of the sound source, and cannot provide the listener with accurate absolute distance information of the sound source; the research on distance
perception in the reverberant environment is only limited. Limited to collecting the listening position in the reconstructed sound field through the
microphone array and then analyzing the relationship between the distance of the sound source and the factors affecting it, there has not been a general restoration of the sound source
distance model so far, which can be restored according to the sound source to be restored. Distance to control the signal played by the speaker, the virtual sound image positioning parameters and distance parameters consistent with the original sound source cannot be obtained by controlling the signal of the speaker, and the virtual sound image consistent with the spatial information of the target object sound source in the 3D video cannot be constructed;
[0009] Second, when speakers are used as playback devices, although the restoration effect based on the physical sound field reconstruction technology is more accurate and the recovery area is large, it requires a large number of speakers and strict restrictions on the placement of speakers. Difficult to implement in practice
[0010] Third, the rapid development of the 3D film and television industry and the start of the
standardization process of 3D audio by the Dynamic Image
Expert Group have aroused widespread concern in the field of 3D audio technology. Because the 2D audio system developed based on traditional stereo or surround sound lacks the three-dimensional space of the sound source The expression of information seriously damages the audience's real and complete spatial experience of audio-visual events
Both
Ambisonics and WFS technologies in the prior art require a large number of speakers and have strict restrictions on the arrangement of speakers. The sound image formed by the
perception-based sound field reconstruction technology is only located on the spherical surface where the speakers are located, and the listener cannot perceive it. Sound source distance information outside the spherical surface, while HRTF mainly uses
headphones for playback and is closely related to individual differences in people, which has great limitations;
[0011] Fourth, if it is necessary to more accurately and comprehensively simulate the reverberation effect in a specific room, the spatial sound effect reverberation model still has the following deficiencies: First, when the spatial sound effect reverberation model simulates reverberation, the
delay parameters and attenuation of the 19th-order FIR filter The coefficients are still the parameters when simulating the Boston Symphony Hall, and it is not proposed to set relevant parameters for specific occasions; second, the feedback
gain coefficients of the
comb filter are all calculated based on the average reverberation time in the room, but due to the influence of air on the sound The
energy absorption is affected by the frequency of the signal. The energy attenuation degree of the sound in different frequency bands is different in different environments, that is, the corresponding reverberation time is different, especially for high-frequency signals. Although the model adds low The pass filter can change the reverberation time of the high-frequency part of the sound, but it only roughly controls the reverberation time, and cannot accurately reflect the reverberation time of different frequency signals
[0012] Fifth, the existing technology lacks the recovery of the distance of the sound image in the non-
free field, and no
effective method is proposed to effectively control the energy ratio of the direct sound and the reverberation sound; the appropriate reverberation in the prior art can make the sound heard by the listener It is clearer and brighter, but if the reverberation is particularly large, it will bring a bad feeling to the listener; although the current VBAP technology can reconstruct the
direction information of the original sound source, this method lacks the perception of the distance of the sound image. The listener's perception of distance seriously affects the listener's overall experience of watching 3D videos