Sound field adjustment method, electronic device, and storage medium

By separating and distributing the stereo signal of mid-to-low-end car audio systems, and utilizing diffusion phase difference and head position tracking technology, the problem of insufficient sound layering and spatial immersion in mid-to-low-end car audio systems has been solved, thus improving the listening experience.

CN122248324APending Publication Date: 2026-06-19WEIFANG GOERDYNA TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
WEIFANG GOERDYNA TECH CO LTD
Filing Date
2026-03-20
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Mid-to-low-end car audio systems cannot effectively separate near-field direct sound and far-field diffused sound, resulting in insufficient sound layering and spatial immersion, which weakens the user's listening experience.

Method used

By analyzing the stereo signals of the left and right channels, the stereo sum signal and stereo difference signal are determined. Ambient sound and direct sound are separated using diffusion phase difference and then distributed to different speakers for playback. Head position tracking and beamforming technology are combined to optimize audio signal transmission.

🎯Benefits of technology

It achieves effective separation of near-field direct sound and far-field diffused sound, enhancing the spatial sense, layering, and immersion of the sound, and providing an auditory experience that is closer to a real acoustic environment.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122248324A_ABST
    Figure CN122248324A_ABST
Patent Text Reader

Abstract

This application discloses a sound field adjustment method, electronic device, and storage medium, relating to the field of audio data processing technology. The method includes: determining a stereo sum signal and a stereo difference signal based on the acquired left-channel stereo signal and right-channel stereo signal; determining ambient sound signals for the left and right channels based on the stereo sum signal, stereo difference signal, and a preset diffusion phase difference, wherein the diffusion phase difference is the instantaneous phase difference between the left-channel stereo signal and the right-channel stereo signal; determining the direct sound signals for the left and right channels based on the left-channel stereo signal, right-channel stereo signal, and ambient sound signals for both channels; and determining the target loudspeakers for the direct sound signals and ambient sound signals, respectively. This application effectively separates the near-field direct sound from the stereo signal and the ambient diffuse sound used to create space, enhancing the spatial sense, layering, and immersion of the sound, providing users with a listening experience closer to a real acoustic environment.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of audio data processing technology, and in particular to sound field adjustment methods, electronic devices and storage media. Background Technology

[0002] In-car audio systems are one of the core components for enhancing the driving experience, and their acoustic performance directly affects the user's sensory enjoyment. With the deep integration of the automotive industry and consumer electronics technology, users' quality requirements for in-car audio systems have upgraded from basic sound output to the pursuit of a high-fidelity, immersive listening experience.

[0003] Currently, the in-car audio systems in mid-to-low-end vehicles mainly rely on speaker units located in the doors, dashboard, and other areas for sound reproduction. These units are driven by stereo or surround sound processing technology, and the audio signals are divided and amplified before being sent to the individual physical speakers for reproduction. However, both near-field sounds such as vocals and lead instruments, and far-field diffused sounds such as reverberation, harmony, and ambient sounds, are mixed into a unified electrical signal and played from the same set of physical speakers, weakening the sense of sound layering and spatial immersion.

[0004] The above content is only used to help understand the technical solution of this application and does not represent an admission that the above content is prior art. Summary of the Invention

[0005] The main purpose of this application is to provide a sound field adjustment method, electronic device and storage medium, which aims to solve the technical problem of how to improve the immersiveness of sound.

[0006] To achieve the above objectives, this application proposes a sound field adjustment method, the method comprising: Based on the acquired left channel stereo signal and right channel stereo signal, determine the stereo sum signal and stereo difference signal; Based on the stereo signal, the stereo difference signal, and the preset diffusion phase difference, the ambient sound signals of the left and right channels are determined, wherein the diffusion phase difference is the instantaneous phase difference between the left channel stereo signal and the right channel stereo signal. Based on the left channel stereo signal, the right channel stereo signal, and the ambient sound signals of the left and right channels, determine the direct sound signals of the left and right channels; The target loudspeakers for the direct sound signal and the ambient sound signal are determined respectively.

[0007] In one embodiment, the step of determining the ambient sound signals of the left and right channels based on the stereo sum signal, the stereo difference signal, and a preset diffusion phase difference includes: The sign of the stereo difference signal is determined based on the diffusion phase difference and the preset sign function; The target difference signal is obtained by multiplying the symbol mark and the stereo difference signal; The ambient sound signal of the left channel is determined based on a preset mixing coefficient, a weighted sum of the stereo sum signal and the target difference signal, wherein the mixing coefficient includes a sum signal coefficient and a difference signal coefficient, and the sum signal coefficient is greater than the difference signal coefficient; The ambient sound signal of the right channel is determined based on the weighted difference between the mixing coefficient, the stereo signal, and the target difference signal.

[0008] In one embodiment, the step of determining the ambient sound signal of the left channel based on a preset mixing coefficient, a weighted sum of the stereo sum signal and the target difference signal includes: The weighted sum is low-pass filtered to obtain the ambient sound signal of the left channel; The step of determining the ambient sound signal of the right channel based on the weighted difference of the mixing coefficient, the stereo signal, and the target difference signal includes: The weighted difference is low-pass filtered to obtain the ambient sound signal of the right channel.

[0009] In one embodiment, the step of determining the target loudspeaker for the direct sound signal and the ambient sound signal respectively includes: Based on the acoustic image position of the direct acoustic signal, determine the acoustic image change rate of each frame of direct acoustic signal; The direct sound signal with a sound image change rate less than a preset change rate threshold and a sound image position greater than a preset position threshold is identified as the target direct element, and the front loudspeaker is identified as the target loudspeaker of the target direct element. Based on the transient energy ratio and spectral tilt of the ambient sound signal, the ambient sound signal is divided into far-field ambient sound elements and surrounding ambient sound elements. The target loudspeaker for the far-field ambient sound element is determined to be a ceiling loudspeaker, and the target loudspeaker for the surround ambient sound element is determined to be a headrest loudspeaker.

[0010] In one embodiment, prior to the step of dividing the ambient sound signal into far-field ambient sound elements and surrounding ambient sound elements based on the transient energy ratio and spectral tilt of the ambient sound signal, the method further includes: Acquire the transient and steady-state energy of the ambient sound signal; The transient energy ratio of the ambient sound signal is determined based on the ratio between the steady-state energy and the transient energy of the ambient sound signal.

[0011] In one embodiment, after the step of determining the target loudspeaker for the direct sound signal and the ambient sound signal, respectively, the method further includes: The head position of the target user is determined by a preset head position tracker; Based on the head position, the head deflection direction is determined, and the head deflection direction is determined as the beamforming direction of the audio signal of each target loudspeaker; Based on the head position and the relative position between each of the target speakers, head-related transfer function compensation is performed on the audio signals of each target speaker so that the sound source position of each audio signal perceived by the target user remains unchanged.

[0012] In one embodiment, the step of performing head-related transfer function compensation on the audio signals of each of the target speakers based on the head position and the relative position between each of the target speakers includes: For any target loudspeaker, the target head-related transfer function filter of the target loudspeaker is determined according to the preset mapping relationship between the relative position and the head-related transfer function filter, and the relative position between the head position and the target loudspeaker. The time-frequency characteristics of the audio signal of the target loudspeaker are adjusted by the target head-related transfer function filter, wherein the time-frequency characteristics include delay, gain, and phase.

[0013] In one embodiment, after the step of determining the target loudspeaker for the direct sound signal and the ambient sound signal, respectively, the method further includes: Ambient noise is collected using a pre-set microphone array; The gain of the audio signal of each target loudspeaker is adjusted according to the spectrum and intensity of the ambient noise.

[0014] Furthermore, to achieve the above objectives, this application also proposes a sound field adjustment device, the sound field adjustment device comprising: The sum and difference signal determination module is used to determine the stereo sum signal and stereo difference signal based on the acquired left channel stereo signal and right channel stereo signal. An ambient sound signal determination module is used to determine the ambient sound signals of the left and right channels based on the stereo signal, the stereo difference signal, and a preset diffusion phase difference, wherein the diffusion phase difference is the instantaneous phase difference between the left channel stereo signal and the right channel stereo signal. The direct sound signal determination module is used to determine the direct sound signals of the left and right channels based on the left channel stereo signal, the right channel stereo signal, and the ambient sound signals of the left and right channels. A loudspeaker allocation module is used to determine the target loudspeakers for the direct sound signal and the ambient sound signal, respectively.

[0015] In addition, to achieve the above objectives, this application also proposes an electronic device, the device comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the sound field adjustment method as described above.

[0016] In addition, to achieve the above objectives, this application also proposes a storage medium, which is a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, it implements the steps of the sound field adjustment method described above.

[0017] In addition, to achieve the above objectives, this application also provides a computer program product, which includes a computer program that, when executed by a processor, implements the steps of the sound field adjustment method described above.

[0018] The one or more technical solutions proposed in this application have at least the following technical effects: First, based on the acquired left-channel stereo signal and right-channel stereo signal, the stereo sum signal and stereo difference signal are determined. The stereo sum signal reflects the common characteristics of the audio signals, and the stereo difference signal reflects the difference information between the left and right channels, facilitating the separation of sound field components. Then, based on the stereo sum signal, stereo difference signal, and a preset diffusion phase difference (i.e., the instantaneous phase difference between the stereo signals of the left and right channels), the ambient sound signals of the left and right channels are determined. The phase difference between the left and right channels can identify the diffusion field components such as reverberation and reflection that represent spatial information in the audio, thus achieving the separation of ambient sound. Furthermore, based on the left-channel stereo signal, right-channel stereo signal, and ambient sound signals of the left and right channels, the direct sound signals of the left and right channels are determined, and the target loudspeakers for the direct sound signal and ambient sound signal are determined respectively, so that the direct sound and ambient sound can be played through different loudspeakers, realizing the spatial separation and optimized layout of different acoustic components at the physical sound source, thereby enhancing the immersiveness of the sound. This application effectively separates the near-field direct sound in the stereo signal from the ambient diffuse sound used to create space, thereby enhancing the spatial sense, layering, and immersion of the sound, and bringing users an auditory experience that is closer to a real acoustic environment. Attached Figure Description

[0019] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0020] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0021] Figure 1 This is a flowchart illustrating an embodiment of the sound field adjustment method of this application. Figure 2 A schematic diagram of the overall flow of the sound field adjustment method provided in Embodiment 1 of this application; Figure 3 This is a system structure block diagram of the sound field adjustment method provided in Embodiment 2 of this application; Figure 4 This is a schematic diagram of the module structure of the sound field adjustment device according to an embodiment of this application; Figure 5 This is a schematic diagram of the device structure of the hardware operating environment involved in the sound field adjustment method in the embodiments of this application.

[0022] The purpose, features, and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0023] It should be understood that the specific embodiments described herein are merely illustrative of the technical solutions of this application and are not intended to limit this application.

[0024] To better understand the technical solution of this application, a detailed description will be provided below in conjunction with the accompanying drawings and specific implementation methods.

[0025] Currently, the in-car audio systems in mid-to-low-end vehicles mainly rely on speaker units located in the doors, dashboard, and other areas for sound reproduction. These units are driven by stereo or surround sound processing technology, and the audio signals are divided and amplified before being sent to the individual physical speakers for reproduction. However, both near-field sounds such as vocals and lead instruments, and far-field diffused sounds such as reverberation, harmony, and ambient sounds, are mixed into a unified electrical signal and played from the same set of physical speakers, weakening the sense of sound layering and spatial immersion.

[0026] This application provides a sound field adjustment method. Based on the acquired left-channel and right-channel stereo signals, this application determines the stereo sum signal and stereo difference signal. The stereo sum signal reflects the common characteristics of the audio signals, while the stereo difference signal reflects the differences between the left and right channels, facilitating sound field component separation. Furthermore, based on the stereo sum signal, stereo difference signal, and a preset diffusion phase difference (i.e., the instantaneous phase difference between the stereo signals of the left and right channels), the ambient sound signals of the left and right channels are determined. The phase difference between the left and right channels allows identification of diffusion field components representing spatial information, such as reverberation and reflection, in the audio, achieving ambient sound separation. Finally, based on the left-channel stereo signal, right-channel stereo signal, and ambient sound signals of the left and right channels, the direct sound signals of the left and right channels are determined, and the target loudspeakers for the direct sound signals and ambient sound signals are determined respectively. This allows the direct sound and ambient sound to be played through different loudspeakers, achieving spatial separation and optimized layout of different acoustic components at the physical sound source, thereby enhancing the immersive experience of the sound. This application effectively separates the near-field direct sound in the stereo signal from the ambient diffuse sound used to create space, thereby enhancing the spatial sense, layering, and immersion of the sound, and bringing users an auditory experience that is closer to a real acoustic environment.

[0027] It should be noted that the executing entity in this embodiment can be an electronic device with data processing, network communication and program execution functions, such as an in-vehicle terminal, tablet computer, personal computer, mobile phone, etc.

[0028] Based on this, embodiments of this application provide a sound field adjustment method, referring to... Figure 1 , Figure 1 This is a flowchart illustrating the first embodiment of the sound field adjustment method of this application.

[0029] In this embodiment, the sound field adjustment method includes steps S10 to S40: Step S10: Based on the acquired left channel stereo signal and right channel stereo signal, determine the stereo sum signal and stereo difference signal. Stereo signals are audio signals containing two or more independent audio channels, used to simulate the distribution of real sound in three-dimensional space. They typically contain two channels: a left channel and a right channel, to simulate the distribution of sound on the user's left and right sides, creating a stereo effect. Specifically, the left channel stereo signal is the set of audio signals corresponding to the left channel in a stereo signal, and the right channel stereo signal is the set of audio signals corresponding to the right channel in a stereo signal.

[0030] Stereo sum signal refers to a new audio signal sequence obtained by performing scalar addition on the left channel stereo signal L(t) and the right channel stereo signal R(t) at each sampling point t. It mainly retains information shared by the left and right channels, such as stereo sum signal. In the frequency domain, stereo sound and signals can be viewed as a vector sum of the corresponding spectra of L(t) and R(t).

[0031] Stereo difference signal refers to a new audio signal sequence obtained by performing a scalar subtraction operation between the left channel stereo signal L(t) and the right channel stereo signal R(t) at each sampling point t. It is used to preserve components that differ between the left and right channels (such as phase differences, amplitude differences, etc.), such as stereo difference signal. In the frequency domain, the stereo difference signal can be viewed as the vector difference between the corresponding spectra of L(t) and R(t).

[0032] Optionally, a Short Time Fourier Transform (STFT) can be directly performed on the stereo sum signal S(t) and stereo difference signal D(t) to obtain the stereo sum signal S(n,k) and stereo difference signal D(n,k) in the frequency domain, where n is the frame index and k is the frequency index.

[0033] Optionally, a short-time Fourier transform can be performed on the left channel stereo signal and the right channel stereo signal to obtain the left channel frequency domain signal L(n,k) and the right channel frequency domain signal R(n,k); based on this, a sum-difference operation is performed on the left channel frequency domain signal L(n,k) and the right channel frequency domain signal R(n,k) to obtain the stereo sum signal and stereo difference signal in the frequency domain.

[0034] Step S20: Determine the ambient sound signals of the left and right channels based on the stereo signal, the stereo difference signal and the preset diffusion phase difference, wherein the diffusion phase difference is the instantaneous phase difference between the left channel stereo signal and the right channel stereo signal. Ambient sound signals refer to the parts of an audio signal that are related to the spatial environment, such as reverberation and ambient sound. They are usually diffuse and non-directional, and are used to enhance the spatial sense and immersive feeling of the sound.

[0035] The spread phase difference refers to the difference in instantaneous phase angle between the left channel signal and the right channel signal at a specific moment or frequency band. It can be obtained by performing a short-time Fourier transform on the left channel stereo signal and the right channel stereo signal and then calculating the phase difference of each frequency in the complex frequency domain. This spread phase difference changes dynamically with time, frequency, and other factors.

[0036] In one feasible implementation, step S20 includes: Step S21: Determine the sign of the stereo difference signal based on the diffusion phase difference and the preset sign function; Symbolic function refers to The function takes the instantaneous phase difference (diffusion phase difference) as input and outputs a scalar value, typically +1, -1, or 0.

[0037] The symbol marker is a marker used to indicate the polarity or weighting direction applied to the stereo difference signal in subsequent processing. It is obtained by applying the sign function to the instantaneous phase difference between the left and right channel stereo signals at each sampling point. It is used to retain or reverse the stereo difference signal according to the phase relationship, ensuring the decorrelation of the ambient sound and enhancing its diffusion and immersion.

[0038] For example, when the input data is greater than 0, the sign of the output stereo difference signal is +1; when the input data is equal to 0, the sign of the output is 0; and when the input data is less than 0, the sign of the output is -1.

[0039] Step S22: Obtain the target difference signal based on the product of the symbol marking and the stereo difference signal; The target difference signal refers to the new signal obtained by performing element-wise multiplication of the symbolic marker with the stereo difference signal D(n,k), and can be represented as: ,in, This represents the diffusion phase difference (i.e., the instantaneous phase difference between the stereo signals of the left and right channels).

[0040] Step S23: Determine the ambient sound signal of the left channel based on the preset mixing coefficients, the weighted sum of the stereo signal and the target difference signal, wherein the mixing coefficients include the sum signal coefficient and the difference signal coefficient, and the sum signal coefficient is greater than the difference signal coefficient. The mixing coefficients refer to a set of parameters used to adjust the weight of different signal components in the synthesis process. In this embodiment, the mixing coefficients include the sum signal coefficient and the difference signal coefficient, which respectively determine the relative contribution of the stereo sum signal and the target difference signal in generating the ambient sound signal. That is, the sum signal coefficient and the difference signal coefficient control the ratio of the central sound image and the ambient diffuse sound, respectively.

[0041] For example, for any frame and any frequency, the current stereo sum signal and stereo difference signal are acquired; the instantaneous phase difference between the current left channel stereo signal and right channel stereo signal is processed using a preset sign function to determine the sign of the stereo difference signal, and the sign is multiplied by the stereo difference signal to obtain the target difference signal; then, two scalar multiplication operations are performed, multiplying the stereo sum signal by the sum signal coefficients and multiplying the target difference signal by the difference signal coefficients to obtain the weighted signal components; finally, the weighted sum signal is added to the weighted difference signal to obtain the ambient sound signal of the left channel. The difference signal coefficients are less than the sum signal coefficients.

[0042] In one feasible implementation, step S23 includes: Step S231: Perform low-pass filtering on the weighted sum to obtain the ambient sound signal of the left channel; Low-pass filtering is a signal processing technique that allows signals below a specific cutoff frequency to pass through while attenuating signals above that frequency. In this embodiment, to attenuate high-frequency ambient sound signals, the cutoff frequency can be set between 8kHz and 12kHz.

[0043] For example, the weighted sum of the stereo sum and the target difference signal is input into a preset low-pass filter (such as a second-order infinite impulse response filter). The low-pass filter processes the weighted sum signal according to its set cutoff frequency, allowing only signal components below the cutoff frequency to pass through, while attenuating high-frequency components, thereby obtaining the ambient sound signal of the left channel.

[0044] Understandably, using low-pass filtering to simulate the high-frequency attenuation characteristics in a real environment helps to better reproduce the real audio scene and create a more realistic sound field environment.

[0045] Step S24: Determine the ambient sound signal of the right channel based on the mixing coefficient, the weighted difference between the stereo signal and the target difference signal.

[0046] For example, after obtaining the weighted signal components (including the weighted sum signal and the weighted difference signal) in step S23 above, the weighted sum signal is subtracted from the weighted difference signal to obtain the ambient sound signal of the right channel.

[0047] In one feasible implementation, step S24 includes: Step S241: Perform low-pass filtering on the weighted difference to obtain the ambient sound signal of the right channel.

[0048] For example, the low-pass filtering process for the weighted difference signal of the stereo signal and the target difference signal can refer to step S231 above, and will not be repeated here.

[0049] For example, the formula for determining the ambient sound signals of the left and right channels in the frequency domain is as follows:

[0050]

[0051] Where AL(n,k) and AR(n,k) are the ambient sound components extracted from the left and right channels, respectively, n is the frame index, and k is the frequency index; LPF(k) is a preset filter used to simulate the high-frequency attenuation characteristics of ambient sound; S(n,k) and D(n,k) represent the stereo sum signal and stereo difference signal, respectively; α and β are mixing coefficients, where α is the sum signal coefficient, 0 < α < 1, and β is the difference signal coefficient, 0 < β < 1, and α > β; LR(n,k) is the instantaneous phase difference between the left and right stereo channels, and sgn() is the sign function, which is used to preserve or reverse the difference signal according to the phase relationship, to ensure the decorrelation of the ambient sound and enhance its diffusion and immersion.

[0052] In this embodiment, the stereo sum and stereo difference signals are redistributed, and the left and right channels are subjected to sum-difference operations according to the Mid-Side stereo system to obtain the stereo sum signal reflecting the core audio content and the stereo difference signal reflecting environmental information. Then, after weighted adjustment and low-pass filtering of the stereo sum and stereo difference signals, the ambient sound signals for the left and right channels are obtained. The inclusion of the stereo sum signal reflecting the core audio content is to ensure a certain stability of the center sound and prevent pure ambient sound from sounding too hollow.

[0053] Step S30: Determine the direct sound signals of the left and right channels based on the left channel stereo signal, the right channel stereo signal, and the ambient sound signals of the left and right channels. A direct sound signal refers to an audio signal that travels directly from the sound source to the listener's ear without undergoing multiple reflections and attenuation. It typically includes near-field sounds such as lead vocals and lead instruments. In this embodiment, the direct sound signal can be obtained by removing ambient sound signals and other components from the original stereo signal, and it is characterized by its clarity and accurate directional positioning information.

[0054] For example, the direct sound signals of the left and right channels can be obtained by subtracting the corresponding ambient sound signals from the stereo signals of the left and right channels using spectral subtraction.

[0055] For example, the formula for determining the direct sound signal in the frequency domain is as follows:

[0056]

[0057] Where AL(n,k) and AR(n,k) are the ambient sound components extracted from the left and right channels, respectively; DL(n,k) and DR(n,k) are the direct sound components extracted from the left and right channels, respectively; γ is a subtraction factor used to control the degree of separation, and its value is usually close to 1.

[0058] Step S40: Determine the target loudspeakers for the direct sound signal and the ambient sound signal, respectively.

[0059] The target loudspeaker refers to a physical sound-emitting unit or its logical grouping used to play a specific type of signal (such as direct sound or ambient sound). In this embodiment, the routing decision of different signals can be determined based on the physical location of the loudspeaker (such as the front door, ceiling, headrest position, etc.), frequency response characteristics, and preset sound field model, so as to optimize the sound field performance of the audio signals played by each loudspeaker.

[0060] For example, based on the characteristics of the direct sound signal and the ambient sound signal, the direct sound signal can be assigned to the loudspeaker near the audience, and the ambient sound signal can be assigned to the surround sound speaker in the target venue; and the generated direct sound signal and ambient sound signal can be transmitted to the corresponding target loudspeaker for optimized playback.

[0061] Optionally, before each speaker plays its corresponding audio signal, DSP (Digital Signal Processing) technology can be used to perform independent gain, equalization, delay, and phase calibration on the signals corresponding to each speaker, ensuring that when all speakers work together, the sound image positioning is accurate, the frequency response is smooth, and the sound field blending is natural.

[0062] For example, the gain and delay of the audio signal corresponding to each speaker can be adjusted according to the distance between each speaker and the listener. For instance, the closer the speaker is, the smaller the gain and the greater the delay of its corresponding audio signal, while the farther the speaker is, the smaller the gain and the smaller the delay of its corresponding audio signal, so as to ensure that the sound playback of speakers at different distances perceived by the listener is synchronized.

[0063] In one feasible implementation, step S40 includes: Step S41: Determine the rate of change of the sound image of each frame of direct sound signal based on the sound image position of the direct sound signal. Sound image position refers to the spatial position information of an audio signal in a sound field, which can be calculated and represented by factors such as signal strength and phase difference between different channels.

[0064] The rate of change of sound image refers to the rate at which the position of the sound image changes over time, reflecting the speed of the sound source's movement in the audio signal. A lower rate of change of sound image indicates that the sound source is basically stationary or moving slowly, and it is usually the core component of the audio signal.

[0065] For example, in the frequency domain, the formula for determining the sound image position of each frame of direct-arrival sound signal is as follows:

[0066] Wherein, DL(n,k) and DR(n,k) are the direct sound components extracted from the left and right channels, respectively, n is the frame index, and k is the frequency index; P(n) represents the sound image position of the direct sound signal in the nth frame, and its value range is [-1,1].

[0067] For example, after obtaining the acoustic image position of each frame of direct sound signal, the acoustic image change rate of each frame of direct sound signal can be obtained by performing differential or first derivative calculation on the acoustic image positions of adjacent frames of direct sound signal.

[0068] Step S42: Determine the direct sound signal with a sound image change rate less than a preset change rate threshold and a sound image position greater than a preset position threshold as the target direct element, and determine the front loudspeaker as the target loudspeaker of the target direct element. Both the rate of change threshold and the position threshold are preset parameters. The rate of change threshold is used to determine whether the rate of change of the sound image is small enough to identify a relatively stable direct sound signal. The position threshold is used to determine whether the position of the sound image is forward enough to determine whether the sound is suitable for playback through the front speaker.

[0069] The target direct element refers to the audio signal component that is selected from the direct sound signal, has a stable sound source and is located in the foreground. It represents the core sound content that needs to be stabilized and reproduced in the front sound field.

[0070] Front speakers refer to a combination of speakers that are physically located in front of the user, such as the center speaker on the dashboard in a vehicle's cabin.

[0071] For example, the calculated sound image change rate and sound image position of each frame of direct sound signal are compared with preset change rate thresholds and position thresholds, respectively. Direct sound signals with sound image change rate less than the threshold and sound image position greater than the threshold are selected and identified as target direct elements. Based on the preset sound field model (such as playing a relatively stable direct sound signal through a front speaker), the front speaker is identified as the target speaker of the target direct element.

[0072] Understandably, by filtering relatively stable direct sound signals based on the position and changes in the sound image, and then distributing them to the front speakers, interference and distortion that may occur when these sounds are played on other speakers can be avoided, making the sound clearer and purer, and improving the overall audio playback quality.

[0073] Step S43: Based on the transient energy ratio and spectral tilt of the ambient sound signal, the ambient sound signal is divided into far-field ambient sound elements and surrounding ambient sound elements. The Transient Energy Ratio (TER) is the ratio of the energy of transient components (such as sudden sounds) to that of steady-state components (such as continuous sounds) in a signal. It reflects the relative intensity of transient components in an audio signal. Transient components are usually related to the sudden changes and impactful characteristics of sound.

[0074] Spectral tilt is a parameter used to describe the degree of tilt in the frequency spectrum of an audio signal. It reflects how fast the energy of the signal changes in different frequency ranges. A positive value (or a high ratio) indicates that the spectrum is tilted towards lower frequencies. The larger the value, the richer the low frequencies and the more severe the high-frequency attenuation.

[0075] Far-field ambient sound elements refer to components in ambient sound signals that have a large spatial diffusion range. They typically have a low transient energy ratio and a high spectral tilt, representing ambient sounds generated at a greater distance. When these sounds reach the listener, their energy is relatively dispersed, such as wind and thunder.

[0076] Surround sound elements refer to components in an ambient sound signal that have a surround effect. They typically have a relatively high transient energy ratio and a low spectral tilt, such as artificial reverberation.

[0077] For example, ambient sound signals with a TER value less than a preset TER threshold and a spectral tilt greater than a preset tilt threshold (significant high-frequency attenuation) can be identified as far-field ambient sound elements, and ambient sound elements with a TER value greater than or equal to the TER threshold and a spectral tilt less than or equal to the tilt threshold (relatively flat spectrum) can be identified as surround ambient sound elements.

[0078] Step S44: Determine the target loudspeaker for the far-field ambient sound element as the ceiling loudspeaker, and determine the target loudspeaker for the surround ambient sound element as the headrest loudspeaker.

[0079] Ceiling speakers are audio playback devices installed on the ceiling (vehicle or room ceiling). They are positioned high and can play audio signals over a large area below.

[0080] Headrest speakers are audio playback devices installed around the user's head (such as in the headrest area of ​​a seat) to play audio signals from behind to the listener's ears. Combined with delay applications and the playback effects of other speakers, they can provide listeners with a surround sound effect.

[0081] For example, far-field ambient sound elements and surround ambient sound elements can be distributed to the corresponding target speakers according to a preset sound field model (e.g., far-field ambient sound elements are played through ceiling speakers and surround ambient sound elements are played through headrest speakers). A gain value matching the content being played can be configured for each speaker's output channel (e.g., left / right ceiling, left / right headrest) (e.g., higher gain for far-field elements on ceiling speakers, attenuated or zero gain for surround elements on headrest speakers) and a small delay (for sound image alignment).

[0082] Understandably, by dividing the ambient sound signal into far-field ambient sound elements and surround ambient sound elements, and distributing them to ceiling speakers and headrest speakers respectively, it is possible to better simulate the propagation and distribution of sound in space, creating a more realistic and three-dimensional sound field environment, and enhancing the spatial sense and immersiveness of the audio.

[0083] In one possible implementation, prior to step S43, the method further includes: Step S401: Obtain the transient energy and steady-state energy of the ambient sound signal; Transient energy refers to the energy change of an audio signal over a short period of time (usually within a short time window).

[0084] Steady-state energy refers to the energy possessed by the slow-changing, persistent components of an audio signal, reflecting the relatively stable energy state of the audio signal over a relatively long period of time.

[0085] Optionally, the transient energy of the ambient sound signal in the current frame can be obtained by calculating the energy change of the ambient sound signal between the current frame and the previous frame in the frequency domain, and by performing energy change calculations on the adjacent two frames at all frequencies; at the same time, the steady-state energy of the ambient sound signal in the current frame can be obtained by performing integration calculations on the ambient sound signal at all frequencies of the current frame.

[0086] For example, the formulas for calculating transient energy and steady-state energy are as follows:

[0087]

[0088] in, This represents the ambient sound signal, where n is the frame index and k is the frequency index. This represents the amplitude spectrum of the ambient sound signal in the nth frame at frequency k, which is the energy intensity of the ambient sound signal. It is obtained by taking the modulus of the complex spectrum of the ambient sound signal. Represents steady-state energy; It represents transient energy.

[0089] Step S402: Determine the transient energy ratio of the ambient sound signal based on the ratio between the steady-state energy and the transient energy of the ambient sound signal.

[0090] For example, the formula for calculating the transient energy ratio TER(n) of the ambient sound signal in the nth frame in the frequency domain is as follows:

[0091] In this embodiment, by acquiring the transient energy and steady-state energy of the ambient sound signal and calculating the transient energy ratio, it is possible to objectively distinguish between "calm" ambient sound and "active" ambient sound without relying on the specific signal content, so as to distinguish between far-field ambient sound elements and surrounding ambient sound elements in the future.

[0092] For example, please refer to Figure 2 , Figure 2 A schematic diagram of the overall flow of the sound field adjustment method of this application is provided. First, stereo signals are acquired (S101), including left channel stereo signals and right channel stereo signals, and short-time Fourier transform is performed on them (S102). Then, the sum and difference signals between the left channel stereo signals and right channel stereo signals are calculated in the frequency domain (S103), and the instantaneous phase difference between the left channel stereo signals and right channel stereo signals is calculated (S104). Then, based on the calculated stereo sum signal, stereo difference signal, and instantaneous phase difference, the ambient sound signals of the left and right channels are extracted (S105). Then, the ambient sound signals are removed from the stereo signals to obtain the direct sound signals (S106), and the sound image position and sound image change rate of the direct sound signals are calculated (S107). 07) Direct sound signals with a sound image change rate less than a preset change rate threshold and a sound image position greater than a preset position threshold are identified as target direct elements and assigned to the front speakers (S108); Simultaneously, the transient energy ratio and spectral tilt of the ambient sound signal are calculated (S109), and far-field ambient sound elements (S110) and surround ambient sound elements (S112) are determined based on this, and far-field ambient sound elements are assigned to ceiling speakers (S111), and surround ambient sound elements are assigned to headrest speakers (S113); Finally, multi-channel information fusion is performed (S114), and each channel signal is subjected to independent gain, equalization, delay, and phase calibration processing to ensure that when all speakers work together, the sound image positioning is accurate, the frequency response is smooth, and the sound field fusion is natural.

[0093] This embodiment provides a sound field adjustment method. By generating stereo and signal difference signals and introducing instantaneous phase difference, the method identifies the reverberation, reflection, and other diffusion field components that represent spatial information in the audio based on the phase difference between the left and right channels, thereby achieving the separation of ambient sound and direct sound. Furthermore, based on the sound image position and sound image change rate, the method determines the forward and stable target direct sound elements and assigns them to the front speakers. At the same time, the ambient sound signal is divided into far-field ambient sound elements and surround ambient sound elements, which are assigned to the ceiling speakers and headrest speakers, respectively, to more accurately simulate the sound diffusion effect in a real environment. This spatialized processing method significantly enhances the sense of sound immersion and envelopment.

[0094] Based on the first embodiment of this application, in the second embodiment of this application, the content that is the same as or similar to that in the first embodiment described above can be referred to the above description and will not be repeated hereafter. In addition, after step S40, the following is also included: Step S50: Determine the head position of the target user using a preset head position tracker; A head position tracker is a pre-programmed sensor or combination of sensors used to detect and track the head position information of a target user.

[0095] Head position refers to the specific location of the user's head in space, which can be represented by coordinate points in a fixed coordinate system.

[0096] For example, a head position tracker can use various technical principles to obtain the position coordinates of the target user's head in three-dimensional space, such as optical tracking (using a camera to capture head feature points), inertial measurement (measuring the motion state of the head through accelerometers, gyroscopes, etc.) or electromagnetic tracking (using changes in electromagnetic fields to determine the head position), etc. This embodiment does not specifically limit the specific implementation.

[0097] Step S51: Determine the head deflection direction based on the head position, and set the head deflection direction as the beamforming direction of the audio signal of each target speaker. Head rotation direction refers to the angle and direction of rotation of the target user's head relative to the initial position or reference direction (such as directly in front) in the horizontal, vertical or tilt direction, usually represented by Euler angles or quaternions.

[0098] Beamforming is a technique that adjusts the phase and amplitude of the output signals from multiple loudspeakers to enhance audio signals in a specific direction while attenuating them in other directions. The beamforming direction of an audio signal refers to the direction in which the audio signal energy is concentrated and propagates after beamforming processing.

[0099] For example, after determining the head deflection direction of the target user, the head deflection direction can be converted into a target beam pointing angle applicable to each target speaker according to a pre-stored mapping table or calculation formula, and the filtering coefficient (including delay, phase, etc.) of each target speaker can be determined based on the target beam pointing angle to change the synthesis direction of sound wave radiation.

[0100] Understandably, beamforming technology focuses sound towards the direction of the user's head, ensuring the sound is always directed at the user, thus enhancing the stability and immersion of the sound field.

[0101] Step S52: Based on the head position and the relative position between each target speaker, perform head-related transfer function compensation on the audio signal of each target speaker so that the sound source position of each audio signal perceived by the target user remains unchanged.

[0102] HRTF (Head Related Transfer Functions) is a function that describes the frequency response changes of sound as it travels from a sound source to the ear, reflecting the propagation characteristics of sound through structures such as the head, auricle, and torso. HRTF compensation refers to adjusting the time-frequency characteristics of an audio signal to ensure that it always originates from a fixed virtual sound source location and reaches the user's ears.

[0103] For example, based on the detected head deflection direction, HRTFs at different angles can be used to filter the audio signals of each target speaker, and the filtered audio signals can be played through the target speakers. In this way, even if the user's head moves, the position of the sound source perceived by the user remains unchanged.

[0104] Understandably, HRTF compensation solves the problem of sound image drift in non-ideal listening positions (where passengers are frequently moving), and enhances the spatial sense and immersion of the sound by simulating sound emanating from a fixed virtual position.

[0105] In one feasible implementation, step S52 includes: Step S521: For any target loudspeaker, determine the target head-related transfer function filter of the target loudspeaker based on the preset mapping relationship between the relative position and the head-related transfer function filter, and the relative position between the head position and the target loudspeaker. A Head-Related Transfer Function (HRTF) Filter is a digital filter designed based on the HRTF to simulate the spectral characteristics of sound as it reaches the human ear from a specific direction, thereby enabling virtual audio localization in audio signal processing.

[0106] Step S522: Adjust the time-frequency characteristics of the audio signal of the target speaker by passing through the target head-related transfer function filter, wherein the time-frequency characteristics include delay, gain and phase.

[0107] Time-frequency characteristics refer to the properties of an audio signal in two dimensions: time and frequency. These include delay (representing the time delay of sound), gain (representing the intensity of sound), and phase (representing the phase relationship of sound waveforms). These characteristics together determine the auditory effect of the audio signal.

[0108] For example, the head position of the target user is monitored by a head position tracker, and the relative position between the head position and any target speaker is determined. Then, according to the preset mapping relationship between the relative position and the HRTF filter, a target HRTF filter that matches the current relative position is found, and the audio characteristics such as the delay, gain and phase of the audio signal of the target speaker are adjusted by the target HRTF filter, and the adjusted audio signal is output to the target speaker for playback.

[0109] In this embodiment, the HRTF filter is dynamically adjusted according to the real-time head position and relative position to adapt to the user's head movement and ensure the continuity of the auditory experience. At the same time, the time-frequency characteristics of the audio signal are adjusted by the HRTF filter to simulate the characteristics of sound reaching the human ear in different directions. Even if the user's head moves, the sound source position remains unchanged, which enhances the immersiveness of the sound.

[0110] In one possible implementation, after step S40, the method further includes: Step S60: Collect ambient noise using a preset microphone array; A microphone array is a device consisting of multiple microphones arranged in a specific geometric shape, used to receive sound signals in space.

[0111] Environmental noise refers to sound signals other than the sound being played by the current speaker, such as the roar of an engine or the sound of wind outside the car window.

[0112] Step S61: Adjust the gain of the audio signal of each target loudspeaker according to the spectrum and intensity of the ambient noise.

[0113] The spectrum refers to the energy distribution of a sound signal at different frequencies; intensity refers to the amplitude energy of a sound signal, usually measured in decibels (dB).

[0114] For example, ambient noise around the user is collected in real time by a microphone array and its spectrum is analyzed to calculate its energy distribution at different frequencies, while the overall signal strength is calculated. Then, based on the spectrum and intensity of the ambient noise (e.g., the noise intensity in the low-frequency band is greater), the gain adjustment parameter of the audio signal is calculated (increasing the gain in the low-frequency band and decreasing the gain in the high-frequency band), and the audio signal of each target speaker is adjusted according to the parameter to mask the ambient noise or reduce noise interference.

[0115] In this embodiment, by dynamically adjusting the gain of the audio signal according to the spectrum and intensity of the ambient noise, the interference of ambient noise on the target audio can be effectively suppressed, making the audio signal of the target speaker more prominent, thereby improving the clarity of the audio and enhancing the user's listening experience.

[0116] For example, please refer to Figure 3 , Figure 3 A system structure block diagram of a sound field adjustment method is provided, which specifically includes an audio signal analysis module, a speaker allocation module, a multi-channel signal fusion module, a target speaker, a microphone array, and a head tracking module. The audio signal analysis module is used to realize audio input, frequency domain conversion, ambient sound extraction, and signal classification. The audio input method can be AUX input, Bluetooth input, USB input, and online streaming media input, etc. First, the audio signal analysis module acquires the input stereo signal and performs frequency domain transformation (short-time Fourier transform) to obtain the stereo signal in the frequency domain. Then, ambient sound is extracted through sum-difference analysis and instantaneous phase difference, while direct sound is separated. The signal is then classified according to the spatial attributes of ambient sound and direct sound (such as sound image position, sound image change rate, transient energy ratio, and spectral tilt). Next, the speaker allocation module assigns speakers to the target direct sound elements, far-field ambient sound elements, and surround ambient sound elements obtained from the signal classification. Then, the multi-channel signal fusion module adjusts the delay, gain, and other aspects of the audio signal of each target speaker to ensure that the acoustic information perceived by the user remains consistent. At the same time, it can also perform noise reduction processing based on the ambient noise collected by the microphone array. The head tracking module detects the user's head deviation and adjusts the time-frequency characteristics of the audio signal according to the user's head deviation, so that even if the user's head moves, the perceived sound source position remains unchanged. Finally, the adjusted audio signal is played through the target speakers.

[0117] It should be noted that the above examples are only for understanding this application and do not constitute a limitation on the sound field adjustment method of this application. Any simple transformations based on this technical concept are all within the protection scope of this application.

[0118] This application also provides a sound field adjustment device; please refer to... Figure 4 The sound field adjustment device includes: The sum and difference signal determination module 10 is used to determine the stereo sum signal and stereo difference signal based on the acquired left channel stereo signal and right channel stereo signal. The ambient sound signal determination module 20 is used to determine the ambient sound signals of the left and right channels based on the stereo signal, the stereo difference signal and a preset diffusion phase difference, wherein the diffusion phase difference is the instantaneous phase difference between the left channel stereo signal and the right channel stereo signal. The direct sound signal determination module 30 is used to determine the direct sound signals of the left and right channels based on the left channel stereo signal, the right channel stereo signal and the ambient sound signals of the left and right channels. The loudspeaker distribution module 40 is used to determine the target loudspeakers for the direct sound signal and the ambient sound signal, respectively.

[0119] Optionally, the ambient sound signal determination module 20 is also used for: The sign of the stereo difference signal is determined based on the diffusion phase difference and the preset sign function; The target difference signal is obtained by multiplying the symbol mark and the stereo difference signal; The ambient sound signal of the left channel is determined based on a preset mixing coefficient, a weighted sum of the stereo sum signal and the target difference signal, wherein the mixing coefficient includes a sum signal coefficient and a difference signal coefficient, and the sum signal coefficient is greater than the difference signal coefficient; The ambient sound signal of the right channel is determined based on the weighted difference between the mixing coefficient, the stereo signal, and the target difference signal.

[0120] Optionally, the ambient sound signal determination module 20 is also used for: The weighted sum is low-pass filtered to obtain the ambient sound signal of the left channel; The step of determining the ambient sound signal of the right channel based on the weighted difference of the mixing coefficient, the stereo signal, and the target difference signal includes: The weighted difference is low-pass filtered to obtain the ambient sound signal of the right channel.

[0121] Optionally, the speaker distribution module 40 is also used for: Based on the acoustic image position of the direct acoustic signal, determine the acoustic image change rate of each frame of direct acoustic signal; The direct sound signal with a sound image change rate less than a preset change rate threshold and a sound image position greater than a preset position threshold is identified as the target direct element, and the front loudspeaker is identified as the target loudspeaker of the target direct element. Based on the transient energy ratio and spectral tilt of the ambient sound signal, the ambient sound signal is divided into far-field ambient sound elements and surrounding ambient sound elements. The target loudspeaker for the far-field ambient sound element is determined to be a ceiling loudspeaker, and the target loudspeaker for the surround ambient sound element is determined to be a headrest loudspeaker.

[0122] Optionally, the speaker distribution module 40 is also used for: Before the step of dividing the ambient sound signal into far-field ambient sound elements and surrounding ambient sound elements based on the transient energy ratio and spectral tilt of the ambient sound signal, the method further includes: Acquire the transient and steady-state energy of the ambient sound signal; The transient energy ratio of the ambient sound signal is determined based on the ratio between the steady-state energy and the transient energy of the ambient sound signal.

[0123] Optionally, the sound field adjustment device also includes a multi-channel signal fusion module for: The head position of the target user is determined by a preset head position tracker; Based on the head position, the head deflection direction is determined, and the head deflection direction is determined as the beamforming direction of the audio signal of each target loudspeaker; Based on the head position and the relative position between each of the target speakers, head-related transfer function compensation is performed on the audio signals of each target speaker so that the sound source position of each audio signal perceived by the target user remains unchanged.

[0124] Optionally, the multi-channel signal fusion module is also used for: For any target loudspeaker, the target head-related transfer function filter of the target loudspeaker is determined according to the preset mapping relationship between the relative position and the head-related transfer function filter, and the relative position between the head position and the target loudspeaker. The time-frequency characteristics of the audio signal of the target loudspeaker are adjusted by the target head-related transfer function filter, wherein the time-frequency characteristics include delay, gain, and phase.

[0125] Optionally, the multi-channel signal fusion module is also used for: Ambient noise is collected using a pre-set microphone array; The gain of the audio signal of each target loudspeaker is adjusted according to the spectrum and intensity of the ambient noise.

[0126] The sound field adjustment device provided in this application, employing the sound field adjustment method described in the above embodiments, can solve the technical problem of how to enhance the immersive experience of sound. Compared with the prior art, the beneficial effects of the sound field adjustment device provided in this application are the same as those of the sound field adjustment method provided in the above embodiments, and other technical features in the sound field adjustment device are the same as those disclosed in the methods of the above embodiments, and will not be repeated here.

[0127] This application provides an electronic device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, which are executed by the at least one processor to enable the at least one processor to perform the sound field adjustment method in the first embodiment described above.

[0128] The following is for reference. Figure 5 The diagram illustrates a structural schematic of an electronic device suitable for implementing embodiments of this application. The electronic devices in these embodiments may include, but are not limited to, mobile terminals such as mobile phones, laptops, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Portable Application Descriptions), PMPs (Portable Media Players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and fixed terminals such as digital TVs and desktop computers. Figure 5 The electronic device shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of this application.

[0129] like Figure 5As shown, the electronic device may include a processing unit 1001 (e.g., a central processing unit, a graphics processing unit, etc.), which can perform various appropriate actions and processes according to a program stored in a read-only memory 1002 or a program loaded from a storage device 1003 into a random access memory 1004. The random access memory 1004 also stores various programs and data required for the operation of the electronic device. The processing unit 1001, the read-only memory 1002, and the random access memory 1004 are interconnected via a bus 1005. An input / output interface 1006 is also connected to the bus. Typically, the following systems can be connected to the input / output interface 1006: input devices 1007 including, for example, touchscreens, touchpads, keyboards, mice, image sensors, microphones, accelerometers, gyroscopes, etc.; output devices 1008 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices 1003 including, for example, magnetic tapes, hard disks, etc.; and communication devices 1009. The communication device 1009 allows the electronic device to communicate wirelessly or wiredly with other devices to exchange data. Although the diagrams show electronic devices with various systems, it should be understood that it is not required to implement or have all of the systems shown. More or fewer systems may be implemented alternatively.

[0130] Specifically, according to the embodiments disclosed in this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments disclosed in this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device, or installed from storage device 1003, or installed from read-only memory 1002. When the computer program is executed by processing device 1001, it performs the functions defined in the methods of the embodiments disclosed in this application.

[0131] The electronic device provided in this application, employing the sound field adjustment method described in the above embodiments, can solve the technical problem of how to enhance the immersive experience of sound. Compared with the prior art, the beneficial effects of the electronic device provided in this application are the same as those of the sound field adjustment method provided in the above embodiments, and other technical features of the electronic device are the same as those disclosed in the previous embodiment method, and will not be repeated here.

[0132] It should be understood that the various parts disclosed in this application can be implemented using hardware, software, firmware, or a combination thereof. In the description of the above embodiments, specific features, structures, materials, or characteristics can be combined in any suitable manner in one or more embodiments or examples.

[0133] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

[0134] This application provides a computer-readable storage medium having computer-readable program instructions (i.e., a computer program) stored thereon, which are used to execute the sound field adjustment method in the above embodiments.

[0135] The computer-readable storage medium provided in this application embodiment may be, for example, a USB flash drive, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems or devices, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this embodiment, the computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system or device. The program code contained on the computer-readable storage medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (Radio Frequency), etc., or any suitable combination thereof.

[0136] The aforementioned computer-readable storage medium may be included in an electronic device or may exist independently without being assembled into an electronic device.

[0137] The aforementioned computer-readable storage medium carries one or more programs that, when executed by an electronic device, cause the electronic device to: determine a stereo sum signal and a stereo difference signal based on the acquired left-channel stereo signal and right-channel stereo signal; determine ambient sound signals for the left and right channels based on the stereo sum signal, the stereo difference signal, and a preset diffusion phase difference, wherein the diffusion phase difference is the instantaneous phase difference between the left-channel stereo signal and the right-channel stereo signal; determine direct sound signals for the left and right channels based on the left-channel stereo signal, the right-channel stereo signal, and the ambient sound signals for the left and right channels; and determine the target loudspeakers for the direct sound signal and the ambient sound signal, respectively.

[0138] Computer program code for performing the operations of this application can be written in one or more programming languages ​​or a combination thereof, including object-oriented programming languages ​​such as Java, Smalltalk, and C++, and conventional procedural programming languages ​​such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a Local Area Network (LAN) or a Wide Area Network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0139] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0140] The modules described in the embodiments of this application can be implemented in software or hardware. The names of the modules do not necessarily limit the functionality of the unit itself.

[0141] The readable storage medium provided in this application is a computer-readable storage medium, which stores computer-readable program instructions (i.e., computer programs) for executing the above-described sound field adjustment method, and can solve the technical problem of how to improve the immersiveness of sound. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided in this application are the same as the beneficial effects of the sound field adjustment method provided in the above embodiments, and will not be repeated here.

[0142] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the sound field adjustment method described above.

[0143] The computer program product provided in this application can solve the technical problem of how to improve the immersiveness of sound. Compared with the prior art, the beneficial effects of the computer program product provided in this application are the same as the beneficial effects of the sound field adjustment method provided in the above embodiments, and will not be repeated here.

[0144] The above description is only a part of the embodiments of this application and does not limit the patent scope of this application. All equivalent structural transformations made under the technical concept of this application and using the contents of the specification and drawings of this application, or direct / indirect applications in other related technical fields, are included in the patent protection scope of this application.

Claims

1. A sound field adjustment method, characterized in that, The sound field adjustment method includes: Based on the acquired left channel stereo signal and right channel stereo signal, determine the stereo sum signal and stereo difference signal; Based on the stereo signal, the stereo difference signal, and the preset diffusion phase difference, the ambient sound signals of the left and right channels are determined, wherein the diffusion phase difference is the instantaneous phase difference between the left channel stereo signal and the right channel stereo signal. Based on the left channel stereo signal, the right channel stereo signal, and the ambient sound signals of the left and right channels, determine the direct sound signals of the left and right channels; The target loudspeakers for the direct sound signal and the ambient sound signal are determined respectively.

2. The sound field adjustment method as described in claim 1, characterized in that, The step of determining the ambient sound signals of the left and right channels based on the stereo sum signal, the stereo difference signal, and the preset diffusion phase difference includes: The sign of the stereo difference signal is determined based on the diffusion phase difference and the preset sign function; The target difference signal is obtained by multiplying the symbol mark and the stereo difference signal; The ambient sound signal of the left channel is determined based on a preset mixing coefficient, a weighted sum of the stereo sum signal and the target difference signal, wherein the mixing coefficient includes a sum signal coefficient and a difference signal coefficient, and the sum signal coefficient is greater than the difference signal coefficient; The ambient sound signal of the right channel is determined based on the weighted difference between the mixing coefficient, the stereo signal, and the target difference signal.

3. The sound field adjustment method as described in claim 2, characterized in that, The step of determining the ambient sound signal of the left channel based on the weighted sum of the preset mixing coefficients, the stereo sum signal, and the target difference signal includes: The weighted sum is low-pass filtered to obtain the ambient sound signal of the left channel; The step of determining the ambient sound signal of the right channel based on the weighted difference of the mixing coefficient, the stereo signal, and the target difference signal includes: The weighted difference is low-pass filtered to obtain the ambient sound signal of the right channel.

4. The sound field adjustment method as described in claim 1, characterized in that, The step of determining the target loudspeaker for the direct sound signal and the ambient sound signal respectively includes: Based on the acoustic image position of the direct acoustic signal, determine the acoustic image change rate of each frame of direct acoustic signal; The direct sound signal with a sound image change rate less than a preset change rate threshold and a sound image position greater than a preset position threshold is identified as the target direct element, and the front loudspeaker is identified as the target loudspeaker of the target direct element. Based on the transient energy ratio and spectral tilt of the ambient sound signal, the ambient sound signal is divided into far-field ambient sound elements and surrounding ambient sound elements. The target loudspeaker for the far-field ambient sound element is determined to be a ceiling loudspeaker, and the target loudspeaker for the surround ambient sound element is determined to be a headrest loudspeaker.

5. The sound field adjustment method as described in claim 4, characterized in that, Before the step of dividing the ambient sound signal into far-field ambient sound elements and surrounding ambient sound elements based on the transient energy ratio and spectral tilt of the ambient sound signal, the method further includes: Acquire the transient and steady-state energy of the ambient sound signal; The transient energy ratio of the ambient sound signal is determined based on the ratio between the steady-state energy and the transient energy of the ambient sound signal.

6. The sound field adjustment method as described in claim 1, characterized in that, After the steps of determining the target loudspeakers for the direct sound signal and the ambient sound signal respectively, the method further includes: The head position of the target user is determined by a preset head position tracker; Based on the head position, the head deflection direction is determined, and the head deflection direction is determined as the beamforming direction of the audio signal of each target loudspeaker; Based on the head position and the relative position between each of the target speakers, head-related transfer function compensation is performed on the audio signals of each target speaker so that the sound source position of each audio signal perceived by the target user remains unchanged.

7. The sound field adjustment method as described in claim 6, characterized in that, The step of performing head-related transfer function compensation on the audio signals of each target speaker based on the head position and the relative position between each target speaker includes: For any target loudspeaker, the target head-related transfer function filter of the target loudspeaker is determined according to the preset mapping relationship between the relative position and the head-related transfer function filter, and the relative position between the head position and the target loudspeaker. The time-frequency characteristics of the audio signal of the target loudspeaker are adjusted by the target head-related transfer function filter, wherein the time-frequency characteristics include delay, gain, and phase.

8. The sound field adjustment method as described in claim 1, characterized in that, After the steps of determining the target loudspeakers for the direct sound signal and the ambient sound signal respectively, the method further includes: Ambient noise is collected using a pre-set microphone array; The gain of the audio signal of each target loudspeaker is adjusted according to the spectrum and intensity of the ambient noise.

9. An electronic device, characterized in that, The device includes: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the sound field adjustment method as described in any one of claims 1 to 8.

10. A storage medium, characterized in that, The storage medium is a computer-readable storage medium, and a computer program is stored on the storage medium. When the computer program is executed by a processor, it implements the steps of the sound field adjustment method as described in any one of claims 1 to 8.