Audio signal separation method and apparatus, electronic device, and storage medium
By acquiring frequency domain signals from different channels of an audio device and using phase differences to separate coherent sound and ambient sound, the problem of difficulty in separating coherent sound and ambient sound in existing technologies is solved, resulting in a better listening experience and audio quality.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- IFLYTEK (SUZHOU) TECH CO LTD
- Filing Date
- 2025-03-03
- Publication Date
- 2026-07-02
AI Technical Summary
In existing stereo, multi-channel audio, 3D audio, and surround sound audio formats, coherent sound and ambient sound are often mixed together, making them difficult to separate effectively.
By acquiring frequency domain signals from different channels of an audio device, coherent sound signals and ambient sound signals are decomposed. The phase difference between ambient sound signals between channels is used to determine the target ambient sound signal under the condition of equal energy, and the coherent sound signal is separated from the audio signal.
It achieves a better listening experience, enhances immersion, improves sound quality, and can adapt to personalized needs, providing a more immersive and immersive listening atmosphere.
Smart Images

Figure CN2025080273_02072026_PF_FP_ABST
Abstract
Description
Audio signal separation methods, apparatus, electronic devices and storage media
[0001] Cross-references to related applications
[0002] This application claims priority to Chinese Patent Application No. 2024119004154, filed on December 23, 2024, entitled “Audio Signal Separation Method, Apparatus, Electronic Device and Storage Medium”, which is incorporated herein by reference in its entirety. Technical Field
[0003] This disclosure relates to the field of audio signal separation technology, and in particular to an audio signal separation method, apparatus, electronic device and storage medium. Background Technology
[0004] To provide users with a more realistic sound experience in immersive systems such as virtual reality, spatial sound reproduction technology is indispensable, and the extraction of coherent sound and ambient sound helps to achieve flexible spatial sound reproduction.
[0005] However, due to the limitations of existing stereo audio, multi-channel audio, three-dimensional audio and surround sound audio formats, coherent sound and ambient sound are often mixed together. Therefore, how to effectively separate the two has become a current research challenge. Summary of the Invention
[0006] This disclosure provides an audio signal separation method, apparatus, electronic device, and storage medium to address the shortcomings in related technologies where coherent sound and ambient sound are often mixed together and difficult to separate due to limitations of existing audio formats such as stereo, multi-channel audio, three-dimensional audio, and surround sound.
[0007] This disclosure provides an audio signal separation method, including the following steps:
[0008] Obtain the frequency domain signals corresponding to the audio signals of different channels of the audio device;
[0009] The frequency domain signal is decomposed to obtain the coherent sound signal and ambient sound signal of each channel;
[0010] When the energy corresponding to the ambient sound signals of different channels is the same, the target ambient sound signal of each channel is determined based on the phase difference between the ambient sound signals of different channels.
[0011] Separate the target ambient sound signal of the corresponding channel from the audio signal of each channel to obtain the target coherent sound signal of that channel.
[0012] According to the audio signal separation method provided in this disclosure, the different channels are a first channel and a second channel;
[0013] When the energy corresponding to the ambient sound signals in different channels is the same, determining the target ambient sound signal for each channel based on the phase difference between the ambient sound signals in different channels includes:
[0014] When the first short-time energy of the first ambient sound signal in the first channel and the second short-time energy of the second ambient sound signal in the second channel are the same, the first target ambient sound signal and the second target ambient sound signal are determined based on the phase difference between the first ambient sound signal and the second ambient sound signal.
[0015] According to an audio signal separation method provided in this disclosure, when the first short-time energy of a first ambient sound signal in the first channel and the second short-time energy of a second ambient sound signal in the second channel are the same, determining a first target ambient sound signal and a second target ambient sound signal based on the phase difference between the first ambient sound signal and the second ambient sound signal includes:
[0016] When the amplitudes of the first ambient sound signal and the second ambient sound signal are the same, the first target ambient sound signal and the second target ambient sound signal are determined based on the phase angle difference between the first ambient sound signal and the second ambient sound signal.
[0017] According to an audio signal separation method provided in this disclosure, the step of determining the phase angle difference includes:
[0018] Obtain the cross-correlation coefficient between the first frequency domain signal and the second frequency domain signal; the first frequency domain signal is the frequency domain signal corresponding to the audio signal of the first channel, and the second frequency domain signal is the frequency domain signal corresponding to the audio signal of the second channel;
[0019] The amplitude difference factor is determined based on the cross-correlation coefficient, the first short-time energy of the first frequency domain signal, and the second short-time energy of the second frequency domain signal;
[0020] The phase angle difference is determined based on the amplitude difference factor.
[0021] According to an audio signal separation method provided in this disclosure, determining the first target ambient sound signal and the second target ambient sound signal based on the phase angle difference between the first ambient sound signal and the second ambient sound signal includes:
[0022] Based on the first target phase angle of the first ambient sound signal, the first target ambient sound signal is determined; based on the first target phase angle and the phase angle difference, the second target phase angle of the second ambient sound signal is determined; based on the second target phase angle, the second target ambient sound signal is determined.
[0023] Alternatively, based on the second target phase angle of the second ambient sound signal, determine the second target ambient sound signal, and based on the second target phase angle and the phase angle difference, determine the first target phase angle of the first ambient sound signal, and based on the first target phase angle, determine the first target ambient sound signal.
[0024] According to the audio signal separation method provided in this disclosure, the step of determining the first target phase angle includes:
[0025] With the goal of minimizing the first short-time energy of the first ambient sound signal, a first target constraint is constructed between the first short-time energy and the first phase angle of the first ambient sound signal.
[0026] Based on the discrete set of the first phase angle and the first target constraint, the first target phase angle is determined; the first target phase angle minimizes the amplitude of the component of the first coherent acoustic signal in the first frequency domain signal.
[0027] According to the audio signal separation method provided in this disclosure, the step of determining the second target phase angle includes:
[0028] With the goal of minimizing the second short-time energy of the second ambient sound signal, a second objective constraint is constructed between the second short-time energy and the second phase angle of the second ambient sound signal.
[0029] Based on the discrete set of the second phase angle and the second target constraint, the second target phase angle is determined; the second target phase angle minimizes the amplitude of the component of the second coherent acoustic signal in the second frequency domain signal.
[0030] This disclosure also provides an audio signal separation device, including the following units:
[0031] The first acquisition unit is used to acquire the frequency domain signals corresponding to the audio signals of different channels of the audio device.
[0032] The second acquisition unit is used to decompose the frequency domain signal and acquire the coherent sound signal and ambient sound signal of each channel.
[0033] The determining unit is used to determine the target ambient sound signal for each channel based on the phase difference between the ambient sound signals of the different channels when the energy corresponding to the ambient sound signals of the different channels is the same.
[0034] The separation unit is used to separate the target ambient sound signal of the corresponding channel from the audio signal of each channel and obtain the target coherent sound signal of that channel.
[0035] This disclosure also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the audio signal separation method described above.
[0036] This disclosure also provides a non-transitory computer-readable storage medium having a computer program stored thereon that, when executed by a processor, implements the audio signal separation method as described above.
[0037] This disclosure also provides a computer program product, including a computer program that, when executed by a processor, implements the audio signal separation method as described above.
[0038] The audio signal separation method, apparatus, electronic device, and storage medium disclosed herein acquire the frequency domain signals corresponding to the audio signals of different channels of an audio device, then decompose the frequency domain signals to acquire the coherent sound signal and ambient sound signal of each channel. Then, under the condition that the energy corresponding to the ambient sound signals of different channels is the same, the target ambient sound signal of each channel is determined based on the phase difference between the ambient sound signals of different channels. Finally, the target ambient sound signal of the corresponding channel is separated from the audio signal of each channel to acquire the target coherent sound signal of that channel. Then, the separated target coherent sound signal and target ambient sound signal are further processed by a mixing engineer or a multi-channel algorithm to create a more immersive and immersive auditory atmosphere and improve the user's auditory experience. Attached Figure Description
[0039] To more clearly illustrate the technical solutions in this disclosure or related technologies, the accompanying drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0040] Figure 1 is one of the flowcharts of the audio signal separation method provided in this disclosure;
[0041] Figure 2 is a flowchart illustrating the steps for determining the phase angle difference provided in this disclosure;
[0042] Figure 3 is a flowchart illustrating the steps for determining the first target phase angle provided in this disclosure;
[0043] Figure 4 is a flowchart illustrating the steps for determining the second target phase angle provided in this disclosure;
[0044] Figure 5 is a second flowchart of the audio signal separation method provided in this disclosure;
[0045] Figure 6 is a schematic diagram of the audio signal separation device provided in this disclosure;
[0046] Figure 7 is a schematic diagram of the structure of the electronic device provided in this disclosure. Detailed Implementation
[0047] To make the objectives, technical solutions, and advantages of this disclosure clearer, the technical solutions of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this disclosure, not all embodiments. Based on the embodiments of this disclosure, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this disclosure.
[0048] The terms “first,” “second,” etc., used in this disclosure are used to distinguish similar objects and not to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that embodiments of this application can be implemented in orders other than those illustrated or described herein, and that the objects distinguished by “first,” “second,” etc., are generally of the same class.
[0049] Figure 1 is one of the flowcharts of the audio signal separation method provided in this disclosure. As shown in Figure 1, the method includes steps 110, 120, 130 and 140.
[0050] Step 110: Obtain the frequency domain signals corresponding to the audio signals of different channels of the audio device.
[0051] Specifically, the frequency domain signals corresponding to the audio signals of different channels of the audio device can be acquired. Here, the audio device can be a smartphone, a tablet computer, a zone microphone, a surround microphone, or a three-dimensional sound microphone, etc., and this disclosure does not specifically limit it.
[0052] Accordingly, the audio signals of different channels of the audio device can be stereo audio, multi-channel audio, three-dimensional audio and surround sound audio, virtual reality (VR) audio and augmented reality (AR) audio, etc., and this disclosure does not specifically limit them.
[0053] It is understood that when the audio device corresponds to stereo audio, the different channels of the audio device include two channels, namely the first channel and the second channel, for example, the left channel and the right channel; when the audio device corresponds to multi-channel audio, the different channels of the audio device include multiple channels, and this disclosure does not specifically limit this.
[0054] It's important to note that the vast majority of audio signals in daily life are non-stationary, meaning their frequency components change over time. Traditional time-domain audio signals cannot simultaneously provide time and frequency information, thus failing to effectively process non-stationary signals. The Short-Time Fourier Transform (SFT) captures the time-varying frequency characteristics of the audio signal by dividing the audio signal into multiple short time intervals (windows) and performing a Fourier transform on each interval.
[0055] In the frequency domain, coherent sound signals and ambient sound signals typically have different spectral characteristics. For example, coherent sound signals may be concentrated within certain specific frequency ranges, while ambient sound signals may be distributed across a wider frequency range. By analyzing the frequency domain signals, these spectral differences can be used to separate coherent sound signals from ambient sound signals.
[0056] In summary, in order to better apply the method to non-stationary conditions and situations where multiple sound sources exist simultaneously, the audio signal is processed by Short-Time Fourier Transform (STFT) to obtain the frequency domain signal.
[0057] In the extraction of coherent sound signals and ambient sound signals, the audio signal of each channel is usually represented as a superposition of the coherent sound signal and the ambient sound signal. Based on the characteristics of coherent sound signals and ambient sound signals, it is assumed that the coherent sound signals between channels are perfectly correlated, and that the coherent sound signals are uncorrelated with the ambient sound signals of each channel and between channels. Here, we illustrate this with the case where the audio from the audio device is stereo audio. The different channels of the audio device include the first channel and the second channel, i.e., the left channel and the right channel. Therefore, the audio signal x in the time domain... L x R Defined as:
[0058]
[0059] Where, x L (n) represents the audio signal of the first channel of the audio device, x R s(n) represents the audio signal of the second channel of the audio device, s(n) represents the coherent acoustic signal, and a L (n) represents the ambient sound signal of the first channel, a R (n) represents the ambient sound signal of the second channel, and β represents the amplitude difference factor of the coherent sound signals of the first and second channels.
[0060] Step 120: Decompose the frequency domain signal to obtain the coherent sound signal and ambient sound signal of each channel.
[0061] Specifically, after obtaining the frequency domain signal, the frequency domain signal can be decomposed to obtain the coherent sound signal and ambient sound signal of each channel.
[0062] Here, we will explain the case where the audio from the audio device is stereo audio. The audio device has different channels, including a first channel and a second channel. Therefore, the audio signal is represented in the frequency domain in the Fourier transform domain. The formula for the frequency domain signal is:
[0063]
[0064] Among them, X L (i,k) represents the frequency domain signal of the first channel, X R (i,k) represents the frequency domain signal of the second channel, S(i,k) represents the coherent acoustic signal in the frequency domain, and A L (i,k) represents the ambient sound signal in the frequency domain of the first channel, A R (i,k) represents the ambient sound signal in the frequency domain of the second channel, i represents the time frame index, k represents the frequency point index, and B(i,k) represents the amplitude difference factor of the coherent sound signals of the first and second channels. For simplicity, (i,k) can be omitted if necessary.
[0065] It is understood that stereo sound mainly includes two components with different properties: one is a directional sound component, called coherent sound; the other is a diffuse sound component that cannot be directional, called ambient sound. Coherent sound can be dialogue between people, a solo performance of an instrument, etc., while ambient sound can be background sound effects, such as distant traffic sounds, wind sounds, rain sounds, etc. This disclosure does not specifically limit these.
[0066] It should be noted that when the audio device corresponds to multi-channel audio, ambient sound signals are mostly generated in the rear left and rear right channels of a 5.1 channel audio system, or in the side left and side right channels of a 7.1 channel audio system. Coherent sound signals are mostly generated in the front left, front right, and center channels. The formulas for the corresponding audio signals and frequency domain signals are similar and will not be repeated here.
[0067] Furthermore, when the audio device corresponds to 3D audio, surround sound audio, virtual reality audio, and augmented reality audio, the formulas for the corresponding audio signals and frequency domain signals are similar, and will not be repeated here.
[0068] Step 130: When the energy corresponding to the ambient sound signals of different channels is the same, determine the target ambient sound signal of each channel based on the phase difference between the ambient sound signals of different channels.
[0069] Specifically, this embodiment first assumes that different channels consist of coherent sound signals and ambient sound signals, respectively, and that the energy of the ambient sound signals in different channels is consistent. Therefore, the only difference between the ambient sound signals in different channels is in phase. Based on this, the amplitude expression of the ambient sound signal can be obtained. The key to this method lies in solving for the phase of the ambient sound signal. In this process, the sparsity constraint of audio is combined to solve for the optimal phase of the ambient sound signal, thereby separating the coherent sound signal and the ambient sound signal. This method has low computational complexity and good separation effect.
[0070] Accordingly, when the audio device corresponds to stereo audio, we first assume that the first channel and the second channel are composed of coherent sound signals and ambient sound signals, respectively, and that the energy of the ambient sound signals in the first channel and the second channel is the same. Therefore, there is only a phase difference between the ambient sound signals in the first channel and the second channel, so the amplitude expression of the ambient sound signal can be obtained.
[0071] In summary, the general principle is to determine the target ambient sound signal for each channel based on the phase difference between the ambient sound signals from different channels, assuming the ambient sound signals from different channels have the same energy. Here, the target ambient sound signal refers to the ambient sound signal that ultimately needs to be separated into audio signals.
[0072] Step 140: Separate the target ambient sound signal of the corresponding channel from the audio signal of each channel to obtain the target coherent sound signal of that channel.
[0073] Specifically, after obtaining the target ambient sound signal for each channel, the target ambient sound signal for the corresponding channel can be separated from the audio signal of each channel to obtain the target coherent sound signal for that channel. For example, the target ambient sound signal for the corresponding channel can be subtracted from the audio signal of each channel to obtain the target coherent sound signal for that channel. Here, the target coherent sound signal is the final coherent sound signal obtained.
[0074] Understandably, the separated target coherent sound signal and target ambient sound signal can be further processed by mixing engineers or multi-channel algorithms to create a more immersive and realistic auditory atmosphere and achieve better auditory effects.
[0075] Furthermore, the significance of separating coherent sound signals from ambient sound signals lies in: 1. Enhancing immersion: By accurately locating coherent sound sources and finely controlling ambient sound, users can feel as if they are actually there. 2. Improving sound quality: Enhancing the clarity and fidelity of sound. 3. Adapting to personalized needs: Providing personalized customization based on the preferences of different users.
[0076] It is understood that the method provided in this disclosure can be applied to multi-channel audio, where mixing engineers or multi-channel algorithms perform secondary processing on the target coherent sound signal and the target ambient sound signal, providing a more immersive audio experience in home theaters or cinemas; when applied to surround sound audio, this audio signal separation method can help improve the playback effect of surround sound; when applied to 3D audio, it can further provide more accurate sound positioning and spatial sense; when applied to virtual reality and augmented reality audio, these immersive technologies require highly realistic spatial audio effects, and this audio signal separation method can help achieve a more dynamic and realistic audio experience.
[0077] The method provided in this disclosure acquires the frequency domain signals corresponding to the audio signals of different channels of an audio device, then decomposes the frequency domain signals to acquire the coherent sound signal and ambient sound signal of each channel. Then, under the condition that the energy corresponding to the ambient sound signals of different channels is the same, the target ambient sound signal of each channel is determined based on the phase difference between the ambient sound signals of different channels. Finally, the target ambient sound signal of the corresponding channel is separated from the audio signal of each channel to acquire the target coherent sound signal of that channel. Then, the separated target coherent sound signal and target ambient sound signal are further processed by a mixing engineer or a multi-channel algorithm to create a more immersive and immersive auditory atmosphere and improve the user's auditory experience.
[0078] Based on the above embodiments, the different channels are the first channel and the second channel;
[0079] Step 130 includes:
[0080] Step 131: When the first short-time energy of the first ambient sound signal in the first channel and the second short-time energy of the second ambient sound signal in the second channel are the same, determine the first target ambient sound signal and the second target ambient sound signal based on the phase difference between the first ambient sound signal and the second ambient sound signal.
[0081] Specifically, when the audio device corresponds to stereo audio, the different channels of the audio device include a first channel and a second channel. Here, the first channel is the left channel and the second channel is the right channel.
[0082] When the first short-time energy of the first ambient sound signal in the first channel is the same as the second short-time energy of the second ambient sound signal in the second channel, the first target ambient sound signal and the second target ambient sound signal are determined based on the phase difference between the first ambient sound signal and the second ambient sound signal.
[0083] Here, the first target ambient sound signal is the ambient sound signal finally determined in the first channel, and the second target ambient sound signal is the ambient sound signal finally determined in the second channel.
[0084] Assuming that the first short-time energy of the first ambient sound signal in the first channel and the second short-time energy of the second ambient sound signal in the second channel are the same in formula (2), denoted as P A Based on the relevant assumptions, formula (2) can be expressed as:
[0085]
[0086] in, This represents the short-time energy of the frequency domain signal in the first channel. This represents the short-time energy of the frequency domain signal in the second channel. Both the first and second short-time energies are represented by P. A It means that P S Β represents the short-time energy of the coherent acoustic signal, and B represents the amplitude difference factor.
[0087] in, E{} represents the short-time average. The short-time energy of the frequency domain signal in the second channel is obtained using the same method.
[0088] It is understandable that, given the frequency domain signal X of the first channel... L (i,k), the frequency domain signal X of the second channel R Given (i,k), estimate the parameters S(i,k), B(i,k), and A. L (i,k) and A R (i,k) can then be used to extract coherent sound signals and ambient sound signals.
[0089] In this embodiment, the phase of the ambient sound signal is constrained by the signal model and assumptions of formula (1), and the phase of the ambient sound signal is estimated by utilizing the sparsity of the coherent sound signal, thus completing the extraction of coherent sound components.
[0090] Based on the above embodiments, step 131, which involves determining the first target ambient sound signal and the second target ambient sound signal based on the phase difference between the first ambient sound signal and the second ambient sound signal when the first short-time energy of the first ambient sound signal in the first channel and the second short-time energy of the second ambient sound signal in the second channel are the same, includes:
[0091] Step 1311: When the amplitudes of the first ambient sound signal and the second ambient sound signal are the same, determine the first target ambient sound signal and the second target ambient sound signal based on the phase angle difference between the first ambient sound signal and the second ambient sound signal.
[0092] Specifically, when the amplitudes of the first ambient sound signal and the second ambient sound signal are the same, the first target ambient sound signal and the second target ambient sound signal are determined based on the phase angle difference between the first ambient sound signal and the second ambient sound signal.
[0093] Specifically, we assume that the ambient sound signal has the same energy in all channels, that is, we assume that the amplitude |A| of the ambient sound signal is the same:
[0094] in,
[0095] In the formula, θ L θ represents the first phase angle of the first ambient sound signal in the first channel. R This represents the second phase angle of the second ambient sound signal in the second channel.
[0096] Substituting formula (4) into formula (2), we get:
[0097] Since |A| is non-negative, the following constraint relationship can be obtained between the first phase angle and the second phase angle:
[0098] Where θ represents X R -BX L The corresponding phase angle, B represents the amplitude difference factor.
[0099] It is understandable that, given that the amplitudes of the first ambient sound signal and the second ambient sound signal are the same, if the phase angle θ of the first target is determined first... L Therefore, the phase angle θ of the second target can be determined based on the phase angle difference between the first and second ambient sound signals. R Once the first target phase angle and the second target phase angle are determined, the first target ambient sound signal and the second target ambient sound signal can be determined based on the first target phase angle and the second target phase angle, respectively.
[0100] That is, in formula (2), the frequency domain signal X of the first channel L (i,k), the frequency domain signal X of the second channel R (i,k) and the amplitude difference factor B have been determined. When θ is calculated... L and θ R One of them can be used to extract coherent sound signals and ambient sound signals.
[0101] Based on the above embodiments, Figure 2 is a flowchart illustrating the steps for determining the phase angle difference provided in this disclosure. As shown in Figure 2, the steps for determining the phase angle difference include:
[0102] Step 210: Obtain the cross-correlation coefficient between the first frequency domain signal and the second frequency domain signal; the first frequency domain signal is the frequency domain signal corresponding to the audio signal of the first channel, and the second frequency domain signal is the frequency domain signal corresponding to the audio signal of the second channel;
[0103] Step 220: Determine the amplitude difference factor based on the cross-correlation coefficient, the first short-time energy of the first frequency domain signal, and the second short-time energy of the second frequency domain signal;
[0104] Step 230: Determine the phase angle difference based on the amplitude difference factor.
[0105] Specifically, firstly, the cross-correlation coefficient between the first frequency domain signal and the second frequency domain signal is obtained, wherein the first frequency domain signal is the frequency domain signal corresponding to the audio signal of the first channel, and the second frequency domain signal is the frequency domain signal corresponding to the audio signal of the second channel.
[0106] The formula for the cross-correlation coefficient is as follows:
[0107] Substituting equation (2) into equation (8) and combining it with the assumptions about the signal model, we get:
[0108] Because in formulas (3) and (9), B and P S and P A It is about Since it is a function of φ, B and P can be solved. S and P A :
[0109] in,
[0110] Therefore, the amplitude difference factor B can be determined based on the cross-correlation coefficient, the first short-time energy of the first frequency domain signal, and the second short-time energy of the second frequency domain signal.
[0111] Based on formula (7), the phase angle difference can be determined based on the amplitude difference factor.
[0112] Based on the above embodiments, step 1311 includes:
[0113] Based on the first target phase angle of the first ambient sound signal, the first target ambient sound signal is determined; based on the first target phase angle and the phase angle difference, the second target phase angle of the second ambient sound signal is determined; based on the second target phase angle, the second target ambient sound signal is determined.
[0114] Alternatively, based on the second target phase angle of the second ambient sound signal, determine the second target ambient sound signal, and based on the second target phase angle and the phase angle difference, determine the first target phase angle of the first ambient sound signal, and based on the first target phase angle, determine the first target ambient sound signal.
[0115] Specifically, since in formula (2), the frequency domain signal X of the first channel L (i,k), the frequency domain signal X of the second channel R (i,k) and the amplitude difference factor B have been determined. When θ is calculated... L and θ R One of them can be used to extract coherent sound signals and ambient sound signals.
[0116] That is, it can be based on the first target phase angle θ of the first ambient sound signal. L Determine the first target ambient sound signal A L And based on the first target phase angle θ L Determine the second target phase angle θ of the second ambient sound signal. R Based on the second target phase angle θ R Determine the second target ambient sound signal A R .
[0117] Alternatively, based on the second target phase angle θ of the second ambient sound signal. R Determine the second target ambient sound signal A R And based on the second target phase angle θ R Determine the first target phase angle θ of the first ambient sound signal. L Based on the first target phase angle θ L Determine the first target ambient sound signal A L .
[0118] Based on the above embodiments, Figure 3 is a flowchart illustrating the steps for determining the first target phase angle provided in this disclosure. As shown in Figure 3, the steps for determining the first target phase angle include:
[0119] Step 310: With the goal of minimizing the first short-time energy of the first ambient sound signal, construct a first target constraint between the first short-time energy and the first phase angle of the first ambient sound signal;
[0120] Step 320: Based on the discrete set of the first phase angle and the first target constraint, determine the first target phase angle; the first target phase angle minimizes the amplitude of the component of the first coherent acoustic signal in the first frequency domain signal.
[0121] Specifically, due to the sparsity of coherent acoustic signals, this property has been widely applied in many audio and music signals. Therefore, the sparsity of coherent acoustic signals can be used to determine the phase angle of an ambient sound signal. That is, with the goal of minimizing the first short-time energy of the first ambient sound signal, a first objective constraint is constructed between the first short-time energy and the first phase angle of the first ambient sound signal, as shown in the following formula:
[0122] in, Indicates the first phase angle. This represents the first short-time energy.
[0123] The objective function in formula (10) is non-convex. In this embodiment, the discrete optimization method is used to determine θ. L Because of θ L The discrete set of values for the first phase angle, within the interval -π, π, is defined as follows:
[0124] Where d∈{1,2,……,D}, and D is the total number of discrete phase values, in this embodiment D=100. Among the determined number of discrete phase values, the first phase angle that minimizes the amplitude of the component of the first coherent acoustic signal in the first frequency domain signal is selected as the first target phase angle.
[0125] Based on the above embodiments, Figure 4 is a flowchart illustrating the steps for determining the second target phase angle provided in this disclosure. As shown in Figure 4, the steps for determining the second target phase angle include:
[0126] Step 410: With the goal of minimizing the second short-time energy of the second ambient sound signal, construct a second target constraint between the second short-time energy and the second phase angle of the second ambient sound signal;
[0127] Step 420: Based on the discrete set of the second phase angle and the second target constraint, determine the second target phase angle; the second target phase angle minimizes the amplitude of the components of the second coherent acoustic signal in the second frequency domain signal.
[0128] Specifically, based on the above description, the sparsity of coherent acoustic signals can be used to determine the phase angle of an ambient sound signal. That is, with the goal of minimizing the second short-time energy of the second ambient sound signal, a second objective constraint is constructed between the second short-time energy and the second phase angle of the second ambient sound signal, as shown in the following formula:
[0129] in, Indicates the second phase angle. This represents the second shortest time energy.
[0130] The objective function in formula (11) is non-convex. In this embodiment, the discrete optimization method is used to determine θ. R Because of θ R The range is within the interval -π, π, and the set of discrete values of the second phase angle is defined as follows:
[0131] Where d∈{1,2,……,D}, and D is the total number of discrete phase values, in this embodiment D=100. Among the determined number of discrete phase values, the second phase angle that minimizes the amplitude of the component of the second coherent acoustic signal in the second frequency domain signal is selected as the second target phase angle.
[0132] Based on any of the above embodiments, Figure 5 is a second schematic flowchart of the audio signal separation method provided in this disclosure. As shown in Figure 5, the method includes:
[0133] The first step is to acquire the audio signals from different channels of the audio device and perform a short-time Fourier transform on the audio signals from different channels to obtain the frequency domain signals corresponding to the audio signals from different channels.
[0134] The second step is to decompose the frequency domain signal to obtain the coherent sound signal and ambient sound signal of each channel.
[0135] The third step involves identifying the different channels as the first and second channels, and then obtaining the cross-correlation coefficients between the first and second frequency domain signals. The first frequency domain signal is the frequency domain signal corresponding to the audio signal of the first channel, and the second frequency domain signal is the frequency domain signal corresponding to the audio signal of the second channel.
[0136] The fourth step is to determine the amplitude difference factor based on the cross-correlation coefficient, the first short-time energy of the first frequency domain signal, and the second short-time energy of the second frequency domain signal.
[0137] The fifth step is to determine the phase angle difference based on the amplitude difference factor.
[0138] Step 6: When the amplitudes of the first ambient sound signal and the second ambient sound signal are the same, with the goal of minimizing the first short-time energy of the first ambient sound signal, construct a first target constraint between the first short-time energy and the first phase angle of the first ambient sound signal. Then, based on the discrete value set of the first phase angle and the first target constraint, determine the first target phase angle, wherein the first target phase angle minimizes the amplitude of the component of the first coherent sound signal in the first frequency domain signal.
[0139] Alternatively, if the amplitudes of the first ambient sound signal and the second ambient sound signal are the same, a second target constraint is constructed between the second short-time energy and the second phase angle of the second ambient sound signal, with the goal of minimizing the second short-time energy of the second ambient sound signal. Then, based on the discrete value set of the second phase angle and the second target constraint, the second target phase angle is determined, wherein the second target phase angle minimizes the amplitude of the component of the second coherent sound signal in the second frequency domain signal.
[0140] In summary, the above method utilizes the sparsity of coherent sound signals to determine the phase of environmental sound signals, selecting the phase value that minimizes the amplitude of the components of the coherent sound signal.
[0141] Step 7: Based on the first target phase angle of the first ambient sound signal, determine the first target ambient sound signal; based on the first target phase angle and the phase angle difference, determine the second target phase angle of the second ambient sound signal; based on the second target phase angle, determine the second target ambient sound signal.
[0142] Alternatively, based on the second target phase angle of the second ambient sound signal, determine the second target ambient sound signal, and based on the second target phase angle and the phase angle difference, determine the first target phase angle of the first ambient sound signal, and based on the first target phase angle, determine the first target ambient sound signal.
[0143] Step 8: Separate the first target ambient sound signal from the audio signal of the first channel to obtain the target coherent sound signal of the first channel. Then separate the second target ambient sound signal from the audio signal of the second channel to obtain the target coherent sound signal of the second channel.
[0144] The method provided in this disclosure first assumes that the first channel and the second channel are composed of coherent sound signals and ambient sound signals, respectively, and that the energy of the ambient sound signals in the first and second channels is the same. Therefore, the only difference between the first and second ambient sound signals is their phase. The key to this method is solving for the phase of the ambient sound signal. Furthermore, it proposes using the sparsity constraint of audio to solve for the optimal phase of the ambient sound signal. Since there is a constraint relationship between the phases of the ambient sound signals in the first and second channels, solving one of them will determine the other. This method has low computational complexity and good separation effect.
[0145] The audio signal separation apparatus provided in this disclosure is described below. The audio signal separation apparatus described below can be referred to in correspondence with the audio signal separation method described above.
[0146] Based on any of the above embodiments, this disclosure provides an audio signal separation device. Figure 6 is a schematic diagram of the structure of the audio signal separation device provided in this disclosure. As shown in Figure 6, the device includes:
[0147] The first acquisition unit 610 is used to acquire the frequency domain signals corresponding to the audio signals of different channels of the audio device.
[0148] The second acquisition unit 620 is used to decompose the frequency domain signal and acquire the coherent sound signal and ambient sound signal of each channel.
[0149] The determining unit 630 is used to determine the target ambient sound signal for each channel based on the phase difference between the ambient sound signals of the different channels when the energy corresponding to the ambient sound signals of the different channels is the same.
[0150] The separation unit 640 is used to separate the target ambient sound signal of the corresponding channel from the audio signal of each channel and obtain the target coherent sound signal of that channel.
[0151] The apparatus provided in this disclosure acquires the frequency domain signals corresponding to the audio signals of different channels of an audio device, then decomposes the frequency domain signals to acquire the coherent sound signal and ambient sound signal of each channel. Then, under the condition that the energy corresponding to the ambient sound signals of different channels is the same, the target ambient sound signal of each channel is determined based on the phase difference between the ambient sound signals of different channels. Finally, the target ambient sound signal of the corresponding channel is separated from the audio signal of each channel to acquire the target coherent sound signal of that channel. Then, the separated target coherent sound signal and target ambient sound signal are further processed by a mixing engineer or a multi-channel algorithm to create a more immersive and immersive auditory atmosphere and improve the user's auditory experience.
[0152] Based on any of the above embodiments, the different channels are the first channel and the second channel;
[0153] The determining unit 630 is specifically used for:
[0154] The determination subunit is used to determine the first target ambient sound signal and the second target ambient sound signal based on the phase difference between the first ambient sound signal and the second ambient sound signal when the first short-time energy of the first ambient sound signal in the first channel and the second short-time energy of the second ambient sound signal in the second channel are the same.
[0155] Based on any of the above embodiments, the determining subunit is specifically used for:
[0156] When the amplitudes of the first ambient sound signal and the second ambient sound signal are the same, the first target ambient sound signal and the second target ambient sound signal are determined based on the phase angle difference between the first ambient sound signal and the second ambient sound signal.
[0157] Based on any of the above embodiments, a phase angle difference determination unit is further included, wherein the phase angle difference determination unit is specifically used for:
[0158] Obtain the cross-correlation coefficient between the first frequency domain signal and the second frequency domain signal; the first frequency domain signal is the frequency domain signal corresponding to the audio signal of the first channel, and the second frequency domain signal is the frequency domain signal corresponding to the audio signal of the second channel;
[0159] The amplitude difference factor is determined based on the cross-correlation coefficient, the first short-time energy of the first frequency domain signal, and the second short-time energy of the second frequency domain signal;
[0160] The phase angle difference is determined based on the amplitude difference factor.
[0161] Based on any of the above embodiments, the determining subunit is specifically used for:
[0162] Based on the first target phase angle of the first ambient sound signal, the first target ambient sound signal is determined; based on the first target phase angle and the phase angle difference, the second target phase angle of the second ambient sound signal is determined; based on the second target phase angle, the second target ambient sound signal is determined.
[0163] Alternatively, based on the second target phase angle of the second ambient sound signal, determine the second target ambient sound signal, and based on the second target phase angle and the phase angle difference, determine the first target phase angle of the first ambient sound signal, and based on the first target phase angle, determine the first target ambient sound signal.
[0164] Based on any of the above embodiments, a first target phase angle determination unit is further included, wherein the first target phase angle determination unit is specifically used for:
[0165] With the goal of minimizing the first short-time energy of the first ambient sound signal, a first target constraint is constructed between the first short-time energy and the first phase angle of the first ambient sound signal.
[0166] Based on the discrete set of the first phase angle and the first target constraint, the first target phase angle is determined; the first target phase angle minimizes the amplitude of the component of the first coherent acoustic signal in the first frequency domain signal.
[0167] Based on any of the above embodiments, a second target phase angle determination unit is further included, the second target phase angle determination unit being specifically used for:
[0168] With the goal of minimizing the second short-time energy of the second ambient sound signal, a second objective constraint is constructed between the second short-time energy and the second phase angle of the second ambient sound signal.
[0169] Based on the discrete set of the second phase angle and the second target constraint, the second target phase angle is determined; the second target phase angle minimizes the amplitude of the component of the second coherent acoustic signal in the second frequency domain signal.
[0170] Figure 7 is a schematic diagram of the structure of the electronic device provided in this disclosure. As shown in Figure 7, the electronic device may include: a processor 710, a communication interface 720, a memory 730, and a communication bus 740. The processor 710, communication interface 720, and memory 730 communicate with each other via the communication bus 740. The processor 710 can call logical instructions in the memory 730 to execute an audio signal separation method. This method includes: acquiring the frequency domain signals corresponding to the audio signals of different channels of the audio device; decomposing the frequency domain signals to acquire the coherent sound signal and ambient sound signal of each channel; when the energy corresponding to the ambient sound signals of different channels is the same, determining the target ambient sound signal of each channel based on the phase difference between the ambient sound signals of different channels; separating the target ambient sound signal of the corresponding channel from the audio signal of each channel to acquire the target coherent sound signal of that channel.
[0171] Furthermore, the logical instructions in the aforementioned memory 730 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this disclosure, in essence, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this disclosure. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0172] On the other hand, this disclosure also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer is able to execute the audio signal separation method provided by the above methods. The method includes: acquiring frequency domain signals corresponding to audio signals of different channels of an audio device; decomposing the frequency domain signals to acquire coherent sound signals and ambient sound signals of each channel; determining the target ambient sound signal of each channel based on the phase difference between the ambient sound signals of different channels when the energy corresponding to the ambient sound signals of different channels is the same; separating the target ambient sound signal of the corresponding channel from the audio signal of each channel to acquire the target coherent sound signal of that channel.
[0173] In another aspect, this disclosure also provides a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the audio signal separation method provided by the above methods. The method includes: acquiring frequency domain signals corresponding to audio signals of different channels of an audio device; decomposing the frequency domain signals to acquire coherent sound signals and ambient sound signals of each channel; when the energy corresponding to the ambient sound signals of different channels is the same, determining the target ambient sound signal of each channel based on the phase difference between the ambient sound signals of different channels; separating the target ambient sound signal of the corresponding channel from the audio signal of each channel to acquire the target coherent sound signal of that channel.
[0174] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0175] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0176] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this disclosure, and are not intended to limit them. Although this disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this disclosure.
Claims
1. An audio signal separation method, comprising: Obtain the frequency domain signals corresponding to the audio signals of different channels of the audio device; The frequency domain signal is decomposed to obtain the coherent sound signal and ambient sound signal of each channel; When the energy corresponding to the ambient sound signals of different channels is the same, the target ambient sound signal of each channel is determined based on the phase difference between the ambient sound signals of different channels. Separate the target ambient sound signal of the corresponding channel from the audio signal of each channel to obtain the target coherent sound signal of that channel.
2. The audio signal separation method of claim 1, wherein, The different channels are the first channel and the second channel; When the energy corresponding to the ambient sound signals in different channels is the same, determining the target ambient sound signal for each channel based on the phase difference between the ambient sound signals in different channels includes: When the first short-time energy of the first ambient sound signal in the first channel and the second short-time energy of the second ambient sound signal in the second channel are the same, the first target ambient sound signal and the second target ambient sound signal are determined based on the phase difference between the first ambient sound signal and the second ambient sound signal.
3. The audio signal separation method of claim 2, wherein, When the first short-time energy of the first ambient sound signal in the first channel and the second short-time energy of the second ambient sound signal in the second channel are the same, determining the first target ambient sound signal and the second target ambient sound signal based on the phase difference between the first ambient sound signal and the second ambient sound signal includes: When the amplitudes of the first ambient sound signal and the second ambient sound signal are the same, the first target ambient sound signal and the second target ambient sound signal are determined based on the phase angle difference between the first ambient sound signal and the second ambient sound signal.
4. The audio signal separation method of claim 3, wherein, The steps for determining the phase angle difference include: Obtain the cross-correlation coefficient between the first frequency domain signal and the second frequency domain signal; the first frequency domain signal is the frequency domain signal corresponding to the audio signal of the first channel, and the second frequency domain signal is the frequency domain signal corresponding to the audio signal of the second channel; The amplitude difference factor is determined based on the cross-correlation coefficient, the first short-time energy of the first frequency domain signal, and the second short-time energy of the second frequency domain signal; The phase angle difference is determined based on the amplitude difference factor.
5. The audio signal separation method of claim 2, wherein, The step of determining the first target ambient sound signal and the second target ambient sound signal based on the phase angle difference between the first ambient sound signal and the second ambient sound signal includes: Based on the first target phase angle of the first ambient sound signal, the first target ambient sound signal is determined; based on the first target phase angle and the phase angle difference, the second target phase angle of the second ambient sound signal is determined; based on the second target phase angle, the second target ambient sound signal is determined. Alternatively, based on the second target phase angle of the second ambient sound signal, determine the second target ambient sound signal, and based on the second target phase angle and the phase angle difference, determine the first target phase angle of the first ambient sound signal, and based on the first target phase angle, determine the first target ambient sound signal.
6. The audio signal separation method of claim 5, wherein, The step of determining the phase angle of the first target includes: With the goal of minimizing the first short-time energy of the first ambient sound signal, a first target constraint is constructed between the first short-time energy and the first phase angle of the first ambient sound signal. Based on the discrete set of the first phase angle and the first target constraint, the first target phase angle is determined; the first target phase angle minimizes the amplitude of the component of the first coherent acoustic signal in the first frequency domain signal.
7. The audio signal separation method of claim 5, wherein, The steps for determining the phase angle of the second target include: With the goal of minimizing the second short-time energy of the second ambient sound signal, a second objective constraint is constructed between the second short-time energy and the second phase angle of the second ambient sound signal. Based on the discrete set of the second phase angle and the second target constraint, the second target phase angle is determined; the second target phase angle minimizes the amplitude of the component of the second coherent acoustic signal in the second frequency domain signal.
8. An audio signal separation device, comprising: The first acquisition unit is used to acquire the frequency domain signals corresponding to the audio signals of different channels of the audio device. The second acquisition unit is used to decompose the frequency domain signal and acquire the coherent sound signal and ambient sound signal of each channel. The determining unit is used to determine the target ambient sound signal for each channel based on the phase difference between the ambient sound signals of the different channels when the energy corresponding to the ambient sound signals of the different channels is the same. The separation unit is used to separate the target ambient sound signal of the corresponding channel from the audio signal of each channel and obtain the target coherent sound signal of that channel.
9. The audio signal separating apparatus according to claim 8, wherein The different channels are the first channel and the second channel; When the energy corresponding to the ambient sound signals in different channels is the same, determining the target ambient sound signal for each channel based on the phase difference between the ambient sound signals in different channels includes: When the first short-time energy of the first ambient sound signal in the first channel and the second short-time energy of the second ambient sound signal in the second channel are the same, the first target ambient sound signal and the second target ambient sound signal are determined based on the phase difference between the first ambient sound signal and the second ambient sound signal.
10. The audio signal separating apparatus according to claim 9, wherein When the first short-time energy of the first ambient sound signal in the first channel and the second short-time energy of the second ambient sound signal in the second channel are the same, determining the first target ambient sound signal and the second target ambient sound signal based on the phase difference between the first ambient sound signal and the second ambient sound signal includes: When the amplitudes of the first ambient sound signal and the second ambient sound signal are the same, the first target ambient sound signal and the second target ambient sound signal are determined based on the phase angle difference between the first ambient sound signal and the second ambient sound signal.
11. The audio signal separating apparatus according to claim 10, wherein The steps for determining the phase angle difference include: Obtain the cross-correlation coefficient between the first frequency domain signal and the second frequency domain signal; the first frequency domain signal is the frequency domain signal corresponding to the audio signal of the first channel, and the second frequency domain signal is the frequency domain signal corresponding to the audio signal of the second channel; The amplitude difference factor is determined based on the cross-correlation coefficient, the first short-time energy of the first frequency domain signal, and the second short-time energy of the second frequency domain signal; The phase angle difference is determined based on the amplitude difference factor.
12. The audio signal separating apparatus according to claim 9, wherein The step of determining the first target ambient sound signal and the second target ambient sound signal based on the phase angle difference between the first ambient sound signal and the second ambient sound signal includes: Based on the first target phase angle of the first ambient sound signal, the first target ambient sound signal is determined; based on the first target phase angle and the phase angle difference, the second target phase angle of the second ambient sound signal is determined; based on the second target phase angle, the second target ambient sound signal is determined. Alternatively, based on the second target phase angle of the second ambient sound signal, determine the second target ambient sound signal, and based on the second target phase angle and the phase angle difference, determine the first target phase angle of the first ambient sound signal, and based on the first target phase angle, determine the first target ambient sound signal.
13. The audio signal separating apparatus according to claim 12, wherein The step of determining the phase angle of the first target includes: With the goal of minimizing the first short-time energy of the first ambient sound signal, a first target constraint is constructed between the first short-time energy and the first phase angle of the first ambient sound signal. Based on the discrete set of the first phase angle and the first target constraint, the first target phase angle is determined; the first target phase angle minimizes the amplitude of the component of the first coherent acoustic signal in the first frequency domain signal.
14. The audio signal separating apparatus according to claim 12, wherein The steps for determining the phase angle of the second target include: With the goal of minimizing the second short-time energy of the second ambient sound signal, a second objective constraint is constructed between the second short-time energy and the second phase angle of the second ambient sound signal. Based on the discrete set of the second phase angle and the second target constraint, the second target phase angle is determined; the second target phase angle minimizes the amplitude of the component of the second coherent acoustic signal in the second frequency domain signal.
15. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the program, implements the audio signal separation method as described in any one of claims 1 to 7.
16. A non-transitory computer-readable storage medium having a computer program stored thereon, the computer program being executed by a processor to implement the audio signal separation method as described in any one of claims 1 to 7.