An audio signal processing method, device and storage medium

By collecting the signal-to-noise ratio of audio signals, determining the target gain, and performing adaptive processing, the problem of noise signals being amplified equally in existing technologies is solved, thereby improving the speech intelligibility and loudness of assistive hearing devices.

CN115714948BActive Publication Date: 2026-06-16BEIJING XIAOMI MOBILE SOFTWARE CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING XIAOMI MOBILE SOFTWARE CO LTD
Filing Date
2022-09-30
Publication Date
2026-06-16

Smart Images

  • Figure CN115714948B_ABST
    Figure CN115714948B_ABST
Patent Text Reader

Abstract

The present disclosure relates to an audio signal processing method, device and storage medium. The audio signal processing method comprises: collecting an audio signal and determining a signal-to-noise ratio of the audio signal; determining a target gain according to the signal-to-noise ratio and a preset signal-to-noise ratio threshold; and performing gain processing on the audio signal based on the target gain. The present disclosure can improve the gain effect of gain processing on the audio signal.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of acoustic and electronic technology, and in particular to an audio signal processing method, apparatus and storage medium. Background Technology

[0002] The number of people with hearing loss worldwide is rising rapidly. The most commonly used assistive devices for hearing loss are digital hearing aids, whose core technologies include: Wide Dynamic Range Compression (WDRC), speech enhancement, echo suppression, frequency reduction algorithms, scene recognition, and sound source localization. In reality, the proportion of people with mild to moderate hearing loss is relatively high, and hearing aids are more effective for this group.

[0003] In related technologies, adaptive WDRC methods, which utilize the discomfort threshold and hearing threshold of hearing-impaired individuals, take into account individualized parameters, making the gain effect of WDRC more closely matched to the wearer. However, the design of WDRC gain curves generally directly adopts the fitting formulas used in digital hearing aids, commonly including the gain and output formula (POGO), NAL-RP, FIG6, NAL-NL2, and CAM2. These fitting formulas do not consider the quality of the received signal in the actual environment. Designing hearing aids solely based on them may result in the equal amplification of noise signals, thus failing to achieve the goal of improving the hearing-impaired individual's understanding of sound signals. Summary of the Invention

[0004] To overcome the problems existing in related technologies, this disclosure provides an audio signal processing method, apparatus and storage medium.

[0005] According to a first aspect of the present disclosure, an audio signal processing method is provided, the method comprising:

[0006] Acquire an audio signal and determine the signal-to-noise ratio (SNR) of the audio signal; determine a target gain based on the SNR and a preset SNR threshold; and perform gain processing on the audio signal based on the target gain.

[0007] In one embodiment, determining the target gain based on the signal-to-noise ratio and a preset signal-to-noise ratio threshold includes:

[0008] If the signal-to-noise ratio (SNR) is less than a first SNR threshold, the target gain is determined to include linear gain and wide dynamic range compression (WDRC) gain; if the SNR is greater than a second SNR threshold, the target gain is determined to be a linear gain value; if the SNR is greater than the first SNR threshold and less than the second SNR threshold, the target gain is determined to be a wide dynamic range compression (WDRC) gain; wherein, the second SNR threshold is greater than the first SNR threshold.

[0009] In one embodiment, the target gain includes linear gain and wide dynamic range compressed WDRC gain;

[0010] The gain processing of the audio signal based on the target gain method includes:

[0011] Based on the signal-to-noise ratio, determine the adaptive compensation coefficient; determine the target frequency band to which the audio signal belongs, with different frequency bands corresponding to different gain compensation functions, the gain compensation function being used to characterize the relationship between the adaptive compensation coefficient, linear gain, and wide dynamic range compression gain; based on the target gain compensation function corresponding to the target frequency band and the adaptive compensation coefficient, determine the linear gain and wide dynamic range compression gain.

[0012] In one embodiment, determining the adaptive compensation coefficient based on the signal-to-noise ratio includes:

[0013] If the signal-to-noise ratio is less than the third signal-to-noise ratio threshold, the adaptive compensation coefficient is 0; if the signal-to-noise ratio is greater than the third signal-to-noise ratio threshold and less than the first signal-to-noise ratio threshold, the adaptive compensation coefficient is determined based on the e-function of the signal-to-noise ratio.

[0014] In one embodiment, the wide dynamic range compression gain is determined in the following manner:

[0015] Determine the input sound pressure level of the audio signal; determine the output sound pressure level based on the input sound pressure level and the wide dynamic range compression curve; and determine the difference between the output sound pressure level and the input sound pressure level as the wide dynamic range compression gain.

[0016] In one embodiment, determining the output sound pressure level based on the input sound pressure level and the wide dynamic range compression curve includes:

[0017] The parameter values ​​of the wide dynamic range compression curve are determined, including a minimum sound pressure input threshold, a minimum sound pressure output threshold, a first inflection point sound pressure threshold, a second inflection point sound pressure threshold, a first inflection point sound pressure threshold gain, and a second inflection point sound pressure threshold gain, wherein the first inflection point sound pressure threshold is less than the second inflection point sound pressure threshold. If the input sound pressure level is greater than 0 and less than the minimum sound pressure input threshold, the output sound pressure level is determined to be 0. If the input sound pressure level is greater than the minimum sound pressure input threshold and less than the minimum sound pressure output threshold, then based on the input sound pressure level, the minimum sound pressure input threshold, and the minimum sound pressure output threshold, the output sound pressure level is determined to be 0. The output sound pressure level is determined as follows: If the input sound pressure level is greater than the minimum sound pressure output threshold and less than the first inflection point sound pressure threshold, the output sound pressure level is determined based on the input sound pressure level, the minimum sound pressure output threshold, the first inflection point sound pressure threshold gain, and the compression ratio, wherein the compression ratio is determined based on the first inflection point sound pressure threshold, the second inflection point sound pressure threshold, the first inflection point sound pressure threshold gain, and the second inflection point sound pressure threshold gain; if the input sound pressure level is greater than the second inflection point sound pressure threshold, the output sound pressure level is determined based on the second inflection point sound pressure threshold and the second inflection point sound pressure threshold gain.

[0018] In one embodiment, the method further includes:

[0019] Based on the range of human hearing threshold, determine the pain threshold of the human ear; based on the pain threshold, adjust the output sound pressure level.

[0020] According to a second aspect of the present disclosure, an audio signal processing apparatus is provided, the apparatus comprising:

[0021] A determining unit is used to acquire audio signals and determine the signal-to-noise ratio (SNR) of the audio signals; and to determine the target gain based on the SNR and a preset SNR threshold.

[0022] The processing unit is used to perform gain processing on the audio signal based on the target gain.

[0023] In one embodiment, the determining unit determines the target gain based on the signal-to-noise ratio and a preset signal-to-noise ratio threshold in the following manner:

[0024] If the signal-to-noise ratio (SNR) is less than a first SNR threshold, the target gain is determined to include linear gain and wide dynamic range compression (WDRC) gain; if the SNR is greater than a second SNR threshold, the target gain is determined to be a linear gain value; if the SNR is greater than the first SNR threshold and less than the second SNR threshold, the target gain is determined to be a wide dynamic range compression (WDRC) gain; wherein, the second SNR threshold is greater than the first SNR threshold.

[0025] In one embodiment, the processing unit performs gain processing on the audio signal based on the target gain method in the following manner:

[0026] Based on the signal-to-noise ratio, determine the adaptive compensation coefficient; determine the target frequency band to which the audio signal belongs, with different frequency bands corresponding to different gain compensation functions, the gain compensation function being used to characterize the relationship between the adaptive compensation coefficient, linear gain, and wide dynamic range compression gain; based on the target gain compensation function corresponding to the target frequency band and the adaptive compensation coefficient, determine the linear gain and wide dynamic range compression gain.

[0027] In one embodiment, the determining unit determines the adaptive compensation coefficient based on the signal-to-noise ratio in the following manner:

[0028] If the signal-to-noise ratio is less than the third signal-to-noise ratio threshold, the adaptive compensation coefficient is 0; if the signal-to-noise ratio is greater than the third signal-to-noise ratio threshold and less than the first signal-to-noise ratio threshold, the adaptive compensation coefficient is determined based on the e-function of the signal-to-noise ratio.

[0029] In one embodiment, the determining unit determines the wide dynamic range compression gain in the following manner:

[0030] Determine the input sound pressure level of the audio signal; determine the output sound pressure level based on the input sound pressure level and the wide dynamic range compression curve; and determine the difference between the output sound pressure level and the input sound pressure level as the wide dynamic range compression gain.

[0031] In one embodiment, the determining unit determines the output sound pressure level based on the input sound pressure level and the wide dynamic range compression curve in the following manner:

[0032] The parameter values ​​of the wide dynamic range compression curve are determined, including a minimum sound pressure input threshold, a minimum sound pressure output threshold, a first inflection point sound pressure threshold, a second inflection point sound pressure threshold, a first inflection point sound pressure threshold gain, and a second inflection point sound pressure threshold gain, wherein the first inflection point sound pressure threshold is less than the second inflection point sound pressure threshold. If the input sound pressure level is greater than 0 and less than the minimum sound pressure input threshold, the output sound pressure level is determined to be 0. If the input sound pressure level is greater than the minimum sound pressure input threshold and less than the minimum sound pressure output threshold, then based on the input sound pressure level, the minimum sound pressure input threshold, and the minimum sound pressure output threshold, the output sound pressure level is determined to be 0. The output sound pressure level is determined as follows: If the input sound pressure level is greater than the minimum sound pressure output threshold and less than the first inflection point sound pressure threshold, the output sound pressure level is determined based on the input sound pressure level, the minimum sound pressure output threshold, the first inflection point sound pressure threshold gain, and the compression ratio, wherein the compression ratio is determined based on the first inflection point sound pressure threshold, the second inflection point sound pressure threshold, the first inflection point sound pressure threshold gain, and the second inflection point sound pressure threshold gain; if the input sound pressure level is greater than the second inflection point sound pressure threshold, the output sound pressure level is determined based on the second inflection point sound pressure threshold and the second inflection point sound pressure threshold gain.

[0033] In one implementation,

[0034] The determining unit is further configured to determine the pain threshold of the human ear based on the range of human hearing threshold; the device further includes an adjusting unit configured to adjust the output sound pressure level based on the pain threshold.

[0035] According to a third aspect of the present disclosure, an audio signal processing apparatus is provided, comprising:

[0036] Processor; memory used to store processor-executable instructions;

[0037] The processor is configured to execute the method described in the first aspect or any embodiment of the first aspect.

[0038] According to a fourth aspect of the present disclosure, a storage medium is provided, the storage medium storing instructions that, when executed by a processor of a terminal, enable the terminal to perform the method described in the first aspect or any embodiment of the first aspect.

[0039] The technical solutions provided by the embodiments of this disclosure may include the following beneficial effects: by acquiring the audio signal, the signal-to-noise ratio (SNR) of the audio signal is estimated and determined; based on the SNR, the target gain used for gain processing of the audio signal is determined. The target gain takes into account the influence of SNR, and thus can take into account the actual audio signal reception quality, avoiding equal amplification of noise. Therefore, gain processing of the audio signal based on the target gain can improve the effect of the target gain after gain processing.

[0040] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Attached Figure Description

[0041] The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure.

[0042] Figure 1 This is a schematic diagram of the general structure of the hearing aid provided in the embodiments of this disclosure.

[0043] Figure 2 This is a flowchart illustrating an audio signal processing method according to an exemplary embodiment.

[0044] Figure 3 This is a schematic diagram of the SNR estimation simulation results provided in the embodiments of this disclosure.

[0045] Figure 4 This is a flowchart illustrating an audio signal processing method according to an exemplary embodiment.

[0046] Figure 5 This is a flowchart illustrating an audio signal processing method according to an exemplary embodiment.

[0047] Figure 6 This is a flowchart illustrating an audio signal processing method according to an exemplary embodiment.

[0048] Figure 7 This is a flowchart illustrating an audio signal processing method according to an exemplary embodiment.

[0049] Figure 8 This is a flowchart illustrating an audio signal processing method according to an exemplary embodiment.

[0050] Figure 9 This is a schematic diagram of a three-segment WDRC for a certain frequency band provided in an embodiment of this disclosure.

[0051] Figure 10 This is a schematic diagram illustrating the processing results of speech by different methods provided in the embodiments of this disclosure.

[0052] Figure 11 This is a flowchart illustrating an audio signal processing method according to an exemplary embodiment.

[0053] Figure 12 This is a schematic diagram of the hearing threshold curve provided in an embodiment of this disclosure.

[0054] Figure 13This is a schematic diagram of an application scenario for hearing aids provided in an embodiment of this disclosure.

[0055] Figure 14 This is a schematic flowchart of the audio signal processing method provided in the embodiments of this disclosure.

[0056] Figure 15 This is a schematic flowchart of the audio signal processing method provided in the embodiments of this disclosure.

[0057] Figure 16 This is a schematic flowchart of the audio signal processing method provided in the embodiments of this disclosure.

[0058] Figure 17 This is a block diagram 100 of an audio signal processing apparatus according to an exemplary embodiment.

[0059] Figure 18 This is a block diagram 200 illustrating an audio signal processing apparatus according to an exemplary embodiment. Detailed Implementation

[0060] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this disclosure.

[0061] The audio signal processing method provided in this disclosure can be used in various audio signal processing devices, and is particularly suitable for assistive hearing devices that help hearing-impaired patients hear. The most common assistive hearing devices are digital hearing aids, including eyeglass hearing aids, body-worn hearing aids, ear-hook hearing aids, and in-ear hearing aids, as well as wired or wireless headsets and headphones. The core technologies for assistive hearing include: wide dynamic range compression algorithms, speech enhancement, echo suppression, frequency reduction algorithms, scene recognition, and sound source localization. In real life, the proportion of people with mild to moderate hearing loss is relatively high, and assistive hearing headphones are more effective for this group.

[0062] In one embodiment of the audio signal processing method provided in this disclosure, various sound signals in the daily environment are collected by the microphone of the audio processing device, converted into electrical signals and sent to the audio processing device. The internal system of the audio processing device processes and amplifies the incoming electrical signals. The speaker in the audio processing device is a sound output device that converts the electrical signals back into sound signals and finally outputs the sound signals.

[0063] Figure 1 This is a schematic diagram of the general structure of the hearing aid provided in the embodiments of this disclosure, as shown below. Figure 1 As shown, the audio signal is input to the microphone, processed by the amplifier, and then output through headphones.

[0064] The audio signal processing method provided in this disclosure is applicable to scenarios where audio signal gain is achieved using one or more microphones and one or more speakers. A typical application scenario of this disclosure is a hearing aid headset that includes a speaker. The hearing aid headset generally includes a small speaker and a microphone. This disclosure uses a hearing aid headset as an example for illustration.

[0065] Figure 2 This is a flowchart illustrating an audio signal processing method according to an exemplary embodiment, such as... Figure 2 As shown, it includes the following steps.

[0066] In step S11, an audio signal is acquired, and the signal-to-noise ratio of the audio signal is determined.

[0067] In step S12, the target gain is determined based on the signal-to-noise ratio and a preset signal-to-noise ratio threshold.

[0068] In step S13, the audio signal is subjected to gain processing based on the target gain.

[0069] In this embodiment, the signal-to-noise ratio (SNR) of the audio signal collected by the microphone is determined. Based on the SNR, a gain method and a target gain are determined, and the audio signal is then subjected to gain processing based on the target gain. This method not only improves speech loudness but also speech intelligibility.

[0070] In this embodiment, a hearing aid headset is used as an example for illustration. Before amplifying the sound signal, there may be steps such as beamforming, blind source separation, and speech enhancement. Therefore, it is assumed that the sound signal that the hearing aid headset ultimately needs to amplify is t(n), and the sampling rate f of the speech signal processing system is... s The sampling rate is typically 16kHz, 44.1kHz, or 48kHz. Due to hardware and computing power limitations, this disclosure uses a 16kHz sampling rate. Since the speech signal approximately satisfies short-time stationary characteristics between 10-40ms, its second-order statistics and other information can be used. Therefore, the received signal needs to be subjected to STFT, i.e., frame windowing and FFT. This disclosure selects each frame as 32ms, so the frame length L = 512; the window function w(n) is a Hanning window, whose length is the same as the frame length; the frame shift is 50% of the frame length, i.e., inc = 256. The frame-windowed signal st(n,m) can be obtained as formula (1):

[0071] st(n,m)=t((m-1)*inc+n)*w(n),0≤n≤(L-1) (1)

[0072] Here, m represents the frame number index, and n represents the data point index of the m-th frame of audio data. Then, performing an FFT on st(n,m) yields the microphone's spectral data T(k,m), where k represents the frequency index.

[0073] This disclosure provides an SNR estimation method. Since time-domain energy and frequency energy are theoretically separated by a factor of L, this method can be used in either the time or frequency domain. To reduce computational complexity, this disclosure uses a time-domain estimation method, assuming that the energy of speech, noise energy, and average noise energy in the received signal are σ0, respectively. 2 (m) and in, Assuming the first 5 frames of signal contain only noise, i.e. The average energy of the noise is calculated starting from frame 6. And let Starting from frame 7, the noise energy of the current frame is modified to be the weighted sum of the noise energy of the previous frame and the speech energy of the current frame, as shown in Formula 2:

[0074]

[0075] In the above formula, The denot represents the noise estimation coefficient generated by the sigmoid function, where a and T represent the signal-to-noise ratio adjustment parameter and the parameter representing the deviation of the noise estimation coefficient, respectively. In this disclosure, 2 and are used. This represents the ratio of speech energy to the average noise energy. The signal-to-noise ratio (SNR) of the current frame can then be calculated, as shown in Formula 3.

[0076]

[0077] Figure 3 This is a schematic diagram of the SNR estimation simulation results provided in the embodiments of this disclosure, as shown below. Figure 3 The image shows the simulation results of the above SNR estimation method. Figure 3 In Figures (a) and (b), the background noise is represented by white noise plus Babble (which can be understood as low-frequency noise in an office environment) and white noise plus Cafeteria (which can be understood as noise in a cafeteria environment), respectively. It can be seen that the calculated SNR result is relatively close to the actual SNR result in most cases, and when the difference is large, the calculated result is always smaller than the actual result. Since this disclosure uses a larger gain value for high SNR, the SNR estimate must be smaller; otherwise, it would abnormally amplify the noise signal.

[0078] In the embodiments of the present disclosure, the gain mode is determined based on the signal-to-noise ratio and the signal-to-noise ratio threshold. Generally speaking, the larger the signal-to-noise ratio, the smaller the noise mixed in the signal, and the higher the sound quality of the sound playback, otherwise it is the opposite. In the present disclosure, the first signal-to-noise ratio threshold is 10 dB, and the second signal-to-noise ratio threshold is 15 dB.

[0079] Figure 4 is a flowchart of an audio signal processing method shown according to an exemplary embodiment, as Figure 4 shown, including the following steps.

[0080] In step S21, the target gain is determined.

[0081] In step S22a, if the signal-to-noise ratio is less than the first signal-to-noise ratio threshold, it is determined that the target gain includes linear gain and wide dynamic range compression (WDRC) gain.

[0082] In step S22b, if the signal-to-noise ratio is greater than the second signal-to-noise ratio threshold, it is determined that the target gain is the linear gain value.

[0083] In step S22c, if the signal-to-noise ratio is greater than the first signal-to-noise ratio threshold and less than the second signal-to-noise ratio threshold, it is determined that the target gain is the wide dynamic range compression (WDRC) gain.

[0084] In the embodiments of the present disclosure, after calculating the current SNR result, if SNR > 15 dB, it means that the voice information in the current received signal dominates, and linear gain can be directly used, that is, the same amount of gain is applied to all frequency band signals; the advantage of linear amplification is that for input signals with medium sound pressure levels, linear amplification can provide appropriate gain, the signal is distortion-free, the voice quality is good, and the intelligibility is high. The disadvantage of linear amplification is that for low sound pressure level input signals, the gain provided by linear amplification is insufficient; but for high sound pressure level input signals, due to the excessive gain provided by linear amplification, patients often feel uncomfortable. If 10 dB < SNR < 15 dB, it means that the voice information in the current received signal is in a dominant position, and the gain value of WDRC does not need to be compensated; if SNR < 10 dB, it means that the influence of the noise signal needs to be considered currently, especially in the low frequency band.

[0085] Figure 5 is a flowchart of an audio signal processing method shown according to an exemplary embodiment, as Figure 5 shown, including the following steps.

[0086] In step S31, according to the signal-to-noise ratio, the adaptive compensation coefficient is determined.

[0087] In the embodiments of the present disclosure, γ is the adaptive compensation coefficient based on the SNR result, as shown in formula (4) specifically:

[0088]

[0089] In step S32, the target frequency band to which the audio signal belongs is determined. Different frequency bands correspond to different gain compensation functions, and the gain compensation function is used to characterize the relationship between the adaptive compensation coefficient, the linear gain, and the wide dynamic range compression gain.

[0090] In step S33, based on the target gain compensation function corresponding to the target frequency band and the adaptive compensation coefficient, the linear gain and the wide dynamic range compression gain are determined.

[0091] In the embodiments of the present disclosure, if SNR < 10dB, it is necessary to further judge the cases of SNR < 5dB and 5dB < SNR < 10dB. When the sound pressure level of the voice is above the hearing threshold of the ear-impaired person, in terms of speech recognition rate, linear amplification is better than wide dynamic compression technology. When the speech signal-to-noise ratio is low, wide dynamic compression can enable medium-to-severe hearing-impaired patients to obtain a higher speech recognition rate. When the signal-to-noise ratio is lower than 5dB, a higher recognition rate can be obtained by using the wide dynamic compression algorithm to compensate for the speech. When the signal-to-noise ratio is higher than 5dB and lower than 10dB, a higher recognition rate can be obtained by using both the wide dynamic compression algorithm and linear amplification to compensate for the speech. Since the human ear has a higher recognition rate for low-frequency speech, the present disclosure utilizes the respective advantages of linear amplification and traditional wide dynamic compression in frequency response compensation, determines the linear gain value and the wide dynamic range compression gain value based on the adaptive compensation coefficient, and determines the target gain used for gain processing of the audio signal.

[0092] In the embodiments of the present disclosure, if SNR < 10dB, it indicates that the influence of the noise signal needs to be considered currently, especially in the low-frequency band. Since in the actual environment, most of the environmental noise is concentrated in the low-frequency band, the final gain is as shown in formula (5):

[0093]

[0094] where G L and G W respectively represent the linear gain value and the preset gain value of WDRC. In the present disclosure, G L = 25.

[0095] In the embodiments of the present disclosure, when the frequency of the audio signal is less than 0.5kHz, the calculation formula of the gain is 0.8×(γG L + (1 - γ)G W ), and when the frequency of the audio signal is greater than or equal to 0.5kHz, the calculation formula of the gain is (γG L + (1 - γ)G W ).

[0096] The following details how to determine the adaptive compensation coefficient. Figure 6 This is a flowchart illustrating an audio signal processing method according to an exemplary embodiment, such as... Figure 6 The steps shown are as follows.

[0097] In step S41, the adaptive compensation coefficient is determined.

[0098] In step S42a, if the signal-to-noise ratio is less than the third signal-to-noise ratio threshold, the adaptive compensation coefficient is 0.

[0099] In step S42b, if the signal-to-noise ratio is greater than the third signal-to-noise ratio threshold and less than the first signal-to-noise ratio threshold, then the adaptive compensation coefficient is determined based on the e-function of the signal-to-noise ratio.

[0100] In this embodiment, γ is an adaptive compensation coefficient based on the SNR result, as shown in formula (4):

[0101]

[0102] As can be seen from Formula 4, the weights of linear amplification and wide dynamic range compression are adaptively adjusted based on the signal-to-noise ratio (SNR) of speech in a certain frequency band. When the SNR of speech is below 5dB, the adaptive compensation coefficient γ is 0, meaning that when the SNR is less than a certain threshold, the linear gain is 0, and the gain compensation method is wide dynamic range compression. When the SNR range is [5dB, 10dB], the adaptive compensation coefficient γ is an e-function based on the SNR, with a value between (0, 1). In this case, linear amplification and wide dynamic range compression are combined for adaptive frequency response compensation. Moreover, the closer the SNR is to 10dB, the closer γ is to 1, meaning that the percentage of linear amplification gain is greater. Conversely, the closer the SNR is to 5dB, the closer γ is to 0, meaning that the percentage of wide dynamic range compression gain is greater.

[0103] Figure 7 This is a flowchart illustrating an audio signal processing method according to an exemplary embodiment, such as... Figure 7 As shown, it includes the following steps.

[0104] In step S51, the input sound pressure level of the audio signal is determined.

[0105] In step S52, the output sound pressure level is determined based on the input sound pressure level and the wide dynamic range compression curve.

[0106] In step S53, the difference between the output sound pressure level and the input sound pressure level is determined as the wide dynamic range compression gain.

[0107] Figure 8 This is a flowchart illustrating an audio signal processing method according to an exemplary embodiment, such as... Figure 8As shown, it includes the following steps.

[0108] In step S61, the parameter values ​​of the wide dynamic range compression curve are determined. The parameter values ​​include the minimum sound pressure input threshold, the minimum sound pressure output threshold, the first inflection point sound pressure threshold, the second inflection point sound pressure threshold, the first inflection point sound pressure threshold gain, and the second inflection point sound pressure threshold gain. The first inflection point sound pressure threshold is less than the second inflection point sound pressure threshold.

[0109] In step S62, the output sound pressure level is determined based on the input sound pressure level and the wide dynamic range compression curve.

[0110] In step S63a, if the input sound pressure level is greater than 0 and less than the minimum sound pressure input threshold, then the output sound pressure level is determined to be 0.

[0111] In step S63b, if the input sound pressure level is greater than the minimum sound pressure input threshold and less than the minimum sound pressure output threshold, the output sound pressure level is determined based on the input sound pressure level, the minimum sound pressure input threshold, and the minimum sound pressure output threshold.

[0112] In step S63c, if the input sound pressure level is greater than the minimum sound pressure output threshold and less than the first inflection point sound pressure threshold, the output sound pressure level is determined based on the input sound pressure level, the minimum sound pressure output threshold, the first inflection point sound pressure threshold gain, and the compression ratio. The compression ratio is determined based on the first inflection point sound pressure threshold, the second inflection point sound pressure threshold, the first inflection point sound pressure threshold gain, and the second inflection point sound pressure threshold gain.

[0113] In step S63d, if the input sound pressure level is greater than the second inflection point sound pressure threshold, the output sound pressure level is determined based on the second inflection point sound pressure threshold and the second inflection point sound pressure threshold gain.

[0114] Figure 9 This is a schematic diagram of a three-segment WDRC for a certain frequency band provided in an embodiment of this disclosure, such as... Figure 9 As shown, Thi, Tho, LK, HK, LKG, and HKG represent: minimum input threshold, minimum output, low inflection point, high inflection point, low inflection point gain, and high inflection point gain, respectively. Since human hearing is complex, the LK-HK segment may be obtained from multiple compression curves. This disclosure only uses the standard three-segment WDRC as an example, which can be described using formula (6). Figure 9 :

[0115]

[0116] Among them, SPL in and SPL out These represent the input and output sound pressure levels, i.e., the original sound pressure level and the amplified sound pressure level, respectively, and the gain G of the WDRC. W =SPL out -SPLin Compression ratio This means that for every 1dB increase in input, the output increases by 1 / CRdB.

[0117] In this embodiment, since the speech signal is a non-steady-state signal, the sound pressure level of the signal in each channel changes with time. Therefore, the compensation gain in each channel also changes with time. The input / output curve of the WDRC signal in one channel is shown below. Figure 9 As shown, the curve consists of three parts: a linear gain section, where linear gain is used if the input sound pressure level is below the lower knee point (LK); a wide dynamic range compression section, where wide dynamic range compression is used if the input sound pressure level is between LK and the higher knee point (HK), with a compression ratio of CR:1, meaning that for every 1dB increase in input, the output increases by 1 / CRdB; the gain at LK is LKG, and the gain at HK is HKG; and a limiting compression section, where limiting compression is used if the input sound pressure level is higher than HK. If the input sound pressure level is lower than the input threshold THi, the output sound pressure level is 0.

[0118] In this embodiment of the disclosure, THi can also be understood as representing the hearing threshold of a normal person, THo representing the hearing threshold of a person with hearing loss, LK representing the optimal hearing threshold of a normal person, LK+LKG representing the optimal hearing threshold of a person with hearing loss, HK representing the pain threshold of a normal person, and HK+HKG representing the pain threshold of a person with hearing loss. When the input sound pressure level SPL... in SPL is smaller than the hearing threshold of healthy people. out =SPL in This is because sounds that normal ears cannot hear will not be mapped to the hearing threshold of a person with hearing loss, in which case the person with hearing loss cannot hear the sound.

[0119] When the input sound pressure level THi <= SPL in When the sound intensity is less than or equal to LK, the sound intensity is higher than the hearing threshold but lower than the optimal threshold, requiring a larger gain. Therefore, the hearing aids start to activate linear gain amplification, and the hearing-impaired person can hear a faint sound.

[0120] When the input sound pressure level LK <= SPL in When SPL <= HK, the loudness of the sound is greater than the optimal threshold but less than the discomfort threshold. The sound intensity is relatively strong at this point, so only slight compensation is needed. Simultaneously, the sound intensity is kept slightly below the discomfort threshold. Therefore, the hearing aid begins to use the WDRC gain processing algorithm. At this time, the hearing-impaired person can hear a relatively strong sound. in =LK, SPL out =LK+LKG, the hearing-impaired person reaches the optimal threshold.

[0121] When the input sound pressure level SPLin >=HK, which is the discomfort threshold for hearing-impaired individuals. To protect hearing-impaired individuals from further damage due to excessively loud sounds, it is necessary to suppress excessively loud sounds. At this point, the auxiliary hearing aid maintains a stable output, and when the sound intensity reaches the discomfort threshold, the hearing-impaired individual experiences discomfort.

[0122] The NAL-RP fitting formula used in this disclosure is an improvement upon the NAL-R fitting formula, and is also based on the half-gain rule, correcting for loudness equalization in compensation for steeply sloping hearing loss. The parameters of the aforementioned WDRC are given by the NAL-RP fitting formula, and specific values ​​are not provided in this disclosure.

[0123] Figure 10 These are schematic diagrams illustrating the speech processing results of different methods provided in the embodiments of this disclosure, such as... Figure 10 As shown, the processing results of different methods for speech are presented. Figures (a)-(c) represent the original speech, the processing results of DRC, and the WDRC results based on the NAL-RP fitting formula, respectively, where DRC represents the result of giving the same gain to signals in all frequency bands. It can be seen that, compared to DRC, the method used in this disclosure has a smaller gain for low frequencies than for mid-frequency frequencies. This is mainly because noise accounts for a larger proportion in low frequencies in real-world environments, and its contribution to semantic understanding is not as significant as that of mid-frequency frequencies.

[0124] Figure 11 This is a flowchart illustrating an audio signal processing method according to an exemplary embodiment, such as... Figure 11 As shown, it includes the following steps.

[0125] In step S71, the pain threshold of the human ear is determined based on the range of human hearing threshold.

[0126] In this embodiment, the pain threshold of the human ear is determined based on the range of hearing thresholds. Hearing loss causes an increase in the patient's hearing threshold, and a decrease in the patient's time resolution and frequency resolution of sound signal perception. The patient's cochlea's perception of sound signals changes, the patient's hearing threshold increases, but the discomfort threshold remains basically unchanged. One characteristic of hearing-impaired patients is an increased hearing threshold, which causes a decrease in signal audibility, and the decrease in signal audibility causes a decrease in signal intelligibility. For patients with mild hearing loss, some sounds in daily life can be heard, while others cannot. For patients with severe or profound hearing loss, almost all sounds in daily life cannot be heard, unless they are shouted at at close range. This disclosure provides a method for estimating the pain threshold DCL from the hearing threshold, as shown in formula (7):

[0127]

[0128] When the hearing threshold HL is less than or equal to 60, the estimated pain threshold DCL is 105 dB. When the hearing threshold is greater than 60, the estimated pain threshold DCL is 105 + (HL - 60) / 2.

[0129] Figure 12 This is a schematic diagram of the hearing threshold curve provided in an embodiment of this disclosure, such as... Figure 12 The figure shows the average hearing threshold curves of the left and right ears of 15 participants. It can be seen that the difference between the left and right ear hearing threshold curves lies mainly in the low and high frequency ranges; this disclosure ultimately uses the results from the right ear.

[0130] In step S72, the output sound pressure level is adjusted according to the pain threshold.

[0131] In this embodiment of the disclosure, since the pain threshold is an unacceptable threshold for the human ear, it is necessary to ensure that the final output sound pressure level of each frequency band is less than the pain threshold. Therefore, formula (8) is used to adjust the final gain:

[0132]

[0133] When the audio signal is less than 0.5kHz, the output sound pressure level should be less than the pain threshold minus 10. When the audio signal is greater than or equal to 0.5kHz, the output sound pressure level should be less than the pain threshold minus 5. By setting the pain threshold, the output sound pressure level should be adjusted so that it does not exceed the set pain threshold.

[0134] Figure 13 This is a schematic diagram of an application scenario for hearing aids provided in this embodiment of the disclosure, such as... Figure 13 As shown, hearing aids need to amplify the sound signal being processed to improve the wearer's understanding of the sound signal. First, the signal-to-noise ratio of the received signal is calculated, and different WDRC methods and compensation coefficients are selected. Then, based on the WDRC gain range, the gain values ​​of signals at different frequency bands and different input sound pressure levels are calculated. Finally, the signal is amplified and output.

[0135] Figure 14 This is a schematic flowchart of the audio signal processing method provided in the embodiments of this disclosure, as shown below. Figure 14As shown, the present disclosure calculates the gains of signals with different frequency bands and different input sound pressure levels according to a preset WDRC gain curve, and adaptively adjusts the gains of each frequency band by using SNR estimation and pain threshold estimation. The basic process is as follows: 1. Data preprocessing. The audio data input by the microphone is subjected to STFT, that is, frame division, windowing, and FFT, to obtain the corresponding frequency-domain signal. 2. SNR estimation. The SNR of the received signal is estimated in real time. 3. WDRC gain calculation. The key technology of the hearing aid earphone is the WDRC algorithm. For a specific frequency band, its core idea is to give different gains to signals with different sound pressure levels. Since the human ear's perception of frequency is non-linear, the bandwidth (frequency) included in each frequency band to be processed is different, which can be determined by the Bark frequency band, Mel, or Gamatone filter bank. For example, in the present disclosure, through the Mel filter bank, many band-pass filters with triangular filtering properties are set for the speech signal in the frequency domain, and the center frequencies are of equal bandwidth in the Mel frequency range. Then, the sound pressure levels of each frequency band are calculated, and their gains are obtained according to the preset gain curve. Generally, the three-segment WDRC gain curve includes four regions: a silent region, a linear amplification region, a compression region, and a limiting region. In the silent region, the signal is set to 0. In the linear amplification region, the gains of all signals are the same. In the compression region, the greater the input sound pressure level, the smaller the gain. In the limiting region, the sound signal is limited or given a negative gain. 4. Pain threshold judgment and adjustment. The pain threshold is estimated based on the hearing threshold curve, and the gain value obtained in step 3 is finely adjusted.

[0136] Figure 15 is a schematic flow chart of an audio signal processing method provided by an embodiment of the present disclosure. As Figure 15 shown, first, the signal is decomposed into multiple channels by using a filter bank; secondly, the envelope of the signal in the channel is extracted, and its sound pressure level is calculated; thirdly, the gain to be compensated is determined according to the sound pressure level and the patient's hearing threshold in the channel: finally, the channel signal is compensated.

[0137] Figure 16 is a schematic flow chart of an audio signal processing method provided by an embodiment of the present disclosure. As Figure 16 shown, when an audio signal is input, audio signals of a preset multiple frequency bands are obtained through a Mel filter, and the signal-to-noise ratio SNR is judged. If SNR > 15 dB, it means that the speech information in the current received signal dominates, and a linear gain can be directly used, that is, the same gain is applied to all frequency band signals; if 10 dB < SNR < 15 dB, it means that the speech information in the current received signal is dominant, and the gain value of WDRC does not need to be compensated; if SNR < 10 dB, it means that the influence of the noise signal needs to be considered currently, especially in the low-frequency band. Then, the frequency of the audio signal is judged. If the frequency of the audio signal is less than 0.5 kHz, the calculation formula of the gain is 0.8×(γG L +(1-γ)G WIf the audio signal frequency is greater than or equal to 0.5kHz, the gain calculation formula is (γG) L +(1-γ)G W If the signal-to-noise ratio (SNR) < 5dB, the adaptive compensation coefficient γ is 0, meaning the SNR is less than a certain threshold, the linear gain is 0, and the gain only takes the WDRC gain value. If the SNR is 5dB <= SNR <= 10dB, the adaptive compensation coefficient γ is an e-function based on the SNR. Finally, based on the range of human hearing thresholds, the pain threshold of the human ear is determined, and the output sound pressure level is adjusted according to the pain threshold to output the signal.

[0138] This disclosure involves acquiring audio signals and determining their signal-to-noise ratio (SNR); calculating the target gain for signals at different frequency bands and input sound pressure levels based on the SNR and a preset WDRC gain curve; performing gain processing on the audio signals based on the target gain; and adaptively adjusting the gain of each frequency band using pain threshold estimation. This method can improve audio gain, thereby enhancing hearing-impaired individuals' understanding of speech in noisy environments.

[0139] Based on the same concept, embodiments of this disclosure also provide an audio signal processing apparatus.

[0140] It is understood that the audio signal processing apparatus provided in this disclosure includes hardware structures and / or software modules corresponding to each function in order to achieve the above-mentioned functions. In conjunction with the units and algorithm steps of the various examples disclosed in this disclosure, this disclosure can be implemented in hardware or a combination of hardware and computer software. Whether a function is executed by hardware or by computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of the technical solutions of this disclosure.

[0141] Figure 17 This is a block diagram 100 of an audio signal processing apparatus according to an exemplary embodiment. (Refer to...) Figure 17 The device 100 includes a determination unit 101, a processing unit 102, and an adjustment unit 103.

[0142] The determining unit 101 is used to acquire audio signals and determine the signal-to-noise ratio (SNR) of the audio signals; and to determine the target gain based on the SNR and a preset SNR threshold.

[0143] The processing unit 102 is used to perform gain processing on the audio signal based on the target gain.

[0144] In one embodiment, the determining unit 101 determines the target gain used for gain processing of the audio signal based on the signal-to-noise ratio in the following manner:

[0145] If the signal-to-noise ratio (SNR) is less than the first SNR threshold, the target gain is determined to include both linear gain and wide dynamic range compression (WDRC) gain. If the SNR is greater than the second SNR threshold, the target gain is determined to be a linear gain value. If the SNR is greater than the first SNR threshold and less than the second SNR threshold, the target gain is determined to be a wide dynamic range compression (WDRC) gain. The second SNR threshold is greater than the first SNR threshold.

[0146] In one embodiment, the determining unit 101 determines the target gain used for gain processing of the audio signal based on the linear gain value and the wide dynamic range compression gain value in the following manner:

[0147] Based on the signal-to-noise ratio, determine the adaptive compensation coefficients; determine the target frequency band to which the audio signal belongs, with different frequency bands corresponding to different gain compensation functions. The gain compensation function is used to characterize the relationship between the adaptive compensation coefficients, linear gain, and wide dynamic range compression gain; based on the target gain compensation function corresponding to the target frequency band and the adaptive compensation coefficients, determine the linear gain and wide dynamic range compression gain.

[0148] In one embodiment, the determining unit 101 determines the adaptive compensation coefficient based on the signal-to-noise ratio in the following manner:

[0149] If the signal-to-noise ratio (SNR) is less than the third SNR threshold, the adaptive compensation coefficient is 0; if the SNR is greater than the third SNR threshold but less than the first SNR threshold, the adaptive compensation coefficient is determined based on the e-function of the SNR.

[0150] In one embodiment, the determining unit 101 determines the wide dynamic range compression gain value in the following manner:

[0151] Determine the input sound pressure level of the audio signal; based on the input sound pressure level and the wide dynamic range compression curve, determine the output sound pressure level; the difference between the output sound pressure level and the input sound pressure level is determined as the wide dynamic range compression gain.

[0152] In one embodiment, the determining unit 101 determines the output sound pressure level based on the input sound pressure level and a wide dynamic range compression curve in the following manner:

[0153] The parameter values ​​for the wide dynamic range compression curve are determined, including the minimum sound pressure input threshold, the minimum sound pressure output threshold, the first inflection point sound pressure threshold, the second inflection point sound pressure threshold, the first inflection point sound pressure threshold gain, and the second inflection point sound pressure threshold gain. The first inflection point sound pressure threshold is less than the second inflection point sound pressure threshold. If the input sound pressure level is greater than 0 and less than the minimum sound pressure input threshold, the output sound pressure level is determined to be 0. If the input sound pressure level is greater than the minimum sound pressure input threshold and less than the minimum sound pressure output threshold, the output sound pressure level is determined based on the input sound pressure level, the minimum sound pressure input threshold, and the minimum... The sound pressure output threshold is used to determine the output sound pressure level. If the input sound pressure level is greater than the minimum sound pressure output threshold but less than the first inflection point sound pressure threshold, the output sound pressure level is determined based on the input sound pressure level, the minimum sound pressure output threshold, the first inflection point sound pressure threshold gain, and the compression ratio. The compression ratio is determined based on the first inflection point sound pressure threshold, the second inflection point sound pressure threshold, the first inflection point sound pressure threshold gain, and the second inflection point sound pressure threshold gain. If the input sound pressure level is greater than the second inflection point sound pressure threshold, the output sound pressure level is determined based on the second inflection point sound pressure threshold and the second inflection point sound pressure threshold gain.

[0154] In one embodiment, the determining unit 101 is further configured to determine the pain threshold of the human ear based on the range of human hearing threshold; the device further includes an adjusting unit 103 configured to adjust the output sound pressure level based on the pain threshold.

[0155] Regarding the apparatus in the above embodiments, the specific manner in which each module performs its operation has been described in detail in the embodiments related to the method, and will not be elaborated upon here.

[0156] Figure 18 This is a block diagram 200 illustrating an audio signal processing apparatus according to an exemplary embodiment. For example, apparatus 200 may be a mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, medical device, fitness equipment, personal digital assistant, etc.

[0157] Reference Figure 18 The device 200 may include one or more of the following components: processing component 202, memory 204, power component 206, multimedia component 208, audio component 210, input / output (I / O) interface 212, sensor component 214, and communication component 216.

[0158] Processing component 202 typically controls the overall operation of device 200, such as operations associated with display, telephone calls, data communication, camera operation, and recording. Processing component 202 may include one or more processors 220 to execute instructions to perform all or part of the steps of the methods described above. Furthermore, processing component 202 may include one or more modules to facilitate interaction between processing component 202 and other components. For example, processing component 202 may include a multimedia module to facilitate interaction between multimedia component 208 and processing component 202.

[0159] Memory 204 is configured to store various types of data to support the operation of device 200. Examples of such data include instructions for any application or method operating on device 200, contact data, phonebook data, messages, pictures, videos, etc. Memory 204 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.

[0160] The power supply component 206 provides power to the various components of the device 200. The power supply component 206 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power to the device 200.

[0161] Multimedia component 208 includes a screen that provides an output interface between the device 200 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touchscreen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensors may sense not only the boundaries of the touch or swipe action but also the duration and pressure associated with the touch or swipe operation. In some embodiments, multimedia component 208 includes a front-facing camera and / or a rear-facing camera. When the device 200 is in an operating mode, such as a shooting mode or a video mode, the front-facing camera and / or the rear-facing camera may receive external multimedia data. Each front-facing camera and rear-facing camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

[0162] Audio component 210 is configured to output and / or input audio signals. For example, audio component 210 includes a microphone (MIC) configured to receive external audio signals when device 200 is in an operating mode, such as call mode, recording mode, and voice recognition mode. The received audio signals may be further stored in memory 204 or transmitted via communication component 216. In some embodiments, audio component 210 also includes a speaker for outputting audio signals.

[0163] I / O interface 212 provides an interface between processing component 202 and peripheral interface modules, such as keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to, home buttons, volume buttons, power buttons, and lock buttons.

[0164] Sensor assembly 214 includes one or more sensors for providing status assessments of various aspects of device 200. For example, sensor assembly 214 may detect the on / off state of device 200, the relative positioning of components such as the display and keypad of device 200, changes in the position of device 200 or a component of device 200, the presence or absence of user contact with device 200, the orientation or acceleration / deceleration of device 200, and temperature changes of device 200. Sensor assembly 214 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 214 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, sensor assembly 214 may also include an accelerometer, a gyroscope, a magnetometer, a pressure sensor, or a temperature sensor.

[0165] Communication component 216 is configured to facilitate wired or wireless communication between device 200 and other devices. Device 200 can access wireless networks based on communication standards, such as WiFi, 2G, or 3G, or combinations thereof. In one exemplary embodiment, communication component 216 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, communication component 216 also includes a near-field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

[0166] In an exemplary embodiment, the apparatus 200 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the methods described above.

[0167] In an exemplary embodiment, a non-transitory computer-readable storage medium including instructions is also provided, such as a memory 204 including instructions, which can be executed by a processor 220 of the device 200 to perform the above-described method. For example, the non-transitory computer-readable storage medium may be a ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc.

[0168] It is understood that in this disclosure, "multiple" refers to two or more, and other quantifiers are similar. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A alone, A and B simultaneously, and B alone. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. The singular forms "a," "the," and "the" are also intended to include the plural forms unless the context clearly indicates otherwise.

[0169] It is further understood that the terms "first," "second," etc., are used to describe various types of information, but this information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another, and do not indicate a specific order or degree of importance. In fact, the expressions "first," "second," etc., are completely interchangeable. For example, without departing from the scope of this disclosure, first information can also be referred to as second information, and similarly, second information can also be referred to as first information.

[0170] It can be further understood that, unless otherwise specified, "connection" includes both direct connections where no other components exist between the two parties and indirect connections where other components exist between them.

[0171] It is further understood that although operations are described in a specific order in the accompanying drawings in the embodiments of this disclosure, this should not be construed as requiring these operations to be performed in the specific order or serial order shown, or requiring all of the shown operations to be performed to obtain the desired result. In certain environments, multitasking and parallel processing may be advantageous.

[0172] Other embodiments of this disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not disclosed herein.

[0173] It should be understood that this disclosure is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this disclosure is limited only by the appended claims.

Claims

1. An audio signal processing method, characterized in that, The method includes: Acquire audio signals and determine the signal-to-noise ratio of the audio signals; The target gain is determined based on the signal-to-noise ratio and the preset signal-to-noise ratio threshold. The audio signal is then subjected to gain processing based on the target gain. The step of determining the target gain based on the signal-to-noise ratio and a preset signal-to-noise ratio threshold includes: If the signal-to-noise ratio is less than the first signal-to-noise ratio threshold, then the target gain is determined to include linear gain and wide dynamic range compressed WDRC gain; If the signal-to-noise ratio is greater than the second signal-to-noise ratio threshold, then the target gain is determined to be a linear gain value, and the second signal-to-noise ratio threshold is greater than the first signal-to-noise ratio threshold; If the signal-to-noise ratio is greater than the first signal-to-noise ratio threshold and less than the second signal-to-noise ratio threshold, then the target gain is determined to be wide dynamic range compressed WDRC gain.

2. The method according to claim 1, characterized in that, The target gain includes linear gain and wide dynamic range compressed WDRC gain; The gain processing of the audio signal based on the target gain includes: Based on the signal-to-noise ratio, determine the adaptive compensation coefficient; The target frequency band to which the audio signal belongs is determined. Different frequency bands correspond to different gain compensation functions. The gain compensation function is used to characterize the relationship between the adaptive compensation coefficient, linear gain, and wide dynamic range compression gain. Based on the target gain compensation function corresponding to the target frequency band and the adaptive compensation coefficient, the linear gain and wide dynamic range compression gain are determined.

3. The method according to claim 2, characterized in that, Determining the adaptive compensation coefficient based on the signal-to-noise ratio includes: If the signal-to-noise ratio is less than the third signal-to-noise ratio threshold, then the adaptive compensation coefficient is 0; If the signal-to-noise ratio is greater than the third signal-to-noise ratio threshold and less than the first signal-to-noise ratio threshold, then the adaptive compensation coefficient is determined based on the e-function of the signal-to-noise ratio.

4. The method according to claim 2, characterized in that, The wide dynamic range compression gain is determined in the following way: Determine the input sound pressure level of the audio signal; The output sound pressure level is determined based on the input sound pressure level and the wide dynamic range compression curve. The difference between the output sound pressure level and the input sound pressure level is determined as the wide dynamic range compression gain.

5. The method according to claim 4, characterized in that, Determining the output sound pressure level based on the input sound pressure level and the wide dynamic range compression curve includes: The parameter values ​​of the wide dynamic range compression curve are determined, including the minimum sound pressure input threshold, the minimum sound pressure output threshold, the first inflection point sound pressure threshold, the second inflection point sound pressure threshold, the first inflection point sound pressure threshold gain, and the second inflection point sound pressure threshold gain, wherein the first inflection point sound pressure threshold is less than the second inflection point sound pressure threshold. If the input sound pressure level is greater than 0 and less than the minimum sound pressure input threshold, then the output sound pressure level is determined to be 0. If the input sound pressure level is greater than the minimum sound pressure input threshold and less than the minimum sound pressure output threshold, then the output sound pressure level is determined based on the input sound pressure level, the minimum sound pressure input threshold, and the minimum sound pressure output threshold. If the input sound pressure level is greater than the minimum sound pressure output threshold and less than the first inflection point sound pressure threshold, then the output sound pressure level is determined based on the input sound pressure level, the minimum sound pressure output threshold, the first inflection point sound pressure threshold gain, and the compression ratio. The compression ratio is determined based on the first inflection point sound pressure threshold, the second inflection point sound pressure threshold, the first inflection point sound pressure threshold gain, and the second inflection point sound pressure threshold gain. If the input sound pressure level is greater than the second inflection point sound pressure threshold, the output sound pressure level is determined based on the second inflection point sound pressure threshold and the second inflection point sound pressure threshold gain.

6. The method according to claim 4, characterized in that, The method further includes: Determine the pain threshold of the human ear based on the range of human hearing threshold; The output sound pressure level is adjusted according to the pain threshold.

7. An audio signal processing device, characterized in that, The device includes: A determining unit is used to acquire audio signals and determine the signal-to-noise ratio (SNR) of the audio signals; and to determine the target gain based on the SNR and a preset SNR threshold. A processing unit is configured to perform gain processing on the audio signal based on the target gain; The determining unit determines the target gain based on the signal-to-noise ratio and a preset signal-to-noise ratio threshold in the following manner: If the signal-to-noise ratio is less than the first signal-to-noise ratio threshold, then the target gain is determined to include linear gain and wide dynamic range compressed WDRC gain; If the signal-to-noise ratio is greater than the second signal-to-noise ratio threshold, then the target gain is determined to be a linear gain value, and the second signal-to-noise ratio threshold is greater than the first signal-to-noise ratio threshold; If the signal-to-noise ratio is greater than the first signal-to-noise ratio threshold and less than the second signal-to-noise ratio threshold, then the target gain is determined to be wide dynamic range compressed WDRC gain.

8. The apparatus according to claim 7, characterized in that, The processing unit performs gain processing on the audio signal based on the target gain in the following manner: Based on the signal-to-noise ratio, determine the adaptive compensation coefficient; The target frequency band to which the audio signal belongs is determined. Different frequency bands correspond to different gain compensation functions. The gain compensation function is used to characterize the relationship between the adaptive compensation coefficient, the linear gain value, and the wide dynamic range compression gain value. Based on the target gain compensation function corresponding to the target frequency band, and based on the adaptive compensation coefficient, the linear gain value and the wide dynamic range compression gain value, the target gain used for gain processing of the audio signal is determined.

9. The apparatus according to claim 8, characterized in that, The determining unit determines the adaptive compensation coefficient based on the signal-to-noise ratio in the following manner: If the signal-to-noise ratio is less than the third signal-to-noise ratio threshold, then the adaptive compensation coefficient is 0; If the signal-to-noise ratio is greater than the third signal-to-noise ratio threshold and less than the first signal-to-noise ratio threshold, then the adaptive compensation coefficient is determined based on the e-function of the signal-to-noise ratio.

10. The apparatus according to claim 8, characterized in that, The determining unit determines the wide dynamic range compression gain in the following manner: Determine the input sound pressure level of the audio signal; The output sound pressure level is determined based on the input sound pressure level and the wide dynamic range compression curve. The difference between the output sound pressure level and the input sound pressure level is determined as the wide dynamic range compression gain.

11. The apparatus according to claim 10, characterized in that, The determining unit determines the output sound pressure level based on the input sound pressure level and the wide dynamic range compression curve in the following manner: The parameter values ​​of the wide dynamic range compression curve are determined, including the minimum sound pressure input threshold, the minimum sound pressure output threshold, the first inflection point sound pressure threshold, the second inflection point sound pressure threshold, the first inflection point sound pressure threshold gain, and the second inflection point sound pressure threshold gain, wherein the first inflection point sound pressure threshold is less than the second inflection point sound pressure threshold. If the input sound pressure level is greater than 0 and less than the minimum sound pressure input threshold, then the output sound pressure level is determined to be 0. If the input sound pressure level is greater than the minimum sound pressure input threshold and less than the minimum sound pressure output threshold, then the output sound pressure level is determined based on the input sound pressure level, the minimum sound pressure input threshold, and the minimum sound pressure output threshold. If the input sound pressure level is greater than the minimum sound pressure output threshold and less than the first inflection point sound pressure threshold, then the output sound pressure level is determined based on the input sound pressure level, the minimum sound pressure output threshold, the first inflection point sound pressure threshold gain, and the compression ratio. The compression ratio is determined based on the first inflection point sound pressure threshold, the second inflection point sound pressure threshold, the first inflection point sound pressure threshold gain, and the second inflection point sound pressure threshold gain. If the input sound pressure level is greater than the second inflection point sound pressure threshold, the output sound pressure level is determined based on the second inflection point sound pressure threshold and the second inflection point sound pressure threshold gain.

12. The apparatus according to claim 10, characterized in that, The determining unit is also used to determine the pain threshold of the human ear based on the range of human hearing threshold; The device further includes an adjustment unit for adjusting the output sound pressure level according to the pain threshold.

13. An audio signal processing device, characterized in that, include: processor; Memory used to store processor-executable instructions; The processor is configured to execute the method according to any one of claims 1 to 6.

14. A storage medium, characterized in that, The storage medium stores instructions that, when executed by the processor of the device, enable the device to perform the method described in any one of claims 1 to 6.