A hearing compensation method for frequency-selective impairment of the auditory system
By combining deep neural networks and simulated hearing loss models, the shortcomings of traditional hearing aids in frequency-selective hearing loss compensation are addressed, achieving more effective hearing compensation and improving users' speech recognition ability and satisfaction in noisy environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- PEKING UNIV
- Filing Date
- 2023-04-14
- Publication Date
- 2026-06-12
AI Technical Summary
Existing hearing aids lack effective hearing compensation algorithms for frequency-selective hearing loss. Traditional methods are insufficient in speech recognition in noisy environments, and manually set gain functions are impractical and difficult to process in real time.
A deep neural network is used to learn nonlinear mappings. Combined with a simulated hearing loss model, the neural network is trained to compensate for speech spectral contrast, replacing the traditional spectral contrast enhancement method. The fully connected neural network and the hearing loss model simulate the auditory system with impaired frequency selection characteristics.
It improves speech recognition capabilities in noisy environments, has better generalizability and real-time processing capabilities, and enhances user satisfaction.
Smart Images

Figure CN116582807B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of signal processing technology, and relates to intelligent hearing aid technology, specifically to a hearing compensation method for frequency-selective damage to the auditory system. Background Technology
[0002] Hearing aids play a crucial role in alleviating communication barriers and restoring the communication abilities of people with hearing loss. People with hearing loss typically experience poorer speech perception in noisy environments than in quiet ones, and one of the key factors affecting their speech perception in noisy environments is the impaired frequency selectivity of their auditory system. Frequency selectivity refers to the auditory system's varying sensitivity and ability to distinguish different frequencies of sound. For hearing-impaired patients, especially those with sensorineural hearing loss, this frequency selectivity is impaired. The outer hair cells of the cochlea lose their ability to increase the cochlea's sensitivity to sound frequencies. Especially in noisy environments, where speech and noise components have similar frequencies, the cochlea experiences excitation in a single, broad area rather than two precisely separated areas, making it difficult for the brain to distinguish the signal from the noise. This results in the auditory system's inability to respond to specific frequency speech signals, impacting speech discrimination ability in noisy environments.
[0003] Traditional methods alleviate this phenomenon by manually defining a gain function to enhance spectral contrast. This involves performing a Fast Fourier Transform on the signal to obtain the corresponding amplitude spectrum, enhancing the frequency components corresponding to higher peaks, and reducing the frequency components corresponding to lower valleys. However, since the specific values for enhancement and reduction are manually set, this approach is often not widely applicable in real-world scenarios, has poor practicality, and is difficult to implement in real-time. Therefore, currently, few algorithms applied in hearing aids specifically address the decline in speech recognition ability caused by impaired frequency selectivity in patients. Summary of the Invention
[0004] To address the shortcomings of existing technologies, this invention aims to provide a hearing compensation method for frequency-selective hearing loss. This method utilizes deep neural networks to learn complex nonlinear mappings to enhance speech and improve its spectral contrast. Playing the processed audio signal to a user effectively improves user satisfaction and speech recognition ability. Compared to traditional methods for frequency-selective hearing loss compensation, this approach eliminates the need for manually defined gain functions, offering better generalization; neural networks have a stronger ability to characterize nonlinear mappings, enabling more effective compensation for hearing loss; and the method requires less computation, allowing for efficient real-time signal processing. Therefore, this method is a more suitable hearing compensation approach.
[0005] The basic idea behind the hearing compensation method for frequency-selective impairment proposed in this invention is as follows: Compared to a normal auditory system, impaired frequency selectivity in the auditory system alters the frequency analysis of input sound, resulting in reduced speech spectral contrast. By utilizing a neural network to learn a compensation mapping, the spectral contrast of the input speech is enhanced. This ensures that the compensated speech, processed by the impaired auditory system, produces the same effect as the original speech processed by a normal auditory system, thus effectively compensating for the patient's hearing loss. The key innovations of this invention are, firstly, the use of a neural network to replace traditional spectral contrast enhancement methods; and secondly, the introduction of a simulated hearing impairment model to simulate an auditory system with impaired frequency selectivity, which is then used to guide the training of the neural network.
[0006] This invention relates to a hearing compensation method for frequency-selective impairment of the auditory system, the steps of which include:
[0007] 1) A fully connected neural network is used as a compensation module for the input signal, and its initial parameters are randomly generated;
[0008] 2) Introduce a simulated hearing loss model (Reference: Baer, T., Moore, BCJ (1993). Effects of spectral smearing on the intelligibility of sentences in noise. The Journal of the Acoustical Society of America, 94(3), 1229-1241) to simulate the damaged auditory system (simulating the impaired frequency selectivity of the auditory system), and set the parameters of the simulated hearing loss model according to the patient's hearing loss status to perform nonlinear processing of sound;
[0009] 3) Use the neural network from step 1) and the simulated hearing loss model from step 2) to jointly generate paired data (i.e., data-labels);
[0010] 4) The neural network described in step 1) trained using the paired data generated in step 3);
[0011] 5) Repeat steps 3) to 4) until the set termination condition is reached, wherein the termination condition is that the number of training rounds reaches the upper limit of the number of iterations or the error reaches the limit value;
[0012] 6) For a given audio signal, process it using the neural network trained in step 4) to generate the compensated audio, which can then be played to the user.
[0013] Furthermore, the neural network described in step 1) is mainly used to generate the compensated sound signal. Specifically, the input to the network is the original sound signal, and the output is the sound signal after undergoing some nonlinear transformation (i.e., the compensated sound signal).
[0014] Furthermore, the simulated hearing loss model described in step 2) is mainly used to simulate the processing of sound by the damaged auditory system. Its input is a sound signal, and its output is a simulated sound signal heard by a hearing-impaired patient.
[0015] Furthermore, step 3) refers to the joint generation of paired data using the neural network in step 1) and the simulated hearing impairment model in step 2). Given a sound signal x1, the neural network described in step 1) is first used to process the signal to generate a sound signal x2 after a certain nonlinear transformation (i.e., the sound signal after compensation by the current neural network). Then, the hearing impairment simulator described in step 2) is used to process x2 to generate a nonlinear distortion signal x3, thus obtaining the paired data (x3, x2) for neural network training, where x2 is the label corresponding to x3.
[0016] A hearing compensation device for frequency-selective impairment of the auditory system, characterized in that it includes a hearing impairment simulation module, a neural network compensation module, and a paired data generation module; wherein...
[0017] The hearing loss simulation module is used to process sound signals so that the processed signal is consistent with the sound signal after being processed by the damaged hearing system.
[0018] The pairing data generation module uses the hearing impairment simulation module and the neural network module to generate pairing data.
[0019] The neural hearing compensation module is used to process sound using a trained neural network to generate a signal that has undergone a certain nonlinear transformation (i.e., the compensated signal).
[0020] This invention focuses on compensating for the impaired frequency selectivity of the auditory system, aiming to enable people to "hear clearly".
[0021] The hearing loss simulator used in this invention includes a set of auditory filters, which utilize the widening of the auditory filter bandwidth to simulate the effects of impaired frequency selectivity.
[0022] This invention achieves the corresponding compensation effect by using a fully connected network in the frequency domain, which is the first time this invention has proposed this method.
[0023] The technical solution of this invention is as follows:
[0024] A hearing compensation method for frequency-selective impairment of the auditory system, comprising the following steps:
[0025] 1) A compensation module is constructed using a fully connected neural network; the compensation module performs nonlinear transformation on each sample sound signal in the training set to obtain a compensated sound signal for the target population with hearing loss.
[0026] 2) Set up a simulated hearing loss model for the target group with the desired degree of hearing loss, and use the simulated hearing loss model as a simulated auditory system for the target group with the desired degree of hearing loss. Perform nonlinear processing on each compensated sound signal to generate a nonlinear distortion signal.
[0027] 3) Use the compensated sound signal corresponding to the sample sound signal as the label of the nonlinear distortion signal corresponding to the sample sound signal to generate a pairing data.
[0028] 4) Train the neural network using the generated paired data;
[0029] 5) For a given audio signal, process it using the neural network trained in step 4) to generate a compensated audio signal for the target group with hearing loss.
[0030] Furthermore, in step 1), the method for generating the compensated sound signal is as follows: For a sound signal x, the neural network first performs a short-time Fourier transform on the sound signal x to obtain the time spectrum X of the sound signal x; then, it performs a nonlinear transform on the time spectrum X to obtain the compensated time spectrum X. NC Then, the compensated time spectrum X NC Perform an inverse short-time Fourier transform to obtain the compensation signal y, which serves as the compensated sound signal for the target group with hearing loss.
[0031] Furthermore, the method for generating nonlinear distortion signals using the simulated hearing impairment model is as follows: For a compensated sound signal y, the simulated hearing impairment model first performs a short-time Fourier transform on the sound signal y to obtain the time spectrum Y, and then uses the formula... Processing the time spectrum Y yields the time spectrum Z; where W NH W HI Let Z represent the weight matrix of the auditory filter in the normal auditory system and the weight matrix of the auditory filter in the simulated auditory system, respectively. Then, perform an inverse short-time Fourier transform on the time spectrum Z to obtain the time-domain signal as the nonlinear distortion signal z.
[0032] Furthermore, using the formula W(f,g)=(1+pg)e -pg The weight matrix W of the auditory filter in the simulated auditory system was calculated. HIWherein, let the frequency value corresponding to the i-th row of the output spectrum be f, and the energy value corresponding to the frequency value corresponding to the j-th row of the input spectrum be g. Then the element corresponding to the i-th row and j-th column of W is W(f,g), which represents the response caused by the energy value g at the frequency f in the simulated auditory system. e is the natural logarithm, and p is the parameter that determines the sharpness of the auditory filter.
[0033] Furthermore, f represents the center frequency value corresponding to the auditory filter, ERB represents the equivalent rectangular bandwidth, ERB = 24.7 × (0.00437f + 1); B represents the widening factor parameter.
[0034] Furthermore, HL OHC This indicates the degree to which damage to the outer hair cells of the simulated auditory system leads to an increase in the hearing threshold.
[0035] Furthermore, HL OHC = r × HL, where r is the proportional parameter of hearing threshold elevation caused by damage to outer hair cells, and HL is the simulated patient's hearing threshold.
[0036] Furthermore, in step 4), the neural network is trained using the backpropagation gradient descent algorithm; the loss function used for training is... Where Loss is the value of the loss function, x out (i) represents the signal value at time i of the neural network output, x NC (i) represents the signal value at time i in the label, n is the signal length, and m is the total number of paired data used for training.
[0037] The present invention also provides a server, characterized in that it includes a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program including instructions for performing the steps of the above-described method.
[0038] The present invention also provides a computer-readable storage medium having a computer program stored thereon, characterized in that the computer program, when executed by a processor, implements the steps of the above-described method.
[0039] Compared with the prior art, the positive effects of the present invention are as follows:
[0040] By utilizing deep neural networks to learn complex nonlinear mappings for speech compensation, and then playing the processed audio signal back to the user, user satisfaction and speech recognition capabilities can be effectively improved. Compared to traditional frequency-selective hearing compensation methods, this approach does not require manually defined gain functions, making it more generalizable; furthermore, neural networks have a stronger ability to characterize nonlinear mappings, enabling more effective compensation for hearing loss; and the method requires less computation, allowing for efficient real-time signal processing. Therefore, this method is a more suitable hearing compensation approach. The key innovations of this invention are, firstly, the use of neural networks to replace traditional spectral contrast enhancement methods; and secondly, the introduction of a hearing loss model to simulate an auditory system with impaired frequency selectivity, which is then used to guide the training of the neural network. Attached Figure Description
[0041] Figure 1 This is a framework diagram of the hearing compensation method.
[0042] Figure 2 This is a schematic diagram of the neural network model structure.
[0043] Figure 3 This is a diagram showing the experimental results. Detailed Implementation
[0044] The specific implementation details of the present invention will be described in more detail below. Figure 1 This is a schematic diagram of the method framework proposed in this invention. The specific implementation steps of this invention include neural network compensation, hearing loss simulation model processing, paired data generation, and neural network training.
[0045] The specific implementation process of each step is as follows:
[0046] 1. Neural network compensation
[0047] This method uses a neural network to replace the compensation module in traditional hearing compensation techniques. First, a short-time Fourier transform is performed on the sound signal x to obtain the time-frequency spectrum X of the speech signal. The time-frequency spectrum X is then input into a corresponding deep neural network (DNN). The DNN performs a non-linear transformation on the sound signal and outputs the compensated time-frequency spectrum X. NC ,Right now
[0048] X NC (n) = DNN(X(n))
[0049] Among them, X NC X(n) and X(n) represent the signal at time n. The network model structure of the DNN can be found in [reference needed]. Figure 2 It consists of a common three-layer fully connected layer and an activation layer. Its initial parameters are randomly generated. After training in (4), the final network parameter settings for different hearing loss groups are learned. Finally, the compensated time-frequency spectrum X is used.NC Perform an inverse short-time Fourier transform to obtain the compensation signal y.
[0050] 2. Simulation of hearing loss model processing
[0051] The simulated hearing loss model introduced in this method is mainly used to simulate the sound processing of a real hearing loss system. The input signal y is subjected to a short-time Fourier transform to obtain the time-frequency spectrum of the input signal Y. The time-frequency spectrum of the processed signal is then obtained using the following formula:
[0052]
[0053] Among them W NH W HI Let represent the auditory filter weight matrix for simulating a normal hearing model and the auditory filter weight matrix for simulating a hearing impairment model, respectively. Their calculation formulas are as follows:
[0054] W(f,g)=(1+pg)e -pg
[0055] Let f be the frequency value corresponding to the i-th row of the output spectrum, and g be the energy value corresponding to the frequency value corresponding to the j-th row of the input spectrum. Then, the element corresponding to the i-th row and j-th column of W is W(f,g), e is the natural logarithm, and p is the auditory filter sharpness parameter, which is affected by f and the widening factor parameter B (W HI With W NH (The corresponding B is different), the specific calculation formula is as follows:
[0056]
[0057] Where f represents the center frequency value corresponding to the filter, corresponding to f in the previous formula, and ERB represents the equivalent rectangular bandwidth, calculated by the following formula:
[0058] ERB = 24.7 × (0.00437f + 1)
[0059] B represents the widening factor parameter, calculated using the following formula: (W) HI With W NH Corresponding HL OHC different):
[0060]
[0061] Among them, HL OHC The degree to which damage to the outer hair cells of the auditory system leads to an increase in the hearing threshold is calculated as follows:
[0062] HL OHC =r×HL
[0063] Where r is the proportional parameter of hearing threshold elevation caused by damage to outer hair cells, which is manually set, and HL is the simulated patient hearing threshold, generally derived from actual measurements. HL, as the hearing threshold, reflects the degree of hearing loss in the patient. The settings of HL and r determine the difference in B, thus affecting W. HI With W NH The values of the two weight matrices allow the model to simulate the auditory system's processing of speech based on the set hearing threshold.
[0064] Finally, by performing an inverse short-time Fourier transform on the output time spectrum Z, the final output time-domain signal z can be obtained as a nonlinear distortion signal.
[0065] 3. Paired data generation
[0066] Specifically, as described in 1 and 2 above, given a sound signal x, a neural network (DNN) is first used to process it to generate a sound signal x that has undergone a certain nonlinear transformation. NC (i.e., the sound signal after compensation by the current neural network); then, the simulated hearing loss model is used to analyze x. NC After processing, a nonlinear distortion signal x′ is generated after processing by the hearing impairment system, and then paired data (x′, x) can be obtained. NC ); where x′ is the input of the neural network, x NC This is the label data corresponding to x'.
[0067] 4. Train neural networks using paired data
[0068] Finally, the above paired data is used to train the neural network DNN. After a certain number of training rounds, the above process 1-3 is repeated to obtain new paired data, which is regarded as one iteration. Then, training continues until the final termination condition is met (i.e., the number of iterations reaches the set value or the loss function value drops to the set value).
[0069] The training algorithm employs the classic backpropagation gradient descent algorithm, with an initial learning rate of 0.00001, which decays to 0.8 of the original rate every 5 epochs. The optimizer uses the common Adam optimization, with the objective of minimizing the following loss function:
[0070]
[0071] Where Loss is the desired loss function value, x out (i) represents the signal value output by the network at time i, x NC (i) is the label x described in process 3. NC The signal value at time i, n is the signal length, and m is the total number of paired data used in training.
[0072] The advantages of the present invention will be explained below with reference to specific embodiments.
[0073] This method was used to evaluate the effectiveness of hearing compensation under computer simulation conditions.
[0074] 1. Experimental setup
[0075] To verify the effectiveness of the hearing compensation method proposed in this invention, we conducted a computer simulation experiment. The experiment simulated a common type of hearing loss (moderate hearing loss). Six conditions were set in the experiment: Condition 1 was no background noise; Conditions 2 and 3 added spectral noise as background noise, with a signal-to-noise ratio (SNR) of 0 dB and -3 dB, respectively; Conditions 4, 5, and 6 added competing speaker speech as background noise, with SNRs of -3 dB, -6 dB, and -9 dB, respectively. The results were compared between no processing, processing using the network trained by this method, and processing using traditional compensation methods. Twelve participants were recruited to conduct subjective voting, comparing pairs to select the method with the better subjective listening experience.
[0076] 2. Experimental Results
[0077] Figure 3 The experimental results show that under any given condition, our method outperforms the other two treatments, and this is statistically significant, indicating that our proposed method has certain advantages over traditional methods.
[0078] Although specific embodiments and accompanying drawings of the invention have been disclosed for illustrative purposes to aid in understanding and implementing the invention, those skilled in the art will understand that various substitutions, variations, and modifications are possible without departing from the spirit and scope of the invention and the appended claims. Therefore, the invention should not be limited to the content disclosed in the preferred embodiments and accompanying drawings.
Claims
1. A hearing compensation method for frequency-selective impairment of the auditory system, comprising the following steps: 1) Construct the compensation module using a fully connected neural network; The compensation module performs a nonlinear transformation on each sample sound signal in the training set to obtain a compensated sound signal for the target population with hearing loss. 2) Set up a simulated hearing loss model for the target group with the desired degree of hearing loss. Use the simulated hearing loss model as a simulated auditory system for the target group with the desired degree of hearing loss. Perform nonlinear processing on each compensated sound signal to generate a nonlinear distortion signal. 3) Use the compensated sound signal corresponding to the sample sound signal as the label of the nonlinear distortion signal corresponding to the sample sound signal to generate a pairing data; 4) Train the neural network using the generated paired data; 5) For a given audio signal, process it using the neural network trained in step 4) to generate a compensated audio signal for the target group with hearing loss.
2. The method as described in claim 1, characterized in that, In step 1), the method for generating the compensated sound signal is: for a sound signal x, the neural network first performs a short-time Fourier transform on the sound signal x to obtain a time-frequency spectrum X of the sound signal x; then performs a nonlinear transformation on the time-frequency spectrum X to obtain a compensated time-frequency spectrum Then the compensated time spectrum Perform an inverse short-time Fourier transform to obtain the compensation signal y, which serves as the compensated sound signal for the target group with hearing loss.
3. The method as described in claim 2, characterized in that, The method for generating nonlinear distortion signals using the simulated hearing impairment model is as follows: For a compensated sound signal y, the simulated hearing impairment model first performs a short-time Fourier transform on the sound signal y to obtain the time spectrum Y, and then uses the formula... The time spectrum Y is processed to obtain the time spectrum. ;in Let Z represent the weight matrix of the auditory filter in the normal auditory system and the weight matrix of the auditory filter in the simulated auditory system, respectively. Then, perform an inverse short-time Fourier transform on the time spectrum Z to obtain the time-domain signal as the nonlinear distortion signal z.
4. The method as described in claim 3, characterized in that, Through the formula W( f,g ) = (1 + pg ) The weight matrix of the auditory filter in the simulated auditory system was calculated. Wherein, let the frequency value corresponding to the i-th row of the output spectrum be... f The energy value of the frequency corresponding to the j-th row of the input spectrum is... g Then the element corresponding to the i-th row and j-th column of W is W( f,g ), 𝑊( f,g () indicates energy value g Frequency in an analog auditory system f The response caused by the location e It is the natural logarithm. p The parameters that determine the sharpness of the auditory filter.
5. The method as described in claim 4, characterized in that, ; This represents the center frequency value corresponding to the auditory filter, and ERB represents the equivalent rectangular bandwidth, ERB = 24.7 × (0.00437). + 1); B represents the widening factor parameter.
6. The method as described in claim 5, characterized in that, B = ; This indicates the degree to which damage to the outer hair cells of the simulated auditory system leads to an increase in the hearing threshold.
7. The method as described in claim 6, characterized in that, = r×HL, where r is the proportional parameter of hearing threshold elevation caused by damage to outer hair cells, and HL is the simulated patient's hearing threshold.
8. The method as described in claim 2, characterized in that, In step 4), the neural network is trained using the backpropagation gradient descent algorithm; the loss function used for training is Loss = Where Loss is the value of the loss function. The frequency spectrum of the i-th frame output by the neural network. Let be the frequency spectrum of the i-th frame, n be the number of frames corresponding to the frequency spectrum of the signal, and m be the total number of paired data used for training.
9. A server, characterized in that, The method includes a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program including instructions for performing each step of the method of any one of claims 1 to 8.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 8.