Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

60results about How to "Improve speech enhancement performance" patented technology

Voice enhancement device based on distributed microphone array network

ActiveCN105206281AExpand the scope of space observationImprove speech enhancement performanceSpeech analysisSpeech enhancementSingle node
The invention discloses a voice enhancement device based on a distributed microphone array network. The method comprises the steps that the distributed microphone array network based on an Ad-hoc network is built; sampling rate synchronization is carried out on network nodes; framing is carried out on signals of the nodes; a multi-channel wiener filter is carried out at each node for voice enhancement; the enhanced voice signals are transmitted to all other nodes of the network; at each node, according to multi-channel microphone array observation signals of the current node and single-channel enhanced voice signals of the other nodes, the multi-channel wiener filters are adopted again for voice enhancement, and the updated single-channel enhanced voice signals of the current node are obtained. Isolated microphone arrays are connected through the wireless communication network to form a microphone array network, so that the voice enhancement effect of a single node can be improved.
Owner:胡旻波

Microphone array voice enhancement device with sound source direction tracking function and method thereof

The invention provides a microphone array voice enhancement device with a sound source direction tracking function and a method for the microphone array voice enhancement device. The microphone array voice enhancement device relates to voice signal processing. A microphone array, an adjustable parallel beam former group module, a fixed parameter FIR (Finite Impulse Response) filter module, a fixed parameter signal blocking module, an adaptive noise canceller module and a sound source direction update module are aranged in the devide. The method comprises the following steps of: initializing, forming an adjustable beam, fixing parameter filtering, blocking a signal, carrying out adaptive noise cancelling, and updating a sound source direction. The invention provides an adjustable parallel beam former group combined with a sidelobe canceller structure to achieve real-time tracking of a target sound source direction, embeds the sound source direction tracking function directly to a generalized sidelobe canceller structure, can achieve sound source direction tracking and voice enhancement, thereby overcoming sensibility of an algorithm performance on a DOA (Direction of Arrival) estimation error.
Owner:XIAMEN UNIV

Method for estimating priori SAP based on statistical model

A priori speech absence probability refers to a probability that a speech is not present with respect to a frame and a frequency bin resulting from an input signal. The priori speech absence probability has been regarded as a constant (generally, 0.5) because it is difficult to estimate. However, attempts to estimate the priori speech absence probability have been made since 2002. A novel method for estimating a priori speech absence probability using a statistical model is proposed. The method for estimating a priori speech absence probability obtains a priori speech absence probability of input speech data using a local parameter, a global parameter and an average parameter. The local parameter and the global parameter are obtained by determining a smaller value than a first threshold value as 0, determining a greater value than a second threshold value as 1, and applying a raised cosine function to values between the first threshold value and the second threshold value. The average parameter is obtained by a frame average of a posteriori signal-to-noise ratio in log scale.
Owner:ELECTRONICS & TELECOMM RES INST

Voice enhancement method, device, equipment and storage medium

The invention provides a voice enhancement method, device, equipment and storage medium. The method comprises the following steps: acquiring a first voice signal and a second voice signal; acquiring asignal-to-noise ratio of the first voice signal; determining fusion coefficients of filtered signals corresponding to the first voice signal and the second voice signal according to the signal-to-noise ratio of the first voice signal; performing voice fusion processing on the filtered signals corresponding to the first voice signal and the second voice signal according to the fusion coefficientsto obtain a voice enhanced signal. The self-adaptive adjustment of the fusion coefficients of voice signals of a non-air conduction voice sensor and an air conduction voice sensor according to an environmental noise is realized, thereby improving the signal quality after voice fusion and improving the effect of voice enhancement.
Owner:SHENZHEN GOODIX TECH CO LTD

Broadband wave beam forming method and apparatus

The disclosed forming method for broadband waveform comprises: determining the sub-band signal opposite to the microphone signal, as well as the signal frequency-domain correlation matrix; according to 3D space transmission vector of signal source and former matrix, determining the weight vector for every sub-band signal; then deciding the output signal. This invention combines frequency and space domain for speech process, and improves SNR for wide application.
Owner:HUAWEI TECH CO LTD +1

Microphone speech enhancement method and microphone speech enhancement device

InactiveCN105244036AHigh quality pickupHigh Noise Estimation AccuracySpeech analysisSound sourcesArray element
The invention provides a microphone speech enhancement method and a corresponding device. The method comprises the steps of acquiring first array speech signals which are acquired and inputted through multi-channel digital speech acquisition equipment; calculating optimal beam output signals synthesized by the first array speech signals by adopting the first array voice signals according to a minimum variance adaptive beam optimization model of the first array speech signals; and carrying out single-channel speech enhancement processing by adopting a power spectrum estimation value of the optimal beam output signals, wherein the minimum variance adaptive beam optimization model of the first array speech signals comprises a space guidance vector from a target sound source to the multi-channel digital speech acquisition equipment. The microphone speech enhancement method and the microphone speech enhancement device provided by the invention can process original speech of a speech acquisition equipment array with many array elements and large spacing.
Owner:ZTE CORP

Method and device for estimating noise power spectral density of speech signal

The invention relates to the technical field of speech processing and specifically provides a method and a device for estimating a noise power spectral density of a speech signal. The method comprisesthe following steps: extracting a time context window feature from a noise speech signal, inputting the time context window feature to a pre-trained speech existence probability estimator which outputs an estimated speech existence probability corresponding to a current time frame; correcting the estimated speech existence probability according to a Bayes rule to determine a speech existence probability; determining the noise power spectral density corresponding to the corresponding time frame based on the speech existence probability according to a recursive smoothing formula. Through the technical scheme of the invention, the estimation accuracy of the noise power spectral density is improved in the case of small computing resources, thereby being beneficial to effectively eliminating noise signals, minimizing distortion in the speech processing process and improving speech enhancement performance.
Owner:PING AN TECH (SHENZHEN) CO LTD

Speech enhancement method based on multi-head self-attention mechanism

The invention relates to a speech enhancement method based on a multi-head self-attention mechanism and relates to the technical field of speech enhancement. According to the method, for the problem that noises cannot be clearly suppressed in an attention computing process through adoption of the speech enhancement method based on an attention mechanism, on the basis of research and utilization ofmasking effect existing in an auditory perception process of people, the invention provides the speech enhancement method based on the multi-head self-attention mechanism. According to the method, the part of suppressing noises in the attention mechanism computing process is realized, and speech enhancement performance is improved.
Owner:BEIJING INST OF COMP TECH & APPL

Speech enhancement method, device, apparatus and storage medium

The invention provides a speech enhancement method, a speech enhancement device, a speech enhancement apparatus and a storage medium. The method includes the following steps that: the speech featuresof speech to be enhanced are obtained; the speech features of the speech to be enhanced are inputted into an enhancement model, so that the ideal ratio film (IRM) of the speech to be enhanced is obtained, wherein the enhancement model is a model which is implemented based on a generative adversarial network (GAN) and is used for obtaining the IRM according to the speech features; and the speech enhancement result of the speech to be enhanced is obtained according to the speech features of the speech to be enhanced and the IRM of the speech to be enhanced. With the speech enhancement method, the speech enhancement device, the speech enhancement apparatus and the storage medium of the invention adopted, a speech enhancement effect can be improved.
Owner:BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Training method and device of speech enhancement model a well as speech enhancement method and device

ActiveCN112927707AImprove speech enhancement performanceImproving Scale-Independent Signal-to-Noise RatioSpeech analysisNoiseSpeech sound
The invention relates to a training method and device of a speech enhancement model as well as a speech enhancement method and device. The training method comprises the steps that: the feature vectors of noisy speech samples and first pure speech samples of a plurality of speakers are obtained, wherein the noisy voice sample of each speaker is obtained by adding noise data to a second pure speech sample corresponding to the speaker; the amplitude spectra of the noisy speech samples are input into a speech enhancement network to obtain an estimated first mask ratio; the estimated first mask ratio and the feature vector are input into an attention mechanism network to obtain an estimated second mask ratio; an estimated amplitude spectrum is determined according to the estimated second mask ratio and the amplitude spectra, and a loss function of the speech enhancement model is determined according to the estimated amplitude spectrum and the amplitude spectra of the second pure speech samples; and the speech enhancement model is trained by adjusting parameters of the speech enhancement network and the attention mechanism network according to the loss function.
Owner:BEIJING DAJIA INTERNET INFORMATION TECH CO LTD

Speech enhancement method based on gated cycle encoding and decoding network

The invention relates to a speech enhancement method based on a gated cycle encoding and decoding network, and relates to the technical field of speech enhancement. According to the method, the speechenhancement method based on the gated cycle encoding and decoding network is proposed based on the research of the process of human auditory perception aiming at the problem that the conventional speech enhancement method does not employ the connection between context information and a current to-be-enhanced speech frame. In a speech enhancement task, an encoding and decoding architecture is introduced, modeling of multiple adjacent frames of speech signals is performed by employing an encoder to extract the context information, mining of the connection between the current to-be-enhanced speech frame and the context information is performed by employing a decoder, and the speech enhancement performance is improved.
Owner:BEIJING INST OF COMP TECH & APPL

Speech enhancement method using stacked multiscale modules

The invention discloses an end-to-end speech enhancement method using stacked multiscale modules. The end-to-end speech enhancement method using the stacked multiscale modules comprises the followingsteps of S1, building a cascading end-to-end speech enhancement frame, and splicing the stacked multiscale modules into a network structure; S2, in the preprocessing stage, converting a time-domain signal into a two-dimensional characteristic; S3, utilizing a speech enhancement module for enhancing the two-dimensional characteristic; and S4, in the postprocessing stage, converting enhanced character representation into a one-dimensional time-domain signal through decoding synthesis. In order to further improve the performance of an algorithm, speech enhancement evaluation indicators STOI and SDR are integrated into a loss function through applying a multi-target joint optimization training strategy. The experiment shows that the method provided by the invention is capable of remarkably improving a speech enhancement effect and has better noise immunity under the conditions of unknown noise and low signal-to-noise ratio.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA

Speech enhancement method in music background based on non-negative matrix factorization

The invention discloses a speech enhancement method in a music background based on non-negative matrix factorization, and belongs to the field of voice analysis or synthetization, audio analysis or processing. The method provided by the invention comprises the steps of framing and windowing a mixed signal of music and voice, carrying out non-negative matrix factorization to an STFT (The Short-Time Fourier Transform) amplitude spectrum, wherein a basic matrix of the background music is obtained by training and is fixed in a decomposing process, the amplitude spectrum of the voice signal is synthesized according to the decomposing result, and then an enhanced voice signal is restored by combining a phase spectrum of a primary mixed signal. The test can be carried out under the different voice sparsity limitations and temporary continuity limitation, therefore the voice enhancement effect in the music background of the music can be effectively improved by improving the temporary continuity limitation of the background music.
Owner:BEIJING INSTITUTE OF TECHNOLOGYGY

Phase-sensitive gated multi-scale dilated convolutional network speech enhancing method and system

The invention provides a phase-sensitive gated multi-scale dilated convolutional network speech enhancing method. The method comprises the following steps: constructing a mapping relationship betweencomplex frequency spectrums of speech signals by using a neural network model, mapping a real and imaginary part frequency spectrum of noisy speech subjected to time-frequency analysis processing to obtain an enhanced real and imaginary part frequency spectrum, and recovering the spectrum into an enhanced time domain voice signal. The invention also provides a phase-sensitive gating multi-scale dilated convolutional network speech enhancing system. The method has the beneficial effects that: the method improves the speech enhancement effect, guarantees that the enhanced speech has good speechintelligibility, and better avoids the problem of speech distortion.
Owner:SHENZHEN INSTITUTE OF INFORMATION TECHNOLOGY

Speech enhancement method and device

The invention discloses a speech enhancement method and device. The method comprises steps as follows: the characteristic quantity of noise in a silence section of a speech signal is acquired; a noise class matched with the noise in the silence section is determined from multiple preset noise classes according to the characteristic quantity of the noise in the silence section, and the multiple noise classes are acquired after clustering of multiple noise samples according to characteristic information of the multiple noise samples; a noise model corresponding to the noise class matched with the noise in the silence section is determined according to the noise class matched with the noise in the silence section as well as the mapping relation between the noise class and the noise model; speech enhancement is performed on the speech signal according to the noise model corresponding to the noise class matched with the noise in the silence section. With the adoption of the speech enhancement method and device, speech enhancement is performed on the speech signal according to the noise model corresponding to the noise class, and the speech enhancement effect can be improved.
Owner:HUAWEI TECH CO LTD

Single-channel speech enhancement method based on joint dictionary learning and sparse representation

ActiveCN111508518AQuality improvementIncreased Time-Frequency Characterization CapabilitiesSpeech analysisComplex mathematical operationsDictionary learningFrequency spectrum
The invention provides a single-channel speech enhancement method based on joint dictionary learning and sparse representation. Carrying out dual-tree complex wavelet transform on the clean voice to obtain a group of sub-band signals, carrying out short-time Fourier transform on the sub-band signals to obtain a time-frequency spectrum of the sub-band signals, learning a joint dictionary of the clean voice by utilizing the amplitude, the real part, the imaginary part and the voice sparsity of the sub-band signals, and learning a joint dictionary of the clean noise as well; carrying out dual-tree complex wavelet transform and short-time Fourier transform on the noisy speech; obtaining a time-frequency spectrum of each sub-band signal; phase and real part imaginary part symbols are reserved;amplitude, real part and imaginary part absolute values are extracted and projected on the clean voice and clean noise joint dictionary; according to the method, the sparse representation coefficientsof the voice and the noise are obtained, the final estimation of the sub-band voice time-frequency spectrum is obtained by using the coefficients, the time-frequency spectrum phase, the real part imaginary part symbol, the mask, the weight and the like, and the enhanced voice signal is obtained by performing short-time inverse Fourier transform and dual-tree complex wavelet inverse transform, sothat the voice enhancement capability is improved.
Owner:UNIV OF SCI & TECH OF CHINA

Speech enhancement method and device, electronic equipment and storage medium

The invention provides a voice enhancement method and device, electronic equipment and a storage medium, and the method comprises the steps: obtaining collected original voice, carrying out the noise reduction of the original voice, and obtaining noise-reduced voice; determining a speech enhancement mask of the original speech based on the original speech and the noise reduction speech; and performing voice enhancement on the original voice based on a voice enhancement mask. According to the embodiment of the invention, the original voice information and the noise reduction voice information are fused into the voice enhancement mask, so the voice enhancement mask can accurately learn the mapping relation from the original voice with noise to the clean voice, and the voice enhancement effect is improved.
Owner:西安讯飞超脑信息科技有限公司

Voice enhancement network model and single-channel speech enhancement method and system

PendingCN112509593ALearning long-term memory propertiesTransfer of controlSpeech analysisEngineeringNetwork model
The invention provides a single-channel speech enhancement method. The method is realized through a speech enhancement network model. The speech enhancement network model comprises an analysis layer,an encoder, a time convolution module, a decoder and a synthesis layer. According to the single-channel speech enhancement method provided by the invention, an analysis layer of quasi-short-time windowed Fourier transform based on convolution layer design and a synthesis layer of quasi-inverse short-time windowed Fourier transform are added, so that the characteristics of speech are better mined in a transform domain. Besides, the encoder and the decoder are constructed by adopting the gating convolution layer so as to expand the receptive field, transmission of information in the hierarchicalstructure can be better controlled, and the time convolution module is added between the encoder and the decoder so as to better learn the long-term memory characteristic of the speech, so that the speech enhancement effect can be enhanced. Meanwhile, the invention provides a single-channel speech enhancement system and a speech enhancement network model.
Owner:BEIJING TSINGMICRO INTELLIGENT TECH CO LTD

Speech enhancement method based on time-frequency domain joint loss function

The invention provides a speech enhancement method based on a time-frequency domain joint loss function. The method comprises steps: integrating a clean voice data set and a noise data set in an open source data set into a noisy voice data set, converting the noisy voice data set into an amplitude spectrum, a phase spectrum and waveform data through preprocessing operation, and constructing a training set; constructing a CNN network model, taking the noisy voice amplitude spectrum as input, taking the clean voice amplitude spectrum as a label, and carrying out model training; performing waveform reconstruction on an amplitude spectrum estimation value output by the model and a noisy speech phase spectrum through an inverse short-time Fourier transform method to obtain a time domain waveform of estimated speech; calculating frequency domain loss through the clean voice amplitude spectrum and the amplitude spectrum estimated value; calculating time domain loss through the clean voice time domain waveform and the estimated voice time domain waveform; and constructing time-frequency domain joint loss according to the frequency domain loss and the time domain loss, and guiding the CNN network model to perform weight optimization. The phenomenon that the estimated amplitude spectrum is not matched with the phase spectrum is reduced, and the speech enhancement effect is improved.
Owner:WUHAN UNIV

Voice enhancement method and device based on multi-frame spectrum and non-negative matrix decomposition

The invention provides a voice enhancement method and device based on multi-frame spectrum and non-negative matrix decomposition and belongs to the voice enhancement and non-negative matrix decomposition field. The method comprises steps that pure voice, noise and noise-contained voice are pre-processed to acquire short-time spectrum which is converted into multi-frame spectrum; the multi-frame spectrum of the noise and the pure voice is converted into products of corresponding base matrixes and corresponding coefficient matrixes, and a base matrix of the multi-frame spectrum of the noise and a base matrix of the multi-frame spectrum of the pure voice are solved; the two base matrixes are synthesized to form a base matrix of the multi-frame spectrum of the noise-contained voice, the multi-frame spectrum of the noise-contained voice is converted into a product of a base matrix and a coefficient matrix, a coefficient matrix of the multi-frame spectrum of the noise-contained voice is acquired, and an initial estimate of the multi-frame spectrum of the noise and enhanced voice is acquired; through a Wiener filtering method, the multi-frame spectrum of the enhanced voice is acquired and is transformed into a time domain signal, and enhancement voice is lastly acquired. The method is advantaged in that the special voice information is kept, the voice is better reduced, and the voice enhancement effect is improved.
Owner:北京华控智加科技有限公司

Speech enhancement model training method and device and speech enhancement method and device

The invention relates to the technical field of speech processing, and provides a speech enhancement model training method and device and a speech enhancement method and device. The training method of the speech enhancement model comprises the following steps: acquiring a speech training set, wherein the voice training set comprises noisy voice samples and pure voice samples; acquiring an amplitude spectrum corresponding to the noisy voice sample, inputting the amplitude spectrum into the generation network, and acquiring an enhanced voice amplitude spectrum; acquiring an amplitude spectrum corresponding to the pure voice sample and an enhanced voice amplitude spectrum, and inputting the amplitude spectrum and the enhanced voice amplitude spectrum into a discrimination network to acquire a discrimination result; and adjusting network parameters of the generation network and the discrimination network according to the enhanced voice amplitude spectrum, the amplitude spectrum corresponding to the pure voice sample, the discrimination result and the optimization target, and generating a voice enhancement model. By adopting the method, the performance of the speech enhancement model can be improved, and the speech enhancement effect is further improved.
Owner:SHANGHAI WINGTECH INFORMATION TECH CO LTD

Hearing aid speech enhancement method based on depth domain adaptive network

The invention discloses a hearing aid speech enhancement method based on a depth domain adaptive network. The method comprises the steps of extracting frame-level logarithm power spectrum features from noisy speech and clean speech respectively; constructing a deep learning model based on an encoder-decoder structure as a baseline speech enhancement model; on the basis of the baseline speech enhancement model, constructing a transfer learning speech enhancement model based on a depth domain adaptive network, wherein the transfer learning speech enhancement model introduces a domain adaptationlayer and a relative discriminator between a feature encoder and a reconstruction decoder; training the transfer learning speech enhancement model by using the domain adversarial loss; and in an enhancement stage, according to the trained depth domain adaptive transfer learning speech enhancement model, inputting frame-level LPS features of noisy speech in a target domain, and reconstructing an enhanced speech waveform. According to the method, the feature encoder is stimulated to generate domain invariance features through domain adversarial training, so that the adaptability of the speech enhancement model to unseen noise is improved.
Owner:NANJING INST OF TECH

Speech enhancement method and device thereof, equipment and medium

The invention discloses a speech enhancement method and a device thereof, equipment and a medium. The method comprises the following steps: acquiring a target noisy voice signal and performing short-time Fourier transform on the target noisy voice signal to obtain a target frequency domain signal corresponding to the target noisy voice signal; inputting the target feature of the current signal frame of the target frequency domain signal into an encoder in a voice noise suppression model obtained by pre-training to obtain an encoding feature corresponding to the current signal frame of the target frequency domain signal; inputting the coding feature and a decoding feature corresponding to a previous signal frame of a current signal frame of a target frequency domain signal output by a decoder in a voice noise suppression model into the decoder to obtain a decoding feature corresponding to the current signal frame of the target frequency domain signal; and performing signal reconstruction on the decoding features corresponding to each signal frame of the target frequency domain signal to obtain a target enhanced voice signal corresponding to the target noisy voice signal. According to the technical scheme, the speech enhancement effect can be improved, and calculation time and calculation cost are reduced.
Owner:EVERSEC BEIJING TECH

Voice signal enhancement method, device and equipment

The invention discloses voice signal enhancement method, device and equipment. The method comprises the following steps of: acquiring a voice signal and geographical position information correspondingto the voice signal; and matching an environment scene type for a voice field corresponding to the voice signal according to the geographic position information, eliminating the environment noise inthe voice signal according to the environment scene type, identifying required voice data from the voice signal after the environment noise is eliminated, and enhancing the identified voice data. Through the above mode, the interference of the environmental noise contained in the voice signal can be reduced, the accuracy of voice recognition from the voice signal is improved, and the effect of voice enhancement of the recognized voice can be improved.
Owner:XIAMEN KUAISHANGTONG TECH CORP LTD

Environment adaptive voice enhancement algorithm based on attention-driven circulating convolution network

The invention discloses an environment adaptive voice enhancement algorithm based on an attention-driven circulating convolution network. The environment adaptive voice enhancement algorithm comprisesthe following steps that 1, a voice enhancement task database is selected, and input data preparation is conducted; 2, amplitude information and environment information of voice are extracted, wherein the environment information of the voice is extracted by adopting a weight prediction error (WPE) method, and the amplitude information of the voice is voice spectrum information extracted through Fourier transform; 3, a depth model is constructed and trained; and 4, voice reconstructing is conducted, specifically, voice amplitude predicted in the step 3 is converted into a voice waveform. According to the environment adaptive voice enhancement algorithm, the environment information of the voice is considered, and environmental adaptability and robustness of the algorithm in different environments are improved; and in the aspect of real voice signal retention, an attention mechanism is fused to construct the attention-driven circulating convolution network, time-sequence context information of the voice is depicted more precisely, and performance of voice enhancement is effectively improved.
Owner:TIANJIN UNIV

Audio noise reduction method and device, equipment and storage medium

The invention relates to artificial intelligence, and provides an audio noise reduction method and device, equipment and a storage medium. The method comprises the following steps: pre-processing noise frequency to obtain frequency spectrum information, processing the frequency spectrum information based on a frequency domain signal processing network to obtain frequency spectrum mask features, acquiring time-frequency features according to the frequency spectrum information and the frequency spectrum mask features, processing the time-frequency features based on a time domain signal processing network to obtain time-frequency mask features, generating a predicted audio according to the time-frequency features and the time-frequency mask features, adjusting network parameters of a preset learner based on the predicted audio and the pure audio to obtain a noise reduction model, acquiring a request audio, and performing noise reduction processing on the request audio based on the noise reduction model to obtain a target audio. According to the method, the noise reduction accuracy and real-time performance of the request audio can be improved. In addition, the invention also relates to a block chain technology, and the target audio can be stored in a block chain.
Owner:PING AN TECH (SHENZHEN) CO LTD

Microphone array speech enhancement method and device, electronic equipment and storage medium

The invention relates to a microphone array speech enhancement method and device, electronic equipment and a storage medium. The method comprises the following steps: acquiring a to-be-enhanced speech signal with a known sound source direction through a microphone array; extracting a spectrum feature and a direction coherence feature of the to-be-enhanced speech signal; inputting the spectrum feature and the direction coherence feature of the to-be-enhanced speech signal into a pre-trained speech enhancement network to obtain an enhanced Fourier coefficient of the to-be-enhanced speech signal; and carrying out inverse Fourier transform on the enhanced Fourier coefficient of the to-be-enhanced speech signal to obtain an enhanced speech signal. The filtering operation of beam forming is realized through the speech enhancement network, and the weight coefficient of beam forming is obtained by training based on a data-driven supervised learning method and is closer to an actual application scene. Therefore, the speech enhancement effect is improved.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

Speech enhancement model, electronic device, storage medium and related method

The embodiment of the invention provides a speech enhancement model, electronic equipment, a storage medium and a related method. The speech enhancement method comprises the following steps: converting noisy speech data into time-frequency domain feature data; generating a masking value of the noisy voice data according to the long-range correlation of the time-frequency domain characteristic data in the frequency direction; and generating enhanced voice data of the noisy voice data according to the masking value and the time-frequency domain feature data. According to the scheme, the effect of performing voice enhancement on the voice signal can be improved.
Owner:ALIBABA DAMO (HANGZHOU) TECH CO LTD

Low-signal-to-noise-ratio speech enhancement method based on information distillation and aggregation

The invention provides a low-signal-to-noise-ratio speech enhancement method based on information distillation and aggregation. The method comprises the following steps: performing speech feature extraction on an original speech spectrogram to obtain speech information representation; performing multi-stage information distillation processing on the voice information representation to obtain a voice information distillation result after noise component filtering; and performing spectrogram reconstruction on the voice information distillation result. The calibrated information on the information distillation line at each moment in the multi-stage information distillation processing process formed according to an attention mechanism and an information distillation mechanism is used as the input of self-attention information processing sub-modules at the next moment; and through information distillation and recalibration of the N attention information processing sub-modules and N information distillation sub-modules in sequence, the noise component filtering effect is finally achieved. The method can adapt to speech feature extraction in different environments, so that the models can adapt to acoustic features of different noises, and the speech enhancement effect is remarkably improved.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA

Method and device for multi-accent speech recognition

The invention discloses a method and a device for multi-accent speech recognition. The method for multi-accent speech recognition comprises the steps: adding an adaptive layer for learning accent-related feature information in a coding stage for a single speech recognition system, enabling an accent representation vector to serve as guidance information for each encoder block, inputting into the adaptive layer and guiding a conversion function in the adaptive layer, wherein one encoder is provided with a plurality of encoder blocks which are connected in series; inputting accent irrelevant features into the adaptive layer at the same time; and mixing the accent irrelevant features and the accent representation vector to form accent relevant features. According to the embodiment of the invention, the injection position, accent cardinal number and different types of accent cardinal numbers of the adaptive layer are further discussed so that better accent adaptation is realized.
Owner:AISPEECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products