Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

59 results about "Time frequency masking" patented technology

Time-frequency masking (TFM) method [1] separates sound sources by masking unwanted sounds in the time-frequency domain. The method primarily relies on clustering of the mixed signals with respect to their amplitudes and time delays.

Multi-speaker voice separation method based on convolutional neural network and depth clustering

The invention discloses a multi-speaker voice separation method based on a convolutional neural network and depth clustering. The method comprises the following steps: 1, a training stage: respectively performing framing, windowing and short-time Fourier transform on single-channel multi-speaker mixed voice and corresponding single-speaker voice; and training mixed voice amplitude frequency spectrum and single-speaker voice amplitude frequency spectrum as an input of a neural network model; 2, a testing stage: taking the mixed voice amplitude frequency spectrum as an input of a threshold expansion convolutional depth clustering model to obtain a high-dimensional embedded vector of each time-frequency unit in the mixed frequency spectrum; using a K-means clustering algorithm to classify thevectors according to a preset number of speakers, obtaining a time-frequency masking matrix of each sound source by means of the time-frequency unit corresponding to each vector, and multiplying thematrixes with the mixed voice amplitude frequency spectrum respectively to obtain a speaker frequency spectrum; and combining a mixed voice phase frequency spectrum according to the speaker frequencyspectrum, and obtaining a plurality of separate voice time domain waveform signals by adopting short-time Fourier inverse transform.
Owner:XINJIANG UNIVERSITY

Time frequency mask-based single acoustic vector sensor (AVS) target voice enhancement method

ActiveCN104103277AReduce complexityTarget Direction Speech EnhancementSpeech analysisPoint correlationAcoustic vector sensor
The invention relates to a time frequency mask-based single acoustic vector sensor (AVS) target voice enhancement method. According to the method, the arrival angle of the target voice is known, a method of combining a fixed beam former and a post-positioned Wiener filter is adopted for realizing target voice enhancement, and calculation of the weight value of the post-positioned Wiener filter involves self-power spectrum estimation of the target voice. Time frequency sparse characteristics of a voice signal are used, the time frequency point correlation arrival angle for receiving audio signals is estimated through calculating the ISDR (Inter-sensor data ratio) of component signals outputted by two gradient sensors in the AVS, time frequency mask is designed through calculating errors between the time frequency point correlation arrival angle and a target arrival angle, and thus self-power spectrum estimation of the target voice is acquired. According to the method of the invention, any noise prior knowledge does not needed, the target voice can be effectively enhanced in a complicated environment where multiple speakers exist, and interference voice can background noise can be suppressed. In addition, the operation complexity is low, the adopted microphone array size is small (about 1cm<3>), and application on a portable device is excessively facilitated.
Owner:SHENZHEN HIAN SPEECH SCI & TECH CO LTD

Method for sound source direction estimation based on time frequency masking and deep neural network

The invention discloses a method and device for sound source direction estimation based on time frequency masking and a deep neural network, electronic equipment and a storage medium, and belongs to the field of computer technologies. The method comprises the steps of acquiring a multichannel sound signal; carrying out framing, windowing and Fourier transform on each channel sound signal in the multichannel sound signal so as to form a short-time Fourier spectrum of the multichannel sound signal; carrying out an iterative operation on the short-time Fourier spectrum through a pre-trained neural network model, calculating ratio membranes corresponding to target signals in the multichannel sound signal, and fusing the multiple ratio membranes to form a single ratio membrane; and marking andweighting the multichannel sound signal according to the single ratio membrane to determine the orientation of the target sound source. The method and device for sound source direction estimation based on the time frequency masking and the deep neural network can have strong robustness in the environment with a low signal-to-noise ratio and strong reverberation, and improve the accuracy and stability of direction estimation for the target sound source.
Owner:ELEVOC TECH CO LTD

Combined model training method and system

The embodiment of the invention provides a combined model training method. The method comprises the following steps: extracting the phase spectrum and the logarithm magnitude spectrum of a noisy voicetraining set in an implicit manner; by utilizing the magnitude spectrum fragments of the logarithm magnitude spectrum after expansion as the input features of a time frequency masking network, and byutilizing the noisy voice training set and a clear voice training set, determining a target masking label used for training the time frequency masking network, based on the input features and the target masking label, training the time frequency masking network, and estimating a soft threshold mask; and enhancing the phase spectrum of the noisy voice training set by utilizing the soft threshold mask, wherein the enhanced phase spectrum is adopted as the input features of a DOA (direction of arrival) estimation network, and training the DOA estimation network. The embodiment of the invention further provides a combined model training system. According to the embodiment of the invention, by setting the target masking label, the input features are extracted in an implicit manner, and the time frequency masking network and DOA estimation network combined training is more suitable for the DOA estimation task.
Owner:AISPEECH CO LTD

Speech enhancement method and system, computer equipment and storage medium

The invention provides a speech enhancement method and system, computer equipment and a storage medium, and relates to the technical field of the human-machine speech interaction. The method comprisesthe following steps: collecting multi-channel acoustic signals through an acoustic vector sensor, preprocessing the multi-channel acoustic signals and acquiring a time-frequency spectrum, filtering the time-frequency spectrum and outputting a signal atlas; performing masking processing on the signal atlas through a nonlinear mask, and outputting an enhanced single-channel speech spectrogram; inputting the single-channel spectrogram into a deep neural network mask estimation model and outputting a mask spectrogram; performing time-frequency masking enhancement on the signal atlas through the mask spectrogram to acquire enhanced amplitude speech spectrogram; reconstructing through the enhanced amplitude speech spectrogram so as to output an enhanced target speech signal. The technical problem that the multi-channel speech enhancement is high in hardware cost, large in collection system volume, and high in operation complexity is solved, and the excellent speech enhancement effect can beacquired under difference interference noise types, strengths and room reverberation conditions.
Owner:PEKING UNIV SHENZHEN GRADUATE SCHOOL

Blind source separation method based on mixed signal local peak value variance detection

The invention discloses a blind source separation method based on mixed signal local peak value variance detection, relating to the improvement on a DUET method and solving the problem that the peak value can not be effectively detected in the prior DUET blind source separation method, in particularly comprising the following steps: finding out all N*N grid subregions on a signal source attenuation-delay column diagram; selecting out the subregion of which the value of the central point is maximum in all subregions as the peak value subregion; respectively calculating the average value of the three-dimensional coordinates of all data points in the selected peak value subregion, sequentially solving the distances from each data point to the average value point and calculating variances; sequencing all variances, and extracting the first P larger variances; and transferring the horizontal coordinates and vertical coordinates corresponding to the P peak values to an attenuation-delay array, extracting the peak value by binary time-frequency masks, separating signal sources on the time-frequency domain, and transforming to time domain to obtain the final separation source signals. The method is applicable to general peak value detection, in particular applicable to the peak value detection of the DUET blind source separation method.
Owner:HARBIN INST OF TECH

Voice enhancement method based on DNN-CLSTM network

PendingCN112735456ASpeech signal is stableImprove fidelitySpeech analysisShort-term memoryNoise
The invention relates to a speech enhancement method based on a deep neural network and a residual long-short term memory (DNN-NCLSTM) network. According to the method, voice amplitude characteristics obtained through spectral subtraction and voice Mel-frequency cepstrum coefficient (MFCC) characteristics obtained through fast Fourier transform are input into a DNN-CLSTM network model, and the purpose of voice enhancement is achieved. The method comprises the following steps: firstly, carrying out time-frequency masking and windowing framing processing on noisy speech, obtaining the amplitude and phase characteristics of the noisy speech by utilizing fast Fourier transform, and estimating the noise amplitude of the noisy speech; secondly, subtracting the estimated noise signal amplitude from the noise-containing voice amplitude to obtain a voice signal amplitude after spectral subtraction, and taking the voice signal amplitude as a first feature of neural network input; then, performing fast Fourier transform (FFT) on the noise-containing voice, and solving spectral line energy of the voice signal to obtain an MFCC feature of the noise-containing voice as a second feature of the voice signal; inputting the two features into the DNN-CLSTM network for training to obtain a network model, and evaluating the effectiveness of the model by adopting a minimum mean square error (MMSE) loss function evaluation index; and finally, inputting the actual noise-containing voice set into a trained voice enhancement network model, predicting an estimated amplitude and MFCC after enhancement, and obtaining a final enhanced voice signal by adopting inverse Fourier transform. The method has high fidelity of voice.
Owner:XIAN UNIV OF POSTS & TELECOMM

Multi-mode voice separation method and system

The invention provides a multi-mode voice separation method and system. The method comprises the following steps: receiving a mixed voice of a to-be-recognized object and facial visual information of the to-be-recognized object; performing face detection by using a Dlib library to obtain the number of speakers; and processing information to obtain a multilingual spectrogram and face images of the speakers, transmitting the multilingual spectrogram and the face images of the speakers to a multi-modal voice separation model, and dynamically adjusting the structure of the model according to the number of the speakers, wherein in the training process of the multi-modal voice separation model, complex field ideal ratio masking is used as a training target; defining a ratio between a clean sound spectrogram and a mixed sound spectrogram in a complex field, consisting of a real part and an imaginary part, and containing amplitude and phase information of a sound; enabling the multi-modal voice separation model to output time-frequency masks corresponding to the number of faces; and carrying out complex number multiplication on the output masking and the spectrogram of the mixed sound to obtain a spectrogram of the clean sound, and carrying out short-time inverse Fourier transform calculation on the spectrogram of the clean sound to obtain a time domain signal of the clean sound, thereby completing voice separation. The model is more suitable for most application scenes.
Owner:SHANDONG UNIV

Beam forming method and system based on time-frequency masking value estimation

The invention belongs to the technical field of speech enhancement, and particularly relates to a beam forming method and system based on time-frequency masking value estimation, and the method comprises the steps: obtaining a multi-channel speech sequence, and extracting amplitude spectrum features and spatial domain features through Fourier transform; carrying out logarithm transformation on the amplitude spectrum characteristics to obtain a multi-channel voice frequency spectrum characteristic sequence, and sending the multi-channel voice frequency spectrum characteristic sequence to a pre-trained and optimized neural network model to obtain a complex value time-frequency masking value; converting the complex value time-frequency masking value into a voice existence probability, and obtaining a time-frequency masking value by utilizing a probability model; calculating a voice signal covariance matrix according to the time-frequency masking value and the multi-channel voice feature sequence, and performing eigenvalue decomposition on the covariance matrix to obtain a beamforming filter coefficient; and in combination with the beamforming filter coefficient, performing filtering processing on the multi-channel voice sequence voice features by using a beamforming filter to obtain an enhanced voice signal. According to the invention, the neural network and spatial clustering are integrated to estimate the time-frequency masking value, and the performance of beam forming and speech recognition is improved.
Owner:PLA STRATEGIC SUPPORT FORCE INFORMATION ENG UNIV PLA SSF IEU +1

The invention discloses a fFeature extraction method for blind source separation

The invention belongs to the technical field of communication, and particularly relates to a feature extraction method for blind source separation. The method provided by the invention is mainly characterized by comprising the following steps of: preparing materials; p; preprocessing the mixed blind source signal; O; obtaining a time-frequency map, inputting the data as training data into a neuralnetwork; D; deep learning method, fitting a neural network objective function with the following characteristics; i; in the process of minimizing the objective function; T; target function convergence, t, the Euclidean distance sum of the time frequency points of the same source signal reaches the minimum; w; when the Euclidean distance sum of different time-frequency points reaches the maximum,inputting the mixed blind source signals into the trained neural network, clustering the signals of different sources according to the output of the neural network, constructing a time-frequency masking matrix by utilizing the feature set, calculating the frequency spectrum, and obtaining separated time-domain signals. The beneficial effects of the invention are that the method can achieve the separation of mixed signals of a plurality of unknown source signals.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products