Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

132 results about "Mel-frequency cepstrum" patented technology

In sound processing, the mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.

Voiceprint identification method based on Gauss mixing model and system thereof

The invention provides a voiceprint identification method based on a Gauss mixing model and a system thereof. The method comprises the following steps: voice signal acquisition; voice signal pretreatment; voice signal characteristic parameter extraction: employing a Mel Frequency Cepstrum Coefficient (MFCC), wherein an order number of the MFCC usually is 12-16; model training: employing an EM algorithm to train a Gauss mixing model (GMM) for a voice signal characteristic parameter of a speaker, wherein a k-means algorithm is selected as a parameter initialization method of the model; voiceprint identification: comparing a collected voice signal characteristic parameter to be identified with an established speaker voice model, carrying out determination according to a maximum posterior probability method, and if a corresponding speaker model enables a speaker voice characteristic vector X to be identified to has maximum posterior probability, identifying the speaker. According to the method, the Gauss mixing model based on probability statistics is employed, characteristic distribution of the speaker in characteristic space can be reflected well, a probability density function is common, a parameter in the model is easy to estimate and train, and the method has good identification performance and anti-noise capability.
Owner:LIAONING UNIVERSITY OF TECHNOLOGY

Environment noise identification classification method based on convolutional neural network

InactiveCN109767785AUniversalSolve problems that are easy to fall into the optimal solutionSpeech analysisMel-frequency cepstrumEnvironmental noise
The invention relates to an environment noise identification classification method based on a convolutional neural network. The method comprises the following steps of: S1, extracting natural environment noise, and editing the natural environment noise into noise segments with duration of 300ms to 30s and a converted frequency of 44.1kHz; S2, carrying out short time Fourier transformation on the noise segments, and converting a one-dimensional time-domain signal into a two-dimensional time-domain signal to obtain a sonagraph; S3, extracting a MFCC (Mel Frequency Cepstrum Coefficient) of the signal; S4, forming a training set with 80% of all the noise segments and forming a testing set with the residual 20% of all the noise segments; S5, carrying out noise classification by a convolutionalneural network model; and S6, training a classification model by the training set, and verifying accuracy of the model by the testing set so as to complete environment noise identification classification based on the convolutional neural network. According to the invention, the sound segments are input, sound feature information is extracted, an output is a classification result, and automatic extraction on the sound feature information can be implemented.
Owner:HEBEI UNIV OF TECH

Music separation method of MFCC (Mel Frequency Cepstrum Coefficient)-multi-repetition model in combination with HPSS (Harmonic/Percussive Sound Separation)

The invention discloses a music separation method of an MFCC (Mel Frequency Cepstrum Coefficient)-multi-repetition model in combination with an HPSS (High Performance Storage System), and relates to the technical field of signal processing. In consideration of high probability of ignore of a gentle sound source and time-varying change characteristic of music, the sound source type is analyzed through a harmonic/percussive sound separation (HPSS) method to separate out a harmonic source, then MFCC characteristic parameters of the remaining sound sources are extracted, and similar operation is performed on the sound sources to construct a similar matrix so as to establish a multi-repetition structural model of the sound source suitable for tune transformation, so that a mask matrix is obtained, and finally the time domain waveform of a song and background music is obtained through ideal binary mask (IBM) and fourier inversion. According to the method, effective separation can be performed on different types of sound source signals, so the separation precision is improved; meanwhile the method is low in complexity, high in processing speed and higher in stability, and has broad application prospect in the fields such as singer retrieval, song retrieval, melody extraction and voice recognition in a musical instrument background.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Method for identifying optical fibre sensing vibration signal

The invention relates to a method for identifying an optical fibre sensing vibration signal. The method comprises the following specific steps: (1), obtaining the signal so as to obtain a discrete digital signal s(n); (2), windowing and framing to obtain a kth-frame windowed signal sk(n); (3), calculating to obtain a kth-frame energy signal e(k); (4), obtaining an energy signal e'(k) after moving average processing; (5), extracting a disturbance event, comparing the e'(k) with dynamic threshold values Th1 and Th2, intercepting a continuous signal as a disturbance event signal, and determining that no any disturbance event happens if the e'(k) is not beyond the Th1; (6), solving an MFCC (Mel Frequency Cepstrum Coefficient) parameter of the disturbance event, obtaining a feature set of Y type events; and establishing a pattern base of Y type disturbance events; (7), performing SVDD (Support Vector Data Description) training; and (8), matching a feature parameter set of an event to be detected with an SVDD training model in the pattern base, judging which the event to be detected belongs to, or judging that the event to be detected is an unknown event. By means of the method disclosed by the invention, the identification accuracy of the optical fibre sensing signal is improved; false reports are reduced; pattern training of single disturbance event can be completed; and the database establishing complexity is reduced.
Owner:NO 34 RES INST OF CHINA ELECTRONICS TECH GRP +2

Isolated word speech recognition method based on HRSF and improved DTW algorithm

The invention discloses an isolated word speech recognition method based on an HRSF (Half Raised Sine Function) and an improved DTW (Dynamic Time Warping) algorithm. The isolated word speech recognition method comprises the following steps that (1), a received analog voice signal is preprocessed; preprocessing comprises pre-filtering, sampling, quantification, pre-emphasis, windowing, short-time energy analysis, short-time average zero crossing rate analysis and end-point detection; (2), a power spectrum X(n) of a frame signal is obtained by FFT (Fast Fourier Transform) and is converted into a power spectrum under a Mel frequency; an MFCC (Mel Frequency Cepstrum Coefficient) parameter is calculated; the calculated MFCC parameter is subjected to HRSF cepstrum raising after a first order difference and a second order difference are calculated; and (3), the improved DTW algorithm is adopted to match test templates with reference templates; and the reference template with the maximum matching score serves as an identification result. According to the isolated word speech recognition method, the identification of a single Chinese character is achieved through the improved DTW algorithm, and the identification rate and the identification speed of the single Chinese character are increased.
Owner:SOUTH CHINA NORMAL UNIVERSITY +1

Digital stethoscope and method for filtering heart sounds and extracting lung sounds

The invention provides a digital stethoscope and a method for filtering heart sounds and extracting lung sounds. The method for filtering the heart sounds and extracting the lung sounds comprises the steps of: acquiring heart and lung sound signals in scheduled time; processing the obtained signals and obtaining heart and lung sounds-included valid frames according to a discrete entropy value; calculating an average amplitude value of the valid frames, and removing noise frames by using a threshold value to obtain heart sound-included lung sound frames; carrying out wavelet transform on the obtained lung sound frames, filtering a wavelet coefficient by using the threshold value, and filtering the heart sounds to obtain pure lung sound frames; carrying out MFCC (Mel Frequency Cepstrum Coefficient) characteristic parameter extraction on the lung sound frames; and judging the lung sound frames according to an obtained MFCC characteristic parameter matrix, judging whether the lung sound signals are normal or judging that the lungs are most likely to suffer from a certain or multiple respiratory diseases. Through the abovementioned method for filtering the heart sounds and extracting the lung sounds, the lung sound signals can be rapidly extracted from the acquired heart and lung sound signals and are then judged so as to pre-diagnose the respiratory diseases.
Owner:成都济森科技有限公司

Isolated digit speech recognition classification system and method combining principal component analysis (PCA) with restricted Boltzmann machine (RBM)

The invention discloses an isolated digit speech recognition classification system and method combining a principal component analysis (PCA) with a restricted Boltzmann machine (RBM). First of all, a Mel frequency cepstrum coefficient (MFCC) is employed for combination with a one-order difference MFCC, and a voice dynamic characteristic of an isolated digit is preliminarily drawn off; then, linear dimension reduction processing is carried out on an MFCC combination characteristic by use of the PCA, and dimensions of a newly obtained characteristic are unified; accordingly, nonlinear dimension reduction processing is performed on the obtained new characteristic by use of the RBM; and finally, finishing recognition classification on a digit voice characteristic after nonlinear dimension reduction by use of a Softmax classifier. According to the invention, PCA linear dimension reduction, unification of the dimensions of the characteristic and RBM nonlinear dimension reduction are combined together, such that the characteristic representation and classification capabilities of a model are greatly improved, the isolated digit voice recognition correct rate is improved, and an efficient solution is provided for high-accuracy recognition of isolated digit voice.
Owner:CHANGAN UNIV

Method for distinguishing speakers based on protective kernel Fisher distinguishing method

The invention relates to a method for distinguishing speakers based on a protective kernel Fisher distinguishing method. The method comprises steps as follows: (1) pretreating voice signals; (2) extracting characteristic parameters: after framing and end point detection of voice signals, extracting Mel frequency cepstrum coefficients as characteristic vectors of speakers; (3) creating a speaker distinguishing model; (4) calculating model optimal projection vector: by using optimal solution of LWFD method, calculating to obtain an optimal projection vector group; (5) distinguishing speakers: projecting original data xi to yi belonging to R<r>( r is more than or equal to 1 and less than or equal to d) according to optimal projection classification vector phi, wherein r is cut dimensionality;the optimal projection classification dimensionality of original c type data space is c-1, then solving a central value of data of each type after injection and normalizing; after projecting data tobe classified to a sub space and normalizing, calculating Euclidean distance from the normalized protecting data to the central point of each type of data in the sub space, and judging the nearest tobe a distinguishing result. The invention has high distinguishing rate, simple model construction and favorable rapidity.
Owner:ZHEJIANG UNIV OF TECH

Method and device for recognizing short speech speaker

ActiveCN108281146AOvercome the defect of recognition performance degradationImprove recognition accuracySpeech analysisFeature vectorMel-frequency cepstrum
The invention discloses a method and device for recognizing a short speech speaker. The method comprises the following steps: after an input training short speech signal is preprocessed, a Mel frequency Cepstral coefficient is extracted as a training feature vector, and an adaptive kernel likelihood fuzzy C-means clustering algorithm is used for performing clustering analysis to establish a speaker speech reference model; after the input test short speech signal is preprocessed, the Mel frequency cepstral coefficient is extracted as a test feature vector, distance between the test feature vector and the speaker speech reference model is calculated, and the identity of the short speech speaker is identified according to the distance. Via the method and device for recognizing the short speech speaker disclosed in the present embodiment, the Mel frequency Cepstrum coefficient is extracted as a feature, and the feature and the adaptive kernel likelihood fuzzy C-means clustering algorithm are used for performing the clustering analysis to establish the speaker speech reference model; after an execution model is matched, the identity of the short speech speaker is recognized, recognitionaccuracy is improved, and practical application requirements are met.
Owner:GEER TECH CO LTD

Speech emotion recognition method based on fuzzy support vector machine

The invention relates to a speech emotion recognition technology, in particular to a speech emotion recognition method based on a fuzzy support vector machine. The method comprises the steps that input speech signals with emotions are pre-processed, the pre-processing comprises pre-emphasis filtering and windowing framing, the Mel frequency cepstrum coefficient (MFCC) of feature information of the processed speech signals is extracted, dimension reduction processing is carried out on the extracted MFCC by utilizing principal component analysis (KPCA), classification and recognition are carried out according to the MFCC feature information after the dimension reduction is carried out, and recognition results are output. A specific classification and recognition method is carried out by adopting an FSVM algorithm. The speech emotion recognition method based on the fuzzy support vector machine has the advantages that KPCA is adopted for carrying out dimension reduction on MFCC emotion features to reduce redundant information, the recognition effect is better than that with the MFCC features directly used, recognition efficiency of the method is higher, the effect is better, and the recognition speed is higher. The speech emotion recognition method based on the fuzzy support vector machine is especially suitable for intelligent speech emotion recognition.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA

Method for recognizing emotion points of Chinese pronunciation based on sound-track modulating signals MFCC (Mel Frequency Cepstrum Coefficient)

The invention provides a method capable of increasing the average recognition rate of emotion points. The method comprises the following steps of: making specifications of emotion data of an electroglottography and voice database; collecting the emotion data of the electroglottography and voice data; carrying out subjective evaluation on the collected data, and selecting one set of data subset as a study object; carrying out preprocessing on electroglottography signals and voice signals, and extracting short-time characteristics and corresponding statistical characteristics in the voice signals and MEL frequency cepstrum coefficient SMFCC; carrying out fast Fourier transform on the electroglottography signals and the voice signals, then dividing the electroglottography signals and the voice signals, and after dividing, obtaining MEL frequency cepstrum coefficient TMFCC; and respectively using different characteristic combination for carrying out experiment, and solving the average recognition rate of 28 emotion points under different characteristic combinations when a speaker is related and not related. The experimental result shows that by adoption of TMFCC characteristics, the average recognition rate of the emotion points can be increased.
Owner:BEIHANG UNIV

Collaborative filtering-based real-time voice-driven human face and lip synchronous animation system

The invention discloses a collaborative filtering-based real-time voice-driven human face and lip synchronous animation system. By inputting voice in real time, a human head model makes lip animation synchronous with the input voice. The system comprises an audio/video coding module, a collaborative filtering module, and an animation module; the module respectively performs Mel frequency cepstrum parameter coding and human face animation parameter coding in the standard of Moving Picture Experts Group (MPEG-4) on the acquired voice and human face three-dimensional characteristic point motion information to obtain a Mel frequency cepstrum parameter and human face animation parameter multimodal synchronous library; the collaborative filtering module solves a human face animation parameter synchronous with the voice by combining Mel frequency cepstrum parameter coding of the newly input voice and the Mel frequency cepstrum parameter and human face animation parameter multimodal synchronous library through collaborative filtering; and the animation module carries out animation by driving the human face model through the human face animation parameter. The system has the advantages of better sense of reality, real-time and wider application environment.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products