Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

183 results about "Formant" patented technology

In speech science and phonetics, a formant is the spectral shaping that results from an acoustic resonance of the human vocal tract. However, in acoustics, the definition of a formant differs slightly as it is defined as a peak, or local maximum, in the spectrum. For harmonic sounds, with this definition, it is therefore the harmonic partial that is augmented by a resonance. The difference between these two definitions resides in whether "formants" characterise the production mechanisms of a sound or the produced sound itself. In practice, the frequency of a spectral peak can differ from the associated resonance frequency when, for instance, harmonics are not aligned with the resonance frequency. In most cases, this subtle difference is irrelevant and, in phonetics, formant can mean either a resonance or the spectral maximum that the resonance produces. Formants are often measured as amplitude peaks in the frequency spectrum of the sound, using a spectrogram (in the figure) or a spectrum analyzer and, in the case of the voice, this gives an estimate of the vocal tract resonances. In vowels spoken with a high fundamental frequency, as in a female or child voice, however, the frequency of the resonance may lie between the widely spaced harmonics and hence no corresponding peak is visible.

Method for improving speaker identification by determining usable speech

InactiveUS7177808B2Overcome limitationsEnhance a target speakerSpeech recognitionDependabilityFormant
Method for improving speaker identification by determining usable speech. Degraded speech is preprocessed in a speaker identification (SID) process to produce SID usable and SID unusable segments. Features are extracted and analyzed so as to produce a matrix of optimum classifiers for the detection of SID usable and SID unusable speech segments. Optimum classifiers possess a minimum distance from a speaker model. A decision tree based upon fixed thresholds indicates the presence of a speech feature in a given speech segment. Following preprocessing, degraded speech is measured in one or more time, frequency, cepstral or SID usable / unusable domains. The results of the measurements are multiplied by a weighting factor whose value is proportional to the reliability of the corresponding time, frequency, or cepstral measurements performed. The measurements are fused as information, and usable speech segments are extracted for further processing. Such further processing of co-channel speech may include speaker identification where a segment-by-segment decision is made on each usable speech segment to determine whether they correspond to speaker #1 or speaker #2. Further processing of co-channel speech may also include constructing the complete utterance of speaker #1 or speaker #2. Speech features such as pitch and formants may be extended back into the unusable segments to form a complete utterance from each speaker.
Owner:THE UNITED STATES OF AMERICA AS REPRESETNED BY THE SEC OF THE AIR FORCE

Method for improving speaker identification by determining usable speech

InactiveUS20050027528A1Enhance a target speakerPrevent degradationSpeech recognitionFormantDependability
Method for improving speaker identification by determining usable speech. Degraded speech is preprocessed in a speaker identification (SID) process to produce SID usable and SID unusable segments. Features are extracted and analyzed so as to produce a matrix of optimum classifiers for the detection of SID usable and SID unusable speech segments. Optimum classifiers possess a minimum distance from a speaker model. A decision tree based upon fixed thresholds indicates the presence of a speech feature in a given speech segment. Following preprocessing, degraded speech is measured in one or more time, frequency, cepstral or SID usable/unusable domains. The results of the measurements are multiplied by a weighting factor whose value is proportional to the reliability of the corresponding time, frequency, or cepstral measurements performed. The measurements are fused as information, and usable speech segments are extracted for further processing. Such further processing of co-channel speech may include speaker identification where a segment-by-segment decision is made on each usable speech segment to determine whether they correspond to speaker #1 or speaker #2. Further processing of co-channel speech may also include constructing the complete utterance of speaker #1 or speaker #2. Speech features such as pitch and formants may be extended back into the unusable segments to form a complete utterance from each speaker.
Owner:THE UNITED STATES OF AMERICA AS REPRESETNED BY THE SEC OF THE AIR FORCE

Method for determining system time delay in acoustic echo cancellation and acoustic echo cancellation method

The invention relates to a method for determining system time delay in acoustic echo cancellation and an acoustic echo cancellation method employing the method for determining system time delay in acoustic echo cancellation. The method for determining system time delay in acoustic echo cancellation comprises the following steps: overlapping, segmenting and windowing collected original signals and reference signals respectively, and transforming the collected original signals and reference signals respectively into frequency-domain signals through quick Fourier transformation to obtain original frequency-domain signals and reference frequency-domain signals; finding out frequency values corresponding to n peak values with the highest energy in the segmented original frequency-domain signals and reference frequency-domain signals, wherein the frequency values are formant characteristic values; next, sequentially moving the formant frequency sequences of the reference frequency-domain signals forwards by an integral multiple of segmentation time t1, and correspondingly comparing two formant characteristic values respectively, wherein the time delay of forward moving with the most similarities is the system time delay in acoustic echo cancellation. Compared with the prior art, the method can determine dynamic and extra-large system time delay only with a very small amount of calculation, and has a wide application range, a small amount of calculation and a stable effect.
Owner:宁波菊风系统软件有限公司

Resonance peak automatic matching method for voiceprint identification

The invention provides a resonance peak automatic matching method for voiceprint identification. The method comprises the following steps that: phoneme boundary positions in an inspection material and a sample in the voiceprint identification can be automatically marked through using continuous speech recognition-based forced alignment (FA) technology; as for identical vowel phoneme segments of the inspection material and the sample, whether a current phoneme is a valid analysable phoneme is automatically judged through using fundamental frequencies, resonance peaks and power spectrum density parameters; and deviation ratios of corresponding resonance peak time-frequency areas can be automatically rendered through using a dynamic time warping (DTW) algorithm and are adopted as analysis basis of final manual voiceprint identification. With the resonance peak automatic matching method for the voiceprint identification of the invention adopted, the boundaries of phonemes can be automatically marked, and whether the pronunciation of the phonemes is valid is judged, and therefore, processing efficiency can be greatly improved; and at the same time, an automatic resonance peak deviation alignment algorithm is performed on effective phoneme pairs, and therefore, the accuracy of resonance peak alignment can be improved.
Owner:ANHUI IFLYTEK INTELLIGENT SYST

Method for determining alcohol consumption, and recording medium and terminal for carrying out same

Disclosed are a method for determining whether a person is drunk after consuming alcohol on the basis of a difference among a plurality of formant energy energies, which are generated by applying linear predictive coding according to a plurality of linear prediction orders, and a recording medium and a terminal for carrying out the method. The alcohol consumption determining terminal comprises: a voice input unit for receiving voice signals and converting same into voice frames and outputting the voice frames; a voiced/unvoiced sound analysis unit for extracting voice frames corresponding to a voiced sound from among the voice frames; an LPC processing unit for calculating a plurality of formant energy energies by applying linear predictive cording according to the plurality of linear prediction orders to the voice frames corresponding to the voiced sound; and an alcohol consumption determining unit for determining whether a person is drunk after consuming alcohol on the basis of a difference among the plurality of formant energy energies which have been calculated by the LPC processing unit, thereby determining whether a person is drunk after consuming alcohol depending on a change in the formant energy energies generated by applying linear predictive coding according to the plurality of linear prediction orders to voice signals.
Owner:FOUND OF SOONGSIL UNIV IND COOP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products