Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

2-D processing of speech

Inactive Publication Date: 2007-04-10
MASSACHUSETTS INST OF TECH
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0005]A method of processing an acoustic signal is provided that prepares a frequency-related representation of the acoustic signal over time (e.g., spectrogram, wavelet transform or auditory transform) and computes a two dimensional transform, such as a 2-D Fourier transform, of the frequency-related representation to provide a compressed frequency-related representation. The compressed frequency-related representation is then processed. The acoustic signal can be a speech signal and the processing may determine a pitch of the speech signal. The pitch of the speech signal can be determined from computing the inverse of a distance between a peak of impulses and an origin. Windowing (e.g., Hamming windows) of the spectrogram can be used to further improve the calculation of the pitch estimate; likewise a multiband analysis is performed for further improvement.
[0006]Processing of the compressed frequency-related representation may filter noise from the acoustic signal. Processing of the compressed frequency-related representation may distinguish plural sources (e.g., separate speakers) within the acoustic signal by filtering the compressed frequency-related representation and performing an inverse transform.
[0007]An embodiment of the present invention produces pitch estimation on par with conventional sinewave-based pitch estimation techniques and performs better than conventional sinewave-based pitch estimation techniques in noisy environments. This embodiment of the present invention for pitch estimation also performs well with high pitch (e.g., women's) speech.

Problems solved by technology

Conventional pitch estimation techniques often suffer when presented with noisy environments or high pitch (e.g., women's) speech.
Processing of the compressed frequency-related representation may filter noise from the acoustic signal.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025]A description of preferred embodiments of the invention follows.

[0026]Human speech produces a vibration of air that creates a complex sound wave signal comprised of a fundamental frequency and harmonics. The signal can be processed over successive time segments using a frequency transform (e.g., Fourier transform) to produce a one-dimensional (1-D) representation of the signal in a frequency / magnitude plane. Concentrations of magnitudes can be compressed and the signal can then be represented in a time / frequency plane (e.g., a spectrogram).

[0027]Two-dimensional (2-D) processing of the one-dimensional (1-D) speech signal in the time-frequency plane is used to estimate pitch and provide a basis for noise filtering and speaker separation in voiced speech. Patterns in a 2-D spatial domain map to dots (concentrated entities) in a 2-D spatial frequency domain (“compressed frequency-related representation”) through the use of a 2-D Fourier transform. Analysis of the “compressed frequ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Acoustic signals are analyzed by two-dimensional (2-D) processing of the one-dimensional (1-D) speech signal in the time-frequency plane. The short-space 2-D Fourier transform of a frequency-related representation (e.g., spectrogram) of the signal is obtained. The 2-D transformation maps harmonically-related signal components to a concentrated entity in the new 2-D plane (compressed frequency-related representation). The series of operations to produce the compressed frequency-related representation is referred to as the “grating compression transform” (GCT), consistent with sine-wave grating patterns in the frequency-related representation reduced to smeared impulses. The GCT provides for speech pitch estimation. The operations may, for example, determine pitch estimates of voiced speech or provide noise filtering or speaker separation in a multiple speaker acoustic signal.

Description

RELATED APPLICATION(S)[0001]This application claims the benefit of U.S. Provisional Application titled “2-D PROCESSING OF SPEECH” by Thomas F. Quatieri, Jr., Ser. No. 60 / 409,095, filed Sep. 6, 2002. The entire teaching of the above application is incorporated herein by reference.GOVERNMENT SUPPORT[0002]The invention was supported, in whole or in part, by the United States Government's Technical Support Working Group under Air Force Contract No. F19628-00-C-0002. The Government has certain rights in the invention.BACKGROUND OF THE INVENTION[0003]Conventional processing of acoustic signals (e.g., speech) analyzes a one dimensional frequency signal in a frequency-time domain. Sinewave-base techniques (e.g., the sine-wave-based pitch estimator described in R. J. McAulay and T. F. Quatieri, “Pitch estimation and voicing detection based on a sinusoidal model,” Proc. lnt. Conf. on Acoustics, Speech, and Signal Processing, Albuquerque, N.Mex., pp. 249–252, 1990) have been used to estimate t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L11/04G10L21/00
Inventor QUATIERI, JR., THOMAS F.
Owner MASSACHUSETTS INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products