Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and apparatus for generating acoustic models for speaker independent speech recognition of foreign words uttered by non-native speakers

a non-native speaker and speaker technology, applied in the field of speech recognition apparatus and methods, can solve the problems of low recognition performance even if uttered by fl speakers, less accurate modeling of words, and high cost of recording such a speech database for every language one wants to suppor

Inactive Publication Date: 2005-09-08
OPTIS WIRELESS TECH LLC
View PDF4 Cites 50 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0011] Using this mapping a new training can be started where the phonetic lexicon contains now native language words and foreign language words described by NL phonemes. Now, both training materials foreign language utterances and native language utterances can be used resulting in robust acoustic models covering foreign words.

Problems solved by technology

The recording of such a speech database for every language one wants to support is very costly and time consuming.
The phonetic realization, i.e., the pronunciation of foreign words uttered by a non-native speaker is the main problem of a multi-language approach to obtain good recognition results for these foreign words.
Moreover, NL acoustic models for FL words are inaccurate and result in low recognition performance even if uttered by FL speakers.
Additionally, even when the foreign word can be transcribed within the NL phoneme inventory problems may arise from the “phono-tactics” of FL (foreign language) words, which do not correspond to the phono-tactics of NL words.
In particular when context dependent acoustic models (like triphones) are utilized for the recognition, different phono-tactics may result in missing triphones and thus in a less accurate modeling of words and a reduced recognition performance.
Basically, even applying a multi-language recognition engine that supports several (native) languages to recognize application words spoken by non-native speakers will not yield the best results.
This is due to the fact that the non-native speaker will color the foreign words with the speaker's mother tongue and the description of the foreign words with the FL phoneme inventory and FL acoustic models is usually not accurate enough and will not necessarily give best recognition results.
The best solutions—training of these FL words with non-native speech of many speakers—is usually not feasible due to the very limited availability of large, appropriate training databases, i.e., speech recordings of non-native pronunciations (i.e. from many NL speakers) of FL words from the specific target language(s).
Typically, the speaker is not at all familiar with the FL inventory and describing newly added foreign words with NL phonemes is a serious undertaking.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and apparatus for generating acoustic models for speaker independent speech recognition of foreign words uttered by non-native speakers
  • Method and apparatus for generating acoustic models for speaker independent speech recognition of foreign words uttered by non-native speakers
  • Method and apparatus for generating acoustic models for speaker independent speech recognition of foreign words uttered by non-native speakers

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017]FIGS. 1 and 2, discussed below, and the various embodiments used to describe the principles of the present invention in this patent document are by way of illustration only and should not be construed in any way to limit the scope of the invention. Those skilled in the art will understand that the principles of the present invention may be implemented in any suitably arranged speech recognition system.

[0018]FIG. 1 depicts a high-level block diagram of a phoneme-to-phoneme mapping system in accordance with a preferred embodiment of the present invention. Database 130 (FL training material) contains recorded utterances, pursuant transliterations and a phonetic lexicon for a foreign language. Database (FL) 100 contains trained acoustic models of this same foreign language. These models can be used to derive a time alignment between FL phonemes and analysis frames for the utterances from database 130. Usually such a time alignment is the byproduct of a phoneme-based training algo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Acoustic models for speech recognition are automatically generated utilizing trained acoustic models from a native language and a foreign language. A phoneme-to-phoneme mapping is utilized to enable the description of foreign language words with native language phonemes. The phoneme-to-phoneme mapping is used for training foreign language words, described by native language phonemes on foreign language speech material. A new phonetic lexicon is created containing foreign language words and native language words transcribed by native language phonemes. Robust native language acoustic models can be derived utilizing foreign language and native language training material. The mapping may be used for training a grapheme to phoneme transducer (i.e., foreign language to native language) to generate native language pronunciations for new foreign language words.

Description

TECHNICAL FIELD OF THE INVENTION [0001] The present invention is directed, in general, to a speech recognition method and apparatus, and more particularly, to speech recognition apparatus and method for recognizing speech uttered by a non-native speaker. BACKGROUND OF THE INVENTION [0002] Speech-enabled applications using speaker-independent speech recognition technology are characterized by a vocabulary utilizing a language dependent phonetic description. And typically, the vocabulary uses language specific acoustic models for such phonetic symbols. The applications therefore utilize a native language phonetic inventory or a foreign language phonetic inventory to recognize and transcribe the vocabulary to be recognized. [0003] Current speech recognition systems support only individual languages. If words of another language need to be recognized, acoustic models associated with that language must be used. An acoustic model is a set of acoustic parameters generated during training, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/02G10L15/06G10L15/18
CPCG10L15/187G10L15/063G10L2015/025
Inventor REINHARD, KLAUSJUNKAWITSCH, JOCHENKIESSLING, ANDREASKLISCH, RAINER
Owner OPTIS WIRELESS TECH LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products