Unlock instant, AI-driven research and patent intelligence for your innovation.

Acoustic model conditioning on sound features

An acoustic model and sound technology, applied in speech analysis, speech recognition, instruments, etc., can solve the problems of accuracy, noise, background sound or music, etc. in speech recognition, so as to improve market competitiveness and overall profitability. Effect

Pending Publication Date: 2021-11-12
SOUNDHOUND INC
View PDF12 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] For speech in unusual accents, unusual speech types, in unusual environmental conditions (such as noise, background sounds, or music), using unusual devices, and in other unusual scenarios, traditional speech recognition suffer from accuracy issues, making them only suitable for narrow uses such as playing music in a quiet home

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Acoustic model conditioning on sound features
  • Acoustic model conditioning on sound features
  • Acoustic model conditioning on sound features

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] Various design choices for relevant aspects of the conditional acoustics model are described below. Design choices for the various aspects are independent of each other and work together in any combination, except where noted.

[0037] acoustic model

[0038] An ASR's acoustic model takes input including a speech audio segment and produces an output of inferred probabilities for one or more phonemes. Some models can infer senone probabilities, which are a type of phoneme probabilities. In some applications, the output of an acoustic model is a SoftMax collection of probabilities over a recognized phoneme or senone.

[0039] Some ASR applications run acoustic models on spectral components computed from frames of audio. The spectral components are, for example, mel-frequency cepstral coefficients (MFCCs) calculated over a window of 25 milliseconds of audio samples. For example, acoustic model inference may repeat at intervals of every 10 milliseconds.

[0040] The spec...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an acoustic model conditioning on sound features. Systems and methods of speech recognition capture segments of speech audio having a key phrase shortly followed by an utterance. An encoder uses the key phrase segment to compute a sound embedding, which is stored. An acoustic model for speech recognition infers phonemes from the utterance audio signal using a model that is conditioned on the sound embedding as an input. The sound embedding may be held until another key phrase is captured or a session ends. The acoustic model and encoder may be jointly trained from speech data recordings that may be mixed with noise, the profile of mixed noise being the same for the key phrase segment and the utterance segment.

Description

[0001] This application claims priority to U.S. Provisional Application No. 62 / 704,202, filed April 27, 2020, entitled "Acoustic Model Conditioning on Sound Features." technical field [0002] This application is in the field of neural networks, in particular in the field of conditioning based on sound embeddings. Background technique [0003] We are at an inflection point in history where natural language voice interfaces are about to take off as a new type of human-machine interface. Their ability to transcribe speech will soon replace keyboards as the fastest and most accurate way to enter text. Their ability to support natural language commands will soon replace mice and touchscreens as the way to manipulate non-text controls. In all directions, they will provide humans with clean, sterile ways to control machines to work, play, educate, relax, and assist with menial tasks. [0004] However, the ability of natural language voice interfaces to provide all of these benef...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/06G10L15/16G10L15/183G10L15/26G10L19/16
CPCG10L15/063G10L15/183G10L15/16G10L19/16G10L15/065G10L2015/088G10L2015/025G10L25/30G10L15/02G10L15/04G10L15/22G10L15/08
Inventor 高孜哲莫轲文
Owner SOUNDHOUND INC