Unlock instant, AI-driven research and patent intelligence for your innovation.

Method for outputting phoneme probability by speech deep neural network model

A deep neural network and model output technology, applied in the field of computing, can solve the problem of low reliability of DNN

Active Publication Date: 2021-04-16
HANGZHOU NATCHIP SCI & TECH
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to provide a method for outputting the phoneme probability of a voice deep neural network model in view of the disadvantage that the reliability of the probability of the output phoneme of the DNN is not high in the actual use scene of the existing low signal-to-noise ratio

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0011] The present invention will be further described below in conjunction with the examples. The following examples are only specific examples of the present invention, but the design concept of the present invention is not limited thereto, and any insubstantial changes made to the present invention by using this concept should fall within the scope of protection of the present invention.

[0012] A deep neural network (DNN) outputs the probability of each original phoneme once per time interval. A phoneme is the smallest unit of pronunciation, and the pronunciation of each character is composed of multiple phonemes.

[0013] The method for outputting the phoneme probability of the voice deep neural network model first adds a confidence information Z to each original phoneme i , the confidence information is a number from 0 to 1, K is a phoneme category parameter, if the original phoneme is a vowel, then K=1, if the original phoneme is a consonant, then K=0, when the phon...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for outputting a phoneme probability by a speech deep neural network model. In an existing low-signal-to-noise-ratio actual use scene, the credibility of the probability of outputting phonemes of a voice deep neural network model is not high. The method comprises the following steps: firstly, adding confidence coefficient information to each original phoneme according to phoneme categories, and then comparing the confidence coefficient information with a set threshold value: when the confidence coefficient information is greater than or equal to the threshold value, keeping the probability corresponding to the original phoneme unchanged, and when the confidence coefficient information is lessthan the threshold value, adding a correction value on the basis of the probability of the original phoneme before correction to serve as an output phoneme probability; and finally, enabling the deep neural network to output a phoneme probability as a decoding basis. By adopting the method, after phoneme confidence correction, the voice probability of consonants in the phoneme probability correction is improved, the character recognition capability is improved, the phoneme confidence correction maintains the voice probability of vowels, and the occurrence of false activation is reduced.

Description

technical field [0001] The invention belongs to the technical field of computing, in particular to the technical field of speech deep neural network processing, and relates to a method for outputting phoneme probability by a speech deep neural network model. [0002] technical background [0003] Deep Neural Networks (DNNs) have been widely used in speech processing. The input of the speech DNN is the speech feature, and the output of the DNN is the probability of the phoneme. DNN outputs the probability of all phonemes once in each time interval, and the decoding algorithm decodes according to the output phoneme probability. The decoding method is a method of converting the probabilities of phonemes into words. Currently commonly used decoding algorithms are Beamsearch (beam search) and CTC (connection time series classifier) ​​and so on. Beamsearch is a heuristic graph search algorithm. In order to reduce the space and time used for searching, some nodes with lower proba...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/216G06F40/279G06F3/06G06N3/04G06N3/08
CPCY02T10/40
Inventor 梁骏汪文轩王坤鹏陈谢姚欢卢燕
Owner HANGZHOU NATCHIP SCI & TECH