Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Voice recognition method, device and system and terminal

A speech recognition and speech technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of low recognition accuracy and achieve high recognition accuracy

Active Publication Date: 2017-09-22
ALIBABA GRP HLDG LTD
View PDF10 Cites 71 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The embodiment of the present application provides a speech recognition method, device, terminal and system, which are used to solve the problem of low recognition accuracy when the speech recognition method in the prior art is applied to a speech recognition scene where multiple languages ​​are mixed.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice recognition method, device and system and terminal
  • Voice recognition method, device and system and terminal
  • Voice recognition method, device and system and terminal

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] In Embodiment 1 of the present application, the whole process of creating WFST (WeightedFinite-State Transducers, weighted finite state transducers) according to the embodiment of the present application will be described.

[0031] like figure 1 As shown, creating a WFST according to the embodiment of this application includes the following steps:

[0032] S101, creating an acoustic model.

[0033] The acoustic model is one of the important components of the speech recognition model, which can be used to describe the correspondence between speech features and phoneme states, and is generally modeled and represented by a statistical model. The language model is one of the important components of the speech recognition model, which can be used to describe the probabilistic connection relationship between words.

[0034] During specific implementation, the acoustic model can be created in the following manner: determine each phoneme of the first language and the second l...

Embodiment 2

[0093] Figure 5 shows the flow of the speech recognition method according to Embodiment 2 of the present application. like Figure 5 As shown, the speech recognition method according to Embodiment 2 of the present application includes the following steps:

[0094] S501. Receive speech to be recognized.

[0095] During specific implementation, before step S501, a step of prompting the user to input voice may also be included. Specifically, a voice input sign can be displayed to prompt the user to input a voice. The voice input sign can be, for example, a microphone icon, a sound wave icon, etc., or it can be, for example, "Please input voice", "Please speak loudly your favorite baby." " etc., this application does not limit.

[0096] Specifically, the voice input logo can be displayed at a specific position of the input box, such as the front, back, middle, and bottom of the input box, or at a specific position of the input screen, such as in the middle of the screen, etc....

Embodiment 3

[0124] Figure 7 A schematic structural diagram of a speech recognition device according to Embodiment 3 of the present application is shown. like Figure 7 As shown, the speech recognition device 700 shown in Embodiment 3 of the present application includes: a receiving module 701 for receiving speech to be recognized; a feature extraction module 702 for performing feature extraction on the speech to be recognized to obtain feature information; Module 703, used to input the feature information into the weighted finite state converter WFST for recognition, wherein the WFST is obtained by combining the pre-created acoustic model, pronunciation dictionary, and language model, and each phoneme of the first language in the acoustic model There is a corresponding relationship with the second language phoneme, and each first language word in the pronunciation dictionary is phonetically annotated by the second language phoneme.

[0125] In a specific implementation, the speech reco...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a voice recognition method, device and system and a terminal. The method comprises the steps that voice to be recognized is received; feature extraction is performed on the voice to be recognized so as to obtain feature information; and the feature information is inputted to a weighted finite state transducer WFST to be recognized, the WFST is obtained by the pre-created combination of an acoustic model, a pronunciation dictionary and a language model, all the first language phonemes and second language phonemes in the acoustic model have the corresponding relationship, and phonetic notation of all the first language vocabularies in the pronunciation dictionary is performed by the second language phonemes. With application of the scheme, the voice recognition accuracy can be enhanced.

Description

technical field [0001] The present application relates to speech recognition technology, in particular to a speech recognition method, device, terminal and system. Background technique [0002] Speech recognition refers to a technology that recognizes the corresponding text content from speech waveforms, and is one of the important technologies in the field of artificial intelligence. [0003] Current speech recognition methods generally include three parts: an acoustic model, a pronunciation dictionary, and a language model. The acoustic model is trained through a deep neural network, the language model is generally a statistical language model, and the pronunciation dictionary records the correspondence between words and phonemes, which is the link between the acoustic model and the language model. [0004] For the mixed speech of multiple languages, the speech recognition method in the prior art directly inputs the phonemes of multiple languages ​​into the deep neural ne...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/02G10L15/08G10L15/10G10L15/14G10L15/183
CPCG10L15/02G10L15/08G10L15/10G10L15/14G10L15/183
Inventor 李宏言李晓辉
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products