Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A speech recognition method, device, terminal and system

A speech recognition and speech technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problem of low recognition accuracy

Active Publication Date: 2021-05-04
ALIBABA GRP HLDG LTD
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The embodiment of the present application provides a speech recognition method, device, terminal and system, which are used to solve the problem of low recognition accuracy when the speech recognition method in the prior art is applied to a speech recognition scene where multiple languages ​​are mixed.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A speech recognition method, device, terminal and system
  • A speech recognition method, device, terminal and system
  • A speech recognition method, device, terminal and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] In the first embodiment of the present application, the whole process of creating WFST (Weighted Finite-State Transducers, weighted finite state transducers) according to the embodiment of the present application will be described.

[0031] Such as figure 1 As shown, creating a WFST according to the embodiment of this application includes the following steps:

[0032] S101, creating an acoustic model.

[0033] The acoustic model is one of the important components of the speech recognition model, which can be used to describe the correspondence between speech features and phoneme states, and is generally modeled and represented by a statistical model. The language model is one of the important components of the speech recognition model, which can be used to describe the probabilistic connection relationship between words.

[0034] During specific implementation, the acoustic model can be created in the following manner: determine each phoneme of the first language and ...

Embodiment 2

[0093] Figure 5 shows the flow of the speech recognition method according to Embodiment 2 of the present application. Such as Figure 5 As shown, the speech recognition method according to Embodiment 2 of the present application includes the following steps:

[0094] S501. Receive speech to be recognized.

[0095] During specific implementation, before step S501, a step of prompting the user to input voice may also be included. Specifically, a voice input sign can be displayed to prompt the user to input a voice. The voice input sign can be, for example, a microphone icon, a sound wave icon, etc., or it can be, for example, "Please input voice", "Please speak loudly your favorite baby." " etc., this application does not limit.

[0096] Specifically, the voice input logo can be displayed at a specific position of the input box, such as the front, back, middle, and bottom of the input box, or at a specific position of the input screen, such as in the middle of the screen, e...

Embodiment 3

[0124] Figure 7 A schematic structural diagram of a speech recognition device according to Embodiment 3 of the present application is shown. Such as Figure 7 As shown, the speech recognition device 700 shown in Embodiment 3 of the present application includes: a receiving module 701 for receiving speech to be recognized; a feature extraction module 702 for performing feature extraction on the speech to be recognized to obtain feature information; Module 703, used to input the feature information into the weighted finite state converter WFST for recognition, wherein the WFST is obtained by combining the pre-created acoustic model, pronunciation dictionary, and language model, and each phoneme of the first language in the acoustic model There is a corresponding relationship with the second language phoneme, and each first language word in the pronunciation dictionary is phonetically annotated by the second language phoneme.

[0125] In a specific implementation, the speech r...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

An embodiment of the present application provides a speech recognition method, device, terminal and system, the method comprising: receiving the speech to be recognized; performing feature extraction on the speech to be recognized to obtain feature information; inputting the feature information into a weighted finite state converter WFST for recognition, wherein the WFST is obtained by combining a pre-created acoustic model, a pronunciation dictionary, and a language model, each first language phoneme in the acoustic model has a corresponding relationship with a second language phoneme, and each first language phoneme in the pronunciation dictionary Language words are phonetically annotated by the phonemes of the second language. By adopting the solution in this application, the accuracy rate of speech recognition can be improved.

Description

technical field [0001] The present application relates to speech recognition technology, in particular to a speech recognition method, device, terminal and system. Background technique [0002] Speech recognition refers to a technology that recognizes the corresponding text content from speech waveforms, and is one of the important technologies in the field of artificial intelligence. [0003] Current speech recognition methods generally include three parts: an acoustic model, a pronunciation dictionary, and a language model. The acoustic model is trained through a deep neural network, the language model is generally a statistical language model, and the pronunciation dictionary records the correspondence between words and phonemes, which is the link between the acoustic model and the language model. [0004] For the mixed speech of multiple languages, the speech recognition method in the prior art directly inputs the phonemes of multiple languages ​​into the deep neural ne...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L15/02G10L15/08G10L15/10G10L15/14G10L15/183
CPCG10L15/02G10L15/08G10L15/10G10L15/14G10L15/183
Inventor 李宏言李晓辉
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products