Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Chinese-English mixed speech recognition method and device

A technology of mixed speech and recognition methods, applied in speech recognition, speech analysis, instruments, etc., can solve problems such as poor speech recognition performance and large network parameters, and achieve the effect of reducing the number of classifications, reducing the amount of parameters, and improving recognition performance

Active Publication Date: 2022-02-15
SHENZHEN UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The main purpose of the present invention is to propose a Chinese-English mixed speech recognition method and device to solve the problems of large network parameters and poor speech recognition performance of the method model for realizing the mixed language acoustic model in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Chinese-English mixed speech recognition method and device
  • A Chinese-English mixed speech recognition method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0047] Such as figure 1 As shown, the embodiment of the present invention provides a Chinese-English mixed speech recognition method, including but not limited to the following steps:

[0048] S101. Acquire voice training samples.

[0049] In the above step S101, the speech training samples are sampled from Chinese and English corpora. Chinese and English corpora include Chinese corpus, English corpus, and Chinese-English mixed corpus.

[0050] In the embodiment of the present invention, the Chinese and English corpus can be used as a data set, wherein the speech training samples are used as a training set or a verification set drawn proportionally from the data set to estimate the model, determine the model network structure, and determine the model parameters.

[0051]In practical applications, the test set can also be extracted from the data set to simulate the robustness of the network model constructed from the training set or verification set in general application sce...

Embodiment 2

[0081] An embodiment of the present invention provides a Chinese-English mixed speech recognition device 20, including:

[0082] Voice sample acquisition module 21, used to obtain voice training samples, the voice training samples are sampled in Chinese and English corpora;

[0083] Chinese and English corpora include Chinese corpus, English corpus, Chinese and English mixed corpus;

[0084] The model training module 22 is used to train the LSTM-CTC end-to-end network by the voice training sample, and modify the softmax layer of the LSTM-CTC end-to-end network, so that the characters output by the softmax layer are Unicode encoding;

[0085] Speech recognition network model acquisition module 23, for obtaining the speech recognition network model according to the character of softmax layer output;

[0086] Speech recognition module 24, for inputting the voice to be recognized into the speech recognition network model, and processing the output of the speech recognition networ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention is applicable to the technical field of speech recognition, and provides a Chinese-English mixed speech recognition method and device. The method includes: obtaining speech training samples, and the speech training samples are sampled in Chinese and English corpora, and the Chinese and English corpora include Chinese corpus, English corpus, Chinese corpus, and Chinese corpus. English mixed corpus; train the LSTM-CTC end-to-end network through speech training samples, and modify the softmax layer of the LSTM-CTC end-to-end network so that the characters output by the softmax layer are Unicode encoded; according to the characters output by the softmax layer Obtain a speech recognition network model; input the speech to be recognized into the speech recognition network model, and process the output of the speech recognition network model through the RNN‑LM language model to obtain a speech recognition result based on the speech to be recognized; wherein, the RNN‑LM language model Obtained from text training of speech training samples. The present invention can effectively improve the decoding efficiency of CTC in the process of establishing a speech recognition network model based on the LSTM-CTC end-to-end network, and improve the recognition performance.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to a Chinese-English mixed speech recognition method and device. Background technique [0002] With the globalization of life, the phenomenon of using mixed languages ​​to communicate has become a common phenomenon. Statistically, there are more people who speak multiple languages ​​than monolingual speakers. Acoustics between mixed languages ​​and complexities between languages ​​pose challenges for speech recognition. Therefore, the study of mixed language acoustic models is an important research direction. [0003] At present, LSTM (Long Short-Term Memory, long-term short-term memory network) is a time cyclic neural network, and CTC (Connectionist Temporal Classification, connectionist temporal classification) algorithm marginalizes and condenses all possible frame-by-frame output symbol sequences, and A good recognition rate has been achieved on the TIMIT dataset, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L15/06G10L15/16
CPCG10L15/063G10L15/16
Inventor 郑能恒容韦聪史裕鹏
Owner SHENZHEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products