A Chinese-English mixed speech recognition method and device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of mixed speech and recognition methods, applied in speech recognition, speech analysis, instruments, etc., can solve problems such as poor speech recognition performance and large network parameters, and achieve the effect of reducing the number of classifications, reducing the amount of parameters, and improving recognition performance

Active Publication Date: 2022-02-15

SHENZHEN UNIV

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The main purpose of the present invention is to propose a Chinese-English mixed speech recognition method and device to solve the problems of large network parameters and poor speech recognition performance of the method model for realizing the mixed language acoustic model in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0047] Such as figure 1 As shown, the embodiment of the present invention provides a Chinese-English mixed speech recognition method, including but not limited to the following steps:

[0048] S101. Acquire voice training samples.

[0049] In the above step S101, the speech training samples are sampled from Chinese and English corpora. Chinese and English corpora include Chinese corpus, English corpus, and Chinese-English mixed corpus.

[0050] In the embodiment of the present invention, the Chinese and English corpus can be used as a data set, wherein the speech training samples are used as a training set or a verification set drawn proportionally from the data set to estimate the model, determine the model network structure, and determine the model parameters.

[0051]In practical applications, the test set can also be extracted from the data set to simulate the robustness of the network model constructed from the training set or verification set in general application sce...

Embodiment 2

[0081] An embodiment of the present invention provides a Chinese-English mixed speech recognition device 20, including:

[0082] Voice sample acquisition module 21, used to obtain voice training samples, the voice training samples are sampled in Chinese and English corpora;

[0083] Chinese and English corpora include Chinese corpus, English corpus, Chinese and English mixed corpus;

[0084] The model training module 22 is used to train the LSTM-CTC end-to-end network by the voice training sample, and modify the softmax layer of the LSTM-CTC end-to-end network, so that the characters output by the softmax layer are Unicode encoding;

[0085] Speech recognition network model acquisition module 23, for obtaining the speech recognition network model according to the character of softmax layer output;

[0086] Speech recognition module 24, for inputting the voice to be recognized into the speech recognition network model, and processing the output of the speech recognition networ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention is applicable to the technical field of speech recognition, and provides a Chinese-English mixed speech recognition method and device. The method includes: obtaining speech training samples, and the speech training samples are sampled in Chinese and English corpora, and the Chinese and English corpora include Chinese corpus, English corpus, Chinese corpus, and Chinese corpus. English mixed corpus; train the LSTM-CTC end-to-end network through speech training samples, and modify the softmax layer of the LSTM-CTC end-to-end network so that the characters output by the softmax layer are Unicode encoded; according to the characters output by the softmax layer Obtain a speech recognition network model; input the speech to be recognized into the speech recognition network model, and process the output of the speech recognition network model through the RNN‑LM language model to obtain a speech recognition result based on the speech to be recognized; wherein, the RNN‑LM language model Obtained from text training of speech training samples. The present invention can effectively improve the decoding efficiency of CTC in the process of establishing a speech recognition network model based on the LSTM-CTC end-to-end network, and improve the recognition performance.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to a Chinese-English mixed speech recognition method and device. Background technique [0002] With the globalization of life, the phenomenon of using mixed languages to communicate has become a common phenomenon. Statistically, there are more people who speak multiple languages than monolingual speakers. Acoustics between mixed languages and complexities between languages pose challenges for speech recognition. Therefore, the study of mixed language acoustic models is an important research direction. [0003] At present, LSTM (Long Short-Term Memory, long-term short-term memory network) is a time cyclic neural network, and CTC (Connectionist Temporal Classification, connectionist temporal classification) algorithm marginalizes and condenses all possible frame-by-frame output symbol sequences, and A good recognition rate has been achieved on the TIMIT dataset, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L15/06G10L15/16

CPCG10L15/063G10L15/16

Inventor 郑能恒容韦聪史裕鹏

Owner SHENZHEN UNIV

A Chinese-English mixed speech recognition method and device

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

Agents

Company

A Chinese-English mixed speech recognition method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

Agents

Company

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology