Unlock instant, AI-driven research and patent intelligence for your innovation.

Voice recognition model training method and system, mobile terminal and storage medium

A speech recognition model and training method technology, applied in speech recognition, speech analysis, natural language data processing, etc., can solve problems such as time-consuming and low training efficiency, improve efficiency, reduce model training time, and reduce labor costs. Effect

Active Publication Date: 2020-05-26
XIAMEN KUAISHANGTONG TECH CORP LTD
View PDF5 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the embodiment of the present invention is to provide a speech recognition model training method, system, mobile terminal and storage medium, aiming to solve the problem of low training efficiency and long time-consuming in the existing speech recognition model training method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice recognition model training method and system, mobile terminal and storage medium
  • Voice recognition model training method and system, mobile terminal and storage medium
  • Voice recognition model training method and system, mobile terminal and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054] see figure 1 , is a flow chart of the speech recognition model training method provided by the first embodiment of the present invention, including steps:

[0055]Step S10, acquiring a sample speech and a sample text corresponding to the sample speech, and performing feature extraction on the sample speech to obtain speech features;

[0056] Wherein, the sample speech is a language to be recognized by the speech recognition model, such as Cantonese or Hokkien, and the sample text is expressed in Mandarin, and a one-to-one correspondence is adopted between the sample speech and the sample text;

[0057] Specifically, in this step, through the acquisition of the sample voice and the sample text, a corresponding data set is constructed, and 20% of the data in the data set is randomly selected as a test set. Preferably, the voice feature adopts an 80-dimensional fbank feature, The frame length is 25ms, and the frame shift is 10ms;

[0058] Step S20, inputting the speech f...

Embodiment 2

[0072] see figure 2 , is a flow chart of the speech recognition model training method provided in the second embodiment of the present invention, including steps:

[0073] Step S11, acquiring a sample speech and a sample text corresponding to the sample speech, performing noise and reverberation processing on the sample speech, and performing feature extraction on the processed sample speech;

[0074] Wherein, by adding noise and adding reverberation to the sample speech, the data can be effectively expanded, and the robustness of the speech recognition model is improved, so that the model can adapt to more complex environments;

[0075] Specifically, in this step, the speech feature adopts the fbank feature of 80 dimensions, the frame length is 25ms, and the frame shift is 10ms;

[0076] Step S21, grouping the voices in the sample voice according to the number of voice features, and setting the maximum number of features in each group as the target voice length; ...

Embodiment 3

[0117] see Figure 4 , is a schematic structural diagram of the speech recognition model training system 100 provided by the third embodiment of the present invention, including: a feature extraction module 10, a feature encoding and decoding module 11, a loss calculation module 12 and a parameter update module 13, wherein:

[0118] The feature extraction module 10 is configured to acquire sample speech and sample text corresponding to the sample speech, and perform feature extraction on the sample speech to obtain speech features.

[0119] Wherein, the feature extraction module 10 is also used for: performing noise addition and reverberation processing on the sample speech, and performing feature extraction on the processed sample speech;

[0120] The voice in the sample voice is grouped according to the number of feature items of the voice feature, and the maximum number of feature items in each group is set as the target voice length;

[0121] The speech feature...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a voice recognition model training method and system, a mobile terminal and a storage medium, wherein the method comprises the steps: obtaining sample voice and a sample text corresponding to the sample voice, and carrying out the feature extraction of the sample voice, so as to obtain voice features; inputting the voice features into an encoder in a voice recognition modelfor encoding to obtain a feature vector, and decoding a decoder in the voice recognition model according to the feature vector and the sample text to obtain a probability vector; performing loss calculation according to the probability vector and the sample text to obtain total loss of the model; and propagating the total loss of the model in the voice recognition model, and controlling a coding line and the decoder to update parameters at the same time until the voice recognition model converges. According to the method, a pronunciation dictionary does not need to be constructed, the labor cost is reduced, the model training time is shortened, all parameters are updated at the same time in a sequence-to-sequence architecture mode, and the model training efficiency and the subsequent voicerecognition efficiency are improved.

Description

technical field [0001] The invention belongs to the technical field of speech recognition, and in particular relates to a speech recognition model training method, system, mobile terminal and storage medium. Background technique [0002] Speech recognition research has a history of several decades. Speech recognition technology mainly includes four parts: acoustic model modeling, language model modeling, pronunciation dictionary construction, and decoding. Each part can become a separate research direction, and compared with image And text, the difficulty of collecting and labeling speech data is also greatly increased, so building a complete speech recognition model training system is a very time-consuming and extremely difficult task, which greatly hinders the development of speech recognition technology. With the research and development of artificial intelligence technology, especially deep learning, some end-to-end speech recognition algorithms have been proposed. Compa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/00G10L15/02G10L15/06G10L15/08G10L15/26G06F40/279
CPCG10L15/02G10L15/063G10L15/005G10L15/083G10L15/26
Inventor 徐敏肖龙源李稀敏蔡振华刘晓葳谭玉坤
Owner XIAMEN KUAISHANGTONG TECH CORP LTD