Voice recognition model training method and system, mobile terminal and storage medium
A speech recognition model and training method technology, applied in speech recognition, speech analysis, natural language data processing, etc., can solve problems such as time-consuming and low training efficiency, improve efficiency, reduce model training time, and reduce labor costs. Effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0054] see figure 1 , is a flow chart of the speech recognition model training method provided by the first embodiment of the present invention, including steps:
[0055]Step S10, acquiring a sample speech and a sample text corresponding to the sample speech, and performing feature extraction on the sample speech to obtain speech features;
[0056] Wherein, the sample speech is a language to be recognized by the speech recognition model, such as Cantonese or Hokkien, and the sample text is expressed in Mandarin, and a one-to-one correspondence is adopted between the sample speech and the sample text;
[0057] Specifically, in this step, through the acquisition of the sample voice and the sample text, a corresponding data set is constructed, and 20% of the data in the data set is randomly selected as a test set. Preferably, the voice feature adopts an 80-dimensional fbank feature, The frame length is 25ms, and the frame shift is 10ms;
[0058] Step S20, inputting the speech f...
Embodiment 2
[0072] see figure 2 , is a flow chart of the speech recognition model training method provided in the second embodiment of the present invention, including steps:
[0073] Step S11, acquiring a sample speech and a sample text corresponding to the sample speech, performing noise and reverberation processing on the sample speech, and performing feature extraction on the processed sample speech;
[0074] Wherein, by adding noise and adding reverberation to the sample speech, the data can be effectively expanded, and the robustness of the speech recognition model is improved, so that the model can adapt to more complex environments;
[0075] Specifically, in this step, the speech feature adopts the fbank feature of 80 dimensions, the frame length is 25ms, and the frame shift is 10ms;
[0076] Step S21, grouping the voices in the sample voice according to the number of voice features, and setting the maximum number of features in each group as the target voice length; ...
Embodiment 3
[0117] see Figure 4 , is a schematic structural diagram of the speech recognition model training system 100 provided by the third embodiment of the present invention, including: a feature extraction module 10, a feature encoding and decoding module 11, a loss calculation module 12 and a parameter update module 13, wherein:
[0118] The feature extraction module 10 is configured to acquire sample speech and sample text corresponding to the sample speech, and perform feature extraction on the sample speech to obtain speech features.
[0119] Wherein, the feature extraction module 10 is also used for: performing noise addition and reverberation processing on the sample speech, and performing feature extraction on the processed sample speech;
[0120] The voice in the sample voice is grouped according to the number of feature items of the voice feature, and the maximum number of feature items in each group is set as the target voice length;
[0121] The speech feature...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


