A language model training method, training device and testing method
A language model and training method technology, applied in the computer field, can solve problems such as increasing computing time and cost
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0041] Such as figure 1 As shown, it is a schematic flowchart of a language model training method provided by Embodiment 1 of the present invention, including steps S11 to S12, specifically as follows:
[0042] S11: Initialize the word list and vocabulary with specific words and phrases.
[0043] S12: Train the language model using the initialized word list and vocabulary and the original corpus to generate a trained language recognition model.
[0044] In this embodiment, the specific words and phrases may be some of the most frequently used words and phrases automatically obtained according to the frequency of use of the user, or they may be the words and phrases in the internal storage library, which is not limited in this embodiment . The words and phrases in the internal repository contain commonly used words given by the country's official. After initializing the vocabulary and vocabulary with the specific characters and words in this way, the probability that the old...
Embodiment 2
[0047] figure 2 It is a schematic flowchart of a language model training method provided in Embodiment 2 of the present invention. This embodiment is optimized on the basis of Embodiment 1. In this embodiment, incremental training will be performed on the trained language recognition model, specifically: when new corpus is received, count the new corpus and statistically analyze its Word Error Rate and Word Error Rate.
[0048] Further, when the amount of new corpus is not less than the set threshold, or when the word error rate or word error rate of the new corpus is not less than the set threshold, the language recognition model is incrementally trained.
[0049] Further, a part of the existing corpus is randomly selected or all of the existing corpus is used for incremental training of the language recognition model.
[0050] Further, calculate the total number of new corpora as m, randomly select α*m old corpora, mix m new corpora with α*m old corpora to generate a mixed ...
Embodiment 3
[0061] image 3 It is a schematic flowchart of a language model training method provided in Embodiment 3 of the present invention. This embodiment is optimized on the basis of Embodiment 1. In this embodiment, incremental training will be carried out for the trained language recognition model, specifically: when a new corpus is received, first analyze and judge whether the source of the new corpus is the same .
[0062] If the source of the new corpus is the same, enter the following process:
[0063] Count the new corpus and statistically analyze its word error rate and word error rate.
[0064] Further, when the amount of new corpus is not less than the set threshold, or when the word error rate or word error rate of the new corpus is not less than the set threshold, the language recognition model is incrementally trained.
[0065] Further, a part of the existing corpus is randomly selected or all of the existing corpus is used for incremental training of the language rec...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


