Word frequency based skip language model training method
A language model and training method technology, applied in the field of Chinese statistical language model training, can solve the problem of lack of statistical language model OOV and so on
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
example 1
[0077] Example 1, the number of words in the longest vocabulary in the dictionary is L = 5, and the positive maximum matching word segmentation is performed for the "Planning Developer Circle of the Ministry of Land and Resources":
[0078] The first round of search: "Ministry of Land and Resources", scan the dictionary, there is this word;
[0079] Round 2 search: "plan developer", scan dictionary, no such word;
[0080] "planned development", scan dictionary, no such word;
[0081] "Scheduled to open", scan the dictionary, there is no such word;
[0082] "plan", scan the dictionary, there is the word;
[0083] The third round of search: "developer circle", scan the dictionary, no such word;
[0084] "Developer", scan the dictionary, there is the word;
[0085] The fourth round of search: "circle", scan the dictionary, there is the word;
[0086] The positive maximum matching word segmentation result Rf is "Ministry of Land and Resources Planning Developer Circle".
[0...
example 2
[0097] Example 2: Take a sentence in the corpus as an example, the sentence after the word segmentation in step S2: "Ili, located in the northwest border, is an important window for my country's opening to the outside world. Since the reform and opening up, Yili's economy has developed rapidly, and people's living standards have rapidly improved. "
[0098] The example sentence is processed in step S3, and the result obtained is: " Yili, located on the northwestern border, is an important window for my country's opening to the outside world. Since the reform and opening up, Yili's economy has achieved rapid development and people's living standards have improved rapidly. ".
[0099] S4. Count the vocabulary and word frequency in the learning set corpus, and generate the Chinese vocabulary wt. Traversing the sentence processed in step S3, every time a new word a is encountered, write a into the Chinese vocabulary, and initialize the number of occurrences c to 1; when traversin...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com