Corpus expansion method and apparatus
An extension method and extension device technology, which are applied in the field of corpus extension methods and devices, can solve the problems of missing paths, low probability of forming sentences, affecting the use effect, etc., and achieve the effect of improving the actual application effect, perfecting the path of forming sentences, and improving the probability of forming sentences.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0041] refer to figure 1 , which is a flow chart of a corpus expansion method described in the embodiment of the present invention, may specifically include the following steps:
[0042] Step 101, using first corpus data to train and obtain an n-gram language model and a neural network language model; the first corpus data is sparse corpus data.
[0043] In the embodiment of the present invention, after obtaining a corpus data, after preprocessing the data in the corpus such as cleaning, word segmentation, etc., the corpus data in units of phrases is obtained, and the preprocessed sparse corpus is trained using the n-gram language model Tools and neural network language model training tools for training n-gram language models and neural network language models.
[0044] Specifically, taking the n-gram language model as an example, the appearance of the nth word is related to the first n-1 words, but not to any other words (this is also the assumption in Hidden Markov.) The pr...
Embodiment 2
[0067] refer to figure 2 , which is a flow chart of a corpus expansion method described in the embodiment of the present invention, may specifically include the following steps:
[0068] Step 201, using the first corpus data to train and obtain an n-gram language model and a neural network language model; the first corpus data is sparse corpus data.
[0069] This step is the same as step 101 and will not be described in detail here.
[0070] Step 202, sort the predicted word data according to the occurrence probability of each word in the predicted word data.
[0071] In practical applications, for a trained neural network language model, for any input word or sentence composed of multiple words, the language model can calculate the probability distribution of words that will appear after the word or phrase. For example, if you input a phrase as the starting word such as "I want today", the language model will give a higher probability to the words that may appear, and give...
Embodiment 3
[0109] refer to image 3 , is a structural block diagram of a corpus expansion device according to an embodiment of the present invention.
[0110] The language model training module 301 is used to obtain an n-gram language model and a neural network language model by using the first corpus data training; the first corpus data is sparse corpus data;
[0111] The second corpus data generation module 302 is used to use the neural network language model to predict the word data after the word or word in the first corpus data, and generate the second corpus data;
[0112]The third corpus data generation module 303 is used to input the second corpus data into the n-gram language model, and generate the third corpus data after filtering;
[0113] The first corpus data updating module 304 is configured to add the third corpus to the first corpus data to generate updated first corpus data.
[0114] refer to Figure 4 , is a schematic diagram of the relationship between modules in t...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com