Method and device for constructing language model
A language model and language technology, applied in natural language data processing, speech analysis, speech recognition, etc., can solve problems such as increasing the difficulty of obtaining dictionaries, inaccurate language model recognition results, increasing the difficulty of language models, etc., to achieve the effect of increasing the number
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0041] This embodiment provides a method for constructing a language model, see figure 1 The specific process of the method provided in this embodiment is as follows:
[0042] 101: Obtain data samples, classify and mine the sentences in the data samples, and use the mined sentences as the result of data mining;
[0043] 102: Obtain classification training samples according to the results of data mining, and construct a text classifier based on the classification training samples;
[0044] Further, obtaining classification training samples according to the results of data mining includes: performing frequency statistics on the excavated sentences, and sorting the excavated sentences according to the frequency; selecting classification training samples from the excavated sentences according to the sorting results.
[0045] Further, constructing a text classifier based on the classification training sample includes: performing tf-idf (Term Frequency-Inverse Document Frequency, term freque...
Embodiment 2
[0091] Example two
[0092] This embodiment provides a method for constructing a language model, see figure 2 The specific process of the method provided in this embodiment is as follows:
[0093] 201: Obtain data samples, classify and mine the sentences in the data samples, and use the mined sentences as the result of data mining;
[0094] For this step, according to the different domains, each domain is regarded as a category, and the acquired data samples are classified and mined, and each type of sentence obtained by mining is regarded as the result of data mining of this category. Specific methods for obtaining data samples include, but are not limited to: using web crawling technology to crawl articles or paragraphs in various fields on the Internet, and use the crawled articles or paragraphs in various fields as the obtained data samples. This embodiment also does not limit the specific classification principle according to the field classification. For example, the field is...
Embodiment 3
[0161] This embodiment provides a device for constructing a language model, which is used to execute the method for constructing a language model provided in the first or second embodiment above, see Figure 5 , The device includes:
[0162] The first obtaining module 501 is used to obtain data samples;
[0163] The first mining module 502 is used to classify and mine sentences in the data sample, and use the sentences obtained as a result of data mining;
[0164] The second acquisition module 503 is configured to acquire classification training samples according to the results of data mining;
[0165] The first construction module 504 is configured to construct a text classifier based on the classification training samples;
[0166] The classification module 505 is used to classify data samples through a text classifier;
[0167] The third acquisition module 506 is used to acquire the classification vocabulary and classification corpus according to the classification result;
[0168] The...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 