Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for constructing language model

A language model and language technology, applied in natural language data processing, speech analysis, speech recognition, etc., can solve problems such as increasing the difficulty of obtaining dictionaries, inaccurate language model recognition results, increasing the difficulty of language models, etc., to achieve the effect of increasing the number

Active Publication Date: 2016-11-23
TENCENT TECH (SHENZHEN) CO LTD +1
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Since the initial speech training samples are obtained by data mining a dictionary, in order to make the initial language training samples cover more domains, the capacity of the dictionary must be increased so that the dictionary can cover more domains, increasing the gain The difficulty of the dictionary, which increases the difficulty of building a language model
[0006] In addition, when the initial speech training sample is expanded by the method of fixed weight interpolation for each vertical field, it is difficult for the interpolated training sample to include professional terms, uncommon words, and unpopular words in the vertical field. Inaccurate recognition results from the language model for this vertical, reducing the accuracy of speech recognition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for constructing language model
  • Method and device for constructing language model
  • Method and device for constructing language model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0041] This embodiment provides a method for constructing a language model, see figure 1 The specific process of the method provided in this embodiment is as follows:

[0042] 101: Obtain data samples, classify and mine the sentences in the data samples, and use the mined sentences as the result of data mining;

[0043] 102: Obtain classification training samples according to the results of data mining, and construct a text classifier based on the classification training samples;

[0044] Further, obtaining classification training samples according to the results of data mining includes: performing frequency statistics on the excavated sentences, and sorting the excavated sentences according to the frequency; selecting classification training samples from the excavated sentences according to the sorting results.

[0045] Further, constructing a text classifier based on the classification training sample includes: performing tf-idf (Term Frequency-Inverse Document Frequency, term freque...

Embodiment 2

[0091] Example two

[0092] This embodiment provides a method for constructing a language model, see figure 2 The specific process of the method provided in this embodiment is as follows:

[0093] 201: Obtain data samples, classify and mine the sentences in the data samples, and use the mined sentences as the result of data mining;

[0094] For this step, according to the different domains, each domain is regarded as a category, and the acquired data samples are classified and mined, and each type of sentence obtained by mining is regarded as the result of data mining of this category. Specific methods for obtaining data samples include, but are not limited to: using web crawling technology to crawl articles or paragraphs in various fields on the Internet, and use the crawled articles or paragraphs in various fields as the obtained data samples. This embodiment also does not limit the specific classification principle according to the field classification. For example, the field is...

Embodiment 3

[0161] This embodiment provides a device for constructing a language model, which is used to execute the method for constructing a language model provided in the first or second embodiment above, see Figure 5 , The device includes:

[0162] The first obtaining module 501 is used to obtain data samples;

[0163] The first mining module 502 is used to classify and mine sentences in the data sample, and use the sentences obtained as a result of data mining;

[0164] The second acquisition module 503 is configured to acquire classification training samples according to the results of data mining;

[0165] The first construction module 504 is configured to construct a text classifier based on the classification training samples;

[0166] The classification module 505 is used to classify data samples through a text classifier;

[0167] The third acquisition module 506 is used to acquire the classification vocabulary and classification corpus according to the classification result;

[0168] The...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method includes: acquiring data samples; performing categorized sentence mining in the acquired data samples to obtain categorized training samples for multiple categories; building a text classifier based on the categorized training samples; classifying the data samples using the text classifier to obtain a class vocabulary and a corpus for each category; mining the corpus for each category according to the class vocabulary for the category to obtain a respective set of high-frequency language templates; training on the templates for each category to obtain a template-based language model for the category; training on the corpus for each category to obtain a class-based language model for the category; training on the class vocabulary for each category to obtain a lexicon-based language model for the category; building a speech decoder according to an acoustic model, the class-based language model and the lexicon-based language model for any given field, and the data samples.

Description

Technical field [0001] The present invention relates to the technical field of speech recognition, in particular to a method and device for constructing a language model. Background technique [0002] With the development of speech recognition technology and mobile Internet technology in recent years, the application range of speech recognition technology has become wider and wider. When realizing the speech recognition function, a speech recognition decoder is usually used to decode the speech data, and the speech recognition decoder uses an acoustic model and a language model to realize the conversion of speech to text during the decoding process. Therefore, how to construct a language model is the key to improving the accuracy of speech recognition. [0003] At present, when building a language model, through data mining on a dictionary, an initial language training sample covering multiple fields is obtained, and the language training sample is used for training to obtain a la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L15/06G10L15/08G06F40/00
CPCG06F40/284G06F40/289G10L15/063G10L15/183
Inventor 饶丰卢鲤陈波张翔岳帅李露
Owner TENCENT TECH (SHENZHEN) CO LTD