Method and apparatus for determining language model

A language model and determination method technology, applied in natural language data processing, speech analysis, speech recognition and other directions, can solve problems such as low corpus performance

Active Publication Date: 2019-02-26
ALIBABA GRP HLDG LTD
View PDF5 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] The embodiment of the present invention provides a method and device for determining a language model, to at least solve the technical problem of low performance of the corpus due to the fact that the language model in the prior art is only acquired through the strategy of piling up the corpus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and apparatus for determining language model
  • Method and apparatus for determining language model
  • Method and apparatus for determining language model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0028] An embodiment of the present invention provides a system for determining a language model. figure 1 is a schematic diagram of a system for determining a language model according to an embodiment of the present invention. Such as figure 1 As shown, the language model determining system 100 includes: an input device 102 , a processor 104 and an output device 106 .

[0029] The input device 102 is configured to input a first corpus to the processor 104, wherein the first corpus is a language text selected in a preset context.

[0030] As the training set corpus of the language model to be used, corpus can be corpus from various sources in daily life, such as corpus in information annotation, corpus in web crawling, corpus in open source libraries, and corpus in a certain field provided by users. Effective corpus, etc., have a wide range of sources and a large amount of data. The corpus may correspond to a certain amount of task fields, wherein. The first corpus may be ...

Embodiment 2

[0054] According to an embodiment of the present invention, an embodiment of a method for determining a language model is provided. It should be noted that the steps shown in the flowcharts of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and , although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that shown or described herein.

[0055] figure 2 is a flowchart of a method for determining a language model according to an embodiment of the present invention, such as figure 2 As shown, the method includes the following steps:

[0056] Step S21, acquiring the first corpus, wherein the first corpus is the language text selected in the preset context.

[0057] In the above steps, the corpus is used as the training set corpus of the language model to be used, which can be corpus from various sources in daily life, such as corpu...

Embodiment 3

[0132] According to an embodiment of the present invention, a device for determining a language model for implementing the above method for determining a language model is also provided. Figure 4 is a schematic diagram of an apparatus for determining a language model according to an embodiment of the present invention. Such as Figure 4 As shown, the language model determining device 400 includes: a first acquiring module 402, a first training module 404 and a processing module 406, wherein.

[0133] The first acquiring module 402 is configured to acquire the first corpus, wherein the second corpus is a language text selected in a preset context.

[0134] The first training module 404 is configured to obtain a first language model by training the first corpus.

[0135] The processing module 406 is configured to use the first language model to screen the target text to obtain a second corpus, and obtain the second language model by training the second corpus, wherein the tar...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a device for determining a language model. The method comprises the following steps: acquiring a first corpus, wherein the first corpus is a language text selectedin a preset context; Obtaining a first language model by training the first language data; adopting A first language model to filter the target text to obtain a second corpus, and acquiring a secondlanguage model by training the second corpus, wherein the target text is retrieved using a keyword set extracted from the first corpus as an index. The invention solves the technical problem that theperformance of the corpus is low because the language model in the prior art is only acquired through the strategy of piling up the corpus.

Description

technical field [0001] The present invention relates to the field of language models, in particular, to a method and device for determining a language model. Background technique [0002] At present, in speech recognition, the language model is an important link in the whole recognition process, even in natural language understanding, which has a profound impact on the performance of speech recognition. However, the corpus is very sensitive to the matching degree of the data. For example, for a specific domain, whether the corpus matches will seriously restrict the performance of the language model, thereby restricting the performance of the entire system. [0003] Traditional language model training often adopts the method of piling up corpus. In the case of insufficient corpus, the impact of the quantity of corpus on the performance of the language model far exceeds the impact of the quality of the corpus on the performance of the language model. When the amount of corpu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/9535G06F16/955G06F16/35G06F17/27
CPCG10L15/063G06F40/216
Inventor 郑昊鄢志杰
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products