Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Language model obtaining method and device

A language model and acquisition method technology, applied in the direction of semantic tool creation, unstructured text data retrieval, etc., can solve problems such as low performance of language model

Active Publication Date: 2019-07-16
ALIBABA GRP HLDG LTD
View PDF6 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] Embodiments of the present invention provide a method and device for acquiring a language model to at least solve the technical problem of low performance of the language model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Language model obtaining method and device
  • Language model obtaining method and device
  • Language model obtaining method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0024] An embodiment of the present invention provides a language model acquisition system. figure 1 is a schematic diagram of a language model acquisition system according to an embodiment of the present invention. Such as figure 1 As shown, the language model acquisition system 100 includes: an input device 102 , a processor 104 and an output device 106 .

[0025] The input device 102 is configured to input a first corpus and a second corpus to the processor 104, wherein the first corpus is a language text collected randomly, and the second corpus is a language text selected in a preset context.

[0026] The corpus, that is, language texts, can be randomly collected language texts, or language texts selected under a preset context. As the training set corpus of the language model to be used, the corpus can be corpus from various sources in daily life, that is, the corpus can come from corpus from various channels, involving various aspects of life, for example, information...

Embodiment 2

[0050] According to an embodiment of the present invention, an embodiment of a method for acquiring a language model is also provided. It should be noted that the steps shown in the flow chart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, Also, although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that shown or described herein.

[0051] figure 2 It is a flow chart of a method for acquiring a language model implemented according to the present invention. Such as figure 2 As shown, the method for obtaining the language model includes the following steps:

[0052] Step S202, acquiring the first corpus and the second corpus.

[0053] In the technical solution provided by the above step S202 of the present invention, the first corpus and the second corpus are acquired, wherein the first corpus is a randomly collected lang...

Embodiment 3

[0100] The technical solutions of the present invention will be described below in conjunction with preferred embodiments. Specifically, the first corpus is the training set corpus, the second corpus is the development set corpus, the first language model is the language model of the development set, the second language model is the base language model, and the third language model is the screening language model for example.

[0101] image 3 is a schematic flowchart of a method for acquiring a language model according to an embodiment of the present invention. Such as image 3 As shown, the method for obtaining the language model includes the following steps:

[0102] Step S301, acquiring training set corpus.

[0103] In the process of acquiring the language model, a large amount of text corpus from various sources is sorted out as a training set corpus, and the training set corpus is randomly collected language texts.

[0104] Step S302, acquiring development set corpus...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a language model obtaining method and device. The method comprises: acquiring a first corpus and a second corpus, wherein the first corpus is a language text collected randomly, and the second corpus is a language text selected under a preset context; carrying out confusion degree calculation on the first corpus by adopting a first language model obtained by training the second corpus so as to screen out a third corpus, the confusion degree between the third corpus and the second corpus being smaller than a preset threshold value; and performing interpolation processingon a second language model obtained by training the first corpus and a third language model obtained by training the third corpus to obtain a language model to be used. The technical problem that a language model is low in performance is solved.

Description

technical field [0001] The present invention relates to the field of language models, in particular to a method and device for acquiring a language model. Background technique [0002] At present, in speech recognition, the language model is an important link in the whole recognition process, even in natural language understanding, which has a profound impact on the performance of speech recognition. However, the corpus is very sensitive to the matching degree of the data. For example, for a specific domain, whether the corpus matches will seriously restrict the performance of the language model, thereby restricting the performance of the entire system. [0003] Traditional language model training often adopts the method of piling up corpus. In the case of insufficient corpus, the impact of the quantity of corpus on the performance of the language model far exceeds the impact of the quality of the corpus on the performance of the language model. When the amount of corpus c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/36
CPCG06F16/36
Inventor 郑昊唐璐
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products