Language model training method, query method and corresponding device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A training method and language model technology, applied in speech analysis, speech recognition, instruments, etc., can solve the problems of large training corpus, rapid update of language models affecting the speech search system, slow training speed, etc.

Active Publication Date: 2014-06-18

BEIJING BAIDU NETCOM SCI & TECH CO LTD

View PDF6 Cites 12 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, this method of training the language model can only train the language model based on the training corpus in a serial manner, which will cause a large amount of training corpus or the language model is too large, and the training speed will be slow, which will affect the language model of the voice search system. quick update

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0072] figure 1 The flow chart of the language model training method provided by Embodiment 1 of the present invention, such as figure 1 As shown, the method includes the following steps:

[0073] Step 101: Divide the training corpus into blocks to obtain N sets of training corpus, where N is a positive integer greater than 1.

[0074] In order to improve the update speed of the language model, in the embodiment of the present invention, the original serial processing of the training corpus is changed to parallel processing, so firstly the training corpus is divided into blocks to obtain multiple sets of training corpus, so that the multiple sets of training The corpus is processed in parallel.

[0075] Here, the division of the training corpus can be performed according to any strategy, as long as the training corpus can be divided into N groups. In addition, the training corpus used in this step can be the user input information of all time periods in the search text duri...

Embodiment 2

[0119] image 3 The flow chart of the language model query method provided by Embodiment 2 of the present invention, as shown in image 3 As shown, the query method specifically includes the following steps:

[0120] Step 301: Obtain the word sequence to be queried, and execute step 302 with the word sequence to be queried as the currently input word sequence.

[0121] Step 302: Adjust the currently input word sequence to a preset word order structure, and the adjusted word sequence is in the following order: the penultimate word, the last word, and other words in the currently input word sequence are arranged in reverse order.

[0122] The word order structure adjustment of the input word sequence in this step matches the word order structure of the Trie tree storing the probability information.

[0123] Step 303: query the adjusted word sequence on the Trie tree storing the forward probability information obtained in the first embodiment.

[0124] Step 304: Determine whet...

Embodiment 3

[0137] Figure 4 The structural diagram of the training device for the language model provided by Embodiment 3 of the present invention, as shown in Figure 4 As shown, the training device includes: a block processing unit 400, N recursive processing units 410, N word order tree building units 420, and a merge processing unit 430, where N is a positive integer greater than 1.

[0138] The block processing unit 400 blocks the training corpus to obtain N sets of training corpus, and provides the N sets of training corpus to each recursive processing unit 410 respectively.

[0139] In the embodiment of the present invention, the original serial processing of the training corpus is changed into parallel processing, so the block processing unit 400 first divides the training corpus to obtain multiple sets of training corpus, so that the subsequent training corpus can be performed on the multiple sets of training corpus Parallel processing. The training corpus used by the block pr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a language model training method, a query method and a corresponding device; the training method comprises the following steps: partitioning training corpus to obtain N groups of training corpus, wherein the N is a positive integer bigger than 1; carrying out parallel execution to the N groups of training corpus obtained by partition; ordering recursion suffix trees so as to respectively obtain ordering results reflecting inverted order position conditions of each word in each sentence; based on the ordering result, respectively setting up an n-ary word order tree according to a preset first word order structure under a condition that a second last word of each sentence is regarded as a root node, and the n refers to the preset one or more positive integers bigger than 1; combining the word order trees of the same root node and converting the word order so as to obtain a Trie tree storing forward probability information. A word order sequence of the Trie tree from root to leaf is as the following order: the second last word in the sentence, a last work, and other words arranged in an inverted order. By employing the method and device, the language model can be fast updated.

Description

【Technical field】 [0001] The invention relates to the technical field of speech recognition in computer applications, in particular to a language model training method, query method and corresponding device. 【Background technique】 [0002] Speech recognition refers to enabling machines to accurately recognize the content of speech in different situations, so as to execute various human intentions based on the recognized information, such as performing voice searches. At present, with the continuous development of speech recognition technology, statistical language models have been widely used in various fields, such as speech recognition, information retrieval, and spoken language understanding. For large-vocabulary continuous speech recognition, the language model is a very critical link in the entire recognition system, which directly affects the performance and recognition effect of the entire recognition system. [0003] In technical applications such as voice search, l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L15/06

Inventor 贾磊万广鲁

Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Language model training method, query method and corresponding device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology