Training method and system for language model

A technology of language model and training method, which is applied in the field of language model training method and system, can solve the problems that the language model is easy to lose the original statistical distribution of big data, the language recognition rate is reduced, and the amount of computing resources is large, so as to improve the speech recognition rate , reduce the amount of computing resources, and the effect of reasonable parameters

Active Publication Date: 2015-04-29
BEIJING SINOVOICE TECH CO LTD
View PDF3 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The existing language model training methods usually directly train the big data to obtain the language model. However, due to the large size and scale of the big data, directly training it not only needs to occupy more hard disk and memory consumption, but also requires a long time. Training time, that is, the existing language model training methods have problems such as occupying a large amount of computing resources and being time-consuming
[0005] In order to overcome the above-mentioned problems such as large amount of computing resources occupied and time-consuming, there are still some language model training methods that will cut some big data and then train the cut big data. However, the language model obtained by the above training is easy Lose the original statistical distribution of big data, resulting in a lower language recognition rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Training method and system for language model
  • Training method and system for language model
  • Training method and system for language model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054] refer to figure 2 , which shows a flow chart of the steps of Embodiment 1 of a language model training method of the present invention, which may specifically include the following steps:

[0055] Step 201, obtaining seed corpus in various fields;

[0056] In the embodiment of the present invention, the domain may refer to the application scenarios of the data, such as news, place names, website addresses, people's names, map navigation, chatting, short messages, Q&A, Weibo, etc. are common domains. In practical applications, the corresponding seed corpus can be obtained through professional crawling and cooperation for specific fields. The cooperation can be with the website operator to obtain the corresponding seed corpus through the log files of the website, such as through The corresponding seed corpus is obtained from the log file of the microblog website, and the embodiment of the present invention does not limit the specific method for obtaining the seed corpus...

Embodiment 2

[0121] refer to image 3 , which shows a flow chart of the steps of Embodiment 2 of an information search method of the present invention, which may specifically include the following steps:

[0122] Step 301. Obtain seed corpus in each field, and train a seed model in a corresponding field according to the seed corpus in each field;

[0123] Step 302: Screen the big data corpus according to the vector space model of the seed corpus in each field, and obtain the seed screening corpus in the corresponding field;

[0124] Step 303, respectively use the seed screening corpus training in each field to obtain the screening model in the corresponding field;

[0125] Step 304, fusing the screening models of all domains to obtain a corresponding screening fusion model.

[0126] Step 305, fuse the seed models of all domains to obtain the corresponding seed fusion model;

[0127] Step 306: Fusion the screening fusion module and the seed fusion model to obtain a corresponding general ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a training method and system for a language model, wherein the method specifically comprises the following steps: acquiring seed corpuses of all fields; screening big data corpuses according to a vector space model of seed corpuses of all fields to obtain seed screening corpuses in the corresponding fields; respectively training by utilizing the seed screening corpuses of all fields to obtain the screening model of the corresponding field; fusing the screening models of all the fields to obtain corresponding screening fusion model. According to the training method and system for language model disclosed by the embodiment of the invention, the parameter reasonableness of the language model can be improved on the premise of reducing a computation burden and saving time.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a language model training method and system. Background technique [0002] In the field of natural language processing technology, speech recognition is a technology in which a machine converts speech signals into corresponding text or commands through the process of recognition and understanding. [0003] Speech recognition system is essentially a pattern recognition system, usually including basic units such as feature extraction, pattern matching and reference model. refer to figure 1 , shows a schematic structural diagram of an existing language recognition system, wherein the input speech signal is first analyzed by the feature extraction unit to form a feature vector, and then enters the word-level matching unit, and the word-level matching unit follows the dictionary and the subword model The word model formed by the collection and concatenation recog...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28G06F17/27G06F17/30
Inventor 郑晓明李健张连毅武卫东
Owner BEIJING SINOVOICE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products