System and iterative method for lexicon, segmentation and language model joint optimization

A language model and dictionary technology, applied in natural language data processing, speech analysis, speech recognition, etc., can solve problems such as language models are prone to errors, limit the accuracy and prediction properties of language models, and the quality of language model predictions is poor

Inactive Publication Date: 2002-12-25
MICROSOFT TECH LICENSING LLC
View PDF0 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Thus the model cannot accurately predict smaller words contained within a semantically acceptable larger string
[0014] As a result of the above limitations, language models using state-of-the-art dictionaries and segmentation algorithms tend to be error-prone
That is, any errors made in the lexicon or segmentation stages are propagated throughout the language model, limiting the accuracy and predictive properties of the language model
[0015] Finally, restricting the model to a maximum of two prior words of context (as far as 3-gram language models are concerned) is also restrictive, since more context may be required to accurately predict the likelihood of a word
These three limitations of a language model usually lead to poor prediction quality for that language model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and iterative method for lexicon, segmentation and language model joint optimization
  • System and iterative method for lexicon, segmentation and language model joint optimization
  • System and iterative method for lexicon, segmentation and language model joint optimization

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The invention relates to a system and an iterative method for joint optimization of dictionaries, segmentation and language models. In describing the present invention, reference is made to an innovative language model, the Dynamic Ordering Markov Model (DOMM). A detailed description of DOMM is given in co-pending U.S. Patent Application No. 09 / XXXXXX, "A Method and Apparatus for Generating and Managing a Language Model Data Structure" by Lee et al., the disclosure of which is incorporated herein by reference .

[0027] In the discussion herein, the invention is described in the general context of computer-executable instructions, such as program modules, being executed by one or more conventional computers. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Additionally, those skilled in the art will recognize that other computer system architectu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Provides a method for optimizing the language model, including using the maximum matching technique to establish an initial language model based on the lexicon and segmentation obtained from the received corpus, and by dynamically updating the dictionary and re-segmenting the corpus according to statistical principles, iteratively improves Initialize the language model until a threshold of predictive power is reached.

Description

[0001] This application claims priority to Provisional Patent Application No. 60 / 163850, "An iterative method for lexicon, wordsegmentation and language model joint optimization," filed on November 5, 1999 by the inventors of this application. technical field [0002] The present invention relates to language modeling, more specifically to a system and an iterative method for joint optimization of dictionaries, text segmentation and language models. Background technique [0003] Recent advances in computing power and related technologies have enabled the development of a new generation of powerful application software, including web browsers, word processing and speech recognition applications. For example, the latest generation of web browsers anticipate Uniform Resource Locator (URL) address entry after entering the first two or three characters of a domain name. Word processors offer improved spelling and grammar checking, word prediction, and language conversion. Newer ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28G06F17/27G10L15/065G10L15/18G10L15/183G10L15/187G10L15/197
CPCG10L15/197G06F40/253
Inventor 王海峰黄常宁李凯夫狄硕蔡东峰秦立峰郭建峰
Owner MICROSOFT TECH LICENSING LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products