Unlock instant, AI-driven research and patent intelligence for your innovation.

Uygur language sentence boundary recognition method

A Uyghur language and boundary recognition technology, which is applied in special data processing applications, instruments, and electronic digital data processing, etc., can solve problems such as large impact, impact on analysis accuracy, and models that cannot directly use Uyghur sentence boundary recognition tasks to achieve Effects of improved accuracy, high processing power and robustness

Inactive Publication Date: 2014-07-02
新疆电力信息通信有限责任公司
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, according to the error amplification principle of natural language processing, the performance of the sentence boundary recognition algorithm at the lowest level in natural language processing directly affects the accuracy of the next analysis, and the impact is relatively large
[0006] Some foreign scholars have established some models and methods of English sentence boundary recognition through long-term research on English features, but these models cannot be directly used in Uyghur sentence boundary recognition tasks, because the two languages ​​produce different ambiguities in sentence boundaries. Different objects for disambiguation and features that contribute to recognition are quite different, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Uygur language sentence boundary recognition method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0015] A Uyghur sentence boundary recognition method, 1. Propose the recognition rules for unambiguous punctuation marks in Uyghur sentence recognition; 2. Propose a Uyghur paragraph classification algorithm, which can effectively reduce the scale of statistical space and rapidly improve efficiency; 3. Use statistics to establish the Uyghur sentence boundary recognition feature space, and efficiently identify ambiguous punctuation marks in Uyghur sentences; 4. Realize high-performance Uyghur sentence boundary recognition for undifferentiated corpora.

[0016] like figure 1 As shown, the process and functional modules involved in the present invention are: a paragraph classification rule base, a test corpus, a paragraph classifier, a sentence boundary recognition rule base, a training corpus, and a maximum entropy model module. The main process includes: first, with the support of the rule base, the Uyghur text is divided into unambiguous paragraphs and ambiguous paragraphs thr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a Uygur language sentence boundary recognition method. The method includes 1, putting forward recognition rules of unambiguous punctuation marks in Uygur language sentence recognition; 2, putting forward a Uygur language paragraph classification algorithm to effectively reduce scale of statistical space and quickly improve efficiency; 3, using statistics to building Uygur language sentence boundary recognition feature space to efficiently recognize ambiguous punctuation marks in Uygur language sentences; 4, realizing high-performance Uygur language sentence boundary recognition aiming at indiscriminate corpora. By the Uygur language sentence boundary recognition method, accuracy of sentence boundary recognition is effectively improved, and basic analyzing service is provided for natural language processing work like subsequent part-of-speech tagging and syntactic analyzing.

Description

technical field [0001] The invention relates to a language information processing technology, in particular to a Uighur sentence boundary recognition method. Background technique [0002] With the rapid development of Internet technology, all kinds of information are increasing, and massive amounts of information are generated, stored and disseminated on the Internet every day. Human beings are facing unprecedented information expansion. Natural language processing technology is widely used in processing a large number of network information texts. Automatic and efficient Uyghur text analysis technology has become a key technology for information processing and understanding, which is of great importance to the research of language information processing and related application fields. theoretical significance and application value. [0003] The gradual development and maturity of large-scale natural language text acquisition technology, machine learning methods and models,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27
Inventor 尼加提·纳吉米买合木提·买买提帕肉克·司地克马斌
Owner 新疆电力信息通信有限责任公司