Chinese homonym error auto-proofreading method

An automatic proofreading, homophone technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as multiple knowledge and resources, sparse data, true word errors, etc.

Active Publication Date: 2015-11-11
JIANGSU UNIV OF SCI & TECH
View PDF3 Cites 31 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] 2) True word errors will interfere with the grammar and semantics of the entire sentence, so finding true word errors r...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese homonym error auto-proofreading method
  • Chinese homonym error auto-proofreading method
  • Chinese homonym error auto-proofreading method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0061] A kind of Chinese homonym error automatic proofreading method provided by the present invention is based on homonym confusion set and local adjacency NGram model combination judgment method with weight to carry out Chinese homonym error automatic proofreading, and the method comprises the following steps: 1), establish homonym confusion set, to Chinese Words use the pinyin of Chinese characters to establish a homonym confusion set for Chinese words.

[0062] Such as figure 1 As shown, using the pinyin table of Chinese characters and the Chinese dictionary, a confusion set of homophones is generated:

[0063] C S e t ( W i ) = { W i 1 , W...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a Chinese homonym error auto-proofreading method. The method comprises: first, generating a confusion set of Chinese homonyms; through large-scale Web corpus training, collecting statistics on a left-adjacent binary model, a right-adjacent binary model, and an adjacent ternary model; obtaining a local adjacent NGram model by using the confusion set of Chinese homonyms and a probability estimation algorithm; by using a weighted combination method and by calculating a sentence context support degree of a word in a sentence and a sentence context support degree of a homonym in a homonym confusion set corresponding to the word, determining whether a homonym error exists; marking the homonym error and providing a correction suggestion list, so as to implement Chinese homonym auto-proofreading. The Chinese homonym error auto-proofreading method provided by the present invention is quick in system response and high in efficiency and accuracy, and meets precision requirements of actual applications.

Description

technical field [0001] The invention relates to natural language processing in the field of artificial intelligence computers, in particular to the field of automatic proofreading of Chinese texts. Background technique [0002] With the rapid development of information processing technology and the Internet, traditional text work is almost completely replaced by computers. Electronic texts such as e-books, e-newspapers, e-mails, and office documents, blogs, and microblogs have all become part of people's daily lives. However, there are more and more errors in the text, which brings great challenges to the proofreading work. Traditional manual proofreading has low efficiency, high intensity, and long cycle obviously cannot meet the needs of text proofreading. [0003] Automatic text proofreading is one of the main applications of natural language processing, and it is also a difficult problem in natural language understanding. Chinese is entered into the computer through th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
Inventor 吴健康严熙刘亮亮
Owner JIANGSU UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products