Automatic Chinese real word error proofreading method

An automatic proofreading, true word technology, applied in natural language data processing, special data processing applications, instruments, etc., can solve problems such as multiple knowledge and resources, interfering with sentence grammar and semantics, etc.

Active Publication Date: 2016-08-03
JIANGSU UNIV OF SCI & TECH
View PDF7 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] 3) True word errors will interfere with the grammar and semantics of the entire sentence, so finding true word errors r...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic Chinese real word error proofreading method
  • Automatic Chinese real word error proofreading method
  • Automatic Chinese real word error proofreading method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The present invention will be further described below in conjunction with the accompanying drawings.

[0039] A kind of Chinese true word error automatic proofreading method that the present invention proposes is based on the Chinese true word confusion set, NGram model, Bayesian model combined judgment method to carry out Chinese true word error automatic proofreading, and the application of synonyms makes the data sparse problem to a large extent be alleviated. The method as figure 1 shown, including the following steps:

[0040] 1) Utilize correct word dictionary and Chinese character confusion set to generate Chinese true word confusion word;

[0041] The confusion set C(W) of a Chinese word W refers to a group of words in the Chinese dictionary that are similar in sound or shape or meaning to W, and in the process of people's use, the words in W and C(W) Often easily confused.

[0042] 11) Get all the correct words in the Chinese dictionary;

[0043] 12) For a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an automatic Chinese real word error proofreading method. The method comprises the following steps: firstly, generating a Chinese real word confusion set by utilizing a correct word dictionary and a Chinese character confusion set; secondly, verifying a current word by utilizing statistical knowledge; thirdly, generalizing a data sparseness problem of context feature easing corpora through synonyms; finally, judging whether the current word is a real word error by estimating the probability of occurrence of the current word in the text by utilizing a Bayesian model, marking the real word error, and giving a modification suggestion list. According to the automatic Chinese real word error proofreading method disclosed by the invention, the problems of data sparseness, low correct word judging and proofreading efficiency and the like in the prior art are solved; the method has relatively high efficiency and accuracy.

Description

technical field [0001] The invention relates to natural language processing in the field of artificial intelligence computers, in particular to the field of automatic proofreading of Chinese texts. Background technique [0002] With the rapid development of information processing technology and the Internet, traditional text work is almost completely replaced by computers, text electronic publications such as e-books, e-newspapers, emails, and office documents are constantly emerging, and there are more and more errors in texts . At present, most of them use manual proofreading. The proofreading work is monotonous, labor-intensive, and inefficient. Manual proofreading can no longer meet the needs of text proofreading. Therefore, the study of automatic text proofreading has far-reaching significance for both theory and application. [0003] Automatic text proofreading is one of the main applications of natural language processing, and it is also a difficult problem in natura...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/232G06F40/289
Inventor 顾德之刘亮亮吴健康刘海波张再跃张晓如
Owner JIANGSU UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products