Automatic construction method and device of text proofreading error word library

A technology of automatic construction and text proofreading, which is applied in character and pattern recognition, natural language data processing, special data processing applications, etc. It can solve the problems of relying on manual methods for collection of wrong vocabularies, limited thesaurus scale, and narrow coverage. Achieve the effects of shortening the construction period, good scalability, and improving construction efficiency

Inactive Publication Date: 2018-02-06
李晓妮
View PDF9 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides an automatic construction method and device for a text proofreading error lexicon, which is used to solve the shortcomings in the prior art that the collection of error lexicons relies too much on manual methods, low efficiency, narrow coverage, and limited thesaurus scale, and further improves The accuracy of text automatic proofreading

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic construction method and device of text proofreading error word library
  • Automatic construction method and device of text proofreading error word library
  • Automatic construction method and device of text proofreading error word library

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] The specific implementation of the present invention will be described below in conjunction with the accompanying drawings.

[0060] Such as figure 1 As shown, an automatic construction method of a text proofreading error lexicon comprises the following steps:

[0061] S101. First construct a large-scale correct thesaurus table, and number each word according to the sequence in the correct thesaurus table.

[0062] The correct lexicon includes Xinhua dictionary, Chinese word segmentation lexicon, idiom dictionary, ancient poems and sentences, and thesaurus in specific professional fields, such as diplomacy, computer, medicine, etc.

[0063] S102. Construct a series of character lists for each Chinese character in the computer system character library.

[0064] The character table of the structure includes a pinyin code table, a radical table and a Wubi font code table.

[0065] a. Create a pinyin code table for all Chinese characters, each of which has one or more pi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an automatic construction method and device of a text proofreading error word library. The method comprises the steps of constructing a large-scale correct word library table,and numbering each word according to the order in the correct word library table; regarding each Chinese character in the word library of a computer system, constructing a series of word tables; creating a word relevancy system matrix table; sequentially enumerating each word in the correct word library table, sequentially replacing other Chinese characters for each Chinese character in each word, and calculating word matching similarities between correct words and error words after one Chinese character is replaced; ranking the values of the word matching similarities in a gradually decreasing mode, setting a similarity threshold value of word matching, and supplementing words with the similarity values larger than the threshold value to the error word library. The method can overcome the defects that in the prior art, error word table collection excessively relies on manual modes, the efficiency is low, the coverage surface is narrow, and the scale of the word library is limited; the accuracy of automatic text proofreading can be improved.

Description

technical field [0001] The invention belongs to the field of word processing, and relates to a text automatic proofreading processing technology, in particular to an automatic construction method and device for an error lexicon for text proofreading. Background technique [0002] With the rapid development of modern laser phototypesetting technology and electronic publishing industry, how to ensure the correctness of the information conveyed has become one of the important aspects of research. At present, when people use computers for writing, editing and typesetting, some text errors will inevitably occur, such as multiple words, missing words, transposition, English word spelling mistakes, and irregular punctuation. Therefore, a special proofreading system is required to proofread the manuscript. From the perspective of long-term development, informatization is the trend of social development in the future. People are facing more and more electronic information and manusc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06K9/32G06K9/72
CPCG06F40/232G06V20/62G06V10/768G06V30/287
Inventor 李晓妮
Owner 李晓妮
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products