A method of constructing typo word knowledge base based on fuzzy matching and statistics

A technology of fuzzy matching and construction method, which is applied in computing, electrical digital data processing, and special data processing applications, etc. It can solve the problems of unsatisfactory text proofreading, low proofreading efficiency, long cycle, etc., and achieve high accuracy, high practicability, The effect of high effectiveness and accuracy

Active Publication Date: 2018-04-06
南方电网互联网服务有限公司
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] With the rapid development of information processing technology and the Internet, traditional text work is almost completely replaced by computers. Electronic texts such as e-books, e-newspapers, emails, and office documents, blogs, and microblogs have all become part of people's daily lives. However, there are more and more errors in the text, which brings great challenges to the proofreading work
Traditional manual proofreading has low efficiency, high intensity, and long cycle obviously cannot meet the needs of text proofreading

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method of constructing typo word knowledge base based on fuzzy matching and statistics
  • A method of constructing typo word knowledge base based on fuzzy matching and statistics

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] The present invention will be further described below in conjunction with the accompanying drawings.

[0034] figure 1 Among them, the typo word knowledge base construction method based on fuzzy matching and statistics of the present invention comprises the following steps:

[0035] (1) Segment the sentence of the corpus to obtain several word strings, which are arranged according to the order in the original corpus sentence, and merge the word strings according to the preset hash merge rules to obtain several merged word strings , according to the Chinese dictionary and the fuzzy matching algorithm to obtain the similar word candidate set of the merged word string;

[0036] (2) to a certain merged word, utilize above-mentioned word loose string to obtain the adjacent unit set of this merged word string and the adjacent unit set of all similar words in its similar word candidate set;

[0037] (3) Judging whether a merged word string is a typo word string according to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for constructing a typo word knowledge base based on fuzzy matching and statistics, which performs word segmentation on corpus sentences to obtain scattered word strings, combines the scattered strings of words according to the scattered string combination rules to obtain merged word strings, and utilizes according to Chinese dictionaries The fuzzy matching algorithm obtains the similar word candidate set of the merged word string; obtains the adjacent element set of the merged word string and the adjacent element set of all similar words in its similar word candidate set; according to the co-occurrence of each adjacent element set element of the merged word string in the corpus Frequency judges whether a merged word string is a typo word string, if the merged word string is a typo word string, then establishes the wrong word pair of the merged word string according to the frequency of occurrence of the adjacent element set elements of similar words in the corpus . The method for constructing the wrong word knowledge base of the present invention solves the problem of low correction accuracy caused by sparse data and only judging wrong words based on the Chinese dictionary in the prior art. The system responds quickly and the accuracy meets the actual application requirements, effectively High performance and accuracy.

Description

technical field [0001] The invention relates to natural language processing in the field of artificial intelligence computers, in particular to the field of automatic proofreading of Chinese texts, in particular to a method for constructing a typo word knowledge base based on fuzzy matching and statistics. Background technique [0002] With the rapid development of information processing technology and the Internet, traditional text work is almost completely replaced by computers. Electronic texts such as e-books, e-newspapers, e-mails, and office documents, blogs, and microblogs have all become part of people's daily lives. However, there are more and more errors in the text, which brings great challenges to the proofreading work. Traditional manual proofreading has low efficiency, high intensity, and long cycle obviously cannot meet the needs of text proofreading. [0003] Automatic text proofreading is one of the main applications of natural language processing, and it i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27
CPCG06F40/232G06F40/242G06F40/258G06F40/279
Inventor 刘亮亮刘海波吴健康顾德之张再跃张晓如
Owner 南方电网互联网服务有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products