A Seed-Based Method for Generating Typos Confusion Sets
A technology for typos and confusion sets, applied in the field of natural language processing, which can solve the problems of unreasonable confusion sets, large workload, and high false positive rate of automatic proofreading systems.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0053] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.
[0054] Such as figure 1 , figure 2 Shown, the present invention is a kind of confusion set generation method based on seed typos, comprising the following steps:
[0055] Step 1) Create a typo confusion set map. According to the seed typo confusion set, a typo confusion set graph is established.
[0056] Step 2) Typo confusion sets are added automatically. Use the created typo confusion set map to discover the rules between typos and automatically add typos confusion sets.
[0057] Step 3) Automatic generation of homophone typos in the typo confusion set. Automatically add homophonic typos of Chinese characters.
[0058] Step 4) Automatic generation of non-homophone typos in the typo confusion set. According to features such as shape similarity and typo confusion set map, automatically add non-homophone typos of Chinese characters.
[...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


