Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Spelling error correction method and device based on Chinese character pronunciation and shape similarity and electronic equipment

An error correction method and an error correction device technology, applied in the field of text error correction, can solve problems such as poor performance, enlarged retrieval range, and performance degradation, and achieve the effect of improving error correction efficiency and reducing the amount of calculation

Active Publication Date: 2021-06-01
HUNDSUN TECH
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] (1) The BK tree structure mainly supports natural word segmentation scenarios (word segmentation is easy to obtain in general search scenarios, and it is almost impossible to obtain correct word segmentation results in other scenarios), and the scope of use is limited;
[0005] (2) BK tree error correction requires real-time calculation of the Chinese character conversion cost between two Chinese character strings, and the performance is poor when the tree depth is deep;
[0006] (3) As the domain dictionary becomes larger and the BK tree becomes deeper, the performance will drop sharply;
[0007] (4) The retrieval range of BK tree increases sharply with the increase of the threshold of the number of wrong Chinese characters, and the performance will also drop sharply
Although BK tree similarity retrieval is a reasonable intervention method for Chinese misspellings, it has problems of insufficient performance and limited scope of use.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Spelling error correction method and device based on Chinese character pronunciation and shape similarity and electronic equipment
  • Spelling error correction method and device based on Chinese character pronunciation and shape similarity and electronic equipment
  • Spelling error correction method and device based on Chinese character pronunciation and shape similarity and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 2

[0130] According to the second aspect of the embodiment of the present disclosure, the embodiment of the present application also proposes a spelling error correction device 2 based on the similarity of the sound and shape of Chinese characters, such as image 3 shown, including:

[0131] Chinese character set generation unit 21, for constructing the sample Chinese character set that comprises Chinese character phonetic and shape information according to standard Chinese character database;

[0132] The Chinese character set matching unit 22 is used to calculate the similarity between any two Chinese characters corresponding to the phonetic and shape information of Chinese characters based on the conversion cost of Chinese characters in the sample Chinese character set, and construct the similarity of each Chinese character in the corresponding sample Chinese character set according to the obtained similarity results. collection of Chinese characters;

[0133] Chinese charact...

Embodiment 3

[0185] According to a third aspect of the embodiments of the present disclosure, this embodiment provides an electronic device, including:

[0186] processor; and

[0187] a memory for storing executable instructions of the processor;

[0188] Wherein, the processor is configured to execute the steps of the spelling error correction method based on the phonetic-form similarity of Chinese characters by executing the executable instructions.

[0189] According to the fourth aspect of the embodiments of the present disclosure, this embodiment provides a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to correct spelling errors based on the similarity of phonetic and shape of Chinese characters .

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a spelling error correction method and device based on Chinese character pronunciation and shape similarity and electronic equipment. The spelling error correction method comprises the steps: constructing a sample Chinese character set containing Chinese character pronunciation and shape information according to a standard Chinese character database; based on the Chinese character conversion cost in the sample Chinese character set, calculating the similarity of the corresponding Chinese character pronunciation and shape information between any two Chinese characters, and constructing a similar Chinese character set corresponding to each Chinese character in the sample Chinese character set according to the obtained similarity result; and acquiring candidate words associated with the target Chinese character, and screening by combining a numerical relationship between the similarity of the target Chinese character and the candidate words and a threshold value to obtain a replacement Chinese character after error correction of the target Chinese character. The Chinese character conversion cost calculation is only carried out in the initial process of pronunciation and shape editing, and the calculation among levels is not involved in the specific process, so that the Chinese character conversion cost is only taken out from a mapping dictionary of pre-loaded Chinese characters and dictionaries; the method can effectively reduce the huge calculation amount caused by continuous comparison from the top layer of the tree to the bottom in the BK tree Chinese error correction algorithm, and improves the error correction efficiency.

Description

technical field [0001] The present application relates to the field of text error correction, in particular to a spelling error correction method, device and electronic equipment based on the similarity of phonetic and shape of Chinese characters. Background technique [0002] In scenarios such as Chinese Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR), recognition errors may occur due to the similarity of Chinese characters in sound and shape. Generally, a large number of errors can be resolved by adding post-processing modules (Chinese spelling error correction) output by models such as ASR and OCR. [0003] In order to reduce the number of search traversals, a BK (Burkhard-Keller) tree structure is introduced. The BK tree constructs a tree structure based on the conversion cost of Chinese character strings between the correct lexicons, and then quickly searches for similar (Chinese character conversion cost) Chinese character strings based on t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/232G06F40/242G06K9/62
CPCG06F40/232G06F40/242G06F18/22
Inventor 林金曙娄东方王炯亮陈哲陈春旭
Owner HUNDSUN TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products