Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Spelling error correction method, device and electronic device based on similarity of Chinese character sound and shape

An error correction method and similarity technology, applied in the field of text error correction, can solve problems such as poor performance, enlarged retrieval range, and performance degradation, and achieve the effect of improving error correction efficiency and reducing the amount of calculation

Active Publication Date: 2021-10-22
HUNDSUN TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] (1) The BK tree structure mainly supports natural word segmentation scenarios (word segmentation is easy to obtain in general search scenarios, and it is almost impossible to obtain correct word segmentation results in other scenarios), and the scope of use is limited;
[0005] (2) BK tree error correction requires real-time calculation of the Chinese character conversion cost between two Chinese character strings, and the performance is poor when the tree depth is deep;
[0006] (3) As the domain dictionary becomes larger and the BK tree becomes deeper, the performance will drop sharply;
[0007] (4) The retrieval range of BK tree increases sharply with the increase of the threshold of the number of wrong Chinese characters, and the performance will also drop sharply
Although BK tree similarity retrieval is a reasonable intervention method for Chinese misspellings, it has problems of insufficient performance and limited scope of use.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Spelling error correction method, device and electronic device based on similarity of Chinese character sound and shape
  • Spelling error correction method, device and electronic device based on similarity of Chinese character sound and shape
  • Spelling error correction method, device and electronic device based on similarity of Chinese character sound and shape

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 2

[0130] According to the second aspect of the embodiment of the present disclosure, the embodiment of the present application also proposes a spelling error correction device 2 based on the similarity of the sound and shape of Chinese characters, such as image 3 shown, including:

[0131] Chinese character set generation unit 21, for constructing the sample Chinese character set that comprises Chinese character phonetic and shape information according to standard Chinese character database;

[0132] The Chinese character set matching unit 22 is used to calculate the similarity between any two Chinese characters corresponding to the phonetic and shape information of Chinese characters based on the conversion cost of Chinese characters in the sample Chinese character set, and construct the similarity of each Chinese character in the corresponding sample Chinese character set according to the obtained similarity results. collection of Chinese characters;

[0133] Chinese charact...

Embodiment 3

[0185] According to a third aspect of the embodiments of the present disclosure, this embodiment provides an electronic device, including:

[0186] processor; and

[0187] a memory for storing executable instructions of the processor;

[0188] Wherein, the processor is configured to execute the steps of the spelling error correction method based on the phonetic-form similarity of Chinese characters by executing the executable instructions.

[0189] According to the fourth aspect of the embodiments of the present disclosure, this embodiment provides a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to correct spelling errors based on the similarity of phonetic and shape of Chinese characters .

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the present application proposes a spelling error correction method, device, and electronic device based on the similarity of Chinese character sound and shape, including constructing a sample Chinese character set containing Chinese character sound and shape information according to a standard Chinese character database; calculating any Chinese character conversion cost based on the sample Chinese character set The similarity between the two Chinese characters corresponds to the phonetic and shape information of the Chinese character, and according to the obtained similarity results, a similar Chinese character set corresponding to each Chinese character in the sample Chinese character set is constructed; the candidate word associated with the target Chinese character is obtained, and the target Chinese character and the candidate word are combined The numerical relationship between the similarity and the threshold is screened to obtain the replacement Chinese character after error correction of the target Chinese character. Since the calculation of the conversion cost of Chinese characters only occurs in the initial process of phonetic and shape editing, the specific process does not involve the calculation between levels, it only needs to be taken from the pre-loaded mapping dictionary between Chinese characters and dictionaries; it can effectively reduce the BK tree Chinese error correction algorithm The huge amount of calculations brought about by continuous comparison from the top of the tree improves the efficiency of error correction.

Description

technical field [0001] The present application relates to the field of text error correction, in particular to a spelling error correction method, device and electronic equipment based on the similarity of phonetic and shape of Chinese characters. Background technique [0002] In scenarios such as Chinese Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR), recognition errors may occur due to the similarity of Chinese characters in sound and shape. Generally, a large number of errors can be resolved by adding post-processing modules (Chinese spelling error correction) output by models such as ASR and OCR. [0003] In order to reduce the number of search traversals, a BK (Burkhard-Keller) tree structure is introduced. The BK tree constructs a tree structure based on the conversion cost of Chinese character strings between the correct lexicons, and then quickly searches for similar (Chinese character conversion cost) Chinese character strings based on t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/232G06F40/242G06K9/62
CPCG06F40/232G06F40/242G06F18/22
Inventor 林金曙娄东方王炯亮陈哲陈春旭
Owner HUNDSUN TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products