Similar Chinese character classification method combining stroke codes with Chinese character dot matrixes

A classification method and a technology close to characters, which are applied in the fields of electrical digital data processing, character and pattern recognition, special data processing applications, etc., can solve the problems of heavy workload, time-consuming and laborious, etc., achieve efficiency improvement, ensure efficiency, and save time energy effect

Active Publication Date: 2017-04-26
KUNMING UNIV OF SCI & TECH
View PDF4 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the recognition of similar characters is mostly collected manually, with a large workload, time-consuming and labor-intensive

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Similar Chinese character classification method combining stroke codes with Chinese character dot matrixes
  • Similar Chinese character classification method combining stroke codes with Chinese character dot matrixes
  • Similar Chinese character classification method combining stroke codes with Chinese character dot matrixes

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] The technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0023] Such as figure 1 As shown, the present invention provides a kind of language processing method that is used for the classification of near-shaped characters and is divided into the following three steps:

[0024] 1. Download the UNICODE Chinese character stroke coding table from the Internet, which is a stroke order table of all 20902 Chinese characters (U+4E00~U+9FA5), partly as shown in Table 1.

[0025] Table 1 part of the UNICODE Chinese character stroke encoding table

[0026]

[0027]

[0028] ...

[0029] Chinese character ordinal value Unicode encoding stroke order beg 01499 6C42 1241344 nervous 01500 5FD1 1244544 Bo 01501 5B5B 1245521 car 01502 8ECA 1251112 just 01503 752B 1251124 box 01504 5323 1251125 Even 01505 66F4...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a similar Chinese character classification method combining stroke codes with Chinese character dot matrixes. The method comprises the steps of collecting statistics on corresponding stroke codes of Chinese characters, and classifying the Chinese characters based on the occurrence frequency of stroke structures to generate a data table, wherein each stroke component corresponds to a Chinese character set including the component; then screening the sets to filter the sets having shorter and longer stroke components, and adding the sets having longer stroke components to a similar Chinese character database; further processing the filtered Chinese character sets by adopting a Chinese character dot matrix comparison method, comparing the dot matrixes of Chinese characters in the same Chinese character set to filter the Chinese characters having low similarity, and adding the processed Chinese character sets to the similar Chinese character database, thus obtaining a similar Chinese character database including most Chinese characters, wherein when similar Chinese characters of one Chinese character are inquired, the similar Chinese characters can be obtained by inquiring the table of the Chinese character. The method improves the similar Chinese character classification efficiency, reduces the time for classification, and obtains relatively accurate similar Chinese character data.

Description

technical field [0001] The invention belongs to the field of language processing, and in particular relates to a method for classifying Chinese characters. Background technique [0002] Chinese characters are composed of several simple strokes, but because they are arranged and combined in two-dimensional space, a wide variety of Chinese characters with complex structures are formed. The various specific points and lines that make up the glyphs of Chinese characters are also the smallest structural unit of Chinese characters. According to the writing requirements of regular script, the stroke from pen down to pen up is called a stroke, collectively called a stroke, and the specific shape of a stroke is called a stroke. The resulting various radicals form many Chinese characters with similar morphological structures, which are called near characters. [0003] The recognition of shape-like characters involves font recognition. Glyph recognition serves all aspects of life, s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/22G06K9/62
CPCG06F40/129G06F18/241
Inventor 邵玉斌王逍翔
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products