Printed font character identification method based on Arabic character set

A character recognition and character set technology, applied in the field of character recognition, can solve problems such as lack of character width and height, large font differences, and insufficient stroke structure information

Inactive Publication Date: 2005-04-13
TSINGHUA UNIV
View PDF0 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The number of character strokes is small and the strokes are mainly composed of arcs, the stroke structure information is not rich and difficult to extract, there are many subsets of similar characters in the character set, the similarity is extremely high, the width and height of the characters are not consistent, and the left and right boundaries of the characters are inconsistent. Characters such as determinism, large font differences between different fonts, some fonts close to cursive handwriting, and small commonly used font sizes have brought great challenges to character recognition research based on Arabic character sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Printed font character identification method based on Arabic character set
  • Printed font character identification method based on Arabic character set
  • Printed font character identification method based on Arabic character set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0453] Embodiment 1: Multi-font and multi-font-size printed type character recognition system based on Web Arabic character set

[0454] Based on the multi-font multi-font-size printed character recognition system of the present invention, such as Figure 14 As shown, the hardware equipment platform of the experiment is a scanner (model: Uniscan 1248US) and a common PC (CPU: Intel  Pentium  4 2.40GHz; Memory: 512MB RAM; OS: Microsoft Windows  XP) experiments were carried out on a collection of 1600 sets of print documents in Uyghur, Kazakh, Kirgiz and Arabic, most of which were collected from today's major print U / K / K publishing systems and Arabic publications system, there are also a small amount of direct printing from Windows TrueType fonts. Fonts include most of the most commonly used fonts, some of the less commonly used fonts and a few less commonly used fonts. There are at least 6 fonts in each type of sample. The font size is from the fifth to the first. The ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a printed font character identification method based on Arabic character set which comprises, extracting region information, character font information, and constituent part information unique for Arabic character set, carrying presorting, determining character class subset of the input character, extracting direction characteristic for reflecting the character stroke composition information, finally employing two steps of characteristic optimization processing including, characteristic dressing, eigen transformation by integrating linear discrimination analysis (LDA) and K-L transformation, finally proceeding categorization judgment through modified quadratic discriminating function (MQDF) statistical classification device.

Description

technical field [0001] A printed character recognition method based on Arabic character set belongs to the field of character recognition. Background technique [0002] The characters of the Uyghur, Kazakh, Kirgiz and other ethnic minorities in my country are written using the characters in the Arabic character set system, and the composition rules and writing forms of the characters are consistent with Arabic. Therefore, the recognition of Uyghur, Kazakh, Kirgiz, Arabic and other characters can be processed by a unified method. In the present invention, Uyghur, Kazakh, Kirgiz, and Arabic character recognition are collectively referred to as character recognition based on Arabic character sets. Uyghur, Kazakh, Kirgiz, Arabic and other characters written in the Arabic character set are all composed of 30 to 40 basic letters. Depending on where it appears in the word, each basic letter has 1-4 different writing forms - initial, middle, final, independent. Therefore, in actu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06K9/72
Inventor 丁晓青王华靳简明彭良瑞刘长松方驰
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products