Printed font character identification method based on Arabic character set

A character recognition and character set technology, applied in the field of character recognition, can solve the problems of uncertain character left and right borders, character width and height not available, difficult to extract, etc.

Inactive Publication Date: 2006-07-26
TSINGHUA UNIV
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The number of character strokes is small and the strokes are mainly composed of arcs, the stroke structure information is not rich and difficult to extract, there are many subsets of similar characters in the character set, the similarity is extremely high, the width and height of the characters are not consistent, and the left and right boundaries of the characters are inconsistent. Characters such as determinism, large font differences between different fonts, some fonts close to cursive handwriting, and small commonly used font sizes have brought great challenges to character recognition research based on Arabic character sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Printed font character identification method based on Arabic character set
  • Printed font character identification method based on Arabic character set
  • Printed font character identification method based on Arabic character set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0455] Embodiment 1: Multi-font and multi-font-size printed character recognition system based on Arabic character set

[0456] Based on the multi-font and multi-size printed character recognition system of the present invention as shown in Figure 14, the hardware equipment platform of the experiment is a scanner (model: Uniscan 1248US) and a common PC (CPU: Intel _ Pentium _ 42.40GHz; Memory: 512MB RAM; OS: Microsoft _ Windows _ XP) experiments were carried out on 1600 sets of printed documents collected in Uyghur, Kazakh, Kirgiz and Arabic. system, there are also a small amount generated by direct printing of Windows TrueType fonts. The fonts include most of the most commonly used, some less commonly used and a small number of uncommonly used fonts, and there are at least 6 fonts in each type of sample. The font size ranges from small five to first. The sample quality varies, and the ratio of normal, broken, and glued characters is about 2:1:1. After scanning input, te...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a printed font character identification method based on Arabic character set which comprises, extracting region information, character font information, and constituent part information unique for Arabic character set, carrying presorting, determining character class subset of the input character, extracting direction characteristic for reflecting the character stroke composition information, finally employing two steps of characteristic optimization processing including, characteristic dressing, eigen transformation by integrating linear discrimination analysis (LDA) and K-L transformation, finally proceeding categorization judgment through modified quadratic discriminating function (MQDF) statistical classification device.

Description

technical field [0001] A printed character recognition method based on an Arabic character set belongs to the field of character recognition. Background technique [0002] The characters of Uyghur, Kazak, Kirgiz and other ethnic minorities in my country are written using the characters in the Arabic character set system, and the composition rules and changes of writing forms of the characters are consistent with Arabic. Therefore, the recognition of characters such as Uighur, Kazakh, Kirgiz, and Arabic can be processed using a unified method. In the present invention, Uighur, Kazakh, Kirgiz, and Arabic character recognition are collectively referred to as character recognition based on the Arabic character set. Uighur, Kazakh, Kirgiz, Arabic and other characters written in the Arabic character set are composed of 30 to 40 basic letters. Each basic letter has 1-4 different writing forms depending on where it occurs in the word—initial form, middle form, final form, independ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/00G06K9/72
Inventor 丁晓青王华靳简明彭良瑞刘长松方驰哈力木拉提
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products