Optical character recognization
An optical font and recognition method technology, applied in the field of optical character recognition, can solve the problems of inability to use contextual font information, inability to identify single word font information, and many different types.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0192] Embodiment 1 accurately calculates the X-height value of an English word, and reliably identifies the interval type (ascending letter, descending letter, full height or X-height) of an English word.
[0193] This embodiment improves the projection method and connected cell (CC) method, and combines them to improve the accuracy of X-height value calculation.
[0194] It is well known that vertical projection profiles as disclosed in Optical font recognition from projection profiles (Electronic publishing, VOL.6(3), 249-260 (September 1993)) can only handle words or lines of text with ideal vertical projection profiles , and its performance deteriorates when dealing with short words and distinguishing between X-height and uppercase characters. This method may fail when the text lines are skewed. In addition, it is difficult to identify the interval types of English words only by the vertical projection distribution, because sometimes the projection distributions of words...
Embodiment 2
[0303] The second embodiment pertains to a priori font recognition at the word level. It uses interval type information to classify English words into four types, each type has a different dictionary. It can identify font information (glyph, serif, point, slope and spacing) of English words with higher accuracy and speed.
[0304] It supports recognition of at least 10 glyphs, much more than current popular OCR software with document layout recovery function, such as Omnipage, FineReader, etc.
[0305] attached Figure 11 Shown is the main flow chart of embodiment 2.
[0306] in the attached Figure 11 In step 100, the height of each word image is normalized to WordHeight (here WordHeight=35) by bilinear interpolation, so that any size of word can be processed. This is better than normalizing according to the X height; because sometimes the X height value cannot be obtained very accurately, so it will affect the subsequent feature extraction and separation processing.
[...
Embodiment 3
[0334] Embodiment 3 adopts the word pair mechanism in the OFR process of the line image. The word pair mechanism is based on the font classification method of English words and simultaneously considers the characteristics of the font distribution in the actual English text. It uses a two-stage result conditioning technique based on contextual font information, which enables greater precision and neater output within lines of text. An accurate size recognition method is used in this embodiment.
[0335] It supports the recognition of 10 glyphs, much more than ordinary OCR software (such as Omnipage, FineReader, etc.) with file layout recovery function. It can achieve much higher accuracy (both font style and font size) than other software.
[0336] attached Figure 15 Shown is the main flow chart of Embodiment 3.
[0337] in the attached Figure 15 In step 100, the X-height values and interval types of all words in the row are first calculated. Preferably use the X heigh...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 