Unlock instant, AI-driven research and patent intelligence for your innovation.

Optical character recognization

An optical font and recognition method technology, applied in the field of optical character recognition, can solve the problems of inability to use contextual font information, inability to identify single word font information, and many different types.

Inactive Publication Date: 2009-10-14
CANON KK
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Obviously, for a text line or block of a single font, the first method is easy to obtain font information, but cannot identify the font information of a single word, and different words in a line can have different fonts (font style, font size, point number, slope, etc.)
The second method can identify different font information for each word in a text line, but it cannot take advantage of contextual font information, and usually they are not completely irrelevant
If the region is too small, there may not be enough information contained in it for classification; however, if it is too large, there may be too many different types mixed in the same region

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Optical character recognization
  • Optical character recognization
  • Optical character recognization

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0192] Embodiment 1 accurately calculates the X-height value of an English word, and reliably identifies the interval type (ascending letter, descending letter, full height or X-height) of an English word.

[0193] This embodiment improves the projection method and connected cell (CC) method, and combines them to improve the accuracy of X-height value calculation.

[0194] It is well known that vertical projection profiles as disclosed in Optical font recognition from projection profiles (Electronic publishing, VOL.6(3), 249-260 (September 1993)) can only handle words or lines of text with ideal vertical projection profiles , and its performance deteriorates when dealing with short words and distinguishing between X-height and uppercase characters. This method may fail when the text lines are skewed. In addition, it is difficult to identify the interval types of English words only by the vertical projection distribution, because sometimes the projection distributions of words...

Embodiment 2

[0303] The second embodiment pertains to a priori font recognition at the word level. It uses interval type information to classify English words into four types, each type has a different dictionary. It can identify font information (glyph, serif, point, slope and spacing) of English words with higher accuracy and speed.

[0304] It supports recognition of at least 10 glyphs, much more than current popular OCR software with document layout recovery function, such as Omnipage, FineReader, etc.

[0305] attached Figure 11 Shown is the main flow chart of embodiment 2.

[0306] in the attached Figure 11 In step 100, the height of each word image is normalized to WordHeight (here WordHeight=35) by bilinear interpolation, so that any size of word can be processed. This is better than normalizing according to the X height; because sometimes the X height value cannot be obtained very accurately, so it will affect the subsequent feature extraction and separation processing.

[...

Embodiment 3

[0334] Embodiment 3 adopts the word pair mechanism in the OFR process of the line image. The word pair mechanism is based on the font classification method of English words and simultaneously considers the characteristics of the font distribution in the actual English text. It uses a two-stage result conditioning technique based on contextual font information, which enables greater precision and neater output within lines of text. An accurate size recognition method is used in this embodiment.

[0335] It supports the recognition of 10 glyphs, much more than ordinary OCR software (such as Omnipage, FineReader, etc.) with file layout recovery function. It can achieve much higher accuracy (both font style and font size) than other software.

[0336] attached Figure 15 Shown is the main flow chart of Embodiment 3.

[0337] in the attached Figure 15 In step 100, the X-height values ​​and interval types of all words in the row are first calculated. Preferably use the X heigh...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an optical character recognizing method and device. On the one hand, the invention provides an optical font recognizing method and device: dividing words inputted into text images into word pairs, respectively recognizing longer and shorter words in the corresponding word pairs, regulating font information of the words according to font information of adjacent words and according to the rough regulating step of regulating font information of a row by font information in the row, respectively, and additionally recognizing sizes of words in the row. And the invention also provides a classification course-based font recognizing method and device, and a method and device for identifying interval types and calculating X height by integrating projection method with connected unit method.

Description

technical field [0001] The present invention relates to an optical character recognition method and equipment, more specifically, to a method and equipment for identifying character font types in an optical character recognition system. Background technique [0002] Optical character recognition (OCR) systems have been widely used. Font information such as font style, slant, point weight and font size have been used in conventional OCR systems to improve the performance of OCR, and font information is also beneficial to the performance of file structure analysis and information recovery. [0003] There are two methods available for font recognition today: [0004] -Extract global features from textual entities (words, lines, paragraphs). This approach is suitable for a priori font recognition, where fonts are identified without any knowledge of letter classes. [0005] - Extract local features from individual letters. This approach can benefit substantially from knowledg...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/68G06K9/54
Inventor 伊晓晶谢文俊李献
Owner CANON KK