Unlock instant, AI-driven research and patent intelligence for your innovation.

Segmentation of a word bitmap into individual characters or glyphs during an OCR process

A character and glyph technology, which is used to divide word bitmaps into single characters or glyph fields in the OCR process, which can solve the problems of difficult word segmentation into single symbols, poor image quality, font thickness, italic text, and character shapes.

Active Publication Date: 2013-01-09
MICROSOFT TECH LICENSING LLC
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in many cases it is difficult to segment words into individual symbols due to poor image quality, font weights, italicized text, character shapes, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Segmentation of a word bitmap into individual characters or glyphs during an OCR process
  • Segmentation of a word bitmap into individual characters or glyphs during an OCR process
  • Segmentation of a word bitmap into individual characters or glyphs during an OCR process

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] figure 1 An illustrative example of a system 5 for performing optical character recognition (OCR) of text images is shown. System 5 includes a data capture device (eg, scanner 10 ) that generates an image of document 15 . Scanner 10 may be an imager-based scanner that utilizes a charge-coupled device as an image sensor to generate an image. Scanner 10 processes the image to generate input data and sends the input data to a processing device (eg, OCR engine 20 ) for use in recognizing characters within the image. In this particular example, OCR engine 20 is incorporated into scanner 10 . However, in other examples, the OCR engine 20 may be a separate unit such as a stand-alone unit or a unit incorporated in another device such as a PC, a server, or the like.

[0017] The OCR engine 20 receives the text image as a bitmap of lines of text. The image may be a scanned image of text or a digital document such as a PDF or Microsoft Word document where input data is already...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

An image processing apparatus is provided that includes a character chopper component that segments words into individual characters in a bitmap of a textual image undergoing an OCR process. The Character chopper component is configured to produce a set of (possibly curved) chop-lines which divide a bitmap of any given word into its individual character or glyph candidates. Cases where an input bitmap contains two separate words are handled by marking a place where those words should be split. The character segmentation algorithm computes the set of vertically oriented, curved chop-lines by considering glyph and background colors in a given word bitmap. The set is filtered afterwards using various heuristics, in order to preserve those lines that indeed do separate a word's glyphs and minimize the number of those that do not.

Description

Background technique [0001] Optical character recognition (OCR) is a computer-based digital form of converting images of text into machine-editable text, generally among standard encoding schemes. This process eliminates the need to manually enter documents into computer systems. Many different problems can arise due to poor image quality, defects caused during the scanning process, and more. For example, conventional OCR engines are coupled to flatbed scanners that scan pages of text. Images produced by the scanner typically exhibit uniform contrast and brightness, reduced skew and distortion, and high resolution because the page is placed tightly on the scanning surface of the scanner. Therefore, the OCR engine can easily convert the text in the image into machine-editable text. However, when the image is of low quality in terms of contrast, brightness, skew, etc., the performance of the OCR engine will decrease and the processing time will increase due to processing all ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): H04N1/387H04N1/04H04N1/00G06K9/00G06V30/10
CPCG06K2209/01G06K9/342G06V30/10G06V30/15
Inventor D·尼耶姆切维奇
Owner MICROSOFT TECH LICENSING LLC