Extraction method for characters in form document image

A document image and extraction method technology, which is applied in the fields of computer vision and image processing, can solve the problems of inflexibility, low recognition, and low versatility of characters, and achieve the effect of reducing impact and improving flexibility

Active Publication Date: 2013-08-21
SICHUAN UNIV
View PDF1 Cites 83 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to provide a method for extracting characters in a form document image, which solves

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Extraction method for characters in form document image
  • Extraction method for characters in form document image
  • Extraction method for characters in form document image

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0032] figure 1 An embodiment of a method for extracting characters in a form document image of the present invention is shown: a method for extracting characters in a form document image includes the following steps:

[0033] Step 1, converting the collected color table image into a grayscale image, performing Gaussian smoothing on the grayscale image, and removing noise in the image, the grayscale image is preferably 256 levels;

[0034] Step 2, use the edge detection operator to perform convolution operation on the image, and then use the maximum inter-class variance method to binarize the image,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of image processing and computer vision technologies, in particular to an extraction method for characters in a form document image. The extraction method includes a first step of extracting line segments in the image through edge detection and Hough transformation algorithm, a second step of estimating an inclined angle of the whole image according to direction distribution of the line segments and carrying out inclination correction on the image, a third step of connecting the line segments in the horizontal direction and the perpendicular direction, and locating table cells of a form, a fourth step of carrying out image binaryzation and segmenting a full line of characters in the table cells through a maximum between-cluster variance method, and extracting the characters in the table cells through a window sliding method, and a fifth step of carrying out restoration on deletion of strokes of the characters according to statistics features of frame lines of the table cells. The extraction method is good in flexibility and capable of effectively solving the problems of adhesion between the characters and overlap between the characters and form lines, and greatly reduces the influence of the adhesion and overlap on optical character recognition (OCR).

Description

technical field [0001] The invention relates to the technical fields of image processing and computer vision, in particular to a method for extracting characters from a form document image. Background technique [0002] Table is a common form of information representation, which is widely used in people's daily life and work. At present, the carrier of most form documents is still paper documents. The advantage of paper documents is that they are confidential, but it is difficult to manage and analyze information. With the development of information technology, computers are used to digitize a large amount of paper document information. It is an inevitable trend of the development of modern society. [0003] By taking or scanning images containing form data, using digital image processing technology to extract and identify the information content in the form is the main research direction of form document processing at home and abroad. The form recognition system usually i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/20G06K9/54
Inventor 王俊峰高琳姬郁林李虹
Owner SICHUAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products