Document image binarization method based on support vector machine

A support vector machine and document image technology, which is applied to computer components, instruments, character and pattern recognition, etc., can solve the problems of high computational complexity and the inability to solve the problem of binarization of low-quality document images.

Inactive Publication Date: 2016-11-09
HUBEI UNIV OF TECH
View PDF1 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Researchers at home and abroad have also proposed many other methods, such as background estimation method, local contrast method, stroke edge detection method, gradient normalization and saliency map method, texture analysis method, Laplace energy method, error diffusion method, spectral Clustering methods and hybrid algorithms, etc., most of which have relatively high computational complexity, and cannot well solve the problem of binarization of low-quality document images affected by degradation factors such as ink infiltration, page stains, and background textures, or Can only be applied to some specific scenes (such as uneven lighting conditions)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document image binarization method based on support vector machine
  • Document image binarization method based on support vector machine
  • Document image binarization method based on support vector machine

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0067] In order to facilitate those of ordinary skill in the art to understand and implement the present invention, the present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the implementation examples described here are only used to illustrate and explain the present invention, and are not intended to limit this invention.

[0068] please see figure 1 , a kind of document image binarization method based on support vector machine provided by the invention, comprises the following steps:

[0069] Step 1: Grayscale the color image (for grayscale images, this step can be omitted);

[0070] At present, researchers mainly use methods such as component weighted average, average value, and maximum value to grayscale color images. These methods are more based on modeling of human visual characteristics.

[0071] The present invention uses the minimum mean value method to grayscale the c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a document image binarization method based on a support vector machine, comprising the following eight steps: color image graying, dividing a document image into blocks, improving the local contrast of image blocks, feature parameter extraction, SVM threshold classification, image block splicing, stroke width estimation, and local binarization. A color image is grayed using a minimum mean method, and the obtained gray image is of color independence. The defined local contrast not only can compensate for the influence generated by the change in image brightness, but also takes into consideration the normalized contribution of all the pixels in the neighborhood of the image to the local contrast of the image. Through use of an SVM threshold classification method, the accuracy and reliability are high. Stroke width estimation using a line-by-line scanning method is of high robustness to the resolution change of the document image. Through the method, character stroke details can be better retained, and phenomena such as ink infiltration, page stain, texture background and uneven illumination can be well suppressed while effective character foreground segmentation is achieved.

Description

technical field [0001] The invention belongs to the technical fields of digital image processing, pattern recognition and machine learning, and relates to a document image binarization method, in particular to a support vector machine (SVM)-based low-quality document image binarization method. Background technique [0002] Document Analysis and Recognition (DAR) technology has been widely used in printed characters and formula recognition, handwritten character recognition, document image segmentation, video subtitle extraction, text information retrieval and other fields, mainly including image acquisition, preprocessing, binarization, layout Analysis, character recognition, indexing, etc. Image binarization is one of the key processing steps, which directly affects the performance of the DAR system. However, binarization of such low-quality document images is extremely challenging due to factors such as image contrast, ink smearing, page stains, or uneven lighting. [00...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/38G06K9/62
CPCG06V10/28G06F18/214
Inventor 熊炜赵诗云徐晶晶赵楠刘敏王改华李敏刘小镜吴俊驰
Owner HUBEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products