Stroke width figure based method for extracting Chinese character data from image

A stroke width and data extraction technology, applied in character and pattern recognition, instruments, computer parts, etc., can solve the problem of difficult to extract stroke width information, instability, and inability to accurately calculate the stroke width information at the corner of the stroke And other issues

Active Publication Date: 2015-05-06
TONGJI UNIV
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Most of the methods for extracting stroke width information are to scan the image in the horizontal and vertical directions respectively. If there is a pair of color value mutations, the clustering between the color value mutation pixels can be calculated as the stroke width information. This method handles text extraction in complex situations, which is unstable, and often leads to misleading or missing.
Another method is to use the stroke width conversion operator to detect the text in the image, that is, to find the stroke width of each stroke edge point by emitting rays along the gradient direction. This method cannot accurately calculate the stroke width information at the corner of the stroke. , only approximate stroke width information can be extracted, and it is difficult to extract real stroke width information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Stroke width figure based method for extracting Chinese character data from image
  • Stroke width figure based method for extracting Chinese character data from image
  • Stroke width figure based method for extracting Chinese character data from image

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0047] Select a low-quality image I containing English characters and uneven illumination, the color feature space is RGB, and set the brightness threshold tr c =0.9, the number of clusters k based on Euclidean distance color clustering E = 3, the number of clusters k based on cosine similarity color clustering C =3.

[0048] Such as Figure 1-3 Shown, a kind of character data extraction method in the image based on stroke width graph is characterized in that, comprises the following steps:

[0049] S1. Read in the color image I, use the mean value clustering algorithm to cluster the colors, extract connected domains from the clustered image, and obtain binary images corresponding to all connected domains to form a first binary image sequence i C =1,...,n C , where n C is the number of connected domains;

[0050] The specific steps of clustering the colors using the mean value clustering algorithm described in step S1 are as follows,

[0051] 11) The image correspondi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a stroke width figure based method for extracting Chinese character data from an image. The method comprises the steps of reading a colored image; clustering colors by the mean clustering algorithm to obtain a first binary image sequence; performing the edge detection algorithm and the morphological connected domain analysis method to obtain a second binary image sequence; filtering the combined sequence for the first time through a geometric filter to obtain a third binary image sequence; calculating a stroke width figure of the third binary image sequence; filtering the third binary image sequence for the second time according to the stroke width figure to obtain a fourth binary image sequence; overlapping all images in the fourth binary image sequence to obtain an extracted character result. Compared with the prior art, the method has the advantages that the distance of the color clustering algorithm can be adaptively selected by determining the image brightness value; the non-uniform illumination and other degradation phenomenon can be treated well; the traditional stroke width calculation method is improved to improve the performance of the character extracting technology.

Description

technical field [0001] The invention relates to the technical fields of image processing and computer vision, in particular to a method for extracting character data in an image based on a stroke width map. Background technique [0002] The text in the image plays an important role in understanding the content of the image, and the accuracy of text extraction in the image directly affects the subsequent processing results of the automatic text processing system. In recent years, text extraction from images has made great progress. However, in the process of text extraction from images, it has encountered many problems, such as blurred images, uneven illumination, complex backgrounds, etc. Both are bottlenecks restricting the practical application of automatic text extraction technology in images, and they are also hot spots and difficulties in the research of automatic text extraction technology in images. [0003] In recent decades, many researchers at home and abroad have...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/46G06K9/20
CPCG06V10/267G06V30/287G06F18/23213
Inventor 刘春梅
Owner TONGJI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products