Printed and handwritten mixed text line extraction system

A technology for extracting systems and text lines, applied in the field of printed and handwritten mixed text line extraction systems, can solve problems such as the loss of extracted stroke information, achieve the effects of improving robustness, reducing pixels, and improving versatility

Active Publication Date: 2018-09-14
WUYI UNIV
View PDF4 Cites 36 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of this, the present invention provides a printed and handwritten mixed text line extraction system, which overcomes the loss of extracted stroke information caused by oblique adhesion of handwritten text or printed and handwritten mixed text lines in the prior art or redundant stroke information, improve the controllability of pixels in the process of text line extraction, so as to achieve the purpose of extracting document image text lines quickly and efficiently

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Printed and handwritten mixed text line extraction system
  • Printed and handwritten mixed text line extraction system
  • Printed and handwritten mixed text line extraction system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] An embodiment of the present invention provides a system for extracting printed and handwritten mixed text lines.

[0030] like Figure 1-6 As shown, a printed and handwritten mixed text line extraction system includes text block area preprocessing and text line extraction.

[0031] Wherein, the specific steps of the preprocessing of the text block area are:

[0032] Step S11 , acquiring a document image; acquiring a digital document image by taking a photo of a paper document or scanning it with a scanner. For example, photograph cases and engineering technology archives to obtain corresponding document images; the document images to be processed can be heterogeneous non-Manhattan document layout images, and the layout can contain basic elements such as formulas, illustrations, and tables.

[0033] Step S12, perform skew correction on the document image; use the LSD algorithm to perform line segment detection on the document image, extract the text line reference lin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a printed and handwritten mixed text line extraction system, which comprises text block area preprocessing and text line extraction. Through the text block area preprocessing process firstly, a document image containing a printed or handwritten or mixed text is subjected to skew correction, and a text block area is acquired through layout analysis; the text block area is then subjected to connected domain clustering, the belonging text line label is distributed, touching character pixels are subjected to clustering segmentation and the belonging text line label is redistributed; and finally, through the text line label, extraction of a document image text line is realized. The defects of missing extracted stroke information or redundant stroke information introduction caused by inclined touching of the text line in a handwritten text or a printed and handwritten mixed text in the prior art can be overcome, the pixel controllability during the text line extraction process is improved, and the purpose of quickly and efficiently extracting the text line is thus achieved.

Description

technical field [0001] The invention relates to the technical field of text line extraction, and more specifically, relates to a system for extracting text lines mixed with printed and handwritten text. Background technique [0002] Document digital processing is an important field of image processing and pattern recognition research. Its task is to convert paper documents into digital images by taking pictures or scanning, and further analyze, understand and reconstruct the layout of digital document images to make them It becomes a fully editable and retrievable digital document, and has important application prospects in the digitization of photographic documents, notes, archives, bills, manuscript documents, etc. [0003] Segmenting text blocks into independent text lines is an important basic step in document image digitization, which largely affects subsequent tasks such as character recognition, text positioning, and keyword retrieval. Compared with printed text line...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06K9/36G06N3/04
CPCG06V30/32G06V30/414G06V10/247G06V10/20G06V30/10G06N3/045Y02D10/00
Inventor 应自炉朱健菲陈鹏飞陈俊娟甘俊英翟懿奎
Owner WUYI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products