The invention discloses a printed and handwritten mixed text line extraction
system, which comprises text block area preprocessing and text line extraction. Through the text block area preprocessing process firstly, a document image containing a printed or handwritten or mixed text is subjected to
skew correction, and a text block area is acquired through
layout analysis; the text block area is then subjected to connected domain clustering, the belonging text line
label is distributed, touching character pixels are subjected to clustering segmentation and the belonging text line
label is redistributed; and finally, through the text line
label, extraction of a document image text line is realized. The defects of missing extracted
stroke information or redundant
stroke information introduction caused by inclined touching of the text line in a handwritten text or a printed and handwritten mixed text in the prior art can be overcome, the pixel
controllability during the text line extraction process is improved, and the purpose of quickly and efficiently extracting the text line is thus achieved.