Tibetan historical document text line segmentation method based on baseline estimation

A technology of historical documents and baseline estimation, applied in the field of image processing, can solve the problems of inaccurate positioning and segmentation, inability to handle curved text lines, and only estimate the approximate position, etc., to achieve high segmentation accuracy

Active Publication Date: 2018-02-23
BEIJING UNIV OF TECH
View PDF7 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method has two disadvantages in dealing with Tibetan historical documents: (1) It can only estimate the approximate position of the text line in the document, and cannot deal with the curved text lines that exist in a large number of Tibetan historical documents
(2) For the cohesive parts in Tibetan historical documents, traditional projection-based segmentation methods cannot accurately locate and segment

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Tibetan historical document text line segmentation method based on baseline estimation
  • Tibetan historical document text line segmentation method based on baseline estimation
  • Tibetan historical document text line segmentation method based on baseline estimation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0048] The flowchart of the method involved in the present invention is as figure 1 shown, including the following steps:

[0049] Step 1, extract the left partial image of the input image.

[0050] Extract the left 1 / 4 part of the image from the input Tibetan historical document image to analyze and extract the baseline position and line number of the text line, and name the image as image A.

[0051] Step 2, remove Tibetan vowel nodes and some prominent strokes.

[0052] Divide the input image into image blocks through a sliding window of size N*M, where the width N is the width of the Tibetan character D in the image, and the length M is twice the width N. like figure 2 As shown in , select 80 image blocks with baselines at the top as templates, and use the principal component analysis (PCA) method to obtain their 13-dimensional feature...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a Tibetan historical document text line segmentation method. The method comprises the steps that the image of the left part of a Tibetan historical document is extracted; Tibetan vowel sound nodes and certain prominent strokes are removed; the starting position information of the baselines of the Tibetan text lines and the number of text lines are acquired; according to the starting position of the baselines, the baselines are established from left to right; during the establishment of the baselines, the baselines needs to be dynamically adjusted according to the pixelvalues of the surrounding points; the estimated baselines are used, and a communication area analysis method is used to determine the position of an adhesion area from two baselines for segmentation;and finally text lines are separated. According to the invention, the Tibetan historical document text line segmentation method based on baseline estimation is more suitable for the segmentation ofthe text lines of the Tibetan historical document, and has more segmentation precision than a traditional technology based on projection segmentation; and compared with a method based on projection segmentation, the method has the advantage that the segmented text lines are more correct.

Description

technical field [0001] The invention relates to an image processing method, in particular to a text line segmentation method of a Tibetan text image. Background technique [0002] Text is an important carrier of human development, one of the main media for information transmission, and one of the important ways for people to record history. Tibetan is my country's first national script with international standards, and it is also one of the oldest scripts in the world. Tibetan historical documents preserve the essence of Tibetan cultural thought and are the precious wealth of human cultural thought. In order to protect this ancient and precious historical and cultural heritage and facilitate people to consult according to the content of the text, converting images of Tibetan ancient books into text is an important method to protect Tibetan historical documents. [0003] Generally speaking, the transformation of ancient book images into computer-readable text needs to go th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06T7/11G06T7/194G06T3/00
CPCG06T3/0012G06T7/11G06T7/194G06T2207/30176
Inventor 段立娟李颜兴
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products