Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Digital book layout analysis method

A technology for digital book and layout analysis, applied in image analysis, image enhancement, image data processing, etc., can solve problems such as unrecognized flowcharts, problems in formula recognition, and incomplete illustration or illustration recognition, so as to improve the software processing effect. Effect

Active Publication Date: 2016-11-09
ZHEJIANG UNIV
View PDF6 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] (1) There is a problem with the formula recognition, and the complete formula cannot be extracted or the formula can be extracted as a text block
[0008] (2) Illustrations such as flowcharts cannot be recognized or the recognition of illustrations is incomplete
[0010] (4) The descriptive text next to the figure is not properly separated

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Digital book layout analysis method
  • Digital book layout analysis method
  • Digital book layout analysis method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0199] Below in conjunction with the method of the present invention, describe in detail the concrete steps that this example implements, take a certain page in the electronic book that CADAL library digital library scans as example here, as Figure 5 show, explain image 3 all the processes.

[0200] 1) Read the original image, perform grayscale processing on the color image, and read in the grayscale image

[0201] 2) Extract the edge of the picture. Create an 8*8 square structuring element to dilate the image. The image background is estimated by using the morphological opening operation imopen and closing operation imclose. The opening operation is to smooth the image contour, disconnect the narrow connection so as to realize the deletion of small protrusions, and construct a disc structure with a radius of 5 during the opening operation. The closing operation is equivalent to the reverse process of the opening operation, connecting the narrow gaps to make it an integr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a digital book layout analysis method. The digital book layout analysis method based on region segmentation is essentially a method for storing image segmentation and region classification in a JEPG or TIF format. The method is characterized by, to begin with, extracting image edges and carrying out over-segmentation region basic merging through a morphological algorithm and with a Gabor linear filter being combined to realize segmentation of a book region; then, carrying out region filling and ordering on segmented image blocks to reconstruct a reading order; and finally, carrying out feature extraction on an image region, and training a classifier to obtain attributes of each region, thereby realizing book region separation and improving identification correctness of an OCR engine and accuracy of book query.

Description

technical field [0001] The invention relates to a layout analysis technology for pictures existing in image form in a digital library, in particular to a layout analysis technology for region division and classification. Background technique [0002] With the research and development of computer and network technology, digital libraries are gradually developing from information-based processing and simple human-machine interface to knowledge-based processing and extensive understanding between machines, so that people can use computers and networks The ability to expand intellectual activities on a larger scale plays an extremely important role in all areas that need to communicate, disseminate, store, and utilize knowledge, including e-commerce, education, and telemedicine. [0003] Since the books in the CADAL digital library exist in the form of images, they need to be processed by OCR to recognize the text and analyze the layout before they can provide in-depth services....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06T7/00G06T5/30G06T5/50G06K9/46
CPCG06T5/30G06T5/50G06T2207/10008G06T2207/20024G06T2207/30176G06T2207/20212G06V30/414G06V10/40
Inventor 鲁伟明刘佳卉庄越挺吴飞魏宝刚
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products