Device and method for extracting composite graph in format document

A technology for a layout document and an extracting device, which is applied in the field of the extraction of compound graphs in a layout document, and an extracting device for compound graphs in a layout document, can solve the problems of redundancy, unfavorable normal display of the complex graph, and difficulty in meeting actual needs, and achieve accurate Extracted effects

Active Publication Date: 2015-02-11
NEW FOUNDER HLDG DEV LLC +2
View PDF6 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In addition, in the layout document, the composite graph is composed of multiple sub-images, a large number of path operations, text primitives and other sub-objects, which cannot be correctly extracted as a complete composite graph in the layout structure analysis of reverse engineering
Therefore, the layout document not only requires a large number of paths to describe, resulting in a large degree of redundancy, but also is not conducive to the normal display of composite graphs when the layout document is streamlined, and it is difficult to meet the growing real needs of digital reading

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Device and method for extracting composite graph in format document
  • Device and method for extracting composite graph in format document
  • Device and method for extracting composite graph in format document

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] In order to understand the above-mentioned purpose, features and advantages of the present invention more clearly, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments can be combined with each other.

[0037] In the following description, many specific details are set forth in order to fully understand the present invention, but the present invention can also be implemented in other ways different from those described here, therefore, the present invention is not limited to the specific embodiments disclosed below limit.

[0038] figure 1 A block diagram of a device for extracting a composite graph in a formatted document according to an embodiment of the present invention is shown.

[0039] like figure 1 As shown, the device 100 for extracting a composit...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An extraction device for the composite graph in a fixed layout document comprising: a document parsing unit, for parsing the fixed layout document, and determining the primitives of the fixed layout document and their types; a layer generation unit, for extracting text primitives so as to form a text layer, and using the rest non-text primitives to form a non-text layer; a page analysis unit, for processing the text layer and the non-text layer with page analyses respectively; a block generation unit, for generating a text block in the text layer and a graph block in the non-text layer; a correlation block determination unit, for determining text blocks correlating to every graph block and merging those correlated text blocks and graph blocks into a composite graph block; an identifier storage unit, for storing the identifiers of all the primitives contained in the composite graph block.

Description

technical field [0001] The present invention relates to the technical field of format conversion of electronic documents, in particular to a device for extracting composite images in formatted documents and a method for extracting composite images in formatted documents. Background technique [0002] To convert paper documents into electronic documents, most of them use scanner scanning or camera shooting to obtain digital images of documents, and after a series of image processing, the characters are segmented out and input into OCR (Optical Character Recognition, Optical Character Recognition )system. The layout documents directly generated by document processing software, such as typesetting software, are replacing the image documents converted from paper documents as the main source of documents for digital publications. [0003] The automatic extraction of structural information mainly includes layout analysis and layout understanding, and its research is limited to th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/46
CPCG06F17/211G06V30/413G06V30/414
Inventor 许灿辉汤帜陶欣史操
Owner NEW FOUNDER HLDG DEV LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products