Method and device for extracting form from portable electronic document

A technology of electronic documents and forms, which is applied in the fields of electronic digital data processing, instruments, calculations, etc., and can solve problems such as difficulty in extracting parts and forms

Inactive Publication Date: 2010-09-15
RICOH KK
View PDF2 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But extracting components from portable electronic documents is still a difficult job
For example, in the format specification of PDF documents, there is no concept of tables and corresponding components. Tables are composed of line segments and text, which makes it very difficult to extract tables from PDF

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for extracting form from portable electronic document
  • Method and device for extracting form from portable electronic document
  • Method and device for extracting form from portable electronic document

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. If it is considered that the detailed description of some related prior art may obscure the gist of the present invention, the detailed description thereof will not be provided here.

[0021] figure 1 A block diagram of an apparatus 100 for extracting tables from portable electronic documents according to an embodiment of the present invention is shown. like figure 1 As shown, the apparatus 100 for extracting a form may include: a command acquisition unit 110, configured to parse the content of a portable electronic document to acquire commands related to the form; a line extraction unit 120, configured to extract lines and line positions by processing these commands ; The table extraction unit 130 is used to analyze the positional relationship of the lines to extract the table.

[0022] figure 2 shows that according to an embodiment of the p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a device and a method for extracting a form from a portable electronic document. The device for extracting the form comprises a command acquiring unit, a line extracting unit and a form extracting unit, wherein the command acquiring unit is used for analyzing the content of the portable electronic document so as to acquire commands related to the form; the line extracting unit is used for extracting lines and the positions of the lines by processing the commands; and the form extracting unit is used for analyzing the position relation of the lines to extract the form. The device and the method can be used for automatically extracting the form from the portable electronic document.

Description

technical field [0001] The present invention relates generally to document processing and document understanding, and in particular to extracting tables in portable electronic documents. Background technique [0002] Portable electronic documents, such as PDF, PS, etc., have the characteristics of display format and attributes unchanged in various system platforms, that is, they are portable, and are widely used in daily office work. But extracting components from portable electronic documents is still a difficult job. For example, in the format specification of a PDF document, there is no concept of a table and corresponding components. The table is composed of line segments and text, which makes it very difficult to extract the table from the PDF. It can be expected that extracting tables from electronic documents will be widely used in the fields of document reuse and document retrieval. [0003] US Patent 6801673 B2 extracts words in PDF documents. This patent extract...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/24
Inventor 杜成长谷川史裕井上浩一
Owner RICOH KK
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products