Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for extracting hidden forms

A form and implicit technology, applied in the field of data processing, can solve problems such as lack of processing methods

Active Publication Date: 2021-02-12
ZHONGKE DINGFU BEIJING TECH DEV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the problem that the existing PDF document extraction technology lacks corresponding processing methods for extracting form data of PDF documents, the embodiment of the present invention provides a method and device for extracting hidden forms

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for extracting hidden forms
  • Method and device for extracting hidden forms
  • Method and device for extracting hidden forms

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] In order to make the object, technical solution and advantages of the present invention clearer, the implementation manner of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0029] Please refer to Figure 1A , which shows a method flow chart of a method for extracting implicit forms provided by an embodiment of the present invention. The hidden form extraction method may include the following steps:

[0030] Step 101, parsing the target document to obtain each character in the target document and the coordinates corresponding to each character.

[0031] Optionally, the target document is a PDF document or a picture.

[0032] The target document is parsed sequentially by page number, each page of the target document is traversed, and each character of each page in the target document and the coordinates corresponding to each character are obtained.

[0033] Step 102, according to the coordinates correspon...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a hidden table extraction method and device and belongs to the technical field of data processing. The method comprises the steps of determining characters of which distance satisfies a preset approach condition as the characters in the same hidden table according to a coordinate corresponding to each character, and dividing the characters in the same hidden table into thesame character set; determining a table cell range corresponding to each character set according to the character coordinates corresponding to the characters in each character set; and generating explicit tables according to the characters contained in each character set, the coordinate corresponding to each character, and the table cell range corresponding to each character set. The problem thatan existing PDF (Portable Document Format) document extraction technology is lack of corresponding processing modes for the extraction of the table data of PDF documents is solved. The effect of determining the table cell ranges of the hidden tables according to the coordinates of the characters in the hidden tables of the target documents and generating the explicit tables according to the determined table cell ranges is achieved.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a method and device for extracting hidden tables. Background technique [0002] With the rapid development of computer and Internet technology, the application of Portable Document Format (PDF) is becoming more and more extensive. [0003] Since the original design purpose of PDF is only to display documents and print documents, it does not have the function of communicating and interacting with other computer programs. Therefore, the data contained in the PDF document can only be used by other computer programs through the corresponding extraction technology of the PDF document. [0004] PDF documents are mainly composed of data such as images, tables, and characters. The existing PDF document extraction technology can basically extract the character data in the PDF document accurately, but for the extraction of the table data in the PDF document, there is no correspon...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/18G06F40/12
Inventor 于闪闪张青程剑华蒋宏飞晋耀红杨凯程
Owner ZHONGKE DINGFU BEIJING TECH DEV