Form extraction method and device based on PDF (Portable Document Format) file
An extraction method and table technology, applied in the information field, can solve the problem of low accuracy of the extracted table and achieve the effect of improving accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0026] figure 1 A schematic flow chart of a method for extracting tables based on PDF files provided by an embodiment of the present invention, such as figure 1 shown, including:
[0027] 101. Analyze the PDF file to obtain the text information of each character and the line information of each line in the PDF file.
[0028] Wherein, the text information includes text character information and text position information; the line information includes line position information, line width and line length; the line position information includes line horizontal axis position and line vertical axis position.
[0029] Specifically, use PDFBox software to analyze the PDF file to obtain the text information in the PDF file; extract the line information in the PDF file according to the operator used to mark the end of the line in the PDF file.
[0030] For example: in the PDF box (PDFBox) software, the words and lines in the PDF file have been re-processed and encapsulated. Both tex...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 