Method for parsing PDF table data and storage medium
A technology for tabular data and storage media, applied in the field of data analysis, can solve the problems of difficulty in judging the correlation between data rows, unrealistic character division, and difficulty in data and title correspondence, so as to improve accuracy and convenience. Sexual, significant effect, strong automatic effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0077] This embodiment mainly provides a method for parsing PDF table data, which is suitable for parsing tables in PDF format data, obtaining corresponding table data, and facilitating subsequent editing operations. If the data is cleaned at the front end, a large part of the bills and bills provided by the customer are in the table format PDF format. Through this embodiment, the table format PDF can be extracted into the corresponding CSV format, and automatically imported into the database for analysis.
[0078] Such as Figure 1-4 As shown, there are several existing common PDF forms. specific, figure 1 Corresponding single form; figure 2 Corresponds to random blank cells; image 3 Corresponding to the spread cell; Figure 4 Corresponding to multi-layer watermark and other forms. Based on the current existing PDF form parsing is relatively closed source, and this type of form data is purely character processing, it is difficult to achieve the correspondence between d...
Embodiment 2
[0114] This embodiment corresponds to Embodiment 1, and a corresponding computer-readable storage medium is provided, on which a computer program is stored, and when the program is executed by a processor, all the steps included in Embodiment 1 can be realized.
[0115] In summary, a method and a storage medium for parsing PDF form data provided by the present invention can realize accurate, convenient and automatic analysis of PDF form. Not only can it accurately analyze the data of a single table or multiple tables, but it can also accurately analyze random blank cells, double-page cells, and multi-layer watermark cells; it has strong practicability and a wide range of applications. Furthermore, the present invention analyzes based on character coordinates and line segment coordinates, which is different from the existing purely character-based processing. It not only achieves more accurate and convenient analysis, but also ensures the correspondence between data and titles; at...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com