Nested table extraction method and device, and storage medium
Patent Information
- Authority / Receiving Office
- CN Β· China
- Current Assignee / Owner
- SUZHOU AUNBOX SOFTWARE CO LTD
- Publication Date
- 2021-04-16
Smart Images

Figure 1 
Figure 2 
Figure 3
Abstract
Description
technical field
[0001] The embodiments of the present application relate to office and network data acquisition technologies, and in particular to a method and device for extracting nested tables, and a storage medium. Background technique
[0002] At present, when performing text recognition on non-editable text such as PDF text, the recognition and extraction method for the text part is relatively mature, and the accuracy of recognition and extraction is relatively high. However, when the non-editable text contains tables and other table-like content , the recognition of the table structure itself is quite poor, such as intermittent and uneven lines in the recognized table, which seriously affects the recognition experience for non-editable text, causing users to waste a lot of time repairing the recognized table structure , leading to a rather low processing efficiency and a poor user experience. Contents of the invention
[0003] In view of this, embodiments of the pr...