A method for analyzing the reading order of formatted documents in electronic files
A technology of reading order and layout documents, applied in the field of information, which can solve problems such as ambiguous block division
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0030] In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with embodiments and drawings.
[0031] Such as figure 1 Shown is the method flow of the reading sequence analysis of electronic format files, including the following steps:
[0032] Extract the original information in the PDF file;
[0033] Identify the header and footer, and merge the adjacent text content to get the line content;
[0034] Perform block merging on the text line content to obtain the text block content;
[0035] Combine adjacent pictures to obtain the content of the picture block;
[0036] Analyze the path information to obtain the dividing line in the horizontal direction;
[0037] Project the content of the text block and the image block in the X direction to obtain the content of the horizontally separated block;
[0038] Take text block content, image block content, horizontal dividi...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


