Method for extracting structured information of continuous page format document
Patent Information
- Authority / Receiving Office
- CN · China
- Current Assignee / Owner
- 北京众信博雅科技有限公司
- Publication Date
- 2020-01-17
Smart Images

Figure 1 
Figure 2
Abstract
Description
technical field
[0001] The invention relates to the field of information extraction of format documents, in particular to a method for extracting structured information of continuous page format documents. Background technique
[0002] The layout document format is an electronic document format with a fixed layout rendering effect. The presentation of the layout document has nothing to do with the device. When reading, printing or printing on various devices, the layout rendering results are consistent. Format documents are mainly used in the release, dissemination and archiving of written documents. Common layout document formats include PDF, CEBX, OFD, etc. The layout document format defines the layout presentation data of multiple pages, the presentation position, color, font size and other information of the internal objects (text, image, graphics, etc.) Format to present document content for human readability. The layout document stores unstructured data, without rec...