Method and device for extracting document structure

A document structure and document technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve problems such as cumbersome operations and achieve the effect of improving efficiency
CN102855243AInactive Publication Date: 2013-01-02PEKING UNIV FOUNDER GRP CO LTD +1

Patent Information

Authority / Receiving Office
CN Β· China
Patent Type
Applications(China)
Current Assignee / Owner
PEKING UNIV FOUNDER GRP CO LTD
Publication Date
2013-01-02
Estimated Expiration
Not applicable Β· inactive patent

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention provides a method for extracting a document structure. The method comprises the following steps: acquiring an object of a document; converting the object into a predefined standard format; identifying and marking items in the object in the standard format; and extracting contents of the matched items to form structural data relevant to the document. The invention also provides a device for extracting the document structure. The device comprises an acquisition module for acquiring the object of the document, a conversion module for converting the object into the predefined standard format, a marking module for identifying and marking the items in the object in the standard format, and an extraction module for extracting contents of the matched items to form structural data relevant to the document. By the method and the device, an effect of improving the efficiency for document structure extraction is achieved.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The present invention relates to the field of digital publishing, in particular, to a method and device for extracting document structure. Background technique

[0002] In the field of traditional publishing, the document format of books and newspapers is only to meet the needs of traditional printing. The description of the content is limited to visual elements such as text, graphics, image outline, color, position, etc., without the logical content and internal relationship of the document. In the field of digital publishing, more attention is paid to the logical content, association relationship, and content granularity of documents. Structural processing of documents is a prerequisite for digital content reuse.

[0003] At present, the method of structured processing of document content mainly adopts manual processing. According to the predefined rules, the processing personnel visually identify the document content in the document that conforms to...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More