Method and device for extracting document structure

A document structure and document technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of error-prone, time-consuming and labor-intensive extraction of document structure, etc. Effect

Inactive Publication Date: 2012-07-04
PEKING UNIV FOUNDER GRP CO LTD +1
View PDF1 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The present invention aims to provide a method and device for extracting document structure, so as to solve the problem of time-consuming, labor-intensive and error-prone in extracting document structure in related technologies

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for extracting document structure
  • Method and device for extracting document structure
  • Method and device for extracting document structure

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0015] The present invention will be described in detail below with reference to the accompanying drawings and in combination with embodiments.

[0016] figure 1 A flowchart showing a method for extracting a document structure according to an embodiment of the present invention, including:

[0017] Step S10, obtaining the book catalog and layout content of the book document;

[0018] Step S20, identifying chapters from the layout content according to the book catalogue;

[0019] Step S30, extracting chapters.

[0020] Related technologies rely on manual analysis of book documents to extract the document structure, which is error-prone and time-consuming. However, the method of this embodiment utilizes catalog information to identify and extract document structures. General book documents have book catalogs, and automatic identification of catalog information by a computer is easy to implement. Therefore, this method can automatically identify and extract most books by compu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for extracting a document structure. The method comprises the following steps of: acquiring the book catalogue and format contents of a book document; identifying chapter sections from the format contents according to the book catalogue; and extracting the chapter sections. The invention also provides a device for extracting the document structure. The device comprises an acquiring module used for acquiring the book catalogue and format contents of the book document, an identifying module used for identifying the chapter sections from the format contents according to the book catalogue, and an extracting module used for extracting the chapter sections. The method and the device can be used for improving the efficiency and accuracy of document structure extraction.

Description

technical field [0001] The present invention relates to the field of digital typesetting, in particular to a method and device for extracting document structure. Background technique [0002] In the current field of structured content processing, chapter structured content is extracted by analyzing the text content and format of books. Specifically, the position of the chapter is identified by analyzing information such as the font, font size, and the definition of the layout symbol of the chapter. Such a method needs to first analyze the layout characteristics of the book, summarize the layout rules of the chapter content, and then manually define the chapter mapping rules to extract. The concrete steps of this method are as follows: [0003] Step 1. Analyze the text content and format of the book, and determine the hierarchical mapping rules for extracting chapter content. For example, the first-level title mapping rule of a chapter can be set as: chapter level: first l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 黄冶田寄远陈长刚翟因为
Owner PEKING UNIV FOUNDER GRP CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products