Method and device for document content structuring

A content structure and structuring technology, applied in word processing, special data processing applications, instruments, etc., can solve the problems of low structuring efficiency and high error rate, and achieve low structuring rate, improve matching rate, and reduce structuring The effect of the error rate

Inactive Publication Date: 2014-06-25
PEKING UNIV FOUNDER GRP CO LTD +1
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The embodiment of the present application provides a method and device for structuring document content to solve the technical problems of low structuring efficiency and high error rate in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for document content structuring
  • Method and device for document content structuring
  • Method and device for document content structuring

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0071] The embodiment of the present application solves the technical problems of low efficiency and high error rate in the prior art of structuring discontinuous content by providing a method and device for structuring document content.

[0072] The technical solution in the embodiment of this application is to solve the problems of low structural efficiency and high error rate of the above-mentioned discontinuous content. The general idea is as follows:

[0073] Based on the first schema file in the first document whose style is a preset style and a first XML file whose rule is a first structural rule, generate a first instantiation rule corresponding to the first document; based on the first The structured first tag structure tree of the first content in the document, obtaining a first tag list corresponding to the first content; obtaining the first tag list corresponding to the first tag list from discontinuous content corresponding to the first tag list M texts matched by...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and a device for document content structuring. The method includes: based on a first schema file in a preset style in a first document and a first XML (extensive markup language) file with a rule to be a first structuring rule, generating a first instantiation rule corresponding to the first document; based on a first tag structure tree of structured first content in the first document, acquiring a first tag list corresponding to the first content; acquiring M texts, matched with the first instantiation rule, from discontinuous content corresponding to the first tag list, wherein the discontinuous content is unstructured content excluded from the structured first convent; judging N tags, capable of being matched with the structured first content, in M tags corresponding to the M texts; based on the N tags, structuring N texts corresponding to the N tags to acquire a second tag structure tree.

Description

technical field [0001] The invention relates to the field of printing, in particular to a method and device for structuring document content. Background technique [0002] When a publishing house receives a large number of manuscripts and needs to make a large number of manuscripts into printed products such as books or periodicals, it needs to devote a lot of energy to organize the content structure of the manuscripts. For discontinuous content in the document, for example, when the test questions and answers are separated, The answer part in the test paper is discontinuous content relative to the test paper. When the general content is separated from the specific content, the specific content is discontinuous content relative to the entire content document. When sorting out the content of these documents, these separated parts need to be separated The answer corresponds to the structure of the test questions, and the specific information corresponds to the structure of the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F40/143
CPCG06F17/2247G06F40/14G06F40/154G06F40/117G06F40/143
Inventor 孙明明
Owner PEKING UNIV FOUNDER GRP CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products