Word document fragmentation method and device

A fragmentation and document technology, applied in word processing, instruments, computing, etc., can solve the problems of huge workload, long time consumption, low retrieval efficiency, etc.

Active Publication Date: 2018-11-09
ZHONGKE DINGFU BEIJING TECH DEV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the different contents of the Word document, the length of the Word document is usually different. When the length of the Word document is long, the workload of retrieving the entire Word document is very large, time-consuming, and retrieval efficiency is low.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word document fragmentation method and device
  • Word document fragmentation method and device
  • Word document fragmentation method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0025] An embodiment of the present invention provides a Word document fragmentation method. figure 1 A flow chart of a Word document fragmentation method provided by an embodiment of the present invention, such as figure 1 As shown, the method may include the following steps:

[0026] Step S110, obtaining all the paragraphs of the Word document.

[0027] It is well known to those skilled in the art that the content of a Word document is composed of multiple paragraphs, each paragraph wraps at the end, and starts a new paragraph at the next line, and, in the Word document, the title occupies a single paragraph. In this embodiment, all the paragraphs of the Word document are obtained, and when the user searches, the retrieval is performed with the paragraph as the retrieved unit, which helps to improve the retrieval efficiency. At the same time, using the paragraph as the operation unit helps to obtain the title from the paragraph.

[0028] figure 2 A flow chart of step S11...

Embodiment 2

[0121] An embodiment of the present invention provides a Word document fragmentation device. Figure 7 A block diagram of a Word document fragmentation device provided by an embodiment of the present invention, such as Figure 7 As shown, the device includes:

[0122] Obtaining module 210, for obtaining all paragraphs of the Word document;

[0123] The first generation module 220 is used to obtain the paragraph attributes of the paragraphs according to the sequence of the paragraphs in the Word document, and extract all the paragraph attributes that first appear in the Word document to generate the paragraph attribute set of the Word document;

[0124] The second generation module 230 is used to use the paragraph attribute recognition model to extract all the title paragraph attributes from the paragraph attribute set to generate the title paragraph attribute set; the recognition model is based on the manually marked paragraph attributes of the title segment / text segment gen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a Word document fragmentization method and device. In order to solve the problem that in the prior art, the retrieval efficiency is low when the target content is retrieved in a Word document, the Word document fragmentization method comprises the steps that firstly, all paragraphs of the Word document are obtained; secondly, according to the sequential order of the paragraphs in the Word document, the paragraph attributes of the paragraph are obtained in sequence, all the paragraph attributes which first appear in the Word document are extracted, and a paragraph attribute set of the Word document is generated; thirdly, a paragraph attribute recognition model is utilized to extract all headline paragraph attributes in the paragraph attribute set, and a headline paragraph attribute set is generated; fourthly, according to the headline paragraph attribute set, all headlines in the Word document are recognized, a headline tree of the Word document is generated, and the Word document fragmentization is achieved. Therefore, a user can directly retrieve the document paragraphs containing the target content in the fragmentized Word document or retrieve the target content in the headline tree of the Word document, and the retrieval efficiency is improved when the user retrieves the headlines of the Word document.

Description

technical field [0001] The present invention relates to the field of word information processing, in particular to a Word document fragmentation method and device Background technique [0002] Word document is a special document format in Microsoft Word software. Because Microsoft Word software occupies an absolute dominant position in the existing word processing software, Word document has actually become an international common document format standard. Therefore, in In the prior art, most of the documents that people deal with in work and study are Word documents. [0003] Generally speaking, the content of a Word document is an article, a report, a thesis, etc., and these Word documents include a title and a body. In a document, the title is usually a summary of the content of a section of text, reflecting the theme of a section of text, and the text is a specific description of its subject content, reflecting the specific content corresponding to the topic; in a docum...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/21G06F17/27G06F17/30
CPCG06F16/322G06F40/10G06F40/258
Inventor 房平会李德彦徐龙
Owner ZHONGKE DINGFU BEIJING TECH DEV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products