Word document fragmentization method and device
A fragmentation and document technology, applied in word processing, special data processing applications, instruments, etc., can solve problems such as huge workload, long time-consuming, and low retrieval efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0025] An embodiment of the present invention provides a Word document fragmentation method. figure 1 A flow chart of a Word document fragmentation method provided by an embodiment of the present invention, such as figure 1 As shown, the method may include the following steps:
[0026] Step S110, obtaining all the paragraphs of the Word document.
[0027] It is well known to those skilled in the art that the content of a Word document is composed of multiple paragraphs, each paragraph wraps at the end, and starts a new paragraph at the next line, and, in the Word document, the title occupies a single paragraph. In this embodiment, all the paragraphs of the Word document are obtained, and when the user searches, the retrieval is performed with the paragraph as the retrieved unit, which helps to improve the retrieval efficiency. At the same time, using the paragraph as the operation unit helps to obtain the title from the paragraph.
[0028] figure 2 A flow chart of step S11...
Embodiment 2
[0121] An embodiment of the present invention provides a Word document fragmentation device. Figure 7 A block diagram of a Word document fragmentation device provided by an embodiment of the present invention, such as Figure 7 As shown, the device includes:
[0122] Obtaining module 210, for obtaining all paragraphs of the Word document;
[0123] The first generation module 220 is used to obtain the paragraph attributes of the paragraphs according to the sequence of the paragraphs in the Word document, and extract all the paragraph attributes that first appear in the Word document to generate the paragraph attribute set of the Word document;
[0124] The second generation module 230 is used to use the paragraph attribute recognition model to extract all the title paragraph attributes from the paragraph attribute set to generate the title paragraph attribute set; the recognition model is based on the manually marked paragraph attributes of the title segment / text segment gen...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com