Blank region processing method and system for electronic document
A blank area and electronic document technology, applied in the direction of electronic digital data processing, special data processing applications, natural language data processing, etc., can solve the problem that redundant blank areas cannot be reduced, achieve the purpose of retaining typesetting interval information, and reducing page turning operations , to ensure the effect of compact
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Example Embodiment
[0052] Example 1:
[0053] This embodiment provides a method for processing blank areas in an electronic document, which includes the following processes:
[0054] S1: Extract blank lines in the document. Here, it can be obtained by parsing the document information and extracting its file attribute information. For example, a blank line or a carriage return in the file attribute can be considered as a blank line.
[0055] As an optimized implementation in this embodiment, in this step, the chapter information of the entire document can be obtained first, and then the blank line information of each chapter file can be obtained in turn. The specific steps are as follows:
[0056] S11. Obtain the chapter file and the catalog file of the document, which can be obtained by parsing the attribute information of the document.
[0057] S12: Determine the traversal sequence of the chapter files according to the directory file.
[0058] S13. Detect paragraph tags of each chapter file in turn, and...
Example Embodiment
[0069] Example 2:
[0070] In this embodiment, a method for processing blank areas in electronic documents is provided, which analyzes documents in ePub format. A schematic diagram of the entire processing process is as follows: figure 2 Shown.
[0071] There may be consecutive empty paragraph tags in the text elements of the html text in the ePub document The resulting white page area, if the white page area is large, will affect the user's reading experience.
[0072] The solution in this embodiment includes the following specific steps:
[0073] 1. The inside of the ePub document is a zip package. Decompress it first to get the directory (OPS or OEBPS) where the chapter file is located, and then determine the traversal order of xhtml / html chapter files according to the order described in the catalog file.
[0074] 2. Detect the html tags of all chapter files in turn, and determine the tags that may cause blank lines, such as paragraph tags Values are spaces, newlines, etc.
[007...
Example Embodiment
[0080] Example 3:
[0081] This embodiment provides a specific application example for processing blank areas in ePub documents. It is assumed that the height of the visible area of the target device is H, and the height of each line is h, and h is equal to the sum of line height and line spacing.
[0082] The first step is to decompress the ePub document, traverse the html / xhtml files corresponding to all chapters, and there are a total of m chapters;
[0083] The second step, starting from the first chapter, traverse the html tags in the chapter file, set a blank line counter sum, the initial value is set to 0;
[0084] In the third step, in the chapter file, each time a continuous blank line is detected, the sum value is increased by 1. If the next line of the blank line is not a blank line, then sum=0;
[0085] The fourth step, when sum*h> =H*p, replace the continuous sum line with a blank line, reset sum to 0, p is the scale factor, set it as needed;
[0086] The fifth step, cont...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic.
© 2023 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap