Blank region processing method and system for electronic document

A blank area and electronic document technology, applied in the direction of electronic digital data processing, special data processing applications, natural language data processing, etc., can solve the problem that redundant blank areas cannot be reduced, achieve the purpose of retaining typesetting interval information, and reducing page turning operations , to ensure the effect of compact

Active Publication Date: 2016-03-02
新方正控股发展有限责任公司 +1
View PDF8 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Therefore, the technical problem to be solved by the present invention is that in the prior art, it is impossible to reduce redundant blank

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Blank region processing method and system for electronic document
  • Blank region processing method and system for electronic document
  • Blank region processing method and system for electronic document

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0052] Example 1:

[0053] This embodiment provides a method for processing blank areas in an electronic document, which includes the following processes:

[0054] S1: Extract blank lines in the document. Here, it can be obtained by parsing the document information and extracting its file attribute information. For example, a blank line or a carriage return in the file attribute can be considered as a blank line.

[0055] As an optimized implementation in this embodiment, in this step, the chapter information of the entire document can be obtained first, and then the blank line information of each chapter file can be obtained in turn. The specific steps are as follows:

[0056] S11. Obtain the chapter file and the catalog file of the document, which can be obtained by parsing the attribute information of the document.

[0057] S12: Determine the traversal sequence of the chapter files according to the directory file.

[0058] S13. Detect paragraph tags of each chapter file in turn, and...

Example Embodiment

[0069] Example 2:

[0070] In this embodiment, a method for processing blank areas in electronic documents is provided, which analyzes documents in ePub format. A schematic diagram of the entire processing process is as follows: figure 2 Shown.

[0071] There may be consecutive empty paragraph tags in the text elements of the html text in the ePub document The resulting white page area, if the white page area is large, will affect the user's reading experience.

[0072] The solution in this embodiment includes the following specific steps:

[0073] 1. The inside of the ePub document is a zip package. Decompress it first to get the directory (OPS or OEBPS) where the chapter file is located, and then determine the traversal order of xhtml / html chapter files according to the order described in the catalog file.

[0074] 2. Detect the html tags of all chapter files in turn, and determine the tags that may cause blank lines, such as paragraph tags Values ​​are spaces, newlines, etc.

[007...

Example Embodiment

[0080] Example 3:

[0081] This embodiment provides a specific application example for processing blank areas in ePub documents. It is assumed that the height of the visible area of ​​the target device is H, and the height of each line is h, and h is equal to the sum of line height and line spacing.

[0082] The first step is to decompress the ePub document, traverse the html / xhtml files corresponding to all chapters, and there are a total of m chapters;

[0083] The second step, starting from the first chapter, traverse the html tags in the chapter file, set a blank line counter sum, the initial value is set to 0;

[0084] In the third step, in the chapter file, each time a continuous blank line is detected, the sum value is increased by 1. If the next line of the blank line is not a blank line, then sum=0;

[0085] The fourth step, when sum*h> =H*p, replace the continuous sum line with a blank line, reset sum to 0, p is the scale factor, set it as needed;

[0086] The fifth step, cont...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Embodiments of the invention disclose a blank region processing method for an electronic document. The method comprises the steps of extracting blank lines from the document firstly; then obtaining the height of the blank region according to the blank lines; next, determining whether the black region needs to be processed or not according to the height of the blank region; and finally, combining the blank lines in the black region needing to be processed. The scheme adopts a special blank page filtering method, so that the redundant operation of a user is reduced, the normal typesetting interval information of the original document is kept to the maximum, and the blank region in the document can be effectively reduced; and in addition, by choosing reasonable conditions, the range of the blank region is reduced to the minimum, the compact document display is ensured, the screen utilization ratio of the terminal equipment is improved, unnecessary page turning operations of the user are reduced, and the user experience is improved.

Description

technical field [0001] The invention relates to the field of electrical data processing, in particular to a method and system for processing blank areas in electronic documents. Background technique [0002] With the development of digital information, electronic documents have become one of the ways for people to obtain information. Electronic documents can be read through mobile devices such as mobile phones, computers, tablets, and e-books. There are also various forms of electronic documents, such as PDF documents, EPub documents, and the like. PDF documents are a commonly used electronic document format, but documents in PDF format cannot be edited. EPub documents internally use XHTML or DTBook to display text, and store document content in zip compressed format. ePub is widely used as a commonly used e-book standard. An EPUB document is a simple ZIP format file, which includes pre-defined The documents are arranged in such a way that their text content can be displa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/25G06F40/189
CPCG06F40/189
Inventor 时志芳贾丽
Owner 新方正控股发展有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products