Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Blank region processing method and system for electronic document

A blank area and electronic document technology, applied in the direction of electronic digital data processing, special data processing applications, natural language data processing, etc., can solve the problem that redundant blank areas cannot be reduced, achieve the purpose of retaining typesetting interval information, and reducing page turning operations , to ensure the effect of compact

Active Publication Date: 2016-03-02
新方正控股发展有限责任公司 +1
View PDF8 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Therefore, the technical problem to be solved by the present invention is that in the prior art, it is impossible to reduce redundant blank areas without affecting the expression of document content, thus proposing a method and system for processing blank areas in electronic documents

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Blank region processing method and system for electronic document
  • Blank region processing method and system for electronic document
  • Blank region processing method and system for electronic document

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0053] This embodiment provides a method for processing a blank area in an electronic document, including the following process:

[0054] S1: Extract blank lines in the document. Here, it can be obtained by parsing the document information and extracting its file attribute information. For example, a space or a carriage return in a file attribute can be regarded as a blank line.

[0055] As an optimized implementation solution in this embodiment, in this step, the chapter information of the entire document can be obtained first, and then the blank line information of each chapter file can be obtained sequentially. The specific steps are as follows:

[0056] S11. Obtain the chapter file and the catalog file of the document, which can be obtained by parsing the attribute information of the document.

[0057] S12. Determine the traversal order of the chapter files according to the directory file.

[0058] S13. Detect the paragraph tags of each chapter file in turn, and obtain t...

Embodiment 2

[0070] A method for processing blank areas in electronic documents is provided in this embodiment, which is to analyze documents in ePub format. The schematic diagram of the entire processing process is as follows figure 2 shown.

[0071] Consecutive empty paragraph tags may exist in text elements of html text in ePub documents The resulting white page area, if the white page area is large, will affect the user's reading experience.

[0072] The scheme in the present embodiment comprises concrete steps as follows:

[0073] 1. The inside of the ePub document is a zip package. First decompress it to get the directory (OPS or OEBPS) where the chapter files are located, and then determine the traversal order of the xhtml / html chapter files according to the order described in the directory file.

[0074] 2. Detect the html tags of all chapter files in turn, and judge the tags that may cause blank lines, such as paragraph tags Values ​​are spaces, newlines, etc.

[0075]...

Embodiment 3

[0081] In this embodiment, a specific application example is given for processing the blank area in the ePub document. It is assumed that the height of the visible area of ​​the target device is H, and the height of each row is h, and h is equal to the sum of the row height and the row spacing.

[0082] The first step is to decompress the ePub document, traverse the html / xhtml files corresponding to all chapters, and there are m chapters in total;

[0083] The second step, starting from the first chapter, traverse the html tags in the chapter file, set a blank line counter sum, and set the initial value to 0;

[0084] In the third step, in the chapter file, every time a continuous blank line is detected, the sum value is increased by 1, and if the next line of the blank line is not a blank line, then sum=0;

[0085] The fourth step, when sum*h>=H*p, replace the continuous sum line with a blank line, reset the sum to 0, and p is the proportional coefficient, set as re...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Embodiments of the invention disclose a blank region processing method for an electronic document. The method comprises the steps of extracting blank lines from the document firstly; then obtaining the height of the blank region according to the blank lines; next, determining whether the black region needs to be processed or not according to the height of the blank region; and finally, combining the blank lines in the black region needing to be processed. The scheme adopts a special blank page filtering method, so that the redundant operation of a user is reduced, the normal typesetting interval information of the original document is kept to the maximum, and the blank region in the document can be effectively reduced; and in addition, by choosing reasonable conditions, the range of the blank region is reduced to the minimum, the compact document display is ensured, the screen utilization ratio of the terminal equipment is improved, unnecessary page turning operations of the user are reduced, and the user experience is improved.

Description

technical field [0001] The invention relates to the field of electrical data processing, in particular to a method and system for processing blank areas in electronic documents. Background technique [0002] With the development of digital information, electronic documents have become one of the ways for people to obtain information. Electronic documents can be read through mobile devices such as mobile phones, computers, tablets, and e-books. There are also various forms of electronic documents, such as PDF documents, EPub documents, and the like. PDF documents are a commonly used electronic document format, but documents in PDF format cannot be edited. EPub documents internally use XHTML or DTBook to display text, and store document content in zip compressed format. ePub is widely used as a commonly used e-book standard. An EPUB document is a simple ZIP format file, which includes pre-defined The documents are arranged in such a way that their text content can be displa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/25G06F40/189
CPCG06F40/189
Inventor 时志芳贾丽
Owner 新方正控股发展有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products