Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and system for converting file from portable document format (PDF) to electronic publication (EPUB) format

A format file and file technology, applied in the field of document processing, can solve problems such as resolution degradation and difficult text recognition

Active Publication Date: 2012-01-25
WONDERSHARE TECH CO LTD
View PDF4 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Text is more difficult to read on small devices due to resolution loss when taking screenshots

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for converting file from portable document format (PDF) to electronic publication (EPUB) format
  • Method and system for converting file from portable document format (PDF) to electronic publication (EPUB) format
  • Method and system for converting file from portable document format (PDF) to electronic publication (EPUB) format

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0066] see figure 1 , is a flow chart of the method for converting a PDF format file into an EPUB format according to Embodiment 1 of the present invention. like figure 1 As shown, the method includes the steps of:

[0067] S101: Identify text elements and image elements in the PDF format file;

[0068] Since the properties of the text element and the image element are different, when the PDF format file is read, the data stream of the text element and the data stream of the image element have different identifiers respectively. Therefore, the text elements and image elements in the PDF file can be identified according to the identifier in the data stream.

[0069] S102: Obtain the coordinates of the text element and the coordinates of the image element;

[0070] S103: According to the coordinates of the text element and the coordinates of the image element, determine the position of the text element and the image element in the newly generated HTML format file, so that th...

Embodiment 2

[0081] see figure 2 , is a flow chart of the method for converting a PDF format file into an EPUB format according to Embodiment 2 of the present invention. This embodiment illustrates the practical application process of the present invention in more detail. like figure 2 As shown, the method includes the steps of:

[0082] S201: Identify text elements and image elements in the PDF format file;

[0083] S202: Obtain the coordinates of the text element and the coordinates of the image element;

[0084] S203: Determine whether the ordinate of the lower right point of the text element is smaller than the ordinate of the upper left point of the image element;

[0085] If yes, execute step S204; otherwise, execute step S205;

[0086] S204: Position the text element above the image element;

[0087] S205: Determine whether the abscissa of the lower right point of the text element is smaller than the abscissa of the upper left point of the image element;

[0088] If yes, ex...

Embodiment 3

[0101] Compared with the second embodiment, this embodiment adopts another way of determining the positions of the text elements and the image elements in the newly generated HTML format file.

[0102] see image 3 , is a flow chart of the method for converting a PDF format file into an EPUB format described in Embodiment 3 of the present invention.

[0103] like image 3 As shown, the method includes the steps of:

[0104] S301: Identify text elements and image elements in the PDF format file;

[0105] S302: Obtain the coordinates of the text element and the coordinates of the image element;

[0106] S303: Determine whether the ordinate of the upper left point of the text element is greater than the ordinate of the lower right point of the image element;

[0107] If yes, execute step S304; otherwise execute step S305;

[0108] S304: Position the text element below the image element;

[0109] S305: Determine whether the abscissa of the upper left point of the text element ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for converting a file from a portable document format (PDF) to an electronic publication (EPUB) format. The method comprises the following steps of: identifying text elements and image elements in a file in the PDF; acquiring the coordinates of the text elements and the coordinates of the image elements; determining the positions of the text elements and the image elements in a newly generated file in a hypertext markup language (HTML) format according to the coordinates of the text elements and the coordinates of the image elements; generating the file in the HTML format according to the positions; and generating a file in the EPUB format according to the file in the HTML format. The invention also discloses a system for converting the file from the PDF to the EPUB format. By the method and the system disclosed by the invention, the converted file in the EPUB format can be provided with texts and images at the same time, and the position relation of the text elements and the image elements in the original file in the PDF can be maintained.

Description

technical field [0001] The invention relates to the technical field of document processing, in particular to a method and system for converting PDF format files into EPUB format. Background technique [0002] PDF is the abbreviation of Portable Document Format (Portable Document Format), which is an electronic document format. With its excellent characteristics, the PDF file format has become an ideal file format for electronic document distribution and formatted information dissemination on the Internet. Currently, most scientific papers published on the Internet are submitted in PDF format. However, because PDF files are typeset based on coordinates, and absolute positioning is difficult on small devices, PDF files cannot adapt to the page on small devices or mobile devices. In the prior art, in order to better display the content of the PDF file on a small device or a mobile device, the PDF file is usually converted into the EPUB format. [0003] The EPUB format is an ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/21
CPCG06F17/21G06F17/30G06F17/30179G06F16/1794
Inventor 王峰晏检平
Owner WONDERSHARE TECH CO LTD