Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and system for document image layout deconstruction and redisplay system

a document image and layout technology, applied in the field of document image layout deconstruction and redisplay system, can solve the problems of loss of meaningful or aesthetically pleasing typeface and type size choice, high cost of manual keying and/or manual tagging, loss of etc., to achieve complete understanding, high degree of legibility, and loss of the original document look and feel

Inactive Publication Date: 2004-10-14
XEROX CORP
View PDF10 Cites 133 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

The invention provides methods and systems for converting page-image documents, such as scanned hardcopy documents, into a form that can be displayed on screens of arbitrary size. This is done by automatically reformatting or "reflowing" the document content using a process called reflowing. The invention also includes methods for analyzing the layout of the document and identifying important information like text lines and column boundaries. The resulting document can be conveniently viewed on handheld devices. The invention has several advantages over existing methods, including the ability to find text lines that are missed and the ability to accurately match text lines to their corresponding column boundaries.

Problems solved by technology

Existing systems for rendering page-image versions of documents on display screens have required manual activities to improve the rendering, or clumsy panning mechanisms to view direct display of page images on wrong-sized surfaces.
Problems with existing systems include: (a) high expense of manual keying and / or correcting of OCR results and manual tagging; (b) the risk of highly visible and disturbing errors in the text resulting from OCR mistakes; and (c) the loss of meaningful or aesthetically pleasing typeface and type size choices, graphics and other non-text elements; and (d) loss of proper placement of elements on the page.
Such problems are significant, for example, because book publishers are increasingly creating page-image versions of books currently being published, as well as books from their backlists.
While print-on-demand images can be re-targeted to slightly larger or slightly smaller formats by scaling the images, they cannot currently be re-used for most electronic book purposes without either re-keying the book into XML format, or scanning the page images using OCR and manually correcting the re-keyed and scanned images.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for document image layout deconstruction and redisplay system
  • Method and system for document image layout deconstruction and redisplay system
  • Method and system for document image layout deconstruction and redisplay system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] FIG. 1 illustrates a detailed example of an intermediate data structure 260 for a page image 300. In FIG. 1 the intermediate data structure 260 is expressed using XHTML as an example of an intermediate data structure format. The page image 300 is shown schematically having a first text area 310 which functions as a title, a second area 320 which functions as an author list, third text areas 330 which function as paragraphs, and a fourth text area 340 which functions as a page number. The structures represented by these text areas 310-340 are usually significant to both the author and the reader, and so are detected and preserved in the intermediate data structure 260. For example, the intermediate data structure 260 preserves the title text area 310 by noting the position of this title text area 310 at the top of the page image, that the text area 310 is centered, and the large typeface used in this text area 310. The position is preserved in the intermediate data structure 2...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention converts a document originating in a page-image format into a form suitable for an arbitrarily sized display, by reformatting or "re-flowing" of the document to fit an arbitrarily sized display device.

Description

[0001] 1. Field of Invention[0002] The invention relates generally to the problem of making an arbitrary document, conveniently readable on an arbitrarily sized display.[0003] 2. Description of Related Art[0004] Existing systems for rendering page-image versions of documents on display screens have required manual activities to improve the rendering, or clumsy panning mechanisms to view direct display of page images on wrong-sized surfaces. In particular, it has been necessary to either (1) key in the entire text manually, or (2) process the page images through an optical character recognition (OCR) system and then manually tag the resulting text in order to preserve visually important layout features.[0005] Problems with existing systems include: (a) high expense of manual keying and / or correcting of OCR results and manual tagging; (b) the risk of highly visible and disturbing errors in the text resulting from OCR mistakes; and (c) the loss of meaningful or aesthetically pleasing t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F3/14G06F17/21G06T11/60G09G5/00G09G5/22
CPCG06F17/211G06K9/00463G06F40/103G06V30/414G06F40/151G06F40/131G06F40/166
Inventor BREUEL, THOMAS M.BAIRD, HENRY S.JANSSEN, WILLIAM C.POPAT, ASHOK C.BLOOMBERG, DAN S.
Owner XEROX CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products