Method and system for document image layout deconstruction and redisplay system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
a document image and layout technology, applied in the field of document image layout deconstruction and redisplay system, can solve the problems of loss of meaningful or aesthetically pleasing typeface and type size choice, high cost of manual keying and/or manual tagging, loss of etc., to achieve complete understanding, high degree of legibility, and loss of the original document look and feel

Inactive Publication Date: 2004-10-14

XEROX CORP

View PDF10 Cites 133 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Benefits of technology

The invention provides methods and systems for converting page-image documents, such as scanned hardcopy documents, into a form that can be displayed on screens of arbitrary size. This is done by automatically reformatting or "reflowing" the document content using a process called reflowing. The invention also includes methods for analyzing the layout of the document and identifying important information like text lines and column boundaries. The resulting document can be conveniently viewed on handheld devices. The invention has several advantages over existing methods, including the ability to find text lines that are missed and the ability to accurately match text lines to their corresponding column boundaries.

Problems solved by technology

Existing systems for rendering page-image versions of documents on display screens have required manual activities to improve the rendering, or clumsy panning mechanisms to view direct display of page images on wrong-sized surfaces.

Problems with existing systems include: (a) high expense of manual keying and / or correcting of OCR results and manual tagging; (b) the risk of highly visible and disturbing errors in the text resulting from OCR mistakes; and (c) the loss of meaningful or aesthetically pleasing typeface and type size choices, graphics and other non-text elements; and (d) loss of proper placement of elements on the page.

Such problems are significant, for example, because book publishers are increasingly creating page-image versions of books currently being published, as well as books from their backlists.

While print-on-demand images can be re-targeted to slightly larger or slightly smaller formats by scaling the images, they cannot currently be re-used for most electronic book purposes without either re-keying the book into XML format, or scanning the page images using OCR and manually correcting the re-keyed and scanned images.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0033] FIG. 1 illustrates a detailed example of an intermediate data structure 260 for a page image 300. In FIG. 1 the intermediate data structure 260 is expressed using XHTML as an example of an intermediate data structure format. The page image 300 is shown schematically having a first text area 310 which functions as a title, a second area 320 which functions as an author list, third text areas 330 which function as paragraphs, and a fourth text area 340 which functions as a page number. The structures represented by these text areas 310-340 are usually significant to both the author and the reader, and so are detected and preserved in the intermediate data structure 260. For example, the intermediate data structure 260 preserves the title text area 310 by noting the position of this title text area 310 at the top of the page image, that the text area 310 is centered, and the large typeface used in this text area 310. The position is preserved in the intermediate data structure 2...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention converts a document originating in a page-image format into a form suitable for an arbitrarily sized display, by reformatting or "re-flowing" of the document to fit an arbitrarily sized display device.

Description

[0001] 1. Field of Invention[0002] The invention relates generally to the problem of making an arbitrary document, conveniently readable on an arbitrarily sized display.[0003] 2. Description of Related Art[0004] Existing systems for rendering page-image versions of documents on display screens have required manual activities to improve the rendering, or clumsy panning mechanisms to view direct display of page images on wrong-sized surfaces. In particular, it has been necessary to either (1) key in the entire text manually, or (2) process the page images through an optical character recognition (OCR) system and then manually tag the resulting text in order to preserve visually important layout features.[0005] Problems with existing systems include: (a) high expense of manual keying and / or correcting of OCR results and manual tagging; (b) the risk of highly visible and disturbing errors in the text resulting from OCR mistakes; and (c) the loss of meaningful or aesthetically pleasing t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(United States)

IPC IPC(8): G06F3/14G06F17/21G06T11/60G09G5/00G09G5/22

CPCG06F17/211G06K9/00463G06F40/103G06V30/414G06F40/151G06F40/131G06F40/166

Inventor BREUEL, THOMAS M.BAIRD, HENRY S.JANSSEN, WILLIAM C.POPAT, ASHOK C.BLOOMBERG, DAN S.

Owner XEROX CORP

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method and system for document image layout deconstruction and redisplay system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Benefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology