File searching and reading method and apparatus

A document image and document processing technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of poor performance and efficiency of image files

Inactive Publication Date: 2005-05-04
HITACHI LTD
View PDF7 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, processing document structure data separately from image files is inefficient in document management
This is because the document structure data includes featur...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • File searching and reading method and apparatus
  • File searching and reading method and apparatus
  • File searching and reading method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] by figure 1 As an example, roughly explain the difference between the existing method and the proposed method. figure 1 It is a diagram that models the difference between conventional document processing using OCR and document processing using the method proposed in this patent.

[0042] First, in the conventional flow, there are paper document groups indicated by 0101, and they are placed on the OCR device indicated by 0102 for reading. The output of OCR, as shown in 0103, is the document image digitized from the paper image and the text file as the result of OCR reading. Next, use the device shown in 0104 for document processing. In this flow, since the output result of OCR is the reading result text and document image, text retrieval and document image browsing can be performed during document processing.

[0043] In this regard, in the processing flow proposed in this patent application, firstly, there is a group of paper documents indicated by 0105, and they are...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method that enables a search and browse of a document image group through the application of a document structure analysis technique and a character recognition technique as searching/browsing means for paper documents and document images. A highly functional document image search/browse system separates an OCR and a document processing apparatus, adopts as OCR output formats data (reading hypothesis data) holding multiple hypotheses of character line extraction, character segmentation and character recognition, and document structure data having ruled line information, frame information, character line information, browse attribute information and the like about a document image, and provides a function of important keyword extraction and document search from typed and handwritten character strings using OCR-added data, and of document display intended by a browser using the document structure data.

Description

technical field [0001] The present invention relates to a device that applies document analysis technology to obtain information necessary for searching and browsing document groups on a computer from paper document groups or document image groups, and a storage medium that records document analysis technology programs. Background technique [0002] Even today with the popularization of digital information technology, paper documents are still widely used as information delivery media. However, due to the problems of occupying space for storing paper documents and difficulty in retrieving required information, paper documents are stored as electronic images, and the society is reluctant to use computers to search and browse electronically imaged documents (hereinafter known as document images) are in high demand. [0003] The most basic method of paper document retrieval is to convert the paper document into a text file through OCR (Optical Character Recognition), and retri...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/72G06F17/30G06K9/62
Inventor 永崎健丸川胜美竹内沙弥香
Owner HITACHI LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products