Unlock instant, AI-driven research and patent intelligence for your innovation.

Blockwise extraction of document metadata

A metadata and document image technology, applied in the field of document processing, can solve problems affecting general productivity and other issues

Pending Publication Date: 2020-08-07
IBM CORP
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Given the amount of information represented in traditional paper forms and scanned document images, the extraction of such document images can greatly impact industry as well as general productivity in many areas of society

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Blockwise extraction of document metadata
  • Blockwise extraction of document metadata
  • Blockwise extraction of document metadata

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] figure 1 A system 100 for cognitively digitizing document images is depicted in accordance with one or more embodiments set forth herein.

[0028]Extracting computational data from document images is often unsuccessful due to various custom formats, separate styles, different alignments, and non-text content. As a result, the vast amount of information represented in document images cannot be accessed like fully digital documents. Document images that have not been digitized have limited uses, such as visual observation and archival purposes. In the alternative, the time and cost required to manually digitize such document images would be prohibitive given the number of documents that would benefit from digitization.

[0029] Digital documents are generally preferred for the convenience of performing calculations using the data represented in the document. When a pen-on-paper document is scanned, the document is a series of visual images of pages but is not computati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Methods, computer program products, and systems are presented. The methods include, for instance: obtaining a document image, wherein the document image includes a plurality of objects; identifying aplurality of macroblocks within the document image; performing microblock processing within macroblocks of the plurality of macroblocks, wherein the microblock processing includes examining content ofmicroblocks within a macroblock for extraction of key-value pairs, the examining content including performing an ontological analysis of microblocks, wherein the microblock processing includes associating confidence levels to the extracted key-value pairs; and outputting metadata based on the performing microblock processing within macroblocks of the plurality of macroblocks.

Description

technical field [0001] The present disclosure relates to document processing techniques, and more particularly to methods, computer program products and systems for cognitively digitizing data from document images. Background technique [0002] In traditional document processing, an inked document on paper is scanned page by page as the corresponding viewable image in preparation. The resulting document file of scanning paper is usually a series of visual images of the pages. Each visual image of a page does not have accessible content, and existing document processing applications can digitize certain visual image patterns into digitized data that can be accessed and manipulated using a corresponding computer program application. This digital manipulation of data from visual images is often referred to as extraction or data extraction. Given the amount of information represented in traditional paper forms and scanned document images, the extraction of such document images...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06V30/10
CPCG06F40/163G06F40/154G06F40/117G06F40/109G06V30/412G06V30/413G06V30/414G06V30/10G06F40/30G06F16/93G06F16/367G06F16/35
Inventor K.诺思拉普C.特里姆T.希基T.加武
Owner IBM CORP