Unlock instant, AI-driven research and patent intelligence for your innovation.

Cognitive document image digitization

A document image and parameter technology, applied in the field of document processing, can solve problems affecting general productivity and other issues

Pending Publication Date: 2020-07-10
IBM CORP
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Given the amount of information represented in traditional paper forms and scanned document images, the extraction of such document images could greatly affect general productivity in many areas of industry as well as society

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cognitive document image digitization
  • Cognitive document image digitization
  • Cognitive document image digitization

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] figure 1 A system 100 for cognitively digitizing document images is depicted in accordance with one or more embodiments set forth herein.

[0026] Extracting computational data from document images is often unsuccessful due to various custom formats, separate styles, different alignments, and non-text content. As a result, the vast amount of information represented in document images cannot be accessed like fully digital documents. Document images that have not been digitized have limited uses, such as visual observation and archival purposes. In the alternative, the time and cost required to manually digitize such document images would be prohibitive given the number of documents that would benefit from digitization.

[0027] Digital documents are generally preferred for the convenience of performing calculations using the data represented in the document. When a pen-on-paper document is scanned, the document is a series of visual images of pages but is not computat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Methods, computer program products and systems are presented. The methods include, for instance, obtaining a document image with objects and identifying the microblocks corresponding to each object; analyzing the positions of the microblocks for collinearity with another microblock based on the respective positional characteristics and the adjustable collinearity parameters. The collinear microblocks are identified into a macroblock, and the computational data of a key-value pair is created from the macroblock. A heuristic confidence level is associated with the key-value pair, and based on the data cluster formation, a table may be classified and the data can be extracted.

Description

technical field [0001] The present disclosure relates to document processing techniques, and more particularly to methods, computer program products and systems for cognitively digitizing data from document images. Background technique [0002] In traditional document processing, an ink-on-paper document is scanned page by page as corresponding viewable images in preparation. The resulting document file of a scanned paper is usually a series of visual images of the pages. Each visual image of a page does not have accessible content, and existing document processing applications can digitize certain visual image patterns into digitized data that can be accessed and manipulated using a corresponding computer program application. This digital manipulation of data from visual images is often referred to as extraction or data extraction. Given the amount of information represented in traditional paper forms and scanned document images, the extraction of such document images can...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/72G06V30/10
CPCG06V30/413G06V30/414G06V30/10G06V30/43G06F18/217
Inventor K.诺思拉普C.特里姆B.哈米斯K.塞加尔C.帕多勒A.阿德尼兰
Owner IBM CORP