Systems and methods for training document analysis system for automatically extracting data from documents

a document analysis and document data technology, applied in the field of systems and methods for training document analysis systems for automatically extracting data from documents, can solve the problems of cumbersome software and bloated standards, too expensive, and the application of xml, xbrl and other computer-readable document files is quite limited

Inactive Publication Date: 2011-10-20
GRUNTWORX
View PDF11 Cites 31 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0039]If the extracted text features are found to consist of the characters, words, and phrases belonging to the set of text features associated with the corresponding electronic document category, the method further includes storing the extracted text features as the data contained in the corresponding electronic document. If, however, the extracted text features are found to include at least one text feature that does not belong to the set of text features associated with the corresponding electronic document category, the method further includes submitting the unrecognized te

Problems solved by technology

Electronic Data Interchange is known for custom computer systems, cumbersome software and bloated standards that defeated its rapid spread throughout the supply chain.
Perceived as too expensive, the vast majority of businesses have avoided implementing EDI.
Similarly, applications of XML, XBRL and other computer-readable document files are quite limited compared to the use of documents in paper and digital image formats (such as PDF and TIFF.)
Such manual data extraction is complex, time-consuming and error-prone.
As a result, the cost of data extraction is often quite high; numerous studies estimate the cost of processing invoices in excess of ten dollars each.
The cost is especially high when the data extraction is performed by accountants, lawyers, physicians and other highly paid professionals as part of their work.
Despite the potential productivity gains that are enabled with workflow software in the form improved labor utilization, manual document processing remains a fundamentally expensive process.
Since outsourcing is manual, just as is conventional data extraction, it is also complex, time-consuming and error-prone.
Quality problems with offshore data extraction work have been reported by many customers.
These measures reduce the cost savings expected from offshore outsourcing.
Outsourcing and offshoring are accompanied with concerns over security risks associated with fraud and identity theft.
Although the transmission of scanned image files to the data extraction organization may be secured by cryptographic techniques, the sensitive data and personal identifying information are in the clear, i.e., unencrypted, when read by data extraction workers prior to entry in the appropriate computer systems.
Many data extraction organizations claim to strictly limit physical access to the rooms in which th

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Systems and methods for training document analysis system for automatically extracting data from documents
  • Systems and methods for training document analysis system for automatically extracting data from documents
  • Systems and methods for training document analysis system for automatically extracting data from documents

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0092]While the prior art attempts to reduce the cost of data extraction through the use of low cost labor and partial automation, none of the above methods of data extraction (1) eliminates the human labor and its accompanying requirements of education, domain expertise, training, software knowledge and / or cultural understanding, (2) minimizes the time spent entering and quality checking the data, (3) minimizes errors, (4) protects the privacy of the owners of the data without being dependent on the security systems of data extraction organizations and (5) eliminates the cost for significant up-front engineering efforts. What is needed, therefore, is a method of performing data extraction that overcomes the above-mentioned limitations and that includes the features enumerated above.

[0093]Preferred embodiments of the present invention provides a method and system for extracting data from paper and digital documents into a format that is searchable, editable and manageable.

[0094]FIG....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A method of training a document analysis system to extract data from documents is provided. The method includes: automatically analyzing images and text features extracted from a document to associate the document with a corresponding document category; comparing the extracted text features with a set of text features associated with corresponding category of the document, in which the set of text features includes a set of characters, words, and phrases; if the extracted features are found to consist of the characters, words, and phrases belonging to the set of text features associated with the corresponding document category, storing the extracted text features as the data contained in the corresponding document; and, if the extracted text features are found to include at least one text feature that does not belong to the set of text features associated with the corresponding document category, submitting the unrecognized text features to a training phase.

Description

CROSS REFERENCE TO RELATED APPLICATIONS [0001]This application claims the benefit under 35 U.S.C. §119(e) of U.S. Provisional Patent Application No. 61 / 295,210, filed Jan. 15, 2010, which is hereby incorporated by reference herein in its entirety.[0002]This application is also related to the following applications filed concurrently herewith on Jan. 14, 2011:[0003]U.S. patent application Ser. No. ______, entitled “Systems and methods for automatically extracting data from electronic documents containing multiple layout features;”[0004]U.S. patent application Ser. No. ______, entitled “Systems and methods for automatically extracting data from electronic documents using external data;”[0005]U.S. patent application Ser. No. ______, entitled “Systems and methods for automatically correcting data extracted from electronic documents using known constraints for semantics of extracted data elements;”[0006]U.S. patent application Ser. No. ______, entitled “Systems and methods for automatica...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F15/18G06V30/10G06V30/182G06V30/224G06V30/262G06V30/40
CPCG06K9/00442G06K2209/01G06K9/72G06V30/40G06V30/10G06V30/182G06V30/262
Inventor NEOGI, DEPANKARLADD, STEVEN K.WELLING, GIRISHKUMAR, ARJUNSINGH, VARTIKADUGGAN, MATTHEWMAHATA, TUSHARYANG, XIAOBINXU, JIAN-WUO'NEIL, JANICESARKAR, NIRUPAMKRISHNA, GOPAL
Owner GRUNTWORX
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products