Electronic Data Interchange is known for custom computer systems, cumbersome
software and bloated standards that defeated its rapid spread throughout the supply chain.
Perceived as too expensive, the vast majority of businesses have avoided implementing EDI.
Similarly, applications of
XML,
XBRL and other computer-readable document files are quite limited compared to the use of documents in paper and
digital image formats (such as PDF and TIFF.)
As a result, the cost of
data extraction is often quite high; numerous studies estimate the cost of
processing invoices in excess of ten dollars each.
The cost is especially high when the data extraction is performed by accountants, lawyers, physicians and other highly paid professionals as part of their work.
Despite the potential productivity gains that are enabled with
workflow software in the form improved labor utilization, manual
document processing remains a fundamentally expensive process.
Since outsourcing is manual, just as is conventional data extraction, it is also complex, time-consuming and error-prone.
Quality problems with offshore data extraction work have been reported by many customers.
These measures reduce the
cost savings expected from offshore outsourcing.
Outsourcing and offshoring are accompanied with concerns over security risks associated with fraud and
identity theft.
Although the transmission of scanned image files to the data extraction organization may be secured by cryptographic techniques, the sensitive data and personal identifying information are in the clear, i.e., unencrypted, when read by data extraction workers prior to entry in the appropriate computer systems.
Many data extraction organizations claim to strictly limit
physical access to the rooms in which the employees enter the data; further, such rooms may be isolated.
Since such seemingly comprehensive security precautions are primarily physical in nature, they are imperfect.
The owners, managers, staff, guards and contractors of data extraction organizations may misuse some or all of the unencrypted confidential information in their care.
Further, breaches of physical and
information system security by external parties can occur.
Because data extraction organizations are increasingly located in foreign countries, there is often little or no recourse for American citizens victimized in this manner.
Because such customization projects often cost upwards of hundreds of thousands of dollars, data extraction
automation is usually limited to large organizations that can afford significant capital investments.
Many documents are neither clean nor high quality, suffering from being folded or marred before scanning, distorted during scanning and degraded during post-scanning binarization.
As a result, some of the labels needed to identify data are often not recognizable; therefore, some of the data cannot be automatically extracted.
When a wide range of forms exists, such as the 10,000 plus variations of W-2, 1099, K-1 and other personal income tax forms,
automated data extraction is quite limited.
Because
automation requires human inspection, source documents with sensitive information are exposed in their entirety to data extraction workers.