Systems and methods to automatically classify electronic documents using extracted image and text features and using a machine learning subsystem

Inactive Publication Date: 2009-05-07
GRUNTWORX
View PDF41 Cites 40 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0013]Systems and methods to automatically organize electronic jobs by automatically classifying electronic documents using extracted image and text features and using a machine-learning recognition subsystem are provided. In some embodiments, a document analysis system that automatically classifies documents by recognizing in each document distinctive features that have been automatically learned by the system, so that the system may organize jobs according to the categories of documents the job contains, is provided. The document analysis system includes a document acquisition system, a document recognition training system, a document classification system, a document recognition system, and a job organization system. The document acquisition system receives jobs from a plurality of users, each job containing at least one electronic document having at least one page that includes image aspects and text. The document feature recognition system automatically extracts image and text features from each received electronic document. The document classification system automatically classifies recognized electronic documen

Problems solved by technology

In many instances, however, the paper documents are scanned in a random, unorganized sequence, which makes it difficult and time-consuming to find a particular page within the electronic document.
One solution can be to manually organize the paper documents prior to scanning; however, the individual organizing the paper documents or performing the scanning may not have the skill, knowledge or time needed to correctly organize the paper documents.
Additionally, organizing the paper documents prior to scanning can be very time-consuming and expensive.
Further, organizing the pages prior to scanning might properly ord

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Systems and methods to automatically classify electronic documents using extracted image and text features and using a machine learning subsystem
  • Systems and methods to automatically classify electronic documents using extracted image and text features and using a machine learning subsystem
  • Systems and methods to automatically classify electronic documents using extracted image and text features and using a machine learning subsystem

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028]While the prior art attempts to reduce the cost of electronic document organization through the use of software, none of the above methods of document organization (1) eliminates the human labor and accompanying requirements of education, domain expertise, training, and / or software knowledge, (2) minimizes time spent entering and quality checking page categorization, (3) minimizes errors and (4) protects the privacy of the owners of the data on the electronic documents being organized. What is needed, therefore, is a method of performing electronic document organization that overcomes the above-mentioned limitations and that includes the features numerated above.

[0029]Preferred embodiments of the present invention provide a method and system for converting paper and digital documents into well-organized electronic documents that are indexed, searchable and editable. The resulting organized electronic documents support more rapid and accurate data entry, retrieval and review th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A document analysis system that automatically classifies documents by recognizing in each document distinctive features comprises a document acquisition system, a document recognition training system, a document classification system, a document recognition system, and a job organization system. The document acquisition system receives jobs wherein each job containing at least one electronic document. The document feature recognition system automatically extracts image and text features from each received document. The document classification system automatically classifies recognized electronic documents by finding the best match between the extracted features of each of the document and feature sets associated with each category of document. The document recognition training system automatically trains the feature set for each corresponding category of documents, wherein the training system using extracted features of unrecognized documents automatically modifies the feature set for a document category. The job organization system automatically organizes each job according to the document categories it contains.

Description

CROSS REFERENCE TO RELATED APPLICATIONS[0001]This application claims the benefit under 35 U.S.C. §119(e) of U.S. Provisional Patent Application Ser. No. 60 / 985,851, filed on Nov. 6, 2007, which is hereby incorporated by reference herein its entirety.[0002]This application is related to the following applications filed concurrently herewith, the entire contents of which are incorporated by reference:[0003]U.S. patent application Ser. No. (TBA), entitled “Systems and Methods for Classifying Electronic Documents by Extracting and Recognizing Text and Image Features Indicative of Document Categories;”[0004]U.S. patent application Ser. No. (TBA), entitled “Systems and Methods for Training a Document Classification System Using Documents from a Plurality of Users;”[0005]U.S. patent application Ser. No. (TBA), entitled “Systems and Methods for Parallel Processing of Document Recognition and Classification Using Extracted Image and Text Features;”[0006]U.S. patent application Ser. No. (TBA)...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06V30/40
CPCG06K9/6885G06K9/00442G06V30/40G06V30/1985
Inventor NEOGI, DEPANKARLADD, STEVEN K.AHMED, DILNAWAJKUMAR, ARJUNMAHATA, TUSHAR
Owner GRUNTWORX
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products