Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data classification based on point-of-view dependency

a data classification and dependency technology, applied in the field of automatic data classification, can solve the problems of difficult decisions, many machine learning algorithms display non-linear efficiency with respect to the number, and complex data classification systems have been developed, so as to improve accuracy

Inactive Publication Date: 2011-05-26
BIZ360
View PDF24 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

"The present invention provides a method and system for data classification by using feature vectors and a pattern discriminator that can identify relevant features in an input item and classify it based on the point-of-view. This approach improves accuracy and can be applied to new points-of-view without needing to re-train the model. The system uses feature weighting and a mathematical engine, such as a support vector machine, to engage in feature weighting. The invention also includes data classifiers that classify received data sets based on specific patterns observed during the training process."

Problems solved by technology

Arguably, feature selection is primarily performed for efficiency reasons, as many machine learning algorithms display non-linear efficiency with respect to the number of distinct features.
More complex data classification systems have been developed.
Data classification systems might make hard decisions as to how to classify a given input document.
One problem with existing data classification systems is that real world examples might be more involved and items would be classified differently depending on other considerations.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data classification based on point-of-view dependency
  • Data classification based on point-of-view dependency
  • Data classification based on point-of-view dependency

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

(a) Definitions and General Parameters

[0026]The following definitions are set forth to illustrate and define the meaning and scope of the various terms used herein.

[0027]The terms “input item” and “document” are interchangeably used herein and refer to any item that can be used in conjunction with the present classification method. For example, an input item may include, but is not limited to, a word processing document, a file of a particular format (e.g., ASCII file, XML file, UTF-8 file, etc.), a collection of documents with some structural organization, an image, text, a combination of images and text, media, spreadsheet data, a collection of bytes, or other organizations of data or data streams.

[0028]The term “relevant feature” refers to a uniquely identifiable attribute that could affect the detection of patterns within a corpus. Relevant features might be domain specific, for example, in the case of English text classification, a relevant feature might be the presence of a un...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Data classification is used to classified input items by associating the input items with one or more classes from a set of one or more classes in a data classification system, including identifying relevant features in an input item to form a feature vector for the input item, receiving at the data classification system an indication of a point-of-view, adjusting the feature vector according to the point-of-view indication or modifying a pattern discriminator (e.g., trainer and classifier) to inline-process feature vectors depending on the provided point-of-view (e.g., SVM custom kernels), and classifying the input item into the set of classes according to the point-of-view. The point-of-view data can be introduced either as a pre-process step prior to passing it off to the pattern discrimination algorithm, or can be incorporated directly into the pattern discrimination algorithm if applicable. The pattern discrimination algorithms can detect arbitrary patterns given a similarly prepared dataset during both training and subsequent classification of unclassified documents.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS[0001]This application is a Continuation of U.S. application Ser. No. 10 / 931,291, filed Aug. 30, 2004, entitled DATA CLASSIFICATION BASED ON POINT-OF-VIEW DEPENDENCY,” (Attorney Docket No. 021389-000410US) now allowed, which claims priority from co-pending U.S. Provisional Patent Application No. 60 / 499,196 filed Aug. 28, 2003 entitled DATA CLASSIFICATION BASED ON POINT-OF-VIEW DEPENDENCY, all of which are hereby incorporated by reference, as if set forth in full in this document, for all purposes.FIELD OF THE INVENTION[0002]The present invention relates to automated data classification in general and data classifiers of documents based on content in particular.BACKGROUND OF THE INVENTION[0003]Data classification systems are useful in many applications. One application is in filtering data, as might be done as part of a search over a corpus of data. While many data structures might be used with a data classification system, a typical example is ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F17/30
CPCG06F17/30707G06F16/353
Inventor GARTUNG, DANIELCHAN, PHILIPROTHERHAM, JOHN
Owner BIZ360
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products