Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Machine Learning System

Active Publication Date: 2008-11-13
PANSCIENT
View PDF2 Cites 70 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0012]Clearly, the ranked set of elements provided by this method will include those elements which incorporate features that the classifier identifies as being relevant to the classification problem. These elements are therefore more likely to provide more positive labeling candidates thereby overcoming a significant problem with prior art methods in those problems where positive labeling candidates are relatively rare in the data set.
[0019]As the new labeled subset will now contain more relevant examples to train the classifier, the classifier will be able to more easily determine the relative importance of different features for the classification problem under consideration.
[0037]This improves the classifier's ability to take into account the type of prior elements in the sequence when attempting to label a given element in a sequence.

Problems solved by technology

There are a large number of computational problems that are too complex for a human to explicitly determine and code a solution.
This requirement leads to one serious disadvantage of these types of algorithms in the case where the positive class is under-represented in the natural distribution over the data of interest.
As the labeling procedure is performed by humans, it can be a labour intensive and hence expensive process.
However, Active Learning algorithms of this type do not address the often fundamentally limiting practical problem of how to efficiently search the total data set for the proposed better labeling candidates.
As most practical problems of any utility usually involve extremely large data sets this can seriously reduce the effectiveness of an Active Learning system.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Machine Learning System
  • Machine Learning System
  • Machine Learning System

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0066]Referring now to FIG. 1, which illustrates a prior art Active Learning system for the classification of documents. Whilst the present invention is described with reference to the classification of documents it will be clear to those skilled in the art that the system described herein is equally applicable to othe machine learning applications where elements of a data set must be reliably classified according to a characteristic of that element.

[0067]Corpus 100 is a data set consisting of a plurality of text documents such as web pages. In practice each document is represented by a vector d=[ω1, . . . ωN] which is an element of the high-dimensional vector-space consisting of all terms. In this representation ωi is non-zero for document d only if the document contains term ti. The numerical value of wi can be set in a variety of ways, ranging from simply setting it to 1, regardless of the frequency of ti in d, through to the use of more sophisticated weighting schemes such as tf...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method for training a classifier to classify elements of a data set according to a characteristic is described. The data set includes N elements with the elements each characterized by at least one feature. The method includes the steps of forming a first labeled subset of elements from the data set with the elements of the first labeled subset each labeled according to whether the element includes the characteristic, training an algorithmic classifier to classify for the characteristic according to the first labeled subset thereby determining which at least one feature is relevant to classifying for the characteristic; and then querying with the classifier an inverted index, with this inverted index formed over the at least one feature and generated from the data set, thereby generating a ranked set of elements from the data set.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS[0001]This is a national stage application under 35 USC 371 of International Application No. PCT / AU2005 / 001488, filed Sep. 29, 2005, which claims priority from Australian Patent Application No. 2004-905602, filed Sep. 29, 2004, the entire disclosures of which are incorporated herein by reference.BACKGROUND OF THE INVENTION[0002]There are a large number of computational problems that are too complex for a human to explicitly determine and code a solution. Examples of such problems include machine recognition of human facial characteristics, speech recognition, the classification of a corpus of documents into a taxonomy and the extraction of information from documents. In an attempt to solve these problems a class of algorithms has been developed that effectively train a computer to perform a specific task by providing example data. This class of algorithms comes under the broad heading of Machine Learning as the computer running such algorithms ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F15/18G06N20/00
CPCG06K9/6256G06K9/629G06N99/005G06N20/00G06F18/214G06F18/253
Inventor BAXTER, JONATHAN
Owner PANSCIENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products