Machine Learning System

Active Publication Date: 2008-11-13
PANSCIENT
View PDF2 Cites 70 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0037]This improves the classifier's ability to take into account the type of prio

Problems solved by technology

There are a large number of computational problems that are too complex for a human to explicitly determine and code a solution.
This requirement leads to one serious disadvantage of these types of algorithms in the case where the positive class is under-represented in the natural distribution over the data of interest.
As the labeling procedure is performed by humans, it can be a labour intens

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Machine Learning System
  • Machine Learning System
  • Machine Learning System

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0066]Referring now to FIG. 1, which illustrates a prior art Active Learning system for the classification of documents. Whilst the present invention is described with reference to the classification of documents it will be clear to those skilled in the art that the system described herein is equally applicable to othe machine learning applications where elements of a data set must be reliably classified according to a characteristic of that element.

[0067]Corpus 100 is a data set consisting of a plurality of text documents such as web pages. In practice each document is represented by a vector d=[ω1, . . . ωN] which is an element of the high-dimensional vector-space consisting of all terms. In this representation ωi is non-zero for document d only if the document contains term ti. The numerical value of wi can be set in a variety of ways, ranging from simply setting it to 1, regardless of the frequency of ti in d, through to the use of more sophisticated weighting schemes such as tf...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A method for training a classifier to classify elements of a data set according to a characteristic is described. The data set includes N elements with the elements each characterized by at least one feature. The method includes the steps of forming a first labeled subset of elements from the data set with the elements of the first labeled subset each labeled according to whether the element includes the characteristic, training an algorithmic classifier to classify for the characteristic according to the first labeled subset thereby determining which at least one feature is relevant to classifying for the characteristic; and then querying with the classifier an inverted index, with this inverted index formed over the at least one feature and generated from the data set, thereby generating a ranked set of elements from the data set.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS[0001]This is a national stage application under 35 USC 371 of International Application No. PCT / AU2005 / 001488, filed Sep. 29, 2005, which claims priority from Australian Patent Application No. 2004-905602, filed Sep. 29, 2004, the entire disclosures of which are incorporated herein by reference.BACKGROUND OF THE INVENTION[0002]There are a large number of computational problems that are too complex for a human to explicitly determine and code a solution. Examples of such problems include machine recognition of human facial characteristics, speech recognition, the classification of a corpus of documents into a taxonomy and the extraction of information from documents. In an attempt to solve these problems a class of algorithms has been developed that effectively train a computer to perform a specific task by providing example data. This class of algorithms comes under the broad heading of Machine Learning as the computer running such algorithms ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F15/18G06N20/00
CPCG06K9/6256G06K9/629G06N99/005G06N20/00G06F18/214G06F18/253
Inventor BAXTER, JONATHAN
Owner PANSCIENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products