Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

System and method for scalable cost-sensitive learning

a cost-sensitive learning and cost-sensitive learning technology, applied in the field of inductive learning, can solve the problems of the intrinsic cost of each fraud case investigation, the second cost associated with the development of the model that is related to the cost of computer time and resources, and the challenge and investigation of fraud cases are not free, so as to improve the development of the learning model, improve the accuracy of the ensemble, and reduce the time to develop

Inactive Publication Date: 2005-06-09
IBM CORP
View PDF0 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0019] In view of the foregoing exemplary problems, drawbacks, and disadvantages of the conventional methods, an exemplary feature of the present invention is to provide a structure and method for an inductive learning technique that significantly increases the accuracy of the basic inductive learning model.
[0020] It is another exemplary feature of the present invention to provide a technique in which throughput is increased by at least ten to twenty times the throughput of the basic inductive learning model.
[0026] In a sixth exemplary aspect of the present invention, also described herein is a method of at least one of increasing a speed of development of a learning model for a dataset of examples and increasing an accuracy of the learning model, including dividing the dataset into N subsets of data and developing an estimated learning model for the dataset by developing a learning model for a first subset of the N subsets.
[0028] With the above and other exemplary aspects, the present invention provides a method to improve learning model development by increasing accuracy of the ensemble, by decreasing time to develop a sufficiently accurate ensemble, and by providing quantitative measures by which a user (e.g., one developing the model or implementing an application based on the model) can decide when to terminate the model development because the ensemble is predicted as being sufficiently accurate.

Problems solved by technology

Because there is a cost associated with the letters, and each individual will either donate different amount of money or does not donate at all, this model is cost-sensitive.
Fraud challenging and investigation are not free.
There is an intrinsic cost associated with each fraud case investigation.
As should be apparent, there is also a second cost associated with the development of the model that is related to the cost of the computer time and resources necessary to develop a model over a database, particularly in scenarios where the database contains a large amount of data.
A problem recognized by the present inventors is that, in current learning model methods, the entire database must be evaluated before the effects of the hypothetical parameters for the test model are known.
Depending upon the size of the database, each such test scenario will require much computer time (sometimes many hours or even days) and cost, and it can become prohibitive to spend so much effort in the development of an optimal model to perform the intended task.
Hence, there is currently no method that efficiently models the cost-benefit tradeoff short of taking time and computer resources to analyze the entire database and predicting the accuracy of the model for whose parameters are undergoing evaluation.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and method for scalable cost-sensitive learning
  • System and method for scalable cost-sensitive learning
  • System and method for scalable cost-sensitive learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] Referring now to the drawings, and more particularly to FIGS. 1-14, exemplary embodiments for a new framework of scalable cost-sensitive learning are now presented. The illustrative scenario of a charity donation database, from which is to be selected a subset of individuals to whom to send campaign letters, will continue to be used for teaching the concepts of the present invention.

[0057] As an introduction, disclosed herein is a method and structure for learning a model using ensembles of classifiers. First, the original, potentially large dataset is partitioned into multiple subsets. Base classifiers are learned from these data subsets, one by one, sequentially. The accuracy of the current ensemble comprised of models computed at any point in the processing is reported to the user.

[0058] At the same time, the overall accuracy of the final ensemble comprised of every single model computed from every data subset is statistically estimated and also reported to the end user....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method (and structure) for processing an inductive learning model for a dataset of examples, includes dividing the dataset into N subsets of data and developing an estimated learning model for the dataset by developing a learning model for a first subset of the N subsets.

Description

BACKGROUND OF THE INVENTION [0001] 1. Field of the Invention [0002] The present invention generally relates to a technique of inductive learning. More specifically, an inductive model is built both “accurately” and “efficiently” by dividing a database of examples into N disjoint subsets of data, and a learning model (base classifier), including a prediction of accuracy, is sequentially developed for each subset and integrated into an evolving aggregate (ensemble) learning model for the entire database. The aggregate model is incrementally updated by each completed subset model. The prediction of accuracy provides a quantitative measure upon which to judge the benefit of continuing processing for remaining subsets in the database or to terminate at an intermediate stage. [0003] 2. Description of the Related Art [0004] Modeling is a technique to learn a model from a set of given examples of the form {(x1, y1), (x2, y2), . . . , (xn, yn)}. Each example (xi, yi) is a feature vector, xi....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N20/20G06F7/00
CPCG06N99/005G06N20/00G06N20/20
Inventor FAN, WEIWANG, HAIXUNYU, PHILIP S.
Owner IBM CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products