Distributed decision tree training

A decision tree and distributed control technology, applied in the computer field, can solve problems such as difficult, impractical, and impossible to obtain improved classification capabilities of decision trees

Inactive Publication Date: 2012-01-25
MICROSOFT CORP
View PDF0 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, one disadvantage of decision tree training using large data sets is that such training can overwhelm the computing system's processor or memory resources, making training of decision trees impractical or impossible
As a result, c

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed decision tree training
  • Distributed decision tree training
  • Distributed decision tree training

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0013] figure 1 A computerized decision tree training system 10 is shown. Computerized decision tree training system 10 includes machine learning computer system 12 configured to receive training data 14 , process training data 14 , and output trained decision tree 16 . The training data 14 includes numerous examples that have been classified into one of a predetermined number of classes. Decision tree 16 may be trained based on training data 14 according to the procedure described below.

[0014] Training data 14 may include various data types and is generally organized into data units 18 . In one particular example, data unit 18 may contain an image 20 or image region, which in turn includes pixel data 22 . Alternatively or additionally, the data unit 18 may comprise audio data, video sequences, 3D medical scans or other data.

[0015] Subsequent to the training of the decision tree 16 of the computerized decision tree training system 10, the decision tree 16 may be inst...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A computerized decision tree training system may include a distributed control processing unit configured to receive input of training data for training a decision tree. The system may further include a plurality of data batch processing units, each data batch processing unit being configured to evaluate each of a plurality of split functions of a decision tree for respective data batch of the training data, to thereby compute a partial histogram for each split function, for each datum in the data batch. The system may further include a plurality of node batch processing units configured to aggregate the associated partial histograms for each split function to form an aggregated histogram for each split function for each of a subset of frontier tree nodes and to determine a selected split function for each frontier tree node by computing the split function that produces highest information gain for the frontier tree node.

Description

technical field [0001] The invention relates to the computer field, in particular to decision tree training in the computer field. Background technique [0002] Machine learning techniques can be employed to enable computers to process experimental data and draw conclusions about it. One example machine learning technique is to train a decision tree based on example data, and apply the trained decision tree to classify unknown data into one of several classes. In many applications, using the largest possible dataset for training a decision tree can yield more accurate results. However, one disadvantage of decision tree training using large data sets is that such training can overwhelm the processor or memory resources of a computing system, rendering the training of decision trees impractical or impossible. Consequently, computer scientists and software developers are limited by the size and complexity of the datasets they can use to train decision trees, and it is difficu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06N20/00
CPCG06K9/6282G06N20/00G06N5/01G06F18/24323
Inventor J·肖顿M-D·布迪尤A·W·费茨吉本M·菲诺齐奥R·E·摩尔D·罗伯逊
Owner MICROSOFT CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products