Horizontal Decision Tree Learning from Very High Rate Data Streams

A technique of decision tree, data processing system, applied in the field of improved data processing device

Active Publication Date: 2016-12-14
INT BUSINESS MASCH CORP
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The method further includes sending candidate leaf splitting actions by multiple model update processing units to multiple conflict resolution processing units
The method further includes identifying, by the plurality of conflict resolution processing units, a conflicting leaf split action

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Horizontal Decision Tree Learning from Very High Rate Data Streams
  • Horizontal Decision Tree Learning from Very High Rate Data Streams
  • Horizontal Decision Tree Learning from Very High Rate Data Streams

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] Real-world applications of big data stream processing present several challenges. The data arrival rate is high. For example, in a small-scale connected vehicle platform, a Global Positioning System (GPS) application considers one million GPS data instances per second. Also, the number of data attributes (feature size) can be large. For example, real-time text analysis considers ten thousand or more attributes. With data arriving twenty-four hours a day and seven days a week, the amount of data to consider can be unlimited.

[0024] The illustrative embodiments provide mechanisms that enable horizontal decision trees to learn from very high rates of data streams. In some applications, such as in connected cars or vehicle-to-vehicle communication scenarios, the number of attributes is small, but the data rate is very high. The mechanisms of the illustrative embodiments horizontally parallelize the most computationally intensive part of horizontal decision tree learni...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Generally speaking, the invention relates to horizontal decision tree learning from very high rate data streams. A mechanism is provided in a data processing system for distributed tree learning. A source processing instance distributes data record instances to a plurality of model update processing items. The plurality of model update processing items determine candidate leaf splitting actions in a decision tree in parallel based on the data record instances. The plurality of model update processing items send the candidate leaf splitting actions to a plurality of conflict resolve processing items. The plurality of conflict resolve processing items identifies conflict leaf splitting actions. The plurality of conflict resolve processing items applies tree structure changes to the decision tree in the plurality of model update processing items.

Description

technical field [0001] The present application relates generally to improved data processing apparatus and methods, and more particularly to mechanisms capable of allowing horizontal decision trees to learn from very high rate data streams. Background technique [0002] Big data is the term used for data collections that are too large or complex for traditional data processing applications to handle. Challenges include analysis, capture, curation, search, sharing, storage, transmission, visualization, and information privacy. The term is often used to refer to the use of predictive analytics or other specific advanced methods to extract value from data, and rarely refers to data collections of a specific size. [0003] Stream computing is a key theme of big data. Stream computing is affected by the timeliness (Velocity), data volume (Volume), suspiciousness (Veracity), and diversity (Variety) of data. Stream computing applications must address low processing latency, high...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N20/20
CPCG06N20/20G06F18/24G06F8/35G06N20/00G06N5/045
Inventor 董维山高鹏胡国强李长升李旭良马春洋王植张欣
Owner INT BUSINESS MASCH CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products