Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Parallel generation method and device of decision tree on the basis of layered strategy

A decision tree and strategy technology, applied in the field of data communication, can solve problems such as the performance degradation of decision tree parallelization algorithm

Inactive Publication Date: 2016-07-27
HUAWEI TECH CO LTD
View PDF4 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Since each map in the MapReduce parallelization algorithm is aimed at the attribute level, and the intermediate results of each map need to be written to the disk, for high-dimensional and massive training data, when the number of nodes in the decision tree increases, The performance of the decision tree parallelization algorithm will be severely degraded, or even unbearable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel generation method and device of decision tree on the basis of layered strategy
  • Parallel generation method and device of decision tree on the basis of layered strategy
  • Parallel generation method and device of decision tree on the basis of layered strategy

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0070] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0071] Decision tree is the main technology used for classification and prediction. It focuses on inferring the classification rules of decision representation from a set of irregular training data sets. In the existing technology, it is generated in a top-down recursive manner. When it is a node, determine the splitting attribute of the node, det...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a parallel generation method and device of a decision tree on the basis of a layered strategy. The decision tree is constructed according to layers downstream from the top; when the splitting attribute and the optimal split point of each node of an nth layer are calculated, a training data set only needs to be scanned for one time, each piece of training data in the training data set is classified into each node of the nth layer according to the filter condition of each node of the nth layer; then, the splitting attribute and the optimal split point of each node are independently calculated to generate the node information of each node; and according to the node information of each node of each layer of the decision tree, constructing the decision tree. In the above method, when the splitting attribute and the optimal split point of each node of each layer are calculated, the number of training data of each category in each splitting interval of each attribute in each node is subjected to parallel statistics according to the filtering condition of each node, so that the splitting attribute and the optimal split point of each node of the same layer can be subjected to parallel computation, and the performance of the decision tree is improved.

Description

technical field [0001] Embodiments of the present invention relate to data communication technologies, and in particular to a method and device for parallel generation of decision trees based on hierarchical strategies. Background technique [0002] Decision Tree (decisionTree) is a main method for classification and prediction, and is used to represent the tree of the corresponding relationship between decision-making and corresponding decision-making results. Each non-leaf node in the tree represents a decision whose value leads to a different decision result (leaf node) or affects subsequent decision choices. The decision tree corresponds to a rule from the root node to the leaf node, and the whole tree corresponds to a set of expression rules. [0003] The generation process of the existing decision tree adopts a recursive method, and the split attribute of each node of the decision tree is calculated sequentially from top to bottom, and the calculation of the split att...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 曹莉金晓明
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products