Decision tree incremental learning method oriented to information big data

A technology of incremental learning and decision tree, applied in machine learning, computing model, computing and other directions, can solve problems such as unacceptable, reduced classification accuracy, and huge decision tree.

Inactive Publication Date: 2017-09-22
HARBIN ENG UNIV
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The initial decision tree algorithm is generally not applicable to incremental, typical representatives are ID3 and C4.5 algorithm, with the rapid increase of data volume, the use of traditional algorithm ideas will make the whole process very time-consuming and difficult Accept, so many decision tree incremental learning algorithms appeared later, such as ID5R, etc.
In the process of building a decision tree, the number of branches branched from a certain node in these decision tree algorithms corresponds exactly to the number of attribute values ​​of the classification attribute. If the decision tree is always split in this way, it will result in a decision tree It may be too large, which will limit the use of decision trees in practical applications, and too many branches may also cause overfitting and reduce the classification accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Decision tree incremental learning method oriented to information big data
  • Decision tree incremental learning method oriented to information big data
  • Decision tree incremental learning method oriented to information big data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] Combine below figure 1 , give an example to describe the present invention in more detail.

[0025] Step 1, node n 0 as the root node of the decision tree T. calculate n 0 The node splitting metric SC(n 0 ), if n 0 is a separable node, then the n 0 Put it into the set Q of nodes to be split. The node splitting criterion is in refers to the node that belongs to n i The number of records, MG(n i ) is the node n i Maximum information gain when splitting into two branches.

[0026] Step 2. If the number of leaf nodes in the decision tree T is less than the limited maximum number of leaf nodes and the set Q is not empty, repeat the following operations for all nodes in the set Q;

[0027] Step 3: From the candidate classification node set Q, select the node n with the largest splitting metric value b , and the node n b Deleted from set Q.

[0028] Step 4. Split node n b , and compute the split n b The node splitting metrics of the two child nodes generated...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a decision tree incremental learning method oriented to information big data. In front of a splitting node, a plurality of attribute values of each candidate attribute in the node are independently combined into two groups, and the candidate attribute with highest information gain is selected to divide the node into two branches. On an aspect of selecting a next node which is to be split, corresponding node splitting measurement values are calculated for all candidate nodes, and the candidate node which has a largest node splitting measurement value is always selected as a next splitting node. IID5R increases a function for evaluating classification attribute quality. By use of the method, NOLCDT is combined with the IID5R to put forward a hybrid classifier algorithm HCS (Hyperspherical Color Sharpening) which mainly consists of two stages of constructing an initial decision tree and carrying out incremental learning. According to the NOLCDT, an initial decision tree is established, and then, the IID5R is used to carry out the incremental learning. The HCS algorithm synthesizes the advantages of the decision tree and the incremental learning so as to bring convenience in comprehension and be suitable for the incremental learning.

Description

technical field [0001] The invention relates to a decision tree incremental learning method. Background technique [0002] With the rapid development of database technology, the amount of business intelligence data is also increasing rapidly. These data contain a lot of information that is not yet known. If this information is mined, it will be very helpful to people's work and life. Therefore, in order to be able to use the information implicit in the data, some analytical processing of the data is required. A large amount of knowledge is locked in the data, that is, knowledge that may be important, but has not yet been extracted. Various current databases can realize the common functions of data manipulation, but they cannot identify whether these data are related and what kind of rules exist, and there is no way to estimate the future dynamics based on the current data. It is precisely for this reason that there will be a situation that seems to be unreasonable, that i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N99/00
CPCG06N20/00
Inventor 周连科宋奎勇何鸣王红滨王念滨孙静王瑛琦朱洪瑞苏畅张海斌
Owner HARBIN ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products