A decision tree generation method based on an ID3 algorithm

A decision tree and algorithm technology, applied in computing, computer parts, instruments, etc., can solve problems to be further analyzed and discussed, and achieve the effect of reasonable feature selection and avoiding overfitting.

Inactive Publication Date: 2019-03-29
TIANJIN UNIV
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The idea of ​​using clustering algorithm to discretize the continuous attribute values ​​in the decision tree is proposed, but the research on the specific implementation, prediction accuracy, application occasions and limitations of this method still needs further analysis and discussion

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A decision tree generation method based on an ID3 algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0014] The basic idea of ​​the present invention is: utilize K-means++ algorithm to discretize the continuous attribute values ​​in the data set, then calculate the importance SGA (a, P, A) of each conditional attribute, and select the attribute with great importance as the split point. Iterate until all condition attributes are used as split nodes. Finally, it is pruned into a decision tree.

[0015] As shown in Figure (1), the specific steps are as follows:

[0016] 1) Data initialization, counting the number of samples in the training set, assuming that the training set D has K classes in total, and counting the number of samples in D1...Dk.

[0017] 2) Determine whether the attribute is discrete. If it is discrete, go to step 3. Otherwise, determine the number of values ​​after discretization, apply the K-means++ algorithm to discretize, and replace the original continuous values ​​with discrete values.

[0018] 3) Calculate the importance of active conditional attribu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a decision tree generation method based on an ID3 algorithm, which improves the ID3 algorithm. Means + + algorithm discretizes the values of the continuous attributes in the dataset, then the importance SGA (a, P, A) of each conditional attribute is calculated, the attributes with great importance are selected as splitting points, iterating is performed repeatedly until all conditional attributes are used as splitting nodes, and finally pruning is performed to form a decision tree.

Description

technical field [0001] The invention belongs to the technical field of machine learning and data mining. Background technique [0002] Data mining is the analysis of observed data sets (often huge) with the aim of discovering [0003] Unknown relationships and summarizing data in novel ways that the data owner can understand and value. The "observed data" mentioned in this definition is relative to the "laboratory obtained" data. Generally speaking, the data processed by data mining has been collected for some other purpose, not for data analysis itself. This means that the goal of data mining is not in the data acquisition strategy at all. This is a feature that distinguishes data mining from most statistical tasks. In statistics, high-efficiency strategies are often used to collect data to answer specific questions. Data mining is to find the relationship in the data set, that is, to find the representation of a certain feature of the data that is accurate, convenient ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23213G06F18/24323G06F18/214
Inventor 王宝亮马明杰
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products