A decision tree generation method based on an ID3 algorithm

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A decision tree and algorithm technology, applied in computing, computer parts, instruments, etc., can solve problems to be further analyzed and discussed, and achieve the effect of reasonable feature selection and avoiding overfitting.

Inactive Publication Date: 2019-03-29

TIANJIN UNIV

View PDF0 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The idea of using clustering algorithm to discretize the continuous attribute values in the decision tree is proposed, but the research on the specific implementation, prediction accuracy, application occasions and limitations of this method still needs further analysis and discussion

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0014] The basic idea of the present invention is: utilize K-means++ algorithm to discretize the continuous attribute values in the data set, then calculate the importance SGA (a, P, A) of each conditional attribute, and select the attribute with great importance as the split point. Iterate until all condition attributes are used as split nodes. Finally, it is pruned into a decision tree.

[0015] As shown in Figure (1), the specific steps are as follows:

[0016] 1) Data initialization, counting the number of samples in the training set, assuming that the training set D has K classes in total, and counting the number of samples in D1...Dk.

[0017] 2) Determine whether the attribute is discrete. If it is discrete, go to step 3. Otherwise, determine the number of values after discretization, apply the K-means++ algorithm to discretize, and replace the original continuous values with discrete values.

[0018] 3) Calculate the importance of active conditional attribu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a decision tree generation method based on an ID3 algorithm, which improves the ID3 algorithm. Means + + algorithm discretizes the values of the continuous attributes in the dataset, then the importance SGA (a, P, A) of each conditional attribute is calculated, the attributes with great importance are selected as splitting points, iterating is performed repeatedly until all conditional attributes are used as splitting nodes, and finally pruning is performed to form a decision tree.

Description

technical field [0001] The invention belongs to the technical field of machine learning and data mining. Background technique [0002] Data mining is the analysis of observed data sets (often huge) with the aim of discovering [0003] Unknown relationships and summarizing data in novel ways that the data owner can understand and value. The "observed data" mentioned in this definition is relative to the "laboratory obtained" data. Generally speaking, the data processed by data mining has been collected for some other purpose, not for data analysis itself. This means that the goal of data mining is not in the data acquisition strategy at all. This is a feature that distinguishes data mining from most statistical tasks. In statistics, high-efficiency strategies are often used to collect data to answer specific questions. Data mining is to find the relationship in the data set, that is, to find the representation of a certain feature of the data that is accurate, convenient ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06K9/62

CPCG06F18/23213G06F18/24323G06F18/214

Inventor王宝亮马明杰

OwnerTIANJIN UNIV

A decision tree generation method based on an ID3 algorithm

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology