Data classification method based on ID3 algorithm

A data classification and algorithm technology, applied in computing, computer components, instruments, etc., can solve problems such as multi-valued bias, and achieve the effect of improving the accuracy of prediction

Pending Publication Date: 2019-07-12
XI'AN POLYTECHNIC UNIVERSITY
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0012] The purpose of the present invention is to provide a kind of data classification method based on ID3 algorithm, has solved the problem of the multi-valued bias that exists in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data classification method based on ID3 algorithm
  • Data classification method based on ID3 algorithm
  • Data classification method based on ID3 algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0074] Using the commercial car purchase customer database (as shown in Table 1) as the training set D, the sample set is obtained after the data is selected, preprocessed and converted. This set contains 4 conditional attributes: favorite season (including 4 attribute values: spring, summer, autumn, winter), whether a business person (including 2 attribute values: yes, no), income (including 3 attribute values: high, medium, low), driving level (including 2 attribute values: good, generally). The sample set is divided according to the category attribute "whether to buy a car" (contains 2 attribute values: buy and not to buy).

[0075]

[0076] Utilize the data classification method of the present invention to classify each attribute in the training set D, specifically as follows:

[0077] Step 1. Obtain the information entropy I and conditional entropy E (A i ) and information gain Gain(A i ):

[0078] Step 1.1, calculate the information entropy I of the classification...

Embodiment 2

[0125] The Benxi Formation database of the Sulige Gas Field (as shown in Table 2) is used as the training set Y. According to the single-layer gas test data of Block X in the Sulige Gas Field over the years, "effective thickness", "shale content", "matrix permeability", " Gas saturation" 4 conditional attributes.

[0126]

[0127]

[0128] The k-means cluster analysis method was used to select, preprocess and transform the data. Take the effective thickness as an example:

[0129] Step 1. Randomly pick three values ​​among the effective thicknesses of these 15 wells: μ 1 =3.3,μ 2 =5.5,μ 3 = 6.6;

[0130] Step 2. Utilize the formula (11) to calculate the class to which the effective thickness of each well belongs, and there are 3 clusters;

[0131] Step 3. For each cluster, use formula (12) to recalculate the centroid, μ 1 = 4, μ 2 =6.

[0132] The effective thickness can be divided into three intervals {y<4, 4≤y≤6, 6

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data classification method based on an ID3 algorithm. The method comprises the following steps: optimizing an information gain Gain (Ai) obtained by an ID3 algorithm by utilizing an equalization coefficient R (Ai) to obtain an optimized information gain Gain (Ai) new, obtaining a root node and branch nodes of a decision tree according to the optimized information gain Gain (Ai) new, and classifying attributes Ai. By introducing the equalization coefficient R (Ai) and an attribute deviation threshold T, the measurement and control of the multi-value deviation degree are realized, the problem of attributes with more information gain deviation values can be avoided, and the prediction accuracy and the practicability and effectiveness of the ID3 algorithm are furtherimproved.

Description

technical field [0001] The invention belongs to the technical field of data classification methods, and relates to a data classification method based on an ID3 algorithm. Background technique [0002] Due to the rapid development of software technology and Internet technology, we are currently in an era of information explosion. With the development of database technology and data mining technology, people can efficiently collect and store a large amount of data, discover potential relationships and rules in the data, and predict future development trends, so as to provide decision makers with the necessary support for their decision-making. value information. Classification algorithm is the most commonly used data analysis method in data mining. The function of classification algorithm is to accurately distinguish the category it belongs to according to the data set. The current main classification techniques and methods are: Bayesian classification, rule induction, decis...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23213G06F18/24
Inventor 孟雅蕾王予
Owner XI'AN POLYTECHNIC UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products