Data classification method based on ID3 algorithm

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A data classification and algorithm technology, applied in computing, computer components, instruments, etc., can solve problems such as multi-valued bias, and achieve the effect of improving the accuracy of prediction

Pending Publication Date: 2019-07-12

XI'AN POLYTECHNIC UNIVERSITY

View PDF0 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0012] The purpose of the present invention is to provide a kind of data classification method based on ID3 algorithm, has solved the problem of the multi-valued bias that exists in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0074] Using the commercial car purchase customer database (as shown in Table 1) as the training set D, the sample set is obtained after the data is selected, preprocessed and converted. This set contains 4 conditional attributes: favorite season (including 4 attribute values: spring, summer, autumn, winter), whether a business person (including 2 attribute values: yes, no), income (including 3 attribute values: high, medium, low), driving level (including 2 attribute values: good, generally). The sample set is divided according to the category attribute "whether to buy a car" (contains 2 attribute values: buy and not to buy).

[0075]

[0076] Utilize the data classification method of the present invention to classify each attribute in the training set D, specifically as follows:

[0077] Step 1. Obtain the information entropy I and conditional entropy E (A i ) and information gain Gain(A i ):

[0078] Step 1.1, calculate the information entropy I of the classification...

Embodiment 2

[0125] The Benxi Formation database of the Sulige Gas Field (as shown in Table 2) is used as the training set Y. According to the single-layer gas test data of Block X in the Sulige Gas Field over the years, "effective thickness", "shale content", "matrix permeability", " Gas saturation" 4 conditional attributes.

[0126]

[0127]

[0128] The k-means cluster analysis method was used to select, preprocess and transform the data. Take the effective thickness as an example:

[0129] Step 1. Randomly pick three values among the effective thicknesses of these 15 wells: μ 1 =3.3,μ 2 =5.5,μ 3 = 6.6;

[0130] Step 2. Utilize the formula (11) to calculate the class to which the effective thickness of each well belongs, and there are 3 clusters;

[0131] Step 3. For each cluster, use formula (12) to recalculate the centroid, μ 1 = 4, μ 2 =6.

[0132] The effective thickness can be divided into three intervals {y<4, 4≤y≤6, 6

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a data classification method based on an ID3 algorithm. The method comprises the following steps: optimizing an information gain Gain (Ai) obtained by an ID3 algorithm by utilizing an equalization coefficient R (Ai) to obtain an optimized information gain Gain (Ai) new, obtaining a root node and branch nodes of a decision tree according to the optimized information gain Gain (Ai) new, and classifying attributes Ai. By introducing the equalization coefficient R (Ai) and an attribute deviation threshold T, the measurement and control of the multi-value deviation degree are realized, the problem of attributes with more information gain deviation values can be avoided, and the prediction accuracy and the practicability and effectiveness of the ID3 algorithm are furtherimproved.

Description

technical field [0001] The invention belongs to the technical field of data classification methods, and relates to a data classification method based on an ID3 algorithm. Background technique [0002] Due to the rapid development of software technology and Internet technology, we are currently in an era of information explosion. With the development of database technology and data mining technology, people can efficiently collect and store a large amount of data, discover potential relationships and rules in the data, and predict future development trends, so as to provide decision makers with the necessary support for their decision-making. value information. Classification algorithm is the most commonly used data analysis method in data mining. The function of classification algorithm is to accurately distinguish the category it belongs to according to the data set. The current main classification techniques and methods are: Bayesian classification, rule induction, decis...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/62

CPCG06F18/23213G06F18/24

Inventor 孟雅蕾王予

Owner XI'AN POLYTECHNIC UNIVERSITY

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Data classification method based on ID3 algorithm

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology