Unlock instant, AI-driven research and patent intelligence for your innovation.
Data classification method based on ID3 algorithm
What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A data classification and algorithm technology, applied in computing, computer components, instruments, etc., can solve problems such as multi-valued bias, and achieve the effect of improving the accuracy of prediction
Pending Publication Date: 2019-07-12
XI'AN POLYTECHNIC UNIVERSITY
View PDF0 Cites 3 Cited by
Summary
Abstract
Description
Claims
Application Information
AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology
Problems solved by technology
[0012] The purpose of the present invention is to provide a kind of data classification method based on ID3 algorithm, has solved the problem of the multi-valued bias that exists in the prior art
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more
Image
Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
Click on the blue label to locate the original text in one second.
Reading with bidirectional positioning of images and text.
Smart Image
Examples
Experimental program
Comparison scheme
Effect test
Embodiment 1
[0074] Using the commercial car purchase customer database (as shown in Table 1) as the training set D, the sample set is obtained after the data is selected, preprocessed and converted. This set contains 4 conditional attributes: favorite season (including 4 attribute values: spring, summer, autumn, winter), whether a business person (including 2 attribute values: yes, no), income (including 3 attribute values: high, medium, low), driving level (including 2 attribute values: good, generally). The sample set is divided according to the category attribute "whether to buy a car" (contains 2 attribute values: buy and not to buy).
[0075]
[0076] Utilize the data classification method of the present invention to classify each attribute in the training set D, specifically as follows:
[0077] Step 1. Obtain the information entropy I and conditional entropy E (A i ) and information gain Gain(A i ):
[0078] Step 1.1, calculate the information entropy I of the classification...
Embodiment 2
[0125] The Benxi Formation database of the Sulige Gas Field (as shown in Table 2) is used as the training set Y. According to the single-layer gas test data of Block X in the Sulige Gas Field over the years, "effective thickness", "shale content", "matrix permeability", " Gas saturation" 4 conditional attributes.
[0126]
[0127]
[0128] The k-means cluster analysis method was used to select, preprocess and transform the data. Take the effective thickness as an example:
[0129] Step 1. Randomly pick three values among the effective thicknesses of these 15 wells: μ 1 =3.3,μ 2 =5.5,μ 3 = 6.6;
[0130] Step 2. Utilize the formula (11) to calculate the class to which the effective thickness of each well belongs, and there are 3 clusters;
[0131] Step 3. For each cluster, use formula (12) to recalculate the centroid, μ 1 = 4, μ 2 =6.
[0132] The effective thickness can be divided into three intervals {y<4, 4≤y≤6, 6
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
PUM
Login to View More
Abstract
The invention discloses a data classification method based on an ID3 algorithm. The method comprises the following steps: optimizing an information gainGain (Ai) obtained by an ID3 algorithm by utilizing an equalization coefficient R (Ai) to obtain an optimized information gainGain (Ai) new, obtaining a root node and branch nodes of a decision tree according to the optimized information gainGain (Ai) new, and classifying attributes Ai. By introducing the equalization coefficient R (Ai) and an attribute deviation threshold T, the measurement and control of the multi-value deviation degree are realized, the problem of attributes with more information gain deviation values can be avoided, and the prediction accuracy and the practicability and effectiveness of the ID3 algorithm are furtherimproved.
Description
technical field [0001] The invention belongs to the technical field of data classification methods, and relates to a data classification method based on an ID3 algorithm. Background technique [0002] Due to the rapid development of software technology and Internet technology, we are currently in an era of information explosion. With the development of database technology and data mining technology, people can efficiently collect and store a large amount of data, discover potential relationships and rules in the data, and predict future development trends, so as to provide decision makers with the necessary support for their decision-making. value information. Classification algorithm is the most commonly used data analysis method in data mining. The function of classification algorithm is to accurately distinguish the category it belongs to according to the data set. The current main classification techniques and methods are: Bayesian classification, rule induction, decis...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
Application Information
Patent Timeline
Application Date:The date an application was filed.
Publication Date:The date a patent or application was officially published.
First Publication Date:The earliest publication date of a patent with the same application number.
Issue Date:Publication date of the patent grant document.
PCT Entry Date:The Entry date of PCT National Phase.
Estimated Expiry Date:The statutory expiry date of a patent right according to the Patent Law, and it is the longest term of protection that the patent right can achieve without the termination of the patent right due to other reasons(Term extension factor has been taken into account ).
Invalid Date:Actual expiry date is based on effective date or publication date of legal transaction data of invalid patent.