Classification tree construction method and data processing method based on classification tree

A construction method and a classification tree technology, applied in the computer field, to achieve the effect of improving performance

Pending Publication Date: 2022-03-08
CHINA INST OF WATER RESOURCES & HYDROPOWER RES
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In summary, conventional classification tree algorithms can only consider discrete and continuous feature attributes

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Classification tree construction method and data processing method based on classification tree
  • Classification tree construction method and data processing method based on classification tree
  • Classification tree construction method and data processing method based on classification tree

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] A classification tree construction method, comprising the following steps:

[0044] 1) Initialize the classification tree, the specific content is: collect sample data, generate a root node, attribute all samples to the root node, take the mode of the sample set class mark as the class mark of the root node, and initialize the root node at the same time is the current only leaf node.

[0045] 2) Traverse the nodes and judge whether they are divisible;

[0046] 2-1 Traversing all nodes, looking for the leaf nodes in the deepest layer of the current classification tree;

[0047] 2-2 Calculate the Gini index of a node, assuming that the sample subset contained in the node is D, then the calculation method of the Gini index is:

[0048]

[0049] In the formula, K is the number of sample categories contained in D, p k It is the proportion of samples of the kth class.

[0050] 2-3 Setting Threshold Th min , Th min The minimum number of samples required for node splitti...

Embodiment 2

[0072] The risk of heart disease of the subject is predicted based on the classification tree constructed by the method of the first embodiment by using the physical measurement indexes related to the human body.

[0073] Construct a heart disease data set. The characteristics of the data set include: age, gender, type of angina pectoris, resting blood pressure, cholesterol, fasting blood sugar, and electrocardiogram; the data label is whether it is heart disease.

[0074] Age, resting blood pressure, cholesterol, and fasting blood glucose in the data set are continuous feature attributes, gender and angina pectoris type are discrete feature attributes, and electrocardiogram is a vector feature attribute.

[0075] By training a classification tree with the constructed data set, it is possible to predict whether a new case has a heart disease risk.

Embodiment 3

[0077] The classification tree constructed based on the method of Embodiment 1 uses the characteristic attributes of vertebrates to judge the animal type.

[0078] Construct a vertebrate data set. The characteristics of the data set include: body temperature, epidermis coverage, whether it is viviparous, whether it is aquatic, whether it is flying, whether it has legs, whether it hibernates, whether it is hibernating, body length, and gene sequence; the data labels are animals, including mammals, reptiles species, fish, amphibians, birds.

[0079] Body temperature and body length in the data set are continuous feature attributes, epidermis coverage, viviparous, aquatic, flying, legged, and hibernating are discrete feature attributes, and gene sequences are vector feature attributes.

[0080] The type of newly discovered animal species can be judged by training the classification tree with the constructed data set.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a classification tree construction method and a data processing method based on a classification tree. The classification tree construction comprises the following steps: 1) initializing the classification tree; 2) traversing nodes, and judging whether the nodes are separable or not; 3) calculating splitting attributes; (4) node splitting is carried out, and splitting information is recorded; 5) marking leaf nodes; and 6) growing the classified trees. A conventional classification tree only supports discrete and continuous feature attributes, and the invention provides a classification tree construction method, so that the classification tree can support vector feature attributes, and more accurate classification prediction can be carried out on sample data.

Description

technical field [0001] The invention belongs to the field of computer technology and relates to data processing technology, in particular to a classification algorithm, specifically an improved classification tree construction method and a data processing method based on the classification tree. Background technique [0002] Decision tree is a typical representative of classification algorithm. It is based on labeled samples, adopts certain splitting criteria, and constructs a tree structure in a top-down manner to form intuitive classification rules. It is a simple and practical nonlinear classifier. . [0003] Quinlan proposed the ID3 algorithm in 1979, and its algorithm process draws on the CLS proposed by Hunt et al., that is, the top-down inductive learning method. ID3 introduces an information-driven evaluation function, which replaces the cost-driven method of CLS, uses information entropy to represent node purity, uses information gain as the optimal criterion for s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/214G06F18/24323
Inventor 王帆向立云张大伟张丹白钰姜晓明
Owner CHINA INST OF WATER RESOURCES & HYDROPOWER RES
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products