Improved multidimensional scaling heterogeneous cost-sensitive decision tree building method

A cost-sensitive, construction method technology, applied in structured data retrieval, special data processing applications, instruments, etc., can solve the problem of low test cost, reduce the cost of misclassification, improve efficiency, and strengthen the classification ability.

Inactive Publication Date: 2017-05-03
SICHUAN YONGLIAN INFORMATION TECH CO LTD
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] To solve the problem of constructing a multi-dimensional scale decision tree process by considering the test cost, misclassification cost and waiting time cost influencing factors at the same time, and to make the tes

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Improved multidimensional scaling heterogeneous cost-sensitive decision tree building method
  • Improved multidimensional scaling heterogeneous cost-sensitive decision tree building method
  • Improved multidimensional scaling heterogeneous cost-sensitive decision tree building method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] Aiming at solving the problem of constructing a multi-dimensional scale decision tree process by considering the test cost, misclassification cost and waiting time cost influencing factors at the same time, the test cost is lower, the decision tree has better scalability, and the difference in cost The final decision tree generated by the unit mechanism problem better avoids the overfitting problem, combined with figure 1 The present invention has been described in detail, and its specific implementation steps are as follows:

[0032] Step 1: Suppose there are X samples in the training set, and the number of attributes is n, that is, n=(S 1 , S 2 ,…S n ), while splitting the attribute S i Corresponds to m classes L, where L r ∈(L 1 , L 2 ...,L m ), i ∈ (1, 2..., n), r ∈ (1, 2..., m). Users in related fields set the misclassification cost matrix C and attribute S i The test cost is cost i ,,wc(S i )—relative waiting time cost value, correction coefficient β, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an improved multidimensional scaling heterogeneous cost-sensitive decision tree building method, which comprises the steps of selecting splitSi from a candidate attribute according to a target function f(Si) of an attribute Si, extending branches meeting the condition splitS=splitSi from a node, supposing that the number of the branches meeting the condition is k and adding a blank node to the node, namely determining the number of the branches of the current node to be k+1; and simultaneously carrying out pruning operation on leaf nodes by using a first pruning technology, carrying out pruning while building the tree, and stopping building the tree when two conditions are met as follows: (1) Yi is supposed to be a sample set meeting the condition splitS=splitSi in a training dataset, if Yi is null, one leaf node is added and the sample set is marked as the most common type in the training dataset; and (2) all examples in the node belong to the same type. According to the method provided by the invention, the classification accuracy is improved; the bias problem in the classification process is solved; and multiple cost impact factors and the blank node in the branches of a decision tree are considered, and the next step of classification operation can be continued through the blank node if an unknown classification result does not conform to a current model.

Description

technical field [0001] The invention relates to the fields of machine learning, artificial intelligence and data mining. Background technique [0002] The topic of decision trees is an important and active research topic in data mining and machine learning. The proposed algorithm is widely and successfully applied in practical problems such as ID 3 , CART and C4.5, classic algorithms such as decision trees mainly study the problem of accuracy, and the generated decision trees have higher accuracy. In the existing algorithms, some only consider the test cost, and some only consider the misclassification error cost. This type is called one-dimensional scale cost sensitive, and the decision tree constructed by it cannot solve the comprehensive problem in real cases. For example, in cost-sensitive learning, in addition to the impact of test cost and misclassification cost on classification, the impact of waiting time cost on classification prediction also needs to be considere...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/285G06F16/2246
Inventor 金平艳胡成华
Owner SICHUAN YONGLIAN INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products