Method for extracting sensitive data from unbalanced data based on SVM-forest

A sensitive data, unbalanced technology, used in instruments, adaptive control, control/regulation systems, etc., can solve problems such as large classification errors, and achieve the effect of reducing unbalance.

Active Publication Date: 2018-02-23
ZHEJIANG UNIV
View PDF6 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But the reality is often not the case. When a certain type of data is very large, or a certain type of data is rare, th...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for extracting sensitive data from unbalanced data based on SVM-forest
  • Method for extracting sensitive data from unbalanced data based on SVM-forest
  • Method for extracting sensitive data from unbalanced data based on SVM-forest

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The method for extracting sensitive data from unbalanced data based on SVM-forest of the present invention will be further described below in conjunction with specific embodiments.

[0039] A method for extracting sensitive data from unbalanced class data based on SVM-forest, is characterized in that, comprises the following steps:

[0040] Step 1: Collect labeled samples for modeling, preprocess and normalize them. The labeled samples include data of normal working conditions in industrial processes and data of various fault conditions, which are divided into C Faulty working condition category and 1 normal working condition category, take out 10%~20% of the samples according to the category as the temporary test sample set Q, and the remaining 80%~90% as the training sample set, that is, X l =[X 1 ;X 2 ;...;X i ;...;X C+1 ], where X i Represents the sample set for each category in no i is the number of training samples, m is the number of process variables, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for extracting sensitive data from unbalanced data based on SVM forest. The method comprises the steps that a part of labeled samples are taken as test samples, and the rest of the samples are used as training samples; k-Means is used to divide a normal working condition class into subclasses, and the subclasses are mixed with fault working condition type data to form N training subsets; an SVM-tree method is used to train SVM-Forest, and the test samples are used to test the SVM-forest; L trees with the highest fault working condition misclassification rate are selected; some data with a great influence on the classification effect are kept; according to a selection classification algorithm, a classifier T is trained through the minority classes and the remaining majority classes in a test set; and a temporary test sample is used to test the classification effect of T until the effect meets requirements. According to the sensitive data extracting method provided by the invention, samples with a great influence on the classification effect in a majority of sample sets are selected through multiple iterations to reduce the degree of unbalance; and the classification effect is close to or up to an equal classification effect under the same condition.

Description

technical field [0001] The invention belongs to the field of industrial process control, in particular to a method for extracting sensitive data from unbalanced data based on SVM-forest. Background technique [0002] In the work of industrial fault classification, some commonly used classification methods have a prerequisite for use, that is, the amount of data in the training set is equivalent. But the reality is often not the case. When a certain type of data is very large, or a certain type of data is rare, that is, unbalanced data appears, directly using the traditional classification method will produce a large classification error. Contents of the invention [0003] Aiming at the deficiencies of the prior art, the present invention proposes a method for extracting sensitive data from unbalanced data based on SVM-forest. This method mainly improves the traditional classification method at the sampling level, and selects most sample sets through multiple iterations. S...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G05B13/04
CPCG05B13/042
Inventor 葛志强陈革成
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products