Imbalanced data industrial fault classification method based on k-means

A fault classification and fault class technology, applied in the direction of instruments, character and pattern recognition, computer parts, etc., can solve the problems of overfitting, adding system methods, and the application effect is not very ideal, so as to solve the problem of unbalanced data classification. problems, increase classification accuracy, reduce the effect of overfitting

Inactive Publication Date: 2017-10-10
ZHEJIANG UNIV
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The improvement methods for sampling are mainly divided into two categories. One is oversampling, that is, resampling the minority class to achieve data balance. A major drawback of this method is that it will increase the system method and cause overfitting. The application effect is not very ideal...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Imbalanced data industrial fault classification method based on k-means
  • Imbalanced data industrial fault classification method based on k-means
  • Imbalanced data industrial fault classification method based on k-means

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0013] The present invention is aimed at the problem of fault classification in industrial processes. The method first uses k-means, and clusters the classes with more data according to the degree of imbalance, divides the majority class into N subclasses, and then combines them with M minority classes , as a multi-classification problem of (M+N) class, and finally learn according to the Naive Bayesian classifier.

[0014] The main steps of the technical solution adopted in the present invention are respectively as follows:

[0015] Step 1: Use the system to collect the data of the normal working conditions of the process and various fault data to form a labeled training sample set for modeling: Assume that the fault category is C, add a normal class, and the total modeling data of each sample The category is C+1, ie i=1,2...C+1 where no i is the number of training samples, m is the number of process variables, and R is the set of real numbers. So the complete labeled tr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an imbalanced data industrial fault classification method based on k-means. The method comprises the following steps: first, utilizing the k-means; based on the imbalance degrees, clustering the classes with relatively big data; dividing the majority classes into N sub-classes; combining the N sub-classes with M minority classes to serve as a multi-classification problem for an (M+N) classification; and finally, performing learning according to a naive bayes classifier. Compared with other existing methods in prior art, the method of the invention keep the information of the original data to the largest extent and better resolves the problem with imbalanced class data classification under the condition that over-fitting is prevented. Therefore, compared with other methods, the classification precision is increased, and the phenomenon of over-fitting can be reduced.

Description

technical field [0001] The invention belongs to the field of industrial process control, in particular to an industrial process fault classification method for unbalanced data. Background technique [0002] In the work of industrial fault classification, some commonly used classification methods have a prerequisite for use, that is, the amount of data in the training set is equivalent. But the reality is often not the case. When a certain type of data is very large, or a certain type of data is rare, that is, unbalanced data appears, directly using the traditional classification method will produce a large classification error. [0003] In recent years, the research on unbalanced data has been a hot spot. The existing methods are mainly solved from two directions, one is from the algorithm level, and the other is from the sampling level. The present invention mainly improves the traditional classification method at the sampling level. . The improvement methods for sampling...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62
CPCG06F18/23213G06F18/24G06F18/214
Inventor 葛志强陈革成
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products