Semi-supervised classification method of unbalance data

A classification method and semi-supervised technology, applied in the field of data processing, can solve the problems of low classification accuracy of minority classes, wrong classification into majority classes, over-learning, etc., so as to avoid over-learning phenomenon, improve generalization ability, and improve classification performance. Effect

Inactive Publication Date: 2011-02-23
XIDIAN UNIV
View PDF0 Cites 88 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Facing the learning problem of unbalanced data sets, the difficulty of research mainly comes from the characteristics of the unbalanced data set itself: the minority class samples in the unbalanced data set are insufficient, and the distribution of samples cannot well reflect the actual distribution of the entire class; Classes are usually mixed with noisy data, so that the two classes of samples tend to overlap to varying degrees
In addition, when traditional classification methods in the field of machine learning are directly applied to unbalanced dat

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semi-supervised classification method of unbalance data
  • Semi-supervised classification method of unbalance data
  • Semi-supervised classification method of unbalance data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] refer to figure 1 , the specific implementation steps of the present invention are as follows:

[0027] Step 1. Select an initial labeled sample set and an initial unlabeled sample set.

[0028] Given an unbalanced data set, the samples of the data set are divided into two types according to their characteristics and attributes, and these two types are recorded as the minority class and the majority class according to the number of samples. Randomly select a part of the balanced data set as the initial labeled sample set {x i}, using the remaining data samples as the initial unlabeled sample set {x j}.

[0029] Step 2, initialize the cluster centers of the unbalanced data set.

[0030] (2a) For the current labeled sample set {x i} in the minority class samples and the majority class samples are respectively averaged to obtain the mean center set M={m + , m -}, where m + is the mean center of minority class samples, m - is the mean center of the majority class s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a semi-supervised classification method of unbalance data, which is mainly used for solving the problem of low classification precision of a minority of data which have fewer marked samples and high degree of unbalance in the prior art. The method is implemented by the following steps: (1) initializing a marked sample set and an unmarked sample set; (2) initializing a cluster center; (3) implementing fuzzy clustering; (4) updating the marked sample set and unmarked sample set according to the result of the clustering; (5) performing the self-training based on a support vector machine (SVM) classifier; (6) updating the marked sample set and unmarked sample set according to the result of the self-training; (7) performing the classification of support vector machines Biased-SVM based on penalty parameters; and (8) estimating a classification result and outputting the result. For unbalance data which have fewer marked samples, the method improves the classification precision of a minority of data. And the method can be used for classifying and identifying unbalance data having few training samples.

Description

technical field [0001] The invention belongs to the field of data processing, relates to unbalanced data classification, is the application of pattern recognition and machine learning in the field of data mining, specifically a method for unbalanced data classification based on fuzzy clustering and semi-supervised learning, which can be used for training Classification and identification of imbalanced data with few samples. Background technique [0002] With the rapid development of global information technology, powerful computers, data collection equipment and storage equipment provide people with a large amount of data information for transaction management, information retrieval and data analysis. Although the amount of data obtained is very large, the data that is useful to people often only accounts for a small part of the total data. The data set in which the number of samples of a certain type is significantly less than that of other types of samples is called an un...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 王爽焦李成冯吭雨钟桦侯彪缑水平马文萍张青
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products