Semi-supervised classification method of unbalance data

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A classification method and semi-supervised technology, applied in the field of data processing, can solve the problems of low classification accuracy of minority classes, wrong classification into majority classes, over-learning, etc., so as to avoid over-learning phenomenon, improve generalization ability, and improve classification performance. Effect

Inactive Publication Date: 2011-02-23

XIDIAN UNIV

View PDF0 Cites 88 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] Facing the learning problem of unbalanced data sets, the difficulty of research mainly comes from the characteristics of the unbalanced data set itself: the minority class samples in the unbalanced data set are insufficient, and the distribution of samples cannot well reflect the actual distribution of the entire class; Classes are usually mixed with noisy data, so that the two classes of samples tend to overlap to varying degrees

In addition, when traditional classification methods in the field of machine learning are directly applied to unbalanced data sets, if the imbalance of the data is not considered, it is easy to misclassify the minority class samples into the majority class. Although the overall classification accuracy is relatively high, but The classification accuracy of the minority class is very low; on the contrary, if the impact of imbalance on the classification method is considered too much, it is prone to over-learning phenomenon, although the training set can achieve a high classification accuracy, but in the face of updating the data set and changes, the classification effect is not ideal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0026] refer to figure 1 , the specific implementation steps of the present invention are as follows:

[0027] Step 1. Select an initial labeled sample set and an initial unlabeled sample set.

[0028] Given an unbalanced data set, the samples of the data set are divided into two types according to their characteristics and attributes, and these two types are recorded as the minority class and the majority class according to the number of samples. Randomly select a part of the balanced data set as the initial labeled sample set {x i}, using the remaining data samples as the initial unlabeled sample set {x j}.

[0029] Step 2, initialize the cluster centers of the unbalanced data set.

[0030] (2a) For the current labeled sample set {x i} in the minority class samples and the majority class samples are respectively averaged to obtain the mean center set M={m + , m -}, where m + is the mean center of minority class samples, m - is the mean center of the majority class s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a semi-supervised classification method of unbalance data, which is mainly used for solving the problem of low classification precision of a minority of data which have fewer marked samples and high degree of unbalance in the prior art. The method is implemented by the following steps: (1) initializing a marked sample set and an unmarked sample set; (2) initializing a cluster center; (3) implementing fuzzy clustering; (4) updating the marked sample set and unmarked sample set according to the result of the clustering; (5) performing the self-training based on a support vector machine (SVM) classifier; (6) updating the marked sample set and unmarked sample set according to the result of the self-training; (7) performing the classification of support vector machines Biased-SVM based on penalty parameters; and (8) estimating a classification result and outputting the result. For unbalance data which have fewer marked samples, the method improves the classification precision of a minority of data. And the method can be used for classifying and identifying unbalance data having few training samples.

Description

technical field [0001] The invention belongs to the field of data processing, relates to unbalanced data classification, is the application of pattern recognition and machine learning in the field of data mining, specifically a method for unbalanced data classification based on fuzzy clustering and semi-supervised learning, which can be used for training Classification and identification of imbalanced data with few samples. Background technique [0002] With the rapid development of global information technology, powerful computers, data collection equipment and storage equipment provide people with a large amount of data information for transaction management, information retrieval and data analysis. Although the amount of data obtained is very large, the data that is useful to people often only accounts for a small part of the total data. The data set in which the number of samples of a certain type is significantly less than that of other types of samples is called an un...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

Inventor王爽焦李成冯吭雨钟桦侯彪缑水平马文萍张青

OwnerXIDIAN UNIV

Semi-supervised classification method of unbalance data

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology