Unlock instant, AI-driven research and patent intelligence for your innovation.

Imbalanced data classification method based on clustering and distance weighting

A data classification and distance weighting technology, which is applied in character and pattern recognition, instruments, calculations, etc., can solve the problem of low recognition rate, achieve the effect of increasing importance, improving generalization performance, and reducing importance

Pending Publication Date: 2022-08-02
GUILIN UNIVERSITY OF TECHNOLOGY
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to propose a kind of unbalanced data classification based on clustering and distance weighting for the problem of low recognition rate caused by class imbalance and discontinuous distribution of minority class samples in the classification of unbalanced data by support vector machine method (CDW-SVM) to improve minority class classification accuracy while maintaining overall classification accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Imbalanced data classification method based on clustering and distance weighting
  • Imbalanced data classification method based on clustering and distance weighting
  • Imbalanced data classification method based on clustering and distance weighting

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] In order to make the objectives, technical solutions and technical effects of the present invention clearer, the present invention will be further described in detail below with reference to specific embodiments and simulation experiments.

[0027] like figure 1 As shown, an imbalanced data classification method based on clustering and distance weighting specifically includes the following steps:

[0028] Step 1. Data collection and preprocessing

[0029] Collect realistic imbalanced datasets from the UCI machine learning database T={(x 1 , y 1 ), (x 2 , y 2 ),...,(x N , y N )},in y i ∈{+1,-1}, i=1,2,...N, N is the total number of samples in the dataset. Divide the dataset into a minority class sample set X according to the target class label MN and the majority class sample set X MX , and the dataset imbalance rate is calculated by the following formula:

[0030]

[0031] In the above formula, N - is the number of samples of the majority class (negativ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an unbalanced data classification method based on clustering and distance weighting, and mainly solves the problem of low classification precision of unbalanced data which is discontinuously distributed in minority classes in the prior art. The method comprises the following implementation steps: (1) collecting an unbalanced data set and dividing the unbalanced data set into a majority class sample set and a minority class sample set; (2) performing non-overlapping division on majority class and minority class sample sets; (3) calculating a sample weight based on the distance between the class clusters; (4) carrying out weight reduction on majority class boundary samples; (5) training a weighted support vector machine classifier by using the sample and the weight thereof; and (6) classifying the samples to be detected. According to the method, the relative distribution relation of the two types of samples in the feature space can be effectively described, the samples are endowed with weights according to the relative importance of the samples on the basis, correct classification boundaries can be constructed, the classification precision of minority classes can be improved, and the method can be used for classification of unbalanced data with complex distribution conditions.

Description

technical field [0001] The invention belongs to the fields of data mining and machine learning, and in particular relates to an imbalanced data classification method based on clustering and distance weighting. Background technique [0002] In traditional classification algorithms, it is usually assumed that the class distribution in the dataset is balanced or the class misclassification costs are the same. If the number of samples of a certain class in the data set is much less than the number of samples of other classes, such a data set is called an imbalanced data set, and the class with a larger number of samples is called the majority class (negative class), and the class with a larger number of samples is called the majority class (negative class). The class with few is called the minority class (positive class). For unbalanced datasets, traditional classification algorithms devoted to learning data generalization rules despise or even ignore minority classes due to th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/231G06F18/23213G06F18/2413G06F18/2411
Inventor 张奕蔡钢生
Owner GUILIN UNIVERSITY OF TECHNOLOGY