Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Unbalanced data classification method based on boundary upsampling

An up-sampling, balanced technology, applied in the field of pattern recognition, which can solve problems such as performance limitations and inability to process data

Inactive Publication Date: 2016-09-28
TIANJIN UNIV
View PDF5 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, individual algorithm-level methods cannot process the data, and their performance is thus limited

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unbalanced data classification method based on boundary upsampling
  • Unbalanced data classification method based on boundary upsampling
  • Unbalanced data classification method based on boundary upsampling

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0014] The present invention is subject to boundary upsampling algorithm and figure 1 Inspired by the Bagging algorithm shown, the two are combined to form an ensemble classifier. The present invention will be described in further detail below in conjunction with the accompanying drawings.

[0015] (1) Obtaining test and training data: the present invention selects the yeast database in the KEEL database to mainly characterize the position of the protein in the yeast. A positive class indicates that the protein is located on the cell membrane about to be lysed, and a negative class indicates that the protein is located in the cytoplasm or cytoskeleton. The data set contains a total of 514 samples, including 51 positive samples and 463 negative samples, namely n p =51,n n =463. The database contains a total of 8-dimensional features, which describe the yeast from multiple aspects such as the pH value of the cell fluid and the morphology of the cell membrane. The C4.5 decis...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an unbalanced data classification method based on boundary upsampling. The unbalanced data classification method includes that Euclidean distance is used as the distance metric for each positive sample in the unbalanced dataset, K sample points nearest to the positive samples in the unbalanced dataset are taken, the class labels of the samples are compared, the number ki of negative samples in the K sample points is given, if ki >=K / 2, the positive sample point is near to the real decision boundary of positive and negative samples, conversely, the positive sample point is far from the boundary, one positive sample is randomly selected from the K nearest positive samples, and a new positive sample is generated between the selected positive sample and the positive sample selected from the K nearest positive samples, this process is subjected to iterative execution for many times until the positive and negative samples are same in number, and thus a balanced dataset is constructed, and the generated balanced dataset is trained by a Bagging algorithm to obtain a final classification model. According to the invention, a better classification effect can be achieved in the unbalanced dataset.

Description

technical field [0001] The invention relates to pattern recognition technology, in particular to a classifier for unbalanced data sets. Background technique [0002] With the development of society and the advancement of science and technology, computer automatic classification based on machine learning and pattern recognition technology is playing an increasingly important role in people's daily life. In this case, establishing an appropriate data classification model and setting credible performance evaluation standards has become a major research hotspot at present. [0003] However, the current mainstream classifiers such as support vector machines, decision trees, and extreme learning machines are designed with the overall misclassification rate as the main indicator of the training data model, and the effective premise of this method is based on the The various types of data in the sample are basically balanced (that is, the number of each type is roughly equal). But...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62
CPCG06F18/2411G06F18/214
Inventor 李喆吕卫褚晶辉
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products