A hierarchical nearest neighbor undersampling method based on clustering

A technology of clustering and sampling results, applied in the fields of instruments, character and pattern recognition, computer parts, etc., can solve problems such as loss of useful information of most samples

Inactive Publication Date: 2019-03-26
BEIJING UNIV OF POSTS & TELECOMM
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Random undersampling can improve computational efficiency by reducing the number of majority class samples, but it is blind and loses useful information of majority class samples

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A hierarchical nearest neighbor undersampling method based on clustering
  • A hierarchical nearest neighbor undersampling method based on clustering
  • A hierarchical nearest neighbor undersampling method based on clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] In order to better understand the technical solutions of the present invention, the embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0025] It should be clear that the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0026] The embodiment of the present invention provides a hierarchical nearest neighbor undersampling method based on clustering, such as figure 1 As shown, it is a schematic flow chart of a clustering-based hierarchical nearest neighbor undersampling method proposed in an embodiment of the present invention. The method includes the following steps:

[0027] Step 101, using the Kmeans clustering algorithm to obtain the elbow diag...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a hierarchical nearest neighbor undersampling method based on clustering, which comprises the following steps: elbow diagrams of a plurality of class samples are obtained by using a Kmeans clustering algorithm, and an optimal cluster number k of clusters is selected according to the relationship between the sum of the cluster number and the distortion degree of each cluster; Kmeans clustering algorithm is used to cluster most of the samples into k clusters so as to obtain the number of center points and sample points in each cluster. According to the number of sample points in each cluster, stratified sampling is carried out, and the nearest neighbor of the center point of each cluster is combined with a small number of samples as the sampling result. The technical proposal provided by the embodiment of the invention fully utilizes the distribution characteristics of the majority class samples, better retains the useful information of the majority class samples, and can effectively improve the classification effect of the subsequent classification algorithm.

Description

【Technical field】 [0001] The invention relates to an undersampling method in the field of machine learning, in particular to a hierarchical nearest neighbor undersampling method based on clustering. 【Background technique】 [0002] When using machine learning methods to solve classification problems, there is a phenomenon that the data set is unbalanced, that is, the number of samples of a certain class is far less than the number of samples of other classes. It is one of the current research hotspots to solve the problem of imbalance between classes in the data set according to the appropriate resampling algorithm to improve the recognition rate of the classification model for the minority class samples. At present, for solving the classification problem of unbalanced datasets, commonly used techniques are mainly divided into algorithm-based methods and data-based methods. Algorithm-based methods refer to the shortcomings of classification algorithms in solving the imbalanc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/231
Inventor 高欣梁跃何杨刘鑫井潇刁新平
Owner BEIJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products