Hierarchical empowerment classification method and system for unbalanced data

A technology of unbalanced data and classification methods, applied in the fields of pattern recognition and machine learning, can solve the problems of poor classifier sub-concept cluster recognition effect, blindly increase the learning weight of few-sample classes, and high false positive rate of few-sample classes, so as to improve the Recall rate, improve overall performance, reduce the effect of false positive rate

Pending Publication Date: 2021-11-26
山东产业技术研究院智能计算研究院
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although it can solve the unbalanced problem to a certain extent, the same processing method is used for all samples of a category, ignoring the distribution within the category, and blindly increasing the learning weight of the few-sample class, resulting in the false classification of the classifier after equalization for the few-sample class. Positive rate is higher
Moreover, it is also mentioned in the literature [Weiss G M, Provost F. Learning when training data are costly: The effect of class distribution on tree induction [J]. Journal of Artificial Intelligence Research, 2003, 19: 315-354.], due to "class Within-class imbalance phenomenon, classifiers usually have poor recognition effect on sub-concept clusters with less sample distribution

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hierarchical empowerment classification method and system for unbalanced data
  • Hierarchical empowerment classification method and system for unbalanced data
  • Hierarchical empowerment classification method and system for unbalanced data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0047] This embodiment discloses a hierarchical weighting classification method for unbalanced data, which effectively solves the problem of unbalanced data, and the method is called a hierarchical weighting learning machine (Hierarchical Weighting Machine, HWM).

[0048] Such as figure 1 As shown, it mainly includes two stages: intra-class sub-concept cluster weight learning and global weight normalization.

[0049]In the weight learning stage of the intra-class sub-concept clusters, the training samples are first hierarchically clustered until the number of samples in the smallest cluster is less than 10% of the number of samples; the sample weight Wn of each cluster is calculated according to the number of samples contained in each cluster to ensure the weight of each sub-concept cluster The sum of quantity weights remains consistent; further, the distance between each sub-concept cluster and the opposite class is calculated, and the distance weight Wd is determined accordi...

Embodiment 2

[0100] The purpose of this embodiment is to provide a computing device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor implements the steps of the above method when executing the program.

Embodiment 3

[0102]The purpose of this embodiment is to provide a computer-readable storage medium.

[0103] A computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the steps of the above-mentioned method are executed.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a hierarchical empowerment classification method and system for unbalanced data. Comprising the following steps: an intra-class sub-concept cluster weight learning step: carrying out hierarchical clustering on training samples, and calculating the sample weight of each cluster according to the number of samples contained in each cluster in the clustering result, so that the sum of the number weights of each sub-concept cluster is kept consistent, and calculating the intra-class weight of each sub-concept cluster; and a global weight normalization step: normalizing the intra-class weight to a global weight according to a certain inter-class weight ratio, constructing a support vector machine with the weight as a classifier, and classifying the input non-equilibrium data with recognition by using the classifier. According to the method, the learning weight of the sub-concepts which are small in sample number and close to the classification surface can be improved, so that the recall rate of minority classes is improved, the false positive rate of the minority classes is reduced, and the overall comprehensive performance is ensured.

Description

technical field [0001] The invention belongs to the technical fields of machine learning, pattern recognition, etc., and in particular relates to a hierarchical weighting classification method and system for unbalanced data. Background technique [0002] The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art. [0003] Class imbalance is very common in using machine learning to solve real-life problems. For example, in spam identification, the number of spam emails is much smaller than that of normal emails; in medical diagnosis, the number of patients is much smaller than the number of healthy people. Many traditional classification methods assume that the number of training data in each category is balanced, so that when the trained classifier is faced with unbalanced data, the classification results are more biased towards multi-sample categories. Although from the perspective of ov...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/231G06F18/2411
Inventor 杨晓东陈益强
Owner 山东产业技术研究院智能计算研究院
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products