Oversampling method based on angle and direction clustering

An oversampling and angle technology, applied in character and pattern recognition, instruments, computer parts, etc., can solve the problems of poor data set effect, easy to fall between most types of samples, ignoring importance and other problems, and achieve the solution effect. worsening effect
CN113139595APending Publication Date: 2021-07-20HUNAN UNIV

Patent Information

Authority / Receiving Office
CN Β· China
Patent Type
Applications(China)
Current Assignee / Owner
HUNAN UNIV
Publication Date
2021-07-20

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses an oversampling method based on angle and direction clustering, and the method comprises the steps: obtaining an unbalanced data set, carrying out the clustering of the unbalanced data set through employing a clustering algorithm, generating a clustering label, an angle variance, and a sorting neighbor set for each sample in the unbalanced data set, and carrying out filtering processing on each sample of which the clustering label is noise so as to obtain a filtered sample; and calculating a first oversampling weight, a second oversampling weight and an optimal interpolation neighbor set of each minority class sample in the unbalanced data set according to the clustering label, the angle variance and the sorting neighbor set of each sample in the unbalanced data set, and calculating the oversampling weight of each cluster in the unbalanced data set and the number of new samples needing to be synthesized by each cluster according to the first oversampling weights of all minority class samples. The technical problem that the importance of boundary samples in classification is ignored in an existing oversampling method can be solved.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention belongs to the technical field of data mining, and more particularly, relates to an oversampling method based on angle and direction clustering. Background technique

[0002] The development of machine learning and deep learning provides powerful support for classification and prediction. In classification problems, as a kind of supervised learning, it is necessary to provide a data set for the classification model for model training, however, the data set is unbalanced in most cases. The samples with a higher proportion of labels in the dataset are called majority class samples, and the samples with lower label proportions are called minority class samples. Imbalance means that the number of majority class samples in the dataset is often far more than the minority class samples. Minority class samples have a small amount of data, which makes the classifier unable to learn minority class samples effectively, and then it is difficult to m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More