A multi-classification oriented unbalanced data preprocessing method and device and an apparatus

A data preprocessing and multi-classification technology, applied in the field of big data processing, can solve the problems of effective processing, the inability to comprehensively consider the processing, and the inability to effectively improve the accuracy of the multi-classification algorithm, so as to achieve the effect of improving the classification accuracy and improving the accuracy.

Inactive Publication Date: 2018-12-18
GUANGZHOU UNIVERSITY
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] 2. A certain sample may be of different sample categories in different binary classification problems. For example, it is noise in one binary classification and needs to be deleted. It is an important boundary sample in another binary classification and needs to be retained. Use Existing methods cannot handle it effectively
[0009] In short, if the multi-classification problem is considered as multiple binary classification problems, the processing of samples cannot comprehensively consider the different situations in each category, and the accuracy of the multi-classification algorithm cannot be effectively improved.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A multi-classification oriented unbalanced data preprocessing method and device and an apparatus
  • A multi-classification oriented unbalanced data preprocessing method and device and an apparatus
  • A multi-classification oriented unbalanced data preprocessing method and device and an apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0068] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0069] see image 3 , the first embodiment of the present invention provides a multi-classification-oriented unbalanced data preprocessing method, which can be performed by a multi-classification-oriented imbalanced data preprocessing device (hereinafter referred to as the device), and at least includes the following steps:

[0070] S101. Read an original sample set; wherein, the original sample set includes sample sets of at least two categories.

[0071] In...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-classification oriented unbalanced data preprocessing method and device and an apparatus. The method comprises the following steps: receiving the final sample set sizeand the unbalanced ratio of the sample set, and obtaining the ideal sample number of each class; according to the number of ideal samples and the number of actual samples, judging the sample sets of minority classes and majority classes; for the samples in the sample set of a few classes, calculating the number of other class samples and a few class samples in the k-nearest neighbor to classify the samples; for the sample set of a few classes, performing deleting, saving, copying or synthesizing according to the marker of the sample set to obtain the final sample set of a few classes; For thesamples in most of the sample sets, calculating the number of the samples in the k-nearest neighbors and other samples to classify the samples. The samples in most class sample sets are deleted or saved according to the markers of the samples to obtain the final sample sets of most classes. The final sample set is generated. The invention enables the final sample set to effectively improve the accuracy of the multi-classification algorithm.

Description

technical field [0001] The present invention relates to the field of big data processing, in particular to a multi-category-oriented unbalanced data preprocessing method, device and equipment. Background technique [0002] With the continuous advancement of technology, including the improvement of Internet speed, the upgrading of mobile Internet, the continuous development of hardware technology, the rapid development of data acquisition technology, storage technology and processing technology, data is growing at an unprecedented rate, and we have entered the big data era. The characteristics of big data such as huge data volume (volume), high speed (velocity), variety (variety), and data uncertainty (veracity) make traditional data analysis and mining technologies encounter unprecedented challenges when applied to the field of big data. . [0003] Data classification is a basic algorithm in data analysis and mining, which has a wide range of applications and is also the b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 韩伟红李树栋王乐方滨兴贾焰黄子中周斌殷丽华田志宏
Owner GUANGZHOU UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products