Data classification method and device, equipment and computer readable storage medium

A data classification and minority class technology, applied in the field of information processing, can solve various problems such as unbalanced samples, unsatisfactory prediction results, unsatisfactory model training results, etc.

Inactive Publication Date: 2018-09-04
PING AN TECH (SHENZHEN) CO LTD
View PDF0 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] At present, in the process of data modeling and classification of data, especially in the case of multi-classification, there are often problems of class imbalance in various samples. When the number of training samples of various types is quite different, the unbalanced samples are directly used. If the classification model is obtained by training, the result of model training may be very unsatisfactory due to the imbalance of the number of samples of various types. Then the prediction result obtained by using the trained model for prediction is not ideal, or even the prediction result is opposite.
[0003] At present, the more common practice is to increase the number of samples by generating new samples for those samples with a small number, so as to reach the level of balance with the number of samples with a large number. The new samples often need to be as close as possible to the real samples, but After all, the new sample is not a real sample, and the model obtained by using it as a model training has a certain adverse effect on the prediction results of the data. If the one-time prediction result obtained by combining the new sample with the original sample for a single modeling prediction error, the result will be irreversible

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data classification method and device, equipment and computer readable storage medium
  • Data classification method and device, equipment and computer readable storage medium
  • Data classification method and device, equipment and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of this application.

[0029] figure 1 It is a schematic flowchart of a data classification method provided by an embodiment of the present application. The method can be run in terminals such as smart phones (such as Android phones, IOS phones, etc.), tablet computers, notebook computers, and smart devices. Such as figure 1 As shown, the steps of the method include S101 to S108.

[0030] S101. Obtain a sample set, where the sample set includes a majority sample set and a minority samp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data classification method and device, equipment and a computer readable storage medium. On the condition that two classes of samples are unbalanded, a few of same-class sample sets are generated through downsampling for the samples with a large number, and new samples are generated through upsampling for the samples with a small number; the new samples are mixed with thesamples with the small number to form samples with a large number, so that the sample number of the sample sets with the small number and the sample number of the sample sets with the large number arebalanced, and the samples with the small number and the samples with the large number predict data through multiple times of modeling, and finally a prediction result with the number advantage is taken as a classification result; and the accuracy of data preciditon is improved by means of upsampling, downsampling, multiple times of modeling and multiple times of prediction.

Description

Technical field [0001] This application relates to the field of information processing technology, and in particular to a data classification method, device, equipment, and computer-readable storage medium. Background technique [0002] At present, in the process of data modeling and data classification, especially in the case of multi-classification, there are often problems of imbalance in the appearance of various types of samples. When the number of training samples of various types is quite different, unbalanced samples are directly used If the classification model is obtained by training, the result of model training may be very unsatisfactory due to the imbalance of the number of various samples. Then the prediction result obtained by using the trained model to predict is not ideal, or even the opposite. [0003] At present, the more common practice is to increase the number of samples by generating new samples with a smaller number of samples to achieve a level that is bala...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/906
Inventor 伍文岳
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products