Data classification method, device, electronic device and computer readable medium

A data classification and data technology, applied in the field of data processing, can solve the problems of performance degradation of majority class samples, unfavorable model promotion, poor coverage and accuracy, etc.

Inactive Publication Date: 2017-09-15
JINGDONG TECH HLDG CO LTD
View PDF0 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] Disadvantage 1: Increasing the proportion of minority class samples can alleviate the problem of data imbalance, but the application of the model trained on resampled data is to predict and classify new samples, and the distribution of new samples will not change. The proportion of minority class samples still low
The training data and the actual application data do not satisfy independent and identical distribution, and the model results lack rationality
[0010] Disadvantage 2: Aiming at the problem of unbalanced data classification determined by the proportion of a minority class, how to adjust the proportion of the minority class and the design of resampling training data lack guidance, need to try carefully, and the workload will be heavy
[0011] Disadvantage 3: The migration of the model in different business and different scenarios is weak, and the model obtained through training depends on the establish

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data classification method, device, electronic device and computer readable medium
  • Data classification method, device, electronic device and computer readable medium
  • Data classification method, device, electronic device and computer readable medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0061] Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete and will fully convey the concept of example embodiments to those skilled in the art. The drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus repeated descriptions thereof will be omitted.

[0062] Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided in order to give a thorough understanding of embodiments of the present disclosure. However, those skilled in ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The disclosure provides a data classification method, device, electronic device and computer readable medium. The data classification method includes adopting a machine learning method to perform modeling on full training data to obtain an original model, wherein the full training data contains minority class samples; performing screening to obtain new trained data from the full training data based on a minority class proportion threshold value which is a critical value of the proportion of the minority class samples in the full training data; adopting the machine learning method to perform modeling on the new trained data to obtain a new trained model; applying the original model and the new trained model to perform classification forecasting on the new trained data to obtain an original classification result and a new trained classification result; and comparing the accuracy rates of the original classification result and the new trained classification result, and using the one with a higher accuracy rate as a final classification result. The model is retrained aiming at the new trained data with an improved minority class sample proportion, and an original model result is updated, thereby achieving the purpose of improving the accuracy rate of sample classification.

Description

technical field [0001] The present disclosure generally relates to the technical field of data processing, and in particular, relates to a data classification method, device, electronic device, and computer-readable medium. Background technique [0002] At present, the method of using machine learning to classify samples has been widely used. Commonly used algorithm models include: logistic regression, decision tree, random forest, support vector machine and neural network. When performing model training for most algorithms, it is generally assumed that the number of categories in the training samples tends to be balanced, and the cost of model prediction errors for various samples is equal. Usually, when the number of classified data in the sample data is not much different, machine learning can achieve good classification results. However, in fact, the requirement of balanced sample data is often not satisfied, and the data volume of each classification data may have a la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06F17/30G06N99/00
CPCG06F16/254G06N20/00G06F18/23G06F18/24G06F18/214
Inventor 解鹏曲洪涛
Owner JINGDONG TECH HLDG CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products