Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Training data resampling method and device, storage medium and electronic device

A training data and resampling technology, which is applied in database models, visual data mining, structured data retrieval, etc., can solve the problems of classification models being unfriendly to small categories and unbalanced training data, so as to improve classification accuracy, The effect of improving user experience

Active Publication Date: 2019-04-16
BEIJING BYTEDANCE NETWORK TECH CO LTD
View PDF6 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of this disclosure is to provide a training data resampling method, device, storage medium and electronic equipment, which can resample the training data according to the proportion of different classifications in the actual original data in view of the unbalanced training data processing, so as to solve the problem that the classification model is not friendly to small categories

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Training data resampling method and device, storage medium and electronic device
  • Training data resampling method and device, storage medium and electronic device
  • Training data resampling method and device, storage medium and electronic device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] Specific embodiments of the present disclosure will be described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to illustrate and explain the present disclosure, and are not intended to limit the present disclosure.

[0047] Since the unbalanced training data is often encountered when building a classification model, the classification model trained based on the unbalanced training data often tends to be biased towards the category with a large proportion, and the category with a large proportion may be over-fitted. Therefore, when building a classification model, those skilled in the art usually formulate strategies such as improving the classification algorithm or balancing the classes of training data (data preprocessing) , where the latter is more commonly used because of its wide range of applications, that is, the processing of resampling unbalanced training data.

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a training data resampling method and device, a storage medium and an electronic device. The method comprises: obtaining first original data in a first time period; calculating a first proportion respectively occupied by a plurality of preset classifications in the first original data; according to the size relation of the first proportion, sorting the multiple preset classifications according to a preset rule to obtain a first sorting result; determining a sampling ratio corresponding to each preset classification according to the ranking of each preset classificationand a preset corresponding relationship, the preset corresponding relationship being a corresponding relationship between the ranking and the sampling ratio; and resampling the training data for modeling according to the sampling proportions respectively corresponding to the plurality of preset classifications. Thus, the problem that the classification model is unfriendly to small classes is solved, the classification accuracy of the classification model obtained through training of the training data for different applications is improved, and therefore the user experience is improved.

Description

technical field [0001] The present disclosure relates to the field of data mining, in particular, to a training data resampling method, device, storage medium and electronic equipment. Background technique [0002] In machine learning, the number of samples in the training data for different classifications in the classification model may often vary greatly. For example, in N training data, the number of samples belonging to the first class may be different from the number of samples belonging to the second class. The number of samples and the number of samples belonging to the third category are very different (for example, the number of samples belonging to the first category may account for 90% of the N training data, and the number of samples belonging to the second and third categories may only account for 90% of the N training data. 10% of the N training data), so that when the training data with an unbalanced number of samples is directly used to train the classificat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/26G06F16/28
CPCG06F16/26G06F16/28
Inventor 李伟健王长虎
Owner BEIJING BYTEDANCE NETWORK TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products