Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Training data resampling method, device, storage medium and electronic equipment

A training data and resampling technology, applied in database models, visual data mining, structured data browsing, etc., can solve problems such as classification models being unfriendly to small categories, unbalanced training data, etc., to improve classification accuracy, The effect of improving user experience

Active Publication Date: 2020-03-03
BEIJING BYTEDANCE NETWORK TECH CO LTD
View PDF6 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of this disclosure is to provide a training data resampling method, device, storage medium and electronic equipment, which can resample the training data according to the proportion of different classifications in the actual original data in view of the unbalanced training data processing, so as to solve the problem that the classification model is not friendly to small categories

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Training data resampling method, device, storage medium and electronic equipment
  • Training data resampling method, device, storage medium and electronic equipment
  • Training data resampling method, device, storage medium and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] Specific embodiments of the present disclosure will be described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to illustrate and explain the present disclosure, and are not intended to limit the present disclosure.

[0047] Since the unbalanced training data is often encountered when building a classification model, the classification model trained based on the unbalanced training data often tends to be biased towards the category with a large proportion, and the category with a large proportion may be over-fitted. Therefore, when building a classification model, those skilled in the art usually formulate strategies such as improving the classification algorithm or balancing the classes of training data (data preprocessing) , where the latter is more commonly used because of its wide range of applications, that is, the processing of resampling unbalanced training data.

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Disclosed are a training data re-sampling method and apparatus, and a storage medium and an electronic device. The method comprises: acquiring first original data within a first time period (S101); calculating respective first proportions of multiple pre-set classifications in the first original data (S102); sorting the multiple pre-set classifications according to a size relationship of the first proportions and a pre-set rule so as to obtain a first sorting result (S103); determining, according to the first sorting result of the pre-set classifications and a pre-set correlation, a sampling proportion corresponding to each pre-set classification (S104), wherein the pre-set correlation is a correlation between the first sorting result and the sampling proportion; and re-sampling training data for modeling according to the sampling proportions respectively corresponding to the multiple pre-set classifications (S105). The problem of a classification model being unfriendly to a small category can be solved, and the classification accuracy of a classification model, obtained through training by means of training data, for different applications can be improved, thereby improving user experience.

Description

technical field [0001] The present disclosure relates to the field of data mining, in particular, to a training data resampling method, device, storage medium and electronic equipment. Background technique [0002] In machine learning, the number of samples in the training data for different classifications in the classification model may often vary greatly. For example, in N training data, the number of samples belonging to the first class may be different from the number of samples belonging to the second class. The number of samples and the number of samples belonging to the third category are very different (for example, the number of samples belonging to the first category may account for 90% of the N training data, and the number of samples belonging to the second and third categories may only account for 90% of the N training data. 10% of the N training data), so that when the training data with an unbalanced number of samples is directly used to train the classificat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/26G06F16/28
CPCG06F16/26G06F16/28
Inventor 李伟健王长虎
Owner BEIJING BYTEDANCE NETWORK TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products