Method and device for processing data category imbalance, electronic equipment and storage medium

A technology for processing data and categories, applied in the direction of nuclear methods, character and pattern recognition, instruments, etc., can solve problems such as unbalanced processing data categories and unbalanced data

Active Publication Date: 2021-04-09
TENCENT TECH (SHENZHEN) CO LTD
View PDF6 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] This application provides a method, device, electronic equipment and storage medium for dealing with unbalanced data categorie

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for processing data category imbalance, electronic equipment and storage medium
  • Method and device for processing data category imbalance, electronic equipment and storage medium
  • Method and device for processing data category imbalance, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this application.

[0035] The solution provided by this application may involve cloud technology.

[0036] Cloud computing (cloud computing) refers to the delivery and use mode of IT infrastructure, which refers to obtaining the required resources through the network in an on-demand and easy-to-expand manner; cloud computing in a broad sense refers to the delivery and use mode of services, which refers to the on-demand, Get the services you need in an easily scalable way. Such services can be IT and software, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and device for processing data category imbalance, electronic equipment and a storage medium. The invention relates to the field of big data processing of cloud technology. The method comprises the steps of determining M nearest neighbor samples based on mutual information between minority class samples Xi and each neighbor sample of the Xi; determining a mutual information weight of the Xij (near) based on the mutual information between the Xi and the Xij (near); determining the weight Wij (near) of the Xij (near) based on the type of the Xij (near) and the mutual information weight of the Xij (near); determining the number Nj of minority class samples to be inserted between the Xi and the Xij (near) based on the Wij (near) and the class imbalance multiplying power N; and inserting Nj new samples between the Xi and the Xij (near). The problem of data category imbalance is solved in a mode of fusing mutual information and SMOTE, and the classification performance of the SMOTE can be improved.

Description

technical field [0001] The embodiments of the present application relate to the field of cloud technology, in particular to the field of big data processing of cloud technology, and more specifically, to a method, device, electronic device, and storage medium for processing unbalanced data types. Background technique [0002] The problem of data category imbalance is a common problem that affects the performance of classification models. [0003] At present, the widely used method to solve the problem of data imbalance is Synthetic Minority Oversampling Technique (SMOTE). Different from general oversampling techniques, SMOTE’s newly added minority class samples are not obtained by repeated sampling, but a new sample is synthesized by interpolation between two minority class samples, that is, new samples are added within the minority class distribution boundary. samples, and add new samples to the minority class, so as to achieve the effect of class balance. The samples gen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06N20/10
CPCG06N20/10G06F18/24147G06F18/24155G06F18/2411G06F18/214
Inventor 刘志煌
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products