Unlock instant, AI-driven research and patent intelligence for your innovation.

Unbalanced data processing method based on integrated feature selection in machine learning

An integrated feature and machine learning technology, applied in the field of data processing, can solve problems such as low model accuracy, complex and variable unbalanced data, and increase computational complexity, so as to achieve the ability to handle unbalanced data, improve accuracy and The effect of training efficiency and increasing computational complexity

Pending Publication Date: 2019-12-13
NANJING UNIV OF TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention designs an unbalanced data processing method based on integrated feature selection, which is used to solve the problem of unbalanced data in a given data set. In order to achieve this purpose, the described processing method uses integrated feature selection to solve It solves the problem of reduced training efficiency caused by redundant features and biased decision boundaries caused by unbalanced data, improving model accuracy and training efficiency
[0006] Different from the existing processing methods, the beneficial effects of the present invention are: the current solution will not only greatly increase the computational complexity, easily lead to model over-fitting, but also lead to low accuracy of the model. In addition, the current big data era The unbalanced data is more complex and changeable. The processing method of this design can be well applied to the unbalanced data at the current stage, achieving a better ability to deal with unbalanced data, improving the accuracy of the model and training efficiency, and practicality. higher

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unbalanced data processing method based on integrated feature selection in machine learning
  • Unbalanced data processing method based on integrated feature selection in machine learning
  • Unbalanced data processing method based on integrated feature selection in machine learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0009] see figure 1 Shown:

[0010] 1. A method for processing unbalanced data based on integrated feature selection, characterized in that the method comprises the following steps:

[0011] Step 1: First obtain the data, organize the data and analyze the initial characteristics of the original data set, calculate different eigenvalues, obtain the feature set, and enter step 2;

[0012] Step 2: Then further process the data, design an integrated feature selection method to perform feature selection on the obtained feature set, find the optimal feature subset to remove redundant features and solve the problem of data imbalance, improve classification efficiency, transfer Enter step three;

[0013] Step 3: Finally, the obtained optimal feature subset is used as input, and a machine learning classification algorithm is used for training to construct a classifier model, using positive class accuracy (acc+) and negative class accuracy (acc-) as performance evaluation indicators. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

According to the invention, an unbalanced data processing method based on integrated feature selection in machine learning is designed; the method comprises the following steps: constructing a majority class sample and a minority class sample into a plurality of bagged subsets by adopting a method of combining a Bagging method and a method of Synthetic minority oversampling technique (SMOTE); andthen screening out a feature subset from each bagged subset by utilizing a feature selection algorithm based on correlation measurement, setting a threshold value to obtain a feature set, and finallyconstructing a classification model by utilizing a machine learning algorithm to perform classification research.

Description

technical field [0001] The invention relates to the field of data processing, and specifically designs an unbalanced data processing method based on integrated feature selection in machine learning. Background technique [0002] With the rapid development of mobile Internet technology, data collection and data transmission have become more convenient and faster. The difficulty of information technology is no longer in data acquisition and transmission, but how to efficiently process and utilize known data, and dig out all the information in it. The existence of valuable information has promoted the rapid development of data mining. But in the research process of data mining and machine learning, we often encounter a serious potential challenge, that is how to deal with the data of "unbalanced category". [0003] In the process of classifying unbalanced data sets, the performance of traditional machine learning classification algorithms is greatly hindered. The current proce...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62
CPCG06F18/211G06F18/24G06F18/214
Inventor 帅仁俊郭汉李文煜
Owner NANJING UNIV OF TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More