Prediction method for unbalanced data set based on isolated forest learning

A prediction method and data set technology, applied in data processing applications, genetic models, genetic laws, etc., can solve problems such as low accuracy, overlapping prediction results, and instability of samples, and achieve stable prediction results, improve prediction accuracy, and predict high precision effect

Pending Publication Date: 2020-12-11
XIAN UNIV OF TECH
View PDF0 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a prediction method for unbalanced data sets based on isolated forest learning, which solves the probl

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Prediction method for unbalanced data set based on isolated forest learning
  • Prediction method for unbalanced data set based on isolated forest learning
  • Prediction method for unbalanced data set based on isolated forest learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0125] In order to test the effect of the method proposed in this application in dealing with unbalanced data sets, this application uses the bank telemarketing data set as unbalanced data for testing.

[0126] The main process of testing the method proposed in this application is: use MWMOTE and isolated forest to process the original data set (unbalanced data set) to obtain a balanced data set, then train the GA-SVM model with the divided data set, and finally use the trained GA-SVM model predicts the effectiveness of bank telemarketing campaigns. In particular, this application compares the application effect of the proposed method considering isolated forest and GA-SVM without considering isolated forest, illustrating the effectiveness and feasibility of the method proposed in this application. The test steps are as follows:

[0127] 1. Receive the bank telemarketing forecast request, wherein, the bank telemarketing forecast request predicts whether the customer will book...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a prediction method for an unbalanced data set based on isolated forest learning. The prediction method comprises the following steps: receiving a prediction request; collecting data, and defining features and labels in the data set and the number of minority class samples and majority class samples; converting a non-numerical feature column and a label column in the data set into classification numerical values; synthesizing minority class samples by using a majority class weighted minority class oversampling technology to form a balance data set; performing abnormal point identification and removal on the balance data set by using an isolated forest algorithm; then performing data standardization, and dividing a training set and a test set; constructing and training a support vector machine classifier model by using the training set; adjusting hyper-parameters of the support vector machine classifier model through a genetic algorithm, and obtaining a prediction model after training is completed; and inputting the test set into the prediction model to obtain a prediction result. The prediction method for the unbalanced data set based on isolated forest learning has the characteristics of stable prediction result and high prediction precision.

Description

technical field [0001] The invention belongs to the technical field of prediction methods for category unbalanced data sets, and in particular relates to a prediction method for unbalanced data sets based on isolated forest learning. Background technique [0002] With the rapid development of sensor technology, computer technology, communication technology, data storage and other technologies, the Internet, process industry and other fields generate and store a large amount of data. Machine learning is a mainstream intelligent data processing technology, and classification algorithm is one of the key technologies of machine learning. It can use big data to build a classification model with strong generalization ability and extract useful information in the data. focus on. Traditional classification methods usually assume that each category in the data set contains the same number of samples and the cost of misclassification is equal. However, data in the real world often h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06N3/12G06Q30/02G06Q40/02
CPCG06N3/126G06Q30/0242G06Q40/02G06F18/2433G06F18/2411G06F18/10G06F18/214
Inventor 王竹荣牛亚邦黑新宏
Owner XIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products