Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Unbalanced concept drift data flow classification method based on G-mean dynamic weighting

A concept drift and dynamic weighting technology, applied in database models, structured data retrieval, instruments, etc., can solve the problems of concept drift and low classification accuracy.

Pending Publication Date: 2021-04-20
JIANGNAN UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to solve the problem of concept drift existing in the current classification of data streams and the problem of low classification accuracy for minority classes, for binary classification data streams, the present invention introduces an online update mechanism based on the data block integration method, and proposes a OGUEIL, an online unbalanced data stream classification method based on G-mean weighting, is based on the integrated framework. Every time a new instance arrives, each base classifier and its weight are incrementally updated, and the few-class instances are randomly over-sampled. Save historical data and periodically add new candidate classifiers

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unbalanced concept drift data flow classification method based on G-mean dynamic weighting
  • Unbalanced concept drift data flow classification method based on G-mean dynamic weighting
  • Unbalanced concept drift data flow classification method based on G-mean dynamic weighting

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0069] This embodiment provides a method for classifying unbalanced concept drift data streams based on G-mean dynamic weighting. The method includes:

[0070] S1: Initialize the current integrated model and the adaptive sliding window to be empty; the current integrated model is composed of base classifiers;

[0071] S2: In the current data stream, every time a new instance x arrives t , using the current ensemble model to predict its classification result;

[0072] S3: Incrementally calculate the number of positive and negative class instances in the current data stream, and determine few-class and multi-class instances;

[0073] S4: Update each classifier and its weight in the current integrated model;

[0074] S5: Periodically train a new candidate classifier based on the data in the current sliding window, and add it to the current integrated model to obtain a new integrated model.

Embodiment 2

[0076] This embodiment provides a method for classifying unbalanced concept drift data streams based on G-mean dynamic weighting, see figure 1 , the method includes:

[0077] Step 1: Initialize the ensemble model and the adaptive sliding window is empty, and the current ensemble model consists of base classifiers.

[0078] Step 2: In the current data stream, each time a new instance arrives, use the current integrated model to predict its classification result.

[0079] Taking the incoming new instance as input, each input instance is predicted according to the weighted majority voting principle. At time t, the current integrated model is based on each member classifier C i weight w i and predicted result C i (x t ) for instance x t forecast, the result is i=1,2,...,m, as shown in formula (1):

[0080]

[0081] Where i=1,2,...,m, m is the maximum number of base classifiers included in the preset integrated model;

[0082] sign( ) is a sign function, if the result i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an unbalanced concept drift data stream classification method based on G-mean dynamic weighting, and belongs to the field of data stream classification. According to the method, through an online weighting mechanism, the weights of all base classifiers are updated once and are not influenced by class distribution when a new instance rather than a complete data block arrives, the creation time of the classifiers and the Gmean performance of the classifiers on the latest p data are considered when the base classifiers are updated, and therefore the classification accuracy of the base classifiers is improved. The G-mean performance is insensitive to the class distribution of the data, the importance of multiple classes and few classes can be balanced, the classification accuracy of the minority classes is improved, meanwhile, the TPR and the TNP are incrementally calculated through the time attenuation factor, and any historical data does not need to be stored; meanwhile, two elimination mechanisms are used for controlling the scale of the integrated model, and the high efficiency and accuracy of decision making are kept.

Description

technical field [0001] The invention relates to a method for classifying unbalanced concept drift data streams based on G-mean dynamic weighting, and belongs to the field of data stream classification. Background technique [0002] In the era of big data, the explosive growth of information has led to the widespread appearance of data streams in various fields, such as wireless sensor data streams, bank transaction data streams, etc. How to realize the accurate classification of data flow is an inevitable problem to be solved; compared with the classification of traditional static data, there are the following problems in the classification of data flow: [0003] On the one hand, the underlying distribution or target concept of data in a data stream changes over time, a phenomenon commonly referred to as concept drift. Concept drift will lead to a significant drop in the performance of classification models trained on past data, and the classification accuracy rate will be ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/28G06K9/62
Inventor 李光辉梁斌
Owner JIANGNAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products