Double-degree integrated unbalanced data stream classification algorithm

A technology of balanced data flow and classification algorithm, applied in the direction of electrical digital data processing, special data processing application, calculation, etc., can solve the problem of reducing the classification accuracy of most classes

Active Publication Date: 2014-02-19
HENAN UNIVERSITY
View PDF4 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although existing classification algorithms can improve the classification accuracy of minority

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Double-degree integrated unbalanced data stream classification algorithm
  • Double-degree integrated unbalanced data stream classification algorithm
  • Double-degree integrated unbalanced data stream classification algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Such as figure 1 As shown, a double-degree integrated unbalanced data flow classification algorithm uses the pane data flow model to sequentially cut the data flow into data flow record blocks of equal size, and each data flow record block has the same number of data records. The parameters used in this patent mainly include: b: the number of data records in the data stream record block. s: The sample size for sampling, s < b, s is also the size of the minority record container. n: The number of dataflow record blocks that the pane dataflow model can hold. Specifically include the following steps:

[0030] A: Balanced data flow classification model and unbalanced data flow classification model training phase: For each latest data flow record block, divide the data flow record block into training data set Tr and verification data at a ratio of 90% and 10%. Set two parts Va, and train a balanced data flow classification model and an unbalanced data flow classification ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a double-degree integrated unbalanced data stream classification algorithm. The double-degree integrated unbalanced data stream classification algorithm includes a balanced data stream classification model prediction stage, a classification reliability evaluation stage and an unbalanced data stream classification model prediction stage. In the balanced data stream classification model prediction stage, firstly, a balanced data stream classification model predicts the classification of each data record. In the classification reliability evaluation stage, reliability evaluation is conducted on the classification results obtained in the balanced data stream classification model prediction stage, the classification results of the records with high reliability are directly sent back to a user, and the data records with low reliability need to be classified again in the unbalanced data stream classification model prediction stage. The method embodied in the double-degree integrated unbalanced data stream classification algorithm can be widely applied to applications such as computer-assisted clinical diagnosis and real-time intrusion detection, and the invention belongs to the field of artificial intelligence applications.

Description

technical field [0001] The invention relates to a data flow classification algorithm, in particular to a double-degree integrated unbalanced data flow classification algorithm. Background technique [0002] In recent years, data mining technology has been increasingly used in various industries, including computer-aided clinical diagnosis, Internet-based recommendation system and advertising system, customer classification, financial data analysis and abnormal transaction monitoring, etc. This industry-oriented The intelligent analysis and decision-making system has been widely accepted by people. [0003] In many practical applications, the distribution of data is unbalanced, also known as the distribution is skewed, for example, 90% of the data records belong to category A, which is called the majority category; and only 10% of the data records belong to category B, so Also known as B is the minority class. For example, in the application of financial data analysis, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/24568
Inventor 张重生
Owner HENAN UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products