Information entropy-based self-adaptive integrated classification method of data streams

A classification method, an adaptive technology, applied in special data processing applications, electrical digital data processing, instruments, etc., which can solve problems such as the problem of not considering the recurrence of concepts

Inactive Publication Date: 2018-06-15
XINYANG NORMAL UNIVERSITY
View PDF3 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] However, most of the above algorithms do

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Information entropy-based self-adaptive integrated classification method of data streams
  • Information entropy-based self-adaptive integrated classification method of data streams
  • Information entropy-based self-adaptive integrated classification method of data streams

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0023] (1) Concept detection algorithm based on information entropy

[0024] In information theory, relative entropy (Relative Entropy), also known as Kullback-Leibler divergence, is a measure of the relative gap between two probability distributions in the same event space X. The relative entropy of two probability distributions p(x) and q(x) is defined as:

[0025]

[0026] However, the Kullback-Leibler divergence does not satisfy symmetry and thus is not a strict notion of distance. Jensen-Shannon divergence is a distance measure based on Kullback-Leibler divergence, which solves the asymmetry problem of Kullback-Leibler divergence. The Jensen-Shannon divergence in information theory can well represent the relationship between two data distributions, so the present invention proposes a concept detection algorithm based...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an information entropy-based self-adaptive integrated classification method of data streams. Concept drift can be detected, and duplicate concepts can also be identified. In asystem, a new classifier is reconstructed and put into a classifier pool only when existence of a new concept is detected, the problem of duplicate training caused by duplicate concept appearance is prevented, model updating frequency is reduced, and real-time classification ability and classification effect of a model are improved. Through carrying out performance analysis comparison with classical data stream algorithms on a synthetic dataset and a real dataset, experiments show that the method of the invention can cope with multiple types of concept drift, improves anti-noise ability of theclassification model, and also has lower time cost consumption on the premise of ensuring higher classification accuracy. The method of the invention can be applied to many practical problems of sensor network anomaly detection, credit-card fraud behavior detection, weather forecasting, electricity price prediction and the like.

Description

technical field [0001] The invention belongs to the technical field of data mining and machine learning, relates to a data flow integration classification method facing concept drift environment, and in particular proposes a detection system capable of processing recurring concepts. Experimental results show that the proposed method has obvious advantages in average classification accuracy, consumes less time than other ensemble algorithms, is suitable for various types of concept drift environments, and has high noise immunity. The system can be applied to many practical application problems such as sensor network anomaly detection, credit card fraud detection, weather forecast and electricity price forecast. Background technique [0002] In many practical application problems in the real world, data is continuously generated in the form of streams. This fast-arriving, real-time, continuous and unbounded data sequence is called Data Streams. In a real data flow environmen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/24568
Inventor 孙艳歌卲罕刘宏兵冯岩王淑礼姚建峰
Owner XINYANG NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products