Data flow concept drift detection method and system

A technology of concept drift and detection method, applied in the field of computer, which can solve the problems of incomplete detection, inability to comprehensively analyze, and the need for manual labor.

Active Publication Date: 2013-10-09
SOUTH CHINA NORMAL UNIVERSITY
View PDF4 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method obviously has a disadvantage: manual participation is required
The disadvantage of this scheme is that the data originally belonging to a cluster will be split into different grids, and classification for each grid may lead to incomplete detection of cluster information and cannot be comprehensively analyzed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data flow concept drift detection method and system
  • Data flow concept drift detection method and system
  • Data flow concept drift detection method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] figure 1 It is a flow chart of the steps of a data flow concept drift detection method of the present invention, and a data flow concept drift detection method of the present invention includes the following steps:

[0060] A. According to the cluster set, the old data set and the data set to be detected, calculate the sum of the squares of the distances of the cluster tolerance point set of the old data set and the sum of the squares of the distances of the cluster tolerance point set of the data set to be detected;

[0061] B. Calculate the cluster evolution value of each cluster in the cluster according to the decay function and the data set to be detected;

[0062] C. Obtain the cluster intolerable point set corresponding to the data set to be detected through analysis and divide its data points to form a new cluster, and then calculate the new cluster acceptance value of each cluster in the new cluster;

[0063] D. Calculate the concept drift level value based on ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data flow concept drift detection method and system. The method analyzes a cluster center, cluster disappearance and cluster new increasing to detect the level of concept drift. The system comprises a cluster center analyzing unit, a cluster disappearance analyzing unit, a cluster new increasing and a concept drift level analyzing unit. The data flow concept drift detection method and system can recognize concept drift from multiple aspects, achieves accurate quantization on concept drift evaluation indexes of a data set to be detected, can comprehensively analyze concept drift situations and can accurately detect the level of concept drift. The data flow concept drift detection method and system is applied to detection of data evolution.

Description

technical field [0001] The invention relates to the field of computers, in particular to a method and system for detecting data flow concept drift. Background technique [0002] The concept drift of data is the second largest research problem in data stream processing. Currently, there are several schemes for data flow concept drift detection: statistics-based, classifier-based, and partition-based. Data concept drift can be divided into two types according to the intensity of the evolution process: gradual and sudden. The first method is statistically based on a density-based evaluation technique for binary-represented data, the second method based on classifiers is a detection scheme that evaluates the average margin of a linear classifier, and the third method is based on classification The average error rate of detectors for concept drift detection on data. [0003] However, for a purely statistically based detection scheme, this method cannot well reflect the cluster...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/00
Inventor 赵淦森虞海王维栋卓超
Owner SOUTH CHINA NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products