Methods and apparatus for data stream clustering for abnormality monitoring

a data stream and abnormality monitoring technology, applied in the field of clustering data streams, can solve the problems of abnormality monitoring, outliers, and special problems, and achieve the effect of better understanding and analysis of data streams

Inactive Publication Date: 2005-09-22
IBM CORP
View PDF8 Cites 56 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0009] Thus, a framework may be provided in which select statistical data may be stored at regular intervals. This results in a technique which is able to analyze different characteristics of the clusters in an effective manner. Advantageously, the inventive techniques may be useful for clustering different kinds of categorical data sets, and adapting to the rapidly evolving nature of a data stream.
[0010] Additional advantages of the inventive techniques of the present invention

Problems solved by technology

However, these techniques cannot be utilized for clustering data streams, since they do not naturally scale well with increasing data size.
Clustering and outlier monitoring present a number of unique challenges in an evolving data stream environment.
In the data stream environment, outlier and abno

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Methods and apparatus for data stream clustering for abnormality monitoring
  • Methods and apparatus for data stream clustering for abnormality monitoring
  • Methods and apparatus for data stream clustering for abnormality monitoring

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The following description will illustrate the invention using an exemplary data processing system architecture. It should be understood, however, that the invention is not limited to use with any particular system architecture. The invention is instead more generally applicable to any data processing system in which it is desirable to perform efficient and effective data stream clustering. It is to be understood that the phrase “data point,” illustratively used herein, is one example of a data “object.”

[0020] As will be illustrated in detail below, the present invention introduces techniques for clustering a data stream and, more particularly, techniques for monitoring data abnormalities in the stream through the clustering of the data stream. An abnormality, as referred to herein, is defined as an outlier cluster or outlier data point of the data stream having specifically defined values in the stored statistical data of the data point or cluster. The stored statistical data...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Techniques for monitoring abnormalities in a data stream are provided. A plurality of objects are received from the data stream and one or more clusters are created from these objects. At least a portion of the one or more clusters have statistical data of the respective cluster. It is determined from the statistical data whether one or more abnormalities exist in the data stream.

Description

FIELD OF THE INVENTION [0001] The present invention is related to techniques for clustering a data stream and, more particularly, techniques for monitoring data abnormalities in the stream through the clustering of the data stream. BACKGROUND OF THE INVENTION [0002] In general, large volumes of continuously evolving data, which may be stored, is referred to as a data stream. Data streams have received increased attention in recent years due to technological innovations, which have facilitated the creation, maintenance and storage of such data. A number of data mining studies have been conducted in the data stream context in recent years, see, e.g., C. C. Aggarwal, “A Framework for Diagnosing Changes in Evolving Data Streams,” ACM SIGMOD Conference, 2003; B. Babcock et al., “Models and Issues in Data Stream Systems,” ACM PODS Conference, 2002; P. Domingos et al., “Mining High-Speed Data Streams,” ACM SIGKDD Conference, 1998; S. Guha et al., “ROCK: A Robust Clustering Algorithm for Ca...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06K9/62
CPCG06K9/6284G06F18/2433
Inventor AGGARWAL, CHARU C.YU, PHILIP SHI-LUNG
Owner IBM CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products