Distributed data stream clustering method and system

A data flow clustering and data flow technology, applied in digital transmission systems, transmission systems, electrical digital data processing, etc., can solve the problems of not being easy to expand, poor running time efficiency, etc., to achieve good scalability, improve efficiency, high performance effect

Active Publication Date: 2013-02-06
CHINA INFORMATION TECH SECURITY EVALUATION CENT +1
View PDF4 Cites 45 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] The technical problem to be solved by the present invention is to overcome the shortcomings that most of the current data flow clustering algorithms cannot run in a distributed cloud environment and cannot be easily expanded, and the running time efficiency is poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed data stream clustering method and system
  • Distributed data stream clustering method and system
  • Distributed data stream clustering method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0057] The implementation of the present invention will be described in detail below in conjunction with the accompanying drawings and examples, so as to fully understand and implement the process of how to apply technical means to solve technical problems and achieve technical effects in the present invention. The embodiments of the present application and the combinations of the various features in the embodiments without conflict are within the protection scope of the present invention.

[0058] Additionally, what is shown in the flowcharts of the figures may be implemented in a computer system, such as a set of computer-executable instructions. Also, although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that shown or described herein.

[0059] The current clustering algorithms are generally divided into two categories, one is the partitioning clustering algorithm, and the other is the hie...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses distributed data stream clustering method and system and overcomes the defect that the existing most data steam clustering algorithms are unable to run in the distributed cloud environment, unable to easily extend and low in operational time efficiency. The method includes: summarizing data streams to obtain a plurality of eigenvectors of the data streams; performing locality-sensitive hashing algorithm to obtain a plurality of clusters with each comprising at least one eigenvector, and selecting at least one cluster as a candidate cluster; periodically using the candidate cluster to cluster eigenvectors of newly arrived data streams. The real-time performance better than that of the prior art is guaranteed by the use of the method and system based on the locality-sensitive hashing algorithm.

Description

technical field [0001] The invention relates to a data stream clustering technology, in particular to a distributed data stream clustering method and system. Background technique [0002] In recent years, with the wide application of computer technology and network technology in industrial production, information processing and other fields, data is no longer limited to traditional static forms such as files and databases. A continuous, unbounded, and variable-speed streaming data has appeared in more and more application fields. These application areas are usually systems with multiple data sources, such as intrusion detection systems, e-commerce, telecommunications, distributed sensor networks, meteorological monitoring, real-time analysis of scientific data, and peer-to-peer (P2P) computing and other application scenarios. In these applications, a large amount of high-dimensional data flows to the data collection center at a high speed, and clustering such data in real t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCH04L9/3236
Inventor 吴世忠曲武李世贤王君鹤偰赓陈巍
Owner CHINA INFORMATION TECH SECURITY EVALUATION CENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products