A distributed data stream clustering method and system

A data stream clustering and data stream technology, applied in digital transmission systems, transmission systems, electrical digital data processing, etc. high performance effects
CN102915347BActive Publication Date: 2016-10-12CHINA INFORMATION TECH SECURITY EVALUATION CENT +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA INFORMATION TECH SECURITY EVALUATION CENT
Publication Date
2016-10-12

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses distributed data stream clustering method and system and overcomes the defect that the existing most data steam clustering algorithms are unable to run in the distributed cloud environment, unable to easily extend and low in operational time efficiency. The method includes: summarizing data streams to obtain a plurality of eigenvectors of the data streams; performing locality-sensitive hashing algorithm to obtain a plurality of clusters with each comprising at least one eigenvector, and selecting at least one cluster as a candidate cluster; periodically using the candidate cluster to cluster eigenvectors of newly arrived data streams. The real-time performance better than that of the prior art is guaranteed by the use of the method and system based on the locality-sensitive hashing algorithm.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to a data stream clustering technology, in particular to a distributed data stream clustering method and system. Background technique

[0002] In recent years, with the wide application of computer technology and network technology in industrial production, information processing and other fields, data is no longer limited to traditional static forms such as files and databases. A continuous, unbounded, and variable-speed streaming data has appeared in more and more application fields. These application areas are usually systems with multiple data sources, such as intrusion detection systems, e-commerce, telecommunications, distributed sensor networks, meteorological monitoring, real-time analysis of scientific data, and peer-to-peer (P2P) computing and other application scenarios. In these applications, a large amount of high-dimensional data flows to the data collection center at a high speed, and clustering such data in real t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More