Unlock instant, AI-driven research and patent intelligence for your innovation.

A method, apparatus, device and medium for improving clustream algorithm

A technology of algorithm and module configuration, applied in computing, computer parts, character and pattern recognition, etc., can solve difficulties, affect the quality and efficiency of macro clustering, and achieve high accuracy

Active Publication Date: 2022-07-05
INSPUR SUZHOU INTELLIGENT TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The determination of k value is difficult for users with insufficient domain knowledge of data flow; the random selection of K-means initial cluster center also affects the quality and efficiency of macro clustering

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method, apparatus, device and medium for improving clustream algorithm
  • A method, apparatus, device and medium for improving clustream algorithm
  • A method, apparatus, device and medium for improving clustream algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0079] According to some embodiments of the method for improving the CluStream algorithm of the present invention, the method further comprises:

[0080] Use the Http interface as the entry point of the data in the data source, and send the data to the kafka cluster through the Http interface.

[0081] In some embodiments of the present invention, the platform uses a common Http interface as the entrance of the data source, which can be compatible with data systems of various data sources. The main process of data entering from Http interface to completing statistics is as follows:

[0082] (1) The data source establishes an Http connection with the platform and sends the data to the platform.

[0083] (2) The load balancing server receives the Http request and allocates the request to the Http server according to the load balancing algorithm.

[0084] (3) The Http server receives the Http request and sends the data to the Kafka cluster.

[0085](4) Spark Streaming reads st...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for improving the CluStream algorithm, comprising: replacing the landmark time window model of the CluStream algorithm with a decay time window model, and introducing a data decay rate as a decay factor to extract micro-clusters; configuring the CluStream algorithm to In the case of storing microcluster snapshots, configure restriction rules for the pyramid time model to eliminate repeated calculations of microcluster snapshots at different levels; introduce the Canopy algorithm to the CluStream algorithm to determine the number of clusters and initial cluster centers, and use Canopy‑ The Kmeans algorithm optimizes offline macro clustering operations; the CluStream algorithm is implemented in parallel on SparkStreaming; the data received from the data source is read from the kafka cluster through SparkStreaming for real-time processing and online analysis. The invention also discloses a device, equipment and medium. The invention can realize real-time statistics and analysis of convection data in a fast, efficient, easy-to-use and high accuracy rate.

Description

technical field [0001] The present invention relates to the field of data mining cluster analysis, more particularly, to a method, device, device and medium for improving CluStream algorithm. Background technique [0002] With the rapid development of information technology, data will appear in the form of streams in many fields. Such data evolves over time, and the data scale will continue to increase. Traditional clustering mining techniques based on static data cannot meet the processing requirements of data streams, and when the scale of data streams is large, it will exceed The computing power of traditional technologies. [0003] In order to realize the requirement of online real-time clustering, the present invention is based on the distributed streaming computing framework Spark Streaming, and improves the traditional CluStream algorithm to overcome the fact that its micro-cluster feature vector cannot reflect the evolution characteristics of data stream in real tim...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/2455G06K9/62
CPCG06F16/24568G06F18/2321
Inventor 熊战磊
Owner INSPUR SUZHOU INTELLIGENT TECH CO LTD