Dynamic streaming data clustering method

A technology of streaming data and clustering method, applied in the field of clustering, can solve the problems of poor clustering effect and inability to better reflect the classification characteristics of data, and achieve the effect of ideal clustering effect and high clustering quality.

Pending Publication Date: 2017-10-20
CHENGDU SEFON SOFTWARE CO LTD
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, these methods use time as a separate field or dimension and integrate it into the original data. In fact, they only upgrade the original data to a dimension for clustering calculations.
This will cause a problem. Some business scenarios are originally changing according to time, and time is only a common dimension. The clustering effect is not good, and it cannot better reflect the classification characteristics of the data.
[0006] In summary, when the number of data changes over time, or is added, or decreased, or the data changes in the middle, using traditional K-means, X-means and other methods cannot effectively deal with the above complex effective data clustering in different situations, and currently there is no better solution for real-time streaming data in the industry

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Dynamic streaming data clustering method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0037] like figure 1 , a clustering method for dynamic streaming data, which includes the following steps:

[0038] S1: Extract the time field, convert the data into time field data, and extract the time field separately;

[0039] S2: Construct a time slice, and construct a time slice after sorting the time field;

[0040] S3: Determine data points, locate and identify each data;

[0041] S4: time slice, data union, and mark the time slice without corresponding data;

[0042] S5: Build a training model and build HMM predictions for missing data;

[0043] S6: Check the validity of the data, and add time slices for repeated data points;

[0044] S7: Eliminate abnormal data, and check whether there is abnormal data fluctuation according to all time slices;

[0045] S8: Centroid data clustering.

[0046] The data in the data to be analyzed described in step S1 should be structured data, and the data fields and structure should be clear and effective. The specific implementa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a dynamic streaming data clustering method, which comprises the steps of converting structured data into time field streaming data, sorting the time field streaming data according to time fields so as to acquire time slices, and solving a union set; building a training model, and building HMM prediction for the missing data; checking the data validity, and adding time slices for repeated data points; eliminating abnormal data, checking whether data with abnormal fluctuations exists or not according to all of the time slices; and performing mass center data clustering. According to the invention, special optimization is performed in allusion to characteristics of the data, an HMM is adopted to perform prediction in allusion to the missing data, and processing is performed in allusion to repeated data with the same identification in the same time slice, so that time-varying characteristics of the data can be reflected more accurately, abnormal data can be distinguished, the number of clustering categories is optimized automatically, and a high-quality clustering result is acquired.

Description

technical field [0001] The invention relates to a clustering method, in particular to a clustering method for dynamic stream data. Background technique [0002] Clustering algorithm is one of several major methods in the field of data mining, such as classification, clustering, regression, and factor analysis. In the era of big data, clustering algorithms are used to analyze massive data to obtain better decision-making capabilities. The advantages of clustering algorithms can handle unsupervised machine learning and actively classify unlabeled data. With the deepening of research on clustering algorithms, researchers have proposed more and more different clustering algorithms, including partition-based clustering, grid-based clustering, and hierarchical-based clustering. These algorithms are proposed for data sets of different dimensions, scales, and types. For the same data set, different clustering algorithms are used, and the results obtained may vary greatly. [0003]...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/232
Inventor 蓝科王纯斌王勇覃进学
Owner CHENGDU SEFON SOFTWARE CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products