Mixed data stream clustering method based on merging and pruning

A technology of data flow clustering and data flow, which is applied in the fields of cluster analysis and data flow mining, which can solve the time cost and space cost of maintenance, restrict the application of data flow clustering, and only focus on numerical data or classified data, etc. question
CN112685569AInactive Publication Date: 2021-04-20ZHEJIANG GONGSHANG UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ZHEJIANG GONGSHANG UNIVERSITY
Publication Date
2021-04-20
Estimated Expiration
Not applicable · inactive patent

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a mixed data stream clustering method based on merging and pruning, which comprises the following steps: converting a classification attribute value into a numerical attribute by using an important measurement criterion, normalizing data, and then reducing the dimension of the data by using a principal component analysis method. The mixed data stream clustering method adopts an online / offline two-stage processing framework. In the online stage, a new micro-cluster feature vector is adopted as a data structure to store data flow summary information, the data flow summary information required in the offline stage is dynamically maintained through a micro-cluster merging algorithm and a micro-cluster pruning algorithm, and the evolution process of the data flow is accurately reflected. In the offline stage, a density peak clustering method is adopted, the micro-clusters are used as virtual objects for clustering, and a final clustering result is obtained.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the technical fields of data stream mining and cluster analysis, in particular to a clustering method for mixed data streams based on merging and pruning. Background technique

[0002] In today's era, people generate a variety of data streams through the use of the Internet. Data streams often have the characteristics of infinite, continuous, fast arrival, concept drift, etc. These characteristics make data stream mining face great challenges. In practical applications, the data that needs to be analyzed is often unlabeled, and the cost of obtaining data stream class labels is very high. Therefore, data stream clustering, as an unsupervised learning algorithm, has attracted extensive attention from researchers and has become an important topic in this context. a research hotspot.

[0003] So far, most studies on data stream clustering have only focused on numerical data or categorical data, but not both. Mixed data streams a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More