Data stream adaptive clustering method for mixed attributes

An adaptive clustering and data flow technology, applied in the field of clustering, can solve problems such as inability to effectively process mixed attribute data sets, large differences in data set clustering effects, and manual determination of cluster centers, so as to reduce memory overhead, Good applicability and scalability, the effect of good clustering results

Inactive Publication Date: 2017-07-07
ZHEJIANG UNIV OF TECH
View PDF0 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to overcome the deficiencies of most existing data stream clustering methods, the cluster centers need to be manually determined, the clustering accuracy is low, the mixed attribute data sets cannot be effectively processed, the clustering effects of different data sets are very different, and the parameter dependence is large. The present invention provides a mixed-attribute-oriented data flow adaptive clustering method, which has the characteristics of being able to process mixed-attribute data sets, fast processing speed and high accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data stream adaptive clustering method for mixed attributes
  • Data stream adaptive clustering method for mixed attributes
  • Data stream adaptive clustering method for mixed attributes

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The present invention will be further described below in conjunction with the accompanying drawings.

[0032] refer to Figure 1 to Figure 5 , a data flow adaptive clustering method for mixed attributes, including the following steps:

[0033] 1) Data preprocessing and grid initialization, the process is as follows:

[0034] 1.1 For a d-dimensional data, according to the properties of each dimension attribute, the dimension can be divided into two categories: numerical attribute dimension and classification attribute dimension. Categorical attribute data can be further divided into binary data and ordinal data. For a data stream object, it can be determined whether the dimension attribute is a numerical attribute or a classification attribute by querying the definition of each dimension attribute. If it is a classification attribute, it can be further divided into binary attributes or ordinal attributes. After determining the attribute properties of each dimension of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data stream adaptive clustering method for mixed attributes. The method comprises the following steps: 1) data are pretreated and a mesh is initialized, and the partition granularity of the mesh in attributes per dimension and the similarity between mesh objects during an offline clustering process are determined; 2) online maintenance is carried out on the mesh; 3) when a user sends a clustering request, the clustering process is changed to an offline stage from an online stage, the mesh is divided to a dense mesh and a sparse mesh according to the density information of the mesh, as for the dense mesh, an improved DBSCAN algorithm is used for clustering, as for the sparse mesh, a density-distance distribution-based CCFD algorithm is used for clustering, and the finally-obtained clustering result is outputted; and the whole clustering process is completed. Applicability and scalability are good, a related data set can be effectively processed, and a good clustering result is obtained.

Description

technical field [0001] The invention belongs to the field of clustering methods, and relates to a data flow clustering method oriented to mixed attributes. Background technique [0002] With the development of big data technology, the amount of data generated has increased rapidly, and cluster analysis, as an important technology for the analysis of various data, has once again become a research hotspot. Cluster analysis is widely used in various fields such as finance, marketing, information retrieval, information filtering, scientific observation and engineering. Mixed attribute data stream clustering is aimed at mixed attribute data streams. The original data arrives in the form of a huge data stream. Most of the attributes of the data stream have both numerical attributes with continuous values ​​and classification attributes representing categories or states. Mixed attributes of attribute types require preprocessing, clustering and knowledge extraction on the data stre...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06F17/30
CPCG06F16/24568G06F16/285G06F18/23
Inventor 陈晋音林翔郑海斌
Owner ZHEJIANG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products