Mixed attribute data flow clustering method for automatically determining clustering center based on density

A data stream clustering and automatic determination technology, applied in the field of data clustering, can solve the problems of poor ability to deal with outliers and low clustering accuracy, and achieve the problem of reducing parameter sensitivity, good clustering effect, and good applicability. and the effect of scalability

Inactive Publication Date: 2015-12-09
ZHEJIANG UNIV OF TECH
View PDF0 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to overcome the shortcomings of the existing mixed-attribute data stream clustering algorithm, such as low clustering accuracy and poor ability to deal with outliers, the present invention provides a density-based automatic determination of cluster centers with high precision and good ability to deal with outliers Mixed attribute data stream clustering method based on

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mixed attribute data flow clustering method for automatically determining clustering center based on density
  • Mixed attribute data flow clustering method for automatically determining clustering center based on density
  • Mixed attribute data flow clustering method for automatically determining clustering center based on density

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The present invention will be further described below in conjunction with the accompanying drawings.

[0036] refer to Figure 1 to Figure 6 , a method for clustering mixed attribute data streams automatically determined based on density cluster centers, a method for clustering mixed attribute data streams automatically determined based on density cluster centers, comprising the following steps:

[0037] 1) Initialization. Use the New-FSFDP algorithm to cluster the first Ninit data objects in the data stream to generate the initial dense micro-clusters to initialize the entire online processing process. Take the average radius of all dense microclusters generated as the initial ε;

[0038] 1.1 Determine the corresponding distance calculation method for data D according to the mixed attribute occupancy analysis results, using the formula ρ i =Σ j f(d ij -d c ) and formula Calculate ρ for each data object i i and δ i ; where ρ i and δ i represent the density an...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a mixed attribute data flow clustering method for automatically determining a clustering center based on density, and the method comprises the following steps: 1) initialization: carrying out the clustering of initial Ninit data objects in a data flow through a New-FSFDP algorithm, generating initial intensive micro-clusters, so as to initialize the whole on-line process and enable the mean radius of all generated initial intensive micro-clusters to serve as an initial epsilon; 2) on-line maintenance; 3) off-line clustering. The method is higher in precision, and is good in processing capability of off-group points.

Description

technical field [0001] The invention relates to a data clustering method. Background technique [0002] With the continuous development of communication technology and hardware equipment, data stream mining technology has great application prospects in real-time monitoring systems, meteorological satellite remote sensing, and network traffic monitoring. The algorithm cannot be applied to the data stream object, and the data stream puts forward the following new requirements for the clustering algorithm: 1. There is no need to assume the number of natural clusters; 2. It can find clusters of arbitrary shapes; 3. It has the ability to deal with outliers. Moreover, most of the data streams in reality are mixed attribute data streams, which contain both numerical attribute data and categorical attribute data. How to effectively mine valuable information from this mixed attribute data stream has become particularly important. important. [0003] In recent years, data clustering...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23211
Inventor 陈晋音何辉豪陈军敢杨东勇
Owner ZHEJIANG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products