Classified variable clustering method based on attribute weight similarity

A clustering method and a technology for classifying variables, applied in special data processing applications, instruments, electrical digital data processing, etc., to achieve the effects of low time and space complexity, high efficiency, and high accuracy of clustering results

Inactive Publication Date: 2014-09-10
XIAN UNIV OF TECH
View PDF3 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method is generally slower than ROCK due to the fac

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Classified variable clustering method based on attribute weight similarity
  • Classified variable clustering method based on attribute weight similarity
  • Classified variable clustering method based on attribute weight similarity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0051] In this embodiment, the mushroom (mushroom) data set is selected for testing. The mushroom data set contains a total of 8124 types of mushroom information, including 4208 types of poisonous mushrooms and 3916 types of non-toxic mushrooms. This dataset can be downloaded from the UCI official website. Implement ROCK, Squeezer, DNNS and the clustering method (CABAS) of the present invention with Java. The operating system of the experimental environment is Windows7version6.1.7600, and the CPU is Core TM i3-2310M2.1GHz, memory is 4G. For ROCK, Squeezer and the clustering method (CABAS) of the present invention, parameters need to be given, each method is run multiple times, and the optimal result of different parameters is obtained. For DNNS, it is sufficient to end when the cohesion metric function suddenly decreases.

[0052] Implementation steps:

[0053] 1) Apply formula (1) to calculate the similarity between each data point in the mushroom data set. Table 3 is...

Embodiment 2

[0069] In this embodiment, a Hayes-Roth data set is selected for testing. The Hayes dataset contains a total of 132 records. This data set can be downloaded from the official website of UCI, and it is still clustered by several methods in Example 1 in Java language.

[0070] Implementation steps:

[0071] 1) Apply formula (1) to calculate the similarity between each data point in the Hayes-Roth data set. Table 7 is the five attribute values ​​of two data points extracted from the mushroom (mushroom) data set, and Table 8 is the corresponding weight of each attribute |V 1 |. The function in formula (1) The resulting values ​​are: 2, 1, 0, 2, 0. Therefore, Sim(1,2)=5 / 7=0.7142.

[0072] Table 7 Two data points in the Hayes dataset

[0073]

[0074] Table 8 The weight of each attribute in the Hayes dataset

[0075]

[0076] 2) Construct an undirected graph, and use each data point in the Hayes-Roth data set as a node in the graph. If the similarity between two data ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a classified variable clustering method based on the attribute weight similarity. On the basis of the attribute weight similarity, the clustering process is converted into a process of searching for graph connected components, data points of data concentration serve as nodes, when the attribute weight similarity of two data points of data concentration is larger than or equal to theta, it is considered that one connecting line exists between the two data points (the parameter theta is given in advance), and when the attribute weight similarity of two data points of data concentration is smaller than theta, it is considered that no connecting line exists between the two data points. After an undirected graph is determined, each connected component of the undirected graph is a cluster, and the records in the clusters are peaks of the connected components. The classified variable clustering method based on the attribute weight similarity is substantially used for searching for the peaks contained in the connected components of the undirected graph, the clustering process can be guided through the thought of a graph traversal algorithm, and therefore the time and space complexity is low, and the clustering result is high in accuracy.

Description

technical field [0001] The invention belongs to the technical field of computer data processing methods, and relates to a classification variable clustering method based on attribute weight similarity. Background technique [0002] Clustering is an important research topic in data mining. Early clustering methods used distance to measure the dissimilarity between two records, such as k-means, DBSCAN and other methods. For categorical variable data sets, existing standardization methods can be used to convert them into interval scale variables, so that traditional methods can be used for clustering. However, there is usually no quantitative relationship between the attribute values ​​of categorical variables, so the standardization work has a lot of blindness. Therefore, using traditional methods to deal with categorical variables will affect the clustering effect. [0003] The ROCK clustering method proposed by Guha S et al. introduces the concept of link. The introduction...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/285
Inventor 周红芳段文聪周扬
Owner XIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products