Classified variable clustering method based on attribute weight similarity

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A clustering method and a technology for classifying variables, applied in special data processing applications, instruments, electrical digital data processing, etc., to achieve the effects of low time and space complexity, high efficiency, and high accuracy of clustering results

Inactive Publication Date: 2014-09-10

XIAN UNIV OF TECH

View PDF3 Cites 9 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, this method is generally slower than ROCK due to the fact that more neighbors are considered during the execution of DNNS.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0051] In this embodiment, the mushroom (mushroom) data set is selected for testing. The mushroom data set contains a total of 8124 types of mushroom information, including 4208 types of poisonous mushrooms and 3916 types of non-toxic mushrooms. This dataset can be downloaded from the UCI official website. Implement ROCK, Squeezer, DNNS and the clustering method (CABAS) of the present invention with Java. The operating system of the experimental environment is Windows7version6.1.7600, and the CPU is Core TM i3-2310M2.1GHz, memory is 4G. For ROCK, Squeezer and the clustering method (CABAS) of the present invention, parameters need to be given, each method is run multiple times, and the optimal result of different parameters is obtained. For DNNS, it is sufficient to end when the cohesion metric function suddenly decreases.

[0052] Implementation steps:

[0053] 1) Apply formula (1) to calculate the similarity between each data point in the mushroom data set. Table 3 is...

Embodiment 2

[0069] In this embodiment, a Hayes-Roth data set is selected for testing. The Hayes dataset contains a total of 132 records. This data set can be downloaded from the official website of UCI, and it is still clustered by several methods in Example 1 in Java language.

[0070] Implementation steps:

[0071] 1) Apply formula (1) to calculate the similarity between each data point in the Hayes-Roth data set. Table 7 is the five attribute values of two data points extracted from the mushroom (mushroom) data set, and Table 8 is the corresponding weight of each attribute |V 1 |. The function in formula (1) The resulting values are: 2, 1, 0, 2, 0. Therefore, Sim(1,2)=5 / 7=0.7142.

[0072] Table 7 Two data points in the Hayes dataset

[0073]

[0074] Table 8 The weight of each attribute in the Hayes dataset

[0075]

[0076] 2) Construct an undirected graph, and use each data point in the Hayes-Roth data set as a node in the graph. If the similarity between two data ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a classified variable clustering method based on the attribute weight similarity. On the basis of the attribute weight similarity, the clustering process is converted into a process of searching for graph connected components, data points of data concentration serve as nodes, when the attribute weight similarity of two data points of data concentration is larger than or equal to theta, it is considered that one connecting line exists between the two data points (the parameter theta is given in advance), and when the attribute weight similarity of two data points of data concentration is smaller than theta, it is considered that no connecting line exists between the two data points. After an undirected graph is determined, each connected component of the undirected graph is a cluster, and the records in the clusters are peaks of the connected components. The classified variable clustering method based on the attribute weight similarity is substantially used for searching for the peaks contained in the connected components of the undirected graph, the clustering process can be guided through the thought of a graph traversal algorithm, and therefore the time and space complexity is low, and the clustering result is high in accuracy.

Description

technical field [0001] The invention belongs to the technical field of computer data processing methods, and relates to a classification variable clustering method based on attribute weight similarity. Background technique [0002] Clustering is an important research topic in data mining. Early clustering methods used distance to measure the dissimilarity between two records, such as k-means, DBSCAN and other methods. For categorical variable data sets, existing standardization methods can be used to convert them into interval scale variables, so that traditional methods can be used for clustering. However, there is usually no quantitative relationship between the attribute values of categorical variables, so the standardization work has a lot of blindness. Therefore, using traditional methods to deal with categorical variables will affect the clustering effect. [0003] The ROCK clustering method proposed by Guha S et al. introduces the concept of link. The introduction...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

CPCG06F16/285

Inventor周红芳段文聪周扬

OwnerXIAN UNIV OF TECH

Classified variable clustering method based on attribute weight similarity

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology