Set characteristic vector-based quick clustering method and device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of eigenvectors and clustering methods, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of large differences in the size of clustering results, the impact of clustering results, and large differences in clustering results, etc. , to achieve the effects of rich data types, high clustering stability, and high clustering efficiency

Inactive Publication Date: 2013-05-01

UNIV OF SCI & TECH BEIJING

View PDF5 Cites 16 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the CABOSFV algorithm also has some deficiencies: ①The clustering results are not uniform, the size of each clustering result is very different, and it tends to produce large-scale clusters; ②The clustering results are seriously affected by the upper limit of the difference b, and the setting If the b value is different, the clustering results may be quite different; ③The clustering results are affected by the order of data input, even if the exact same data is input into the CABOSFV algorithm according to different data, different clustering results will be obtained

The above defects make the quality of CABOSFV algorithm clustering results unstable, which seriously restricts the development and application of the algorithm.

In addition, the CABOSFV algorithm can only be applied to data with binary attributes, but not to more general classification attributes and mixed data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0042] Embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. First, a fast clustering method based on set feature vectors according to an embodiment of the present invention will be described.

[0043] figure 1 A schematic flowchart of a fast clustering method based on set feature vectors according to an embodiment of the present invention is shown. figure 2 Then a detailed flowchart is shown. In general, the method includes a data attribute transformation step, a data sorting step, a primary clustering step and a secondary clustering step.

[0044] In step 101, the input mixed attribute data is converted into binary attribute data. For convenience of description, this step is referred to as a data attribute conversion step hereinafter.

[0045] In the data attribute conversion step, both the categorical attributes and the interval attributes in the data need to be converted into binary attributes. The metho...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a set characteristic vector-based quick clustering method and a set characteristic vector-based quick clustering device. The method comprises the following steps of: (1) converting input hybrid attribute data into a binary attribute; (2) sequencing according to an object sparsity index or a non-interference sequence index; (3) independently categorizing a first sequenced object to obtain a set characteristic vector of the first object, then sequentially scanning other objects to be clustered, and determining whether a presently scanned object is incorporated into an established category or an independently established new category by sizes of set difference and set difference upper limit b1 for incorporating the object into the established category; and (4) performing secondary clustering on a primary clustering result obtained by the step (3), and then removing an isolated point in the clustering result to obtain a final clustering result. According to the method and the device, a clustering process can be finished by only performing sequencing and scanning on the data once, the time required by clustering is greatly shortened while the clustering quality is considered, and the clustering result cannot be limited to influence of a data input sequence.

Description

technical field [0001] The invention relates to the technical fields of data mining, cluster analysis, high-dimensional data clustering, etc., and in particular to a fast clustering method and device based on set feature vectors. Background technique [0002] Clustering is one of the most common tasks in the field of data mining, which is used to discover unknown object classes in the data set. [0003] The ability to process high-dimensional data is an important content of clustering research. Many clustering algorithms can generate high-quality clustering results when the dimensionality is relatively low, but it is difficult to apply to high-dimensional data, and sometimes may even produce wrong clustering results. [0004] Before proposing the present invention, we have proposed an effective algorithm—CABOSFV clustering algorithm in the field of high-dimensional data mining, especially in the field of high-dimensional sparse data mining. [0005] The CABOSFV algorithm d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

Inventor 武森姜敏魏桂英鄂旭

Owner UNIV OF SCI & TECH BEIJING

Set characteristic vector-based quick clustering method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology