Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A fast clustering method and device based on set feature vector

A technology of eigenvectors and clustering methods, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as the influence of clustering results, large differences in the scale of clustering results, and large differences in clustering results, etc. , to achieve the effects of high clustering stability, rich data types, and high clustering efficiency

Inactive Publication Date: 2016-03-02
UNIV OF SCI & TECH BEIJING
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the CABOSFV algorithm also has some deficiencies: ①The clustering results are not uniform, the size of each clustering result is very different, and it tends to produce large-scale clusters; ②The clustering results are seriously affected by the upper limit of the difference b, and the setting If the b value is different, the clustering results may be quite different; ③The clustering results are affected by the order of data input, even if the exact same data is input into the CABOSFV algorithm according to different data, different clustering results will be obtained
The above defects make the quality of CABOSFV algorithm clustering results unstable, which seriously restricts the development and application of the algorithm.
In addition, the CABOSFV algorithm can only be applied to data with binary attributes, but not to more general classification attributes and mixed data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A fast clustering method and device based on set feature vector
  • A fast clustering method and device based on set feature vector
  • A fast clustering method and device based on set feature vector

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] Embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. First, a fast clustering method based on set feature vectors according to an embodiment of the present invention will be described.

[0043] figure 1 A schematic flowchart of a fast clustering method based on set feature vectors according to an embodiment of the present invention is shown. figure 2 Then a detailed flowchart is shown. In general, the method includes a data attribute transformation step, a data sorting step, a primary clustering step and a secondary clustering step.

[0044] In step 101, the input mixed attribute data is converted into binary attribute data. For convenience of description, this step is referred to as a data attribute conversion step hereinafter.

[0045] In the data attribute conversion step, both the categorical attributes and the interval attributes in the data need to be converted into binary attributes. The metho...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a set characteristic vector-based quick clustering method and a set characteristic vector-based quick clustering device. The method comprises the following steps of: (1) converting input hybrid attribute data into a binary attribute; (2) sequencing according to an object sparsity index or a non-interference sequence index; (3) independently categorizing a first sequenced object to obtain a set characteristic vector of the first object, then sequentially scanning other objects to be clustered, and determining whether a presently scanned object is incorporated into an established category or an independently established new category by sizes of set difference and set difference upper limit b1 for incorporating the object into the established category; and (4) performing secondary clustering on a primary clustering result obtained by the step (3), and then removing an isolated point in the clustering result to obtain a final clustering result. According to the method and the device, a clustering process can be finished by only performing sequencing and scanning on the data once, the time required by clustering is greatly shortened while the clustering quality is considered, and the clustering result cannot be limited to influence of a data input sequence.

Description

technical field [0001] The invention relates to the technical fields of data mining, cluster analysis, high-dimensional data clustering, etc., and in particular to a fast clustering method and device based on set feature vectors. Background technique [0002] Clustering is one of the most common tasks in the field of data mining, which is used to discover unknown object classes in the data set. [0003] The ability to process high-dimensional data is an important content of clustering research. Many clustering algorithms can generate high-quality clustering results when the dimensionality is relatively low, but it is difficult to apply to high-dimensional data, and sometimes may even produce wrong clustering results. [0004] Before proposing the present invention, we have proposed an effective algorithm—CABOSFV clustering algorithm in the field of high-dimensional data mining, especially in the field of high-dimensional sparse data mining. [0005] The CABOSFV algorithm d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 武森姜敏魏桂英鄂旭
Owner UNIV OF SCI & TECH BEIJING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products