Density peak value clustering method and system based on principal component analysis and nearest neighbor graph

A principal component analysis, density peak technology, applied in character and pattern recognition, instruments, computer parts and other directions, can solve the problems of dimensional disaster, undetectable, poor performance, etc., to achieve strong robustness and generalization ability, good handling effect

Inactive Publication Date: 2018-01-09
CHINA UNIV OF MINING & TECH
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

First, the algorithm does not consider the local structure of the data, and the original DPC algorithm cannot detect all clusters; second, the algorithm performs poorly on high-dimensional data, because the DPC algorithm relies too much on the distance between data pairs , and the "curse of dimensionality"

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Density peak value clustering method and system based on principal component analysis and nearest neighbor graph
  • Density peak value clustering method and system based on principal component analysis and nearest neighbor graph
  • Density peak value clustering method and system based on principal component analysis and nearest neighbor graph

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] Such as figure 1 As shown, this embodiment includes the following steps:

[0036] Input: data set χ={x 1 ,x 2 ,...,x n}(x i ∈ R d ), parameter d c .

[0037] Output: divided data classes.

[0038] Step 1: Data preprocessing. Transform the original data into a data set with equal mean and variance χ′={x′ 1 ,x′ 2 ,…,x′ n}(x' i ∈ R d ).

[0039] Step 2: Calculate the covariance matrix. Calculate the covariance matrix Σ of the transformed data according to formula (1).

[0040] Step 3: Find the eigenvectors and eigenvalues ​​of the covariance matrix. Solve for the eigenvalues ​​λ of the covariance matrix Σ i and the eigenvector u i . And the eigenvectors are stacked into a matrix form, denoted by U.

[0041] Step 4: Solve the rotated data. According to formula (2), calculate each data x after rotation rot,i .

[0042] Step 5: Solve the dimensionally reduced data. According to the formula (4), the rotated data x rot,i Dimensionality reduction to the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a density peak value clustering method and system based on principal component analysis and a nearest neighbor graph. The method comprises steps of using the principal componentanalysis to carry out characteristic conversion and characteristic extraction on original data, i.e., carrying out dimension reduction on the original data; then, using an improved local density calculation formula, i.e., by use of the nearest neighbor graph to replace the original mode, solving local density; and by use of solving steps in the original algorithm, finding a clustering center point, thereby finishing the clustering. According to the invention, effects on the algorithm imposed by high-dimension data and local structures in the data are fully considered; and the method and the system have quite high robustness and generalization ability.

Description

technical field [0001] The invention relates to the field of pattern recognition and machine learning, in particular to a density peak clustering method and system based on principal component analysis and nearest neighbor graph. Background technique [0002] The density of cluster analysis is to discover the internal organization of the data set by finding out the structure existing in the data set in the form of "cluster". The term refers to isolated groups of similar data points. Intuitively speaking, the segmentation of clusters has the characteristics of similarity within clusters and dissimilarity between clusters. Therefore, the data data is decomposed into many groups, these groups are composed of similar objects, while different groups contain different elements. This methodology is widely used in multivariate statistics and machine learning. [0003] Traditional clustering is roughly divided into four categories: partition clustering, hierarchical clustering, de...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06K9/62
Inventor 丁世飞其他发明人请求不公开姓名
Owner CHINA UNIV OF MINING & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products