Nonnegative matrix factorization-based dimensionality reducing method used for clustering

A non-negative matrix factorization and clustering technology, applied in the field of dimensionality reduction based on non-negative matrix factorization, it can solve the problems of dimensional disaster, multi-computing time, high storage and computing costs, and achieve the effect of good sparsity

Inactive Publication Date: 2010-10-06
FUDAN UNIV
View PDF3 Cites 31 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, data features are generally high-dimensional, and high-dimensional data features usually bring two problems: 1) high storage and calculation costs, 2) dimensionality disaster problem
First, they complicate the original NNMF method to varying degrees, so the final update rule requires more computation time than the original method
Secondly, the iterative update method shows that both matrices C and M are functions of themselves, so C and M need to be initialized at the same time, too many initials may not produce more effective matrix decomposition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Nonnegative matrix factorization-based dimensionality reducing method used for clustering
  • Nonnegative matrix factorization-based dimensionality reducing method used for clustering
  • Nonnegative matrix factorization-based dimensionality reducing method used for clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0053] Embodiment 1 can more intuitively illustrate the characteristics of the matrices X, C and M. matrix

[0054] A=20001032100232020002

[0055] It is a simple data matrix. For text data, each column corresponds to a different word, and each row represents a statistical vector of the number of occurrences of different words in a certain text. Our task is to group similar texts together in a compressed low-dimensional space. In order to simplify the above problem, the data in data matrix A can be binarized first, that is,

[0056] A ij = 1 if A ij > 0 0 else

[0057] Based on this, NCMF can be decomposed as follows,

[0058] 1 / 20001 / 201 / 21 / 21 / 2001 / 21 / 21 / 201 / 20001 / 2=1 / 2001 / 201 / 21 / 201000101110---(8)

[0059] When the three matrices corresponding to the above decomposition are represented by X, C, and M respectively, X=CM is obtained. The matrix X is composed of 4 sample row vectors, each sample has 5 dimensions (columns), and the...

Embodiment 2

[0060] Example 2 illustrates the experimental results on the gait data set [17]. The original data of this data set is some video data, and the frame data extracted from it to eliminate the background are as attached figure 2 . In order to apply these gait data, the data information can be further extracted to form a data matrix with several sample feature vectors for each ID number (corresponding to a person). In order to illustrate the dimensionality reduction characteristics of the NCMF algorithm, the present invention uses the first 6 ID numbers from the most samples in the galData data to perform cluster analysis on the corresponding data. For the related results, we use three measurement methods, running time, sparsity and clustering accuracy, which are measured in seconds, information entropy, and purity.

[0061] The data after dimensionality reduction is clustered using the same class k-means algorithm SIB [18], and the clustering accuracy is measured by purity. If t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of statistical model identification and machine learning, and in particular discloses a nonnegative matrix factorization-based dimensionality reducing method used for clustering. The method comprises the following steps of: adopting a KL distance; adding a data normalization constraint; directly discovering the internal relation among data dimensionalities by minimizing an object error function between data compression and reconstruction; obtaining a mapping matrix; and projecting high-dimensional data to a low-dimensional subspace by using the mapping matrix so as to perform effective data analysis such as clustering and the like. A simpler iterative formula compared with an original factorization method is obtained by the method; and normalization can be maintained naturally in each iterative updating. The normalization makes a final mapping matrix have higher sparsity compared with the original factorization method. In the obtained low-dimensional space, a clustering result shows that a more effective low-dimensional data characteristic can be obtained by the method; and an algorithm is simple and effective.

Description

technical field [0001] The invention belongs to the technical field of statistical pattern recognition and machine learning, and in particular relates to a dimensionality reduction method for clustering based on non-negative matrix decomposition. Background technique [0002] Clustering is one of the most fundamental research tasks in the field of machine learning. In practical applications, each dimension of the data represents a related feature. Usually, it is difficult to simply judge which features are conducive to clustering. A common method is to collect as many data features as possible and then perform clustering. Therefore, data features are generally high-dimensional, and high-dimensional data features usually bring two problems: 1) high storage and calculation costs, and 2) the curse of dimensionality. In practical applications, the curse of dimensionality problem is one of the main problems faced by many pattern recognition methods, such as gait recognition, im...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F15/18
CPCG06K9/6232G06F18/213
Inventor 郭跃飞朱真峰薛向阳
Owner FUDAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products