Nonnegative matrix factorization-based dimensionality reducing method used for clustering

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A non-negative matrix factorization and clustering technology, applied in the field of dimensionality reduction based on non-negative matrix factorization, it can solve the problems of dimensional disaster, multi-computing time, high storage and computing costs, and achieve the effect of good sparsity

Inactive Publication Date: 2010-10-06

FUDAN UNIV

View PDF3 Cites 31 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Therefore, data features are generally high-dimensional, and high-dimensional data features usually bring two problems: 1) high storage and calculation costs, 2) dimensionality disaster problem

First, they complicate the original NNMF method to varying degrees, so the final update rule requires more computation time than the original method

Secondly, the iterative update method shows that both matrices C and M are functions of themselves, so C and M need to be initialized at the same time, too many initials may not produce more effective matrix decomposition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0053] Embodiment 1 can more intuitively illustrate the characteristics of the matrices X, C and M. matrix

[0054] A=20001032100232020002

[0055] It is a simple data matrix. For text data, each column corresponds to a different word, and each row represents a statistical vector of the number of occurrences of different words in a certain text. Our task is to group similar texts together in a compressed low-dimensional space. In order to simplify the above problem, the data in data matrix A can be binarized first, that is,

[0056] A ij = 1 if A ij > 0 0 else

[0057] Based on this, NCMF can be decomposed as follows,

[0058] 1 / 20001 / 201 / 21 / 21 / 2001 / 21 / 21 / 201 / 20001 / 2=1 / 2001 / 201 / 21 / 201000101110---(8)

[0059] When the three matrices corresponding to the above decomposition are represented by X, C, and M respectively, X=CM is obtained. The matrix X is composed of 4 sample row vectors, each sample has 5 dimensions (columns), and the...

Embodiment 2

[0060] Example 2 illustrates the experimental results on the gait data set [17]. The original data of this data set is some video data, and the frame data extracted from it to eliminate the background are as attached figure 2 . In order to apply these gait data, the data information can be further extracted to form a data matrix with several sample feature vectors for each ID number (corresponding to a person). In order to illustrate the dimensionality reduction characteristics of the NCMF algorithm, the present invention uses the first 6 ID numbers from the most samples in the galData data to perform cluster analysis on the corresponding data. For the related results, we use three measurement methods, running time, sparsity and clustering accuracy, which are measured in seconds, information entropy, and purity.

[0061] The data after dimensionality reduction is clustered using the same class k-means algorithm SIB [18], and the clustering accuracy is measured by purity. If t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the technical field of statistical model identification and machine learning, and in particular discloses a nonnegative matrix factorization-based dimensionality reducing method used for clustering. The method comprises the following steps of: adopting a KL distance; adding a data normalization constraint; directly discovering the internal relation among data dimensionalities by minimizing an object error function between data compression and reconstruction; obtaining a mapping matrix; and projecting high-dimensional data to a low-dimensional subspace by using the mapping matrix so as to perform effective data analysis such as clustering and the like. A simpler iterative formula compared with an original factorization method is obtained by the method; and normalization can be maintained naturally in each iterative updating. The normalization makes a final mapping matrix have higher sparsity compared with the original factorization method. In the obtained low-dimensional space, a clustering result shows that a more effective low-dimensional data characteristic can be obtained by the method; and an algorithm is simple and effective.

Description

technical field [0001] The invention belongs to the technical field of statistical pattern recognition and machine learning, and in particular relates to a dimensionality reduction method for clustering based on non-negative matrix decomposition. Background technique [0002] Clustering is one of the most fundamental research tasks in the field of machine learning. In practical applications, each dimension of the data represents a related feature. Usually, it is difficult to simply judge which features are conducive to clustering. A common method is to collect as many data features as possible and then perform clustering. Therefore, data features are generally high-dimensional, and high-dimensional data features usually bring two problems: 1) high storage and calculation costs, and 2) the curse of dimensionality. In practical applications, the curse of dimensionality problem is one of the main problems faced by many pattern recognition methods, such as gait recognition, im...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F15/18

CPCG06K9/6232G06F18/213

Inventor 郭跃飞朱真峰薛向阳

Owner FUDAN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Nonnegative matrix factorization-based dimensionality reducing method used for clustering

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology