A Fuzzy Clustering Method Based on Sparse Mean

A fuzzy clustering method and mean technology, applied in the field of computing, can solve the problems that the similarity between sample points and classes cannot be effectively measured, and the processing results are not very good.

Active Publication Date: 2019-10-18
ZHEJIANG UNIV OF TECH
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem solved by the present invention is that in the prior art, the number of samples required for correct estimation of the potential probability distribution in the vector space will grow exponentially with the increase of the dimension, and at the same time, the traditional fuzzy k-means algorithm is based on the Euclidean distance To measure the distance from the sample point to the class center point, without any constraints, the mean of high-dimensional sparse data is not sparse, and the resulting traditional fuzzy clustering is not very good for high-dimensional data such as text data. , the Euclidean distance between the sample point (high-dimensional sparse vector) and the mean (high-dimensional non-sparse vector) cannot effectively measure the similarity between the sample point and the class, and then provides an optimized sparse mean-based fuzzy aggregation class method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Fuzzy Clustering Method Based on Sparse Mean
  • A Fuzzy Clustering Method Based on Sparse Mean

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The present invention will be described in further detail below in conjunction with the examples, but the protection scope of the present invention is not limited thereto.

[0027] As shown in the figure, the present invention relates to a fuzzy clustering method based on a sparse mean, and the method includes the following steps:

[0028] Step 1.1: Express the documents to be clustered as a high-dimensional sparse vector X={x 1 ,x 2 ,...x n}, where each sample point is s dimension vector, i.e. x i ∈R s , s>0, 1≤i≤n; n is the total number of samples, n>0;

[0029] Step 1.2: Set the parameters, which include the number of classes k, the fuzzy coefficient m, and the weight of the initial regularization term β 0 , the end judgment parameter ε and the maximum number of iterations T; 00 >0; set with mean l 1 The objective function for minimizing the norm regularizer: Among them, u ci Indicates the degree of membership of the i-th sample to the c-th class, δ c Indi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention relates to a fuzzy clustering method based on the sparse mean value, expressing the documents to be clustered as high-dimensional sparse vectors with a vector space model, setting parameters, initializing the mean value, updating the values ​​of all membership degrees based on the current mean value, updating the weights, Then update the corresponding mean value based on the degree of membership. When the corresponding mean value does not change or the number of iterations is the largest, the iteration ends and the clustering result is output, otherwise repeat. The present invention uses the sparse mean value to make the mean value, that is, the class center point have the same local sparse characteristics as the sample point, and increase the effectiveness of describing the sample point and class similarity based on the Euclidean distance between the sample point and the mean value, which is more efficient in time and generates The mean value with sparse characteristics makes the center point of the class more naturally represent the characteristics of sparse sample points. At the same time, in order to increase the control of the sparsity of the mean value, a regular term of the mean norm is added to the objective function to obtain a new minimum objective function , allowing for a faster solution.

Description

technical field [0001] The invention belongs to the technical field of calculation, estimation and counting, and in particular relates to a fuzzy clustering method based on sparse mean designed for high-dimensional sparse data. Background technique [0002] In many practical problems in many fields, it is necessary to use effective clustering methods to group objects in high-dimensional sparse data sets to analyze the internal structure of the data and mine useful knowledge to help people make further decisions, such as grouping news documents to detect them. topics included. [0003] Fuzzy clustering analysis is an analysis method for clustering objective things by establishing a fuzzy similarity relationship based on the characteristics, closeness, and similarity between objective things. Its advantage over hard clustering is the introduction of fuzzy membership by means of fuzzy set theory. concept, which can naturally describe the overlap between classes. [0004] Howe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62
CPCG06F18/23
Inventor 梅建萍
Owner ZHEJIANG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products