Fuzzy clustering method based on sparse mean values

A fuzzy clustering method and mean value technology, applied in the computing field, can solve problems such as poor processing results and inability to effectively measure the similarity between sample points and classes

Active Publication Date: 2017-01-04
ZHEJIANG UNIV OF TECH
View PDF6 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem solved by the present invention is that in the prior art, the number of samples required for correct estimation of the potential probability distribution in the vector space will grow exponentially with the increase of the dimension, and at the same time, the traditional fuzzy k-means algorithm is based on the Euclidean distance To measure the distance from the sample point to the class center point, without any constraints, the mean of high-dimensional sparse data is not sparse, and the resulting traditional fuzzy clustering is not very good for high-dimensional data such as text data. , the Euclidean distance between the sample point (high-dimensional sparse vector) and the mean (high-dimensional non-sparse vector) cannot effectively measure the similarity between the sample point and the class, and then provides an optimized sparse mean-based fuzzy aggregation class method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Fuzzy clustering method based on sparse mean values
  • Fuzzy clustering method based on sparse mean values

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The present invention will be described in further detail below in conjunction with the examples, but the protection scope of the present invention is not limited thereto.

[0027] As shown in the figure, the present invention relates to a fuzzy clustering method based on a sparse mean, and the method includes the following steps:

[0028] Step 1.1: Express the documents to be clustered as a high-dimensional sparse vector X={x 1 ,x 2 ,...x n}, where each sample point is s dimension vector, i.e. x i ∈ R s , s>0, 1≤i≤n; n is the total number of samples, n>0;

[0029] Step 1.2: Set the parameters, which include the number of classes k, the fuzzy coefficient m, and the weight of the initial regularization term β 0 , the end judgment parameter ε and the maximum number of iterations T; 00 >0; set with mean l 1 The objective function for minimizing the norm regularizer: Among them, u ci Indicates the degree of membership of the i-th sample to the c-th class, δ c Ind...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a fuzzy clustering method based on sparse mean values. To-be-clustered documents are expressed as high-dimensional sparse vectors by employing a vector space model; parameters are set; the mean values are initialized; values of all memberships are changed based on a current mean value; weights are updated; corresponding mean values are updated based on the memberships; when the corresponding mean values are no longer changed or the number of iterations is maximum, interaction is finished; a clustering result is output, otherwise, a step is repeated. According to the method, through adoption of the spare mean values, mean values namely class center points and sample points are enabled to have local sparse characteristics; the effectiveness of the sample points and the class similarities is described by increasing the Euclidean distance based on the sample points and the mean values; the method is more efficient in time; the mean values with the sparse characteristics are generated, so that the class center points can represent the characteristics of the sparse sample points naturally; moreover, the control for the sparsity of the mean values is increased; regular terms of mean value norms are added to target functions, thereby obtaining new least target functions; and solution can be carried more rapidly.

Description

technical field [0001] The invention belongs to the technical field of calculation, estimation and counting, and in particular relates to a fuzzy clustering method based on sparse mean designed for high-dimensional sparse data. Background technique [0002] In many practical problems in many fields, it is necessary to use effective clustering methods to group objects in high-dimensional sparse data sets to analyze the internal structure of the data and mine useful knowledge to help people make further decisions, such as grouping news documents to detect them. topics included. [0003] Fuzzy clustering analysis is an analysis method for clustering objective things by establishing a fuzzy similarity relationship based on the characteristics, closeness, and similarity between objective things. Its advantage over hard clustering is the introduction of fuzzy membership by means of fuzzy set theory. concept, which can naturally describe the overlap between classes. [0004] Howe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23
Inventor 梅建萍
Owner ZHEJIANG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products