Unlock instant, AI-driven research and patent intelligence for your innovation.

Adaptive Text Clustering Algorithm Based on Center Method

A text clustering and adaptive technology, applied in the field of information retrieval, which can solve problems such as poor algorithm performance

Inactive Publication Date: 2017-02-01
JILIN UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In view of the two problems that the segmentation clustering algorithm needs to manually pre-designate the number of clusters before the algorithm runs and the algorithm performs poorly when the data set contains many categories, the purpose of the present invention is to provide a method that does not require manual work before the algorithm runs. The number of clusters is pre-specified (that is, the number of clusters is determined adaptively according to the data set and the operation of the algorithm) and the algorithm performs better when the data set contains more categories.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Adaptive Text Clustering Algorithm Based on Center Method
  • Adaptive Text Clustering Algorithm Based on Center Method
  • Adaptive Text Clustering Algorithm Based on Center Method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention. The present invention will be further described in detail through the accompanying drawings and examples.

[0031] The premise of the embodiment of the present invention is that the text data set has been obtained.

[0032] figure 1 A schematic flow chart of an adaptive text clustering algorithm based on the center method provided by the embodiment of the present invention, as shown in figure 1 As shown, this embodiment mainly includes the following steps:

[0033] Step 1: Initialize relevant parameters

[0034] First, initialize the clustered CFC vector calculation parameter b and the base of the log function. Secondly, set the parameter initial cluster size Im during the random partition process of the algorithm, and set the parameter restart frequency Fm and restart range Rm of the algorithm restart strategy. Finally, set the max...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a centroid method-based self-adaption text clustering algorithm, which is an iteration segmentation clustering algorithm. The centroid method-based self-adaption text clustering algorithm comprises the steps: before iterating, firstly initializing the related parameters of the algorithm, then randomly segmenting a data set into a group of clusterings of the same size, and calculating the CFC (Class-Feature-Centroid) vector of each clustering; afterwards, carrying out iterations on the algorithm, wherein each iteration process mainly comprises the following steps: according to the similarity of each text and the CFC vectors of different clusterings, reorganizing each text so as to obtain a new group of clusterings; after reorganizing each text, recalculating the CFC vector of each non-null clustering; and judging whether the algorithm meets the termination condition, if yes, terminating, otherwise, continuing the iteration process. The centroid method-based self-adaption text clustering algorithm has the following advantages: (1) the method is simple, and easy to realize; (2) the method has self-adaptability.

Description

technical field [0001] The invention belongs to the field of information retrieval, in particular to a text clustering algorithm based on a center method and adaptively determining the number of clusters. Background technique [0002] Text clustering algorithm is a kind of main text data mining method in the fields of machine learning and information retrieval, and it is one of the main ways to solve the overload of Internet text information. Its purpose is to organize Internet text collections according to the principle of "like flock together" to obtain a series of meaningful text subsets. Among them, the texts in each text subset are most similar, and the texts of different text subsets are most different. A good text clustering algorithm can gather the same topic and the same type of text into a meaningful text subset, which can help Internet users find the most interesting content more easily from the massive text information. Researching and using text clustering alg...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 欧阳继红周晓堂李熙铭马超王旭
Owner JILIN UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More