Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Full-coverage granular computing based K-medoids text clustering method

A text clustering and full coverage technology, applied in the field of full coverage granular computing and text clustering, which can solve problems such as low accuracy

Inactive Publication Date: 2018-04-13
TAIYUAN UNIV OF TECH
View PDF8 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a kind of K-medoids text clustering method based on full-coverage granule calculation in order to solve the problem of traditional clustering method randomly selecting cluster centers and text clustering method with low accuracy, the method comprising the following steps:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Full-coverage granular computing based K-medoids text clustering method
  • Full-coverage granular computing based K-medoids text clustering method
  • Full-coverage granular computing based K-medoids text clustering method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] In order to further explain the technical means and effects of the present invention to achieve the intended purpose of the invention, the specific implementation methods, features and effects of the present invention will be described in detail below in conjunction with the accompanying drawings and preferred embodiments.

[0057] Such as figure 1 Shown, the overall flow process of the present invention is described in detail as follows:

[0058] Step 1: Use jieba word segmentation to segment the Chinese text, and sort out various stop word lists such as "Harbin Institute of Technology Stop Words Thesaurus", "Sichuan University Machine Learning Intelligence Laboratory Stop Words", Baidu Stop Words List, etc. After re-extracting the new Chinese word list.

[0059] Step 2: Perform TF-IDF feature extraction on the word segmentation results after removing stop words in step 1. TF-IDF is a statistical weighting method, the formula is

[0060]

[0061]

[0062]

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a full-coverage granular computing based K-medoids text clustering method. The method comprises the steps of 1) preprocessing texts, including, Chinese word segmentation and stop word removal; 2) performing characteristic extraction on the texts, setting a high frequency word threshold and a low frequency word threshold, filtering away high-frequency words with insufficientdiscrimination degrees and low-frequency words with weak representativeness, and then building a word vector spatial model by utilizing a TF-IDF algorithm; and 3) clustering the texts, firstly performing coarse clustering on the texts by utilizing single-pass and calculating an initial clustering center candidate set by utilizing a concept of granularity importance of a full-coverage granular computing theory, and then calculating an initial clustering center based on the density and a maximum-minimum distance algorithm, and finally performing text clustering by utilizing a k-medoids algorithm. The full-coverage granular computing based K-medoids text clustering method solves the problems of iteration times increase and relatively big fluctuation of clustering results of the traditional K-medoids clustering algorithm in which the initial clustering center is selected randomly, and also solves the problem that the initial clustering center is located at the same type of the cluster inthe currently improved K-medoids clustering algorithm.

Description

technical field [0001] The invention relates to full-coverage granule calculation and text mining technology, in particular to the granulation of full-coverage granule calculation and the method of text clustering. Background technique [0002] Problems such as information overload and lack of structure brought about by the rapid development of the Internet make it difficult for people to quickly and accurately obtain interesting and potentially useful content from the massive amount of information, and it is impossible to process this information manually . At present, the vast majority of network information is in the form of text. As unstructured data, text data is not as easy to process as structured data, so the utilization rate of text data is greatly reduced, and most traditional information retrieval technologies Cannot handle massive text data. Data mining is an effective technology to mine hidden information from a large amount of valid data. Text mining is the p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06K9/62
CPCG06F40/289G06F18/23213
Inventor 谢珺邹雪君杨云云续欣莹
Owner TAIYUAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products