Full-coverage granular computing based K-medoids text clustering method

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A text clustering and full coverage technology, applied in the field of full coverage granular computing and text clustering, which can solve problems such as low accuracy

Inactive Publication Date: 2018-04-13

TAIYUAN UNIV OF TECH

View PDF8 Cites 7 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The present invention provides a kind of K-medoids text clustering method based on full-coverage granule calculation in order to solve the problem of traditional clustering method randomly selecting cluster centers and text clustering method with low accuracy, the method comprising the following steps:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0056] In order to further explain the technical means and effects of the present invention to achieve the intended purpose of the invention, the specific implementation methods, features and effects of the present invention will be described in detail below in conjunction with the accompanying drawings and preferred embodiments.

[0057] Such as figure 1 Shown, the overall flow process of the present invention is described in detail as follows:

[0058] Step 1: Use jieba word segmentation to segment the Chinese text, and sort out various stop word lists such as "Harbin Institute of Technology Stop Words Thesaurus", "Sichuan University Machine Learning Intelligence Laboratory Stop Words", Baidu Stop Words List, etc. After re-extracting the new Chinese word list.

[0059] Step 2: Perform TF-IDF feature extraction on the word segmentation results after removing stop words in step 1. TF-IDF is a statistical weighting method, the formula is

[0060]

[0061]

[0062]

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a full-coverage granular computing based K-medoids text clustering method. The method comprises the steps of 1) preprocessing texts, including, Chinese word segmentation and stop word removal; 2) performing characteristic extraction on the texts, setting a high frequency word threshold and a low frequency word threshold, filtering away high-frequency words with insufficientdiscrimination degrees and low-frequency words with weak representativeness, and then building a word vector spatial model by utilizing a TF-IDF algorithm; and 3) clustering the texts, firstly performing coarse clustering on the texts by utilizing single-pass and calculating an initial clustering center candidate set by utilizing a concept of granularity importance of a full-coverage granular computing theory, and then calculating an initial clustering center based on the density and a maximum-minimum distance algorithm, and finally performing text clustering by utilizing a k-medoids algorithm. The full-coverage granular computing based K-medoids text clustering method solves the problems of iteration times increase and relatively big fluctuation of clustering results of the traditional K-medoids clustering algorithm in which the initial clustering center is selected randomly, and also solves the problem that the initial clustering center is located at the same type of the cluster inthe currently improved K-medoids clustering algorithm.

Description

technical field [0001] The invention relates to full-coverage granule calculation and text mining technology, in particular to the granulation of full-coverage granule calculation and the method of text clustering. Background technique [0002] Problems such as information overload and lack of structure brought about by the rapid development of the Internet make it difficult for people to quickly and accurately obtain interesting and potentially useful content from the massive amount of information, and it is impossible to process this information manually . At present, the vast majority of network information is in the form of text. As unstructured data, text data is not as easy to process as structured data, so the utilization rate of text data is greatly reduced, and most traditional information retrieval technologies Cannot handle massive text data. Data mining is an effective technology to mine hidden information from a large amount of valid data. Text mining is the p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/27G06K9/62

CPCG06F40/289G06F18/23213

Inventor谢珺邹雪君杨云云续欣莹

OwnerTAIYUAN UNIV OF TECH

Full-coverage granular computing based K-medoids text clustering method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology