Supercharge Your Innovation With Domain-Expert AI Agents!

Big data text clustering method and system based on parallel improved K-means algorithm

A k-means algorithm and text clustering technology, applied in the field of text clustering, can solve the problems of low accuracy and efficiency of the algorithm, no optimization or partial optimization of the K-means algorithm, etc., and achieve great performance advantages and accuracy Improve and improve the effect of accuracy and efficiency

Inactive Publication Date: 2020-05-15
INNER MONGOLIA UNIV OF TECH
View PDF2 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a large data text clustering method and system based on a parallel improved K-means algorithm, to solve the problem in the prior art that the K-means algorithm has no optimization or local optimization processing, which leads to algorithm failure. Accuracy and inefficiency of clustering

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big data text clustering method and system based on parallel improved K-means algorithm
  • Big data text clustering method and system based on parallel improved K-means algorithm
  • Big data text clustering method and system based on parallel improved K-means algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0056] Embodiment one: if figure 1 As shown, the large data text clustering method based on the parallel improved K-means algorithm includes:

[0057] Perform unstructured text data preprocessing S101 on the large data text in the text storage system;

[0058] The preprocessed big data text is used to calculate the text feature word weight S102 through the word2Vec feature word weight algorithm of the training word vector method;

[0059] Through the SWCK-means text clustering algorithm combining the Canopy center point selection algorithm and the K-means distance-based clustering algorithm, the low-dimensional big data text data is clustered S103.

[0060] The SWCK-means text clustering algorithm processing combined with the Canopy center point selection algorithm and the K-means distance-based clustering algorithm includes:

[0061] Parallel Canopy clustering of large text data with text feature word weights to obtain the cluster center point, using the cluster center poin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of text clustering, in particular to a big data text clustering method and system based on a parallel improved K-means algorithm. According to the method,low-dimensional big data text data is clustered through SWCK-means text clustering algorithm processing combining a Canopy central point selection algorithm and a K-means distance-based clustering algorithm; according to the invention, a problem that the K-means algorithm has no optimization or local optimization processing in the prior art is solved; the K-means clustering method has the beneficial technical effects that the clustering accuracy and efficiency of the K-means algorithm are improved, the dimensionality of the text is reduced, the clustering effect is improved, and the parallel design is realized.

Description

technical field [0001] The invention belongs to the technical field of text clustering, and in particular relates to a large data text clustering method and system based on a parallel improved K-means algorithm. Background technique [0002] In recent years, with the rapid increase of Internet information, a large amount of network text data has been generated. Text data is a kind of unstructured data, which has the characteristics of high dimensionality, large data volume, and low value density. How to analyze the massive network text information Effective processing and value mining have become one of the research hotspots in Chinese information processing today. Classifying large quantities of text is one of the important research fields. Currently, clustering can be applied in large-scale text information mining and processing on the Internet. In the preprocessing stage, text semantic analysis, document similarity analysis, corpus classification analysis and topic analys...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35
CPCG06F16/35
Inventor 李雷孝周成栋王慧马志强王永生
Owner INNER MONGOLIA UNIV OF TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More