Big data text clustering method and system based on parallel improved K-means algorithm

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A k-means algorithm and text clustering technology, applied in the field of text clustering, can solve the problems of low accuracy and efficiency of the algorithm, no optimization or partial optimization of the K-means algorithm, etc., and achieve great performance advantages and accuracy Improve and improve the effect of accuracy and efficiency

Inactive Publication Date: 2020-05-15

INNER MONGOLIA UNIV OF TECH

View PDF2 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The present invention provides a large data text clustering method and system based on a parallel improved K-means algorithm, to solve the problem in the prior art that the K-means algorithm has no optimization or local optimization processing, which leads to algorithm failure. Accuracy and inefficiency of clustering

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0056] Embodiment one: if figure 1 As shown, the large data text clustering method based on the parallel improved K-means algorithm includes:

[0057] Perform unstructured text data preprocessing S101 on the large data text in the text storage system;

[0058] The preprocessed big data text is used to calculate the text feature word weight S102 through the word2Vec feature word weight algorithm of the training word vector method;

[0059] Through the SWCK-means text clustering algorithm combining the Canopy center point selection algorithm and the K-means distance-based clustering algorithm, the low-dimensional big data text data is clustered S103.

[0060] The SWCK-means text clustering algorithm processing combined with the Canopy center point selection algorithm and the K-means distance-based clustering algorithm includes:

[0061] Parallel Canopy clustering of large text data with text feature word weights to obtain the cluster center point, using the cluster center poin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the technical field of text clustering, in particular to a big data text clustering method and system based on a parallel improved K-means algorithm. According to the method,low-dimensional big data text data is clustered through SWCK-means text clustering algorithm processing combining a Canopy central point selection algorithm and a K-means distance-based clustering algorithm; according to the invention, a problem that the K-means algorithm has no optimization or local optimization processing in the prior art is solved; the K-means clustering method has the beneficial technical effects that the clustering accuracy and efficiency of the K-means algorithm are improved, the dimensionality of the text is reduced, the clustering effect is improved, and the parallel design is realized.

Description

technical field [0001] The invention belongs to the technical field of text clustering, and in particular relates to a large data text clustering method and system based on a parallel improved K-means algorithm. Background technique [0002] In recent years, with the rapid increase of Internet information, a large amount of network text data has been generated. Text data is a kind of unstructured data, which has the characteristics of high dimensionality, large data volume, and low value density. How to analyze the massive network text information Effective processing and value mining have become one of the research hotspots in Chinese information processing today. Classifying large quantities of text is one of the important research fields. Currently, clustering can be applied in large-scale text information mining and processing on the Internet. In the preprocessing stage, text semantic analysis, document similarity analysis, corpus classification analysis and topic analys...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/35

CPCG06F16/35

Inventor 李雷孝周成栋王慧马志强王永生

Owner INNER MONGOLIA UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Big data text clustering method and system based on parallel improved K-means algorithm

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology