Clustering method and system based on big data parallel computation

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A technology of parallel computing and clustering methods, applied in computing, relational databases, database models, etc., can solve problems such as unstable initial point selection, large computational load, and easy to fall into local optimal solutions.

Inactive Publication Date: 2017-12-08

GUANGZHOU TEDAO INFORMATION TECH CO LTD

View PDF5 Cites 14 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0002] Existing article clustering technologies such as kmeans, hierarchical clustering, SOM, and FCM are all based on word frequency and probability to classify and integrate articles, and there are uncontrollable errors. Among them, the initial point of k-means clustering algorithm The selection is unstable and is randomly selected, which causes the instability of the clustering results; although hierarchical clustering does not need to determine the number of categories, once a split or merge is performed, it cannot be corrected, and the clustering quality is limited; FCM clustering The class algorithm is sensitive to the initial clustering center, needs to manually determine the number of clusters, and is easy to fall into a local optimal solution; the SOM clustering algorithm has a strong theoretical connection with the actual brain processing

But the processing time is longer, further research is needed to adapt it to large databases

Moreover, the existing article clustering technology is not a parallel computing version. The general probability model is used to obtain weights, and the error is relatively high. The global feature vector is used for association, and the amount of calculation is huge.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0054] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0055] see figure 1 , the present invention provides a kind of clustering method based on big data parallel computing, comprising the following steps:

[0056] S100, receiving the data to be aggregated that is collected in parallel by multiple threads of the large cluster.

[0057] In the embodiment of the present invention, please refer to figure 2 , the first data collection end, the second data collection end and the third data collection end carry out t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a clustering method based on big data parallel computation. The clustering method comprises following steps of receiving data to be aggregated which is acquired by multiple threads of a larger cluster in parallel; saving the data to be aggregated in a first database; extracting the data characteristic of the data to be aggregated, calling cluster models in parallel during the same period by multiple threads, calculating and analyzing the aggregation class of the data to be aggregated independently in a distributed manner, and carrying out same class aggregation; saving the data which is subjected to same class aggregation in a second database; and storing the data which is subjected to same class aggregation in a memory, and establishing a cluster data index. The invention also discloses a clustering system based on big data parallel computation, the Text fingerprint is accurately positioned, dimension reduction is simple, and the clustering topic accuracy is improved.

Description

technical field [0001] The invention relates to the field of text mining and automatic clustering, in particular to a clustering method and system based on big data parallel computing. Background technique [0002] Existing article clustering technologies such as kmeans, hierarchical clustering, SOM, and FCM are all based on word frequency and probability to classify and integrate articles, and there are uncontrollable errors. Among them, the initial point of k-means clustering algorithm The selection is unstable and is randomly selected, which causes the instability of the clustering results; although hierarchical clustering does not need to determine the number of categories, once a split or merge is performed, it cannot be corrected, and the clustering quality is limited; FCM clustering The class algorithm is sensitive to the initial cluster center, needs to manually determine the number of clusters, and is easy to fall into a local optimal solution; the SOM clustering al...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F17/30

CPCG06F16/27G06F16/2272G06F16/285

Inventor晋彤李永康

OwnerGUANGZHOU TEDAO INFORMATION TECH CO LTD

Clustering method and system based on big data parallel computation

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology