Method for achieving clustering mining by employing parallel weighted affinity propagation big data

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of neighbor propagation and cluster mining, which is applied in the field of big data processing, can solve the problems of insufficient data comprehensiveness and insufficient processing time, and achieve the effect of guaranteeing data

Inactive Publication Date: 2017-04-19

INSPUR GROUP CO LTD

View PDF2 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

This approach seems helpless in the face of big data. It may be possible to mine part of the data within a limited time, but this will bring about the problem of insufficient comprehensive data; The problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0026] as attached figure 1 As shown, a method for implementing cluster mining using parallelized weighted neighbor propagation big data in this embodiment includes the following steps:

[0027] 1. Build Hadoop cluster platform.

[0028] 2. Divide the large data set into K subsets, and assign the K subsets to data nodes with similar performance in Hadoop.

[0029] 3. Use the AP (Affinity Propagation) algorithm to cluster the subsets. Since the size of the decomposed subsets is relatively small, the central point set Ei={ei, ni} of the class can be quickly obtained, and the Map task is responsible for AP clustering of a subset, and the clustering results are stored in the local disk. Obtain K central point sets for subsequent processing.

[0030] 4. Use the WAP (weighted neighbor propagation clustering) algorithm to perform weighted clustering on these center point sets. At this time, the number of data items in each class, that is, the number ni of points represented by the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to the technical field of big data processing, in particular to a method for achieving clustering mining by employing parallel weighted affinity propagation big data. The method comprises the steps of firstly decomposing a large original data set, distributing the decomposed subsets to nodes with similar performance on a big data platform, running an affinity propagation clustering algorithm on the decomposed small data sets, and then carrying out further integration on center representative points obtained by the affinity propagation clustering algorithm by using the weighted affinity propagation clustering algorithm to obtain a final data point set with representativeness. According to the method for achieving clustering mining by employing the parallel weighted affinity propagation big data provided by the invention, fast and accurate clustering mining work of the big data can be achieved.

Description

technical field [0001] The invention relates to the technical field of big data processing, in particular to a method for realizing cluster mining by using parallelized weighted neighbor propagation big data. Background technique [0002] Data mining is the process of discovering information and knowledge from large-scale, incomplete, noisy, fuzzy, and random data sets. The tasks of data mining include association analysis, cluster analysis, classification, prediction and deviation analysis, etc., among which clustering is the process of unsupervised learning, and the data set is divided into several categories according to the similarity. Data between classes differ from each other. Cluster analysis can establish a macroscopic data concept. After clustering the data, the distribution pattern of the data can be intuitively given, and the correlation between data attributes can be found according to the data category. [0003] In traditional commercial data mining, people u...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30G06K9/62

CPCG06F16/182G06F16/27G06F2216/03G06F18/23213

Inventor 王俊杰戴鸿君于治楼

Owner INSPUR GROUP CO LTD

Method for achieving clustering mining by employing parallel weighted affinity propagation big data

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology