Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for achieving clustering mining by employing parallel weighted affinity propagation big data

A technology of neighbor propagation and cluster mining, which is applied in the field of big data processing, can solve the problems of insufficient data comprehensiveness and insufficient processing time, and achieve the effect of guaranteeing data

Inactive Publication Date: 2017-04-19
INSPUR GROUP CO LTD
View PDF2 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This approach seems helpless in the face of big data. It may be possible to mine part of the data within a limited time, but this will bring about the problem of insufficient comprehensive data; The problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for achieving clustering mining by employing parallel weighted affinity propagation big data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0026] as attached figure 1 As shown, a method for implementing cluster mining using parallelized weighted neighbor propagation big data in this embodiment includes the following steps:

[0027] 1. Build Hadoop cluster platform.

[0028] 2. Divide the large data set into K subsets, and assign the K subsets to data nodes with similar performance in Hadoop.

[0029] 3. Use the AP (Affinity Propagation) algorithm to cluster the subsets. Since the size of the decomposed subsets is relatively small, the central point set Ei={ei, ni} of the class can be quickly obtained, and the Map task is responsible for AP clustering of a subset, and the clustering results are stored in the local disk. Obtain K central point sets for subsequent processing.

[0030] 4. Use the WAP (weighted neighbor propagation clustering) algorithm to perform weighted clustering on these center point sets. At this time, the number of data items in each class, that is, the number ni of points represented by the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of big data processing, in particular to a method for achieving clustering mining by employing parallel weighted affinity propagation big data. The method comprises the steps of firstly decomposing a large original data set, distributing the decomposed subsets to nodes with similar performance on a big data platform, running an affinity propagation clustering algorithm on the decomposed small data sets, and then carrying out further integration on center representative points obtained by the affinity propagation clustering algorithm by using the weighted affinity propagation clustering algorithm to obtain a final data point set with representativeness. According to the method for achieving clustering mining by employing the parallel weighted affinity propagation big data provided by the invention, fast and accurate clustering mining work of the big data can be achieved.

Description

technical field [0001] The invention relates to the technical field of big data processing, in particular to a method for realizing cluster mining by using parallelized weighted neighbor propagation big data. Background technique [0002] Data mining is the process of discovering information and knowledge from large-scale, incomplete, noisy, fuzzy, and random data sets. The tasks of data mining include association analysis, cluster analysis, classification, prediction and deviation analysis, etc., among which clustering is the process of unsupervised learning, and the data set is divided into several categories according to the similarity. Data between classes differ from each other. Cluster analysis can establish a macroscopic data concept. After clustering the data, the distribution pattern of the data can be intuitively given, and the correlation between data attributes can be found according to the data category. [0003] In traditional commercial data mining, people u...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/182G06F16/27G06F2216/03G06F18/23213
Inventor 王俊杰戴鸿君于治楼
Owner INSPUR GROUP CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products