Unlock instant, AI-driven research and patent intelligence for your innovation.

Parallel clustering method based on memory calculation

A technology of memory computing and clustering method, applied in computing, computer components, instruments, etc., can solve problems such as low algorithm efficiency, large time consumption, and low algorithm efficiency, so as to improve processing efficiency, meet user needs, and solve problems The effect of efficiency problems

Active Publication Date: 2016-12-07
UNIV OF ELECTRONIC SCI & TECH OF CHINA
View PDF4 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If the data set is not loaded into memory, frequent I/O operations will cause the algorithm to be inefficient
Therefore, the traditional DBSCAN algorithm cannot be applied to cluster analysis of large-scale data sets
[0007] When the existing parallel DBSCAN algorithm performs data partitioning, it usually divides the original database into several disjoint partitions, and ensures the load b

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel clustering method based on memory calculation
  • Parallel clustering method based on memory calculation
  • Parallel clustering method based on memory calculation

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0033] The following is attached to illustrate the principle of the present invention Figure one A detailed description of one or more embodiments of the present invention is provided together. The present invention is described in conjunction with such an example, but the present invention is not limited to any embodiment. The scope of the present invention is limited only by the claims, and the present invention covers many alternatives, modifications and equivalents. In the following description, many specific details are set forth in order to provide a thorough understanding of the present invention. These details are provided for the purpose of example, and the present invention can be implemented according to the claims without some or all of these specific details.

[0034] As described above, the parallel clustering method based on memory computing provided by the present invention can well solve the efficiency problem of large-scale data set clustering. The distribute...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a parallel clustering method based on memory calculation, and aims at improving the efficiency in processing mass data of a clustering algorithm DBSCAN. The method comprises the steps that S1) data is divided on the basis of simple random sampling, <ID,Raw_data> serves as input of the stage, simple random sampling is carried out on original data to divide the data, and divided results are stored in different RDD; S2) a memory calculation model is used to execute the DBSCAN algorithm in different calculating nodes in parallel, and the original data in the different RDD is clustered to generate local class clusters; and S3) all the local class clusters are merged based on the center of gravity, a memory calculating model is used to merge the local class clusters, and a global clustering result is generated. According to the invention, the original data is divided via the simple data division manner on the basis of the memory calculating module, and the processing efficiency of the algorithm is improved greatly; the local class clusters are merged based on the gravity center distance to construct the global class cluster, and user requirement for processing large-scale data is met.

Description

technical field [0001] The invention relates to the field of parallelization of data mining algorithms, in particular to a parallel clustering method based on memory calculation. Background technique [0002] Today, with the continuous innovation of information technology, data is growing at an explosive rate. How to effectively process large-scale data has become a serious challenge. [0003] In order to dig out regular information from massive data and find out the differences and connections between data, data mining, as a new discipline, appears in people's sight and plays an important role in various industries. [0004] Cluster analysis occupies a pivotal position in data mining and has received widespread attention. Clustering is usually based on a certain similarity measurement method, so that a group of data with high similarity is clustered together. [0005] The DBSCAN algorithm is a density clustering method based on high-density connected regions proposed by ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62
CPCG06F18/23
Inventor 田玲罗光春陈爱国殷光强
Owner UNIV OF ELECTRONIC SCI & TECH OF CHINA