Unlock instant, AI-driven research and patent intelligence for your innovation.

A Parallelized Clustering Method Based on Memory Computing

A technology of memory computing and clustering methods, applied in computing, computer parts, instruments, etc., can solve the problems of low algorithm efficiency, time-consuming, low algorithm efficiency, etc., to improve processing efficiency, meet user needs, and solve problems. The effect of efficiency problems

Active Publication Date: 2019-12-13
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If the data set is not loaded into memory, frequent I / O operations will cause the algorithm to be inefficient
Therefore, the traditional DBSCAN algorithm cannot be applied to cluster analysis of large-scale data sets
[0007] When the existing parallel DBSCAN algorithm performs data partitioning, it usually divides the original database into several disjoint partitions, and ensures the load balance between the partitions through a certain strategy. Dimensional space segmentation will consume a lot of time
At the same time, when merging partition boundaries, for each partition, it is necessary to find boundary data located in 2m directions for boundary judgment, where m is the dimension of the data, which will undoubtedly consume a lot of time and make the algorithm inefficient

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Parallelized Clustering Method Based on Memory Computing
  • A Parallelized Clustering Method Based on Memory Computing
  • A Parallelized Clustering Method Based on Memory Computing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] The following and accompanying appendices illustrating the principles of the invention Figure 1 A detailed description of one or more embodiments of the invention is provided together. The invention is described in connection with such examples, but the invention is not limited to any example. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details.

[0034] As mentioned above, the parallel clustering method based on memory computing provided by the present invention can well solve the problem of clustering efficiency of large-scale data sets. Using a distributed programming model, the or...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a parallel clustering method based on memory computing, which aims at solving the efficiency problem of the clustering algorithm DBSCAN in processing massive data. The scheme is: S1: Data division based on simple random sampling, with <id,raw_data>As the input of this stage, the data segmentation is completed by performing simple random sampling on the original data, and the segmentation results are saved to different RDDs; S2: use the memory computing model to execute the DBSCAN algorithm in parallel on each computing node, for The original data in the RDD is clustered to generate local clusters; S3: Merge all local clusters based on the center of gravity, and use the memory computing model to merge the local clusters to generate global clustering results. Based on the memory computing model, the present invention cuts the original data through a simple data division method, which greatly improves the processing efficiency of the algorithm. At the same time, the combination of local clusters based on the center of gravity distance can quickly construct global clusters, which meets the needs of users dealing with large-scale data.< / id,raw_data>

Description

technical field [0001] The invention relates to the field of parallelization of data mining algorithms, in particular to a parallel clustering method based on memory calculation. Background technique [0002] Today, with the continuous innovation of information technology, data is growing at an explosive rate. How to effectively process large-scale data has become a serious challenge. [0003] In order to dig out regular information from massive data and find out the differences and connections between data, data mining, as a new discipline, appears in people's sight and plays an important role in various industries. [0004] Cluster analysis occupies a pivotal position in data mining and has received widespread attention. Clustering is usually based on a certain similarity measurement method, so that a group of data with high similarity is clustered together. [0005] The DBSCAN algorithm is a density clustering method based on high-density connected regions proposed by ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62
CPCG06F18/23
Inventor 田玲罗光春陈爱国殷光强
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA