Parallel density clustering mining method based on MapReduce

A density clustering and density technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of difficult determination of grid unit size, lack of parallelization, low computational efficiency, etc.

Pending Publication Date: 2020-08-28
JIANGXI UNIV OF SCI & TECH
View PDF1 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are obviously two problems in the algorithm: when the grid is evenly divided, the size of the grid unit is actually difficult to determine, and the clustering effect of the algorithm is greatly affected by the size of the grid unit, resulting in a poor clustering effect of the algorithm; in addition, the algorithm When merging local clusters using incremental methods, the computational efficiency is still low
However, the algorithm still has two obvious deficiencies: on the one hand, when the algorithm uses the dichotomy method to divide the data, it still needs to input the threshold of the grid side length, and the difference of the threshold will affect the accuracy of the clustering results of the algorithm, resulting in the accuracy of the clustering results. On the other hand, the calculation complexity is high when performing local clustering, and the idea of ​​parallelization is not used when merging local clusters, and the overall parallelization efficiency of the algorithm needs to be further improved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel density clustering mining method based on MapReduce
  • Parallel density clustering mining method based on MapReduce
  • Parallel density clustering mining method based on MapReduce

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0063] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

[0064] According to the spatial distribution of data points, the algorithm first proposes an adaptive grid partitioning strategy (ADG) to adaptively divide grid units; secondly, for each data partition, a neighbor grid expansion strategy (NE) is proposed to construct its weighted grid for Strengthen the correlation between grids to improve the clustering effect; at the same time, a weighted grid information entropy strategy (WGIE) is proposed to calculate the grid density and the ε neighborhood and core objects of the density clustering a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a parallel density clustering mining method based on MapReduce, and the parallel density clustering mining method is characterized in that the parallel density clustering miningmethod comprises the following steps: S1, adaptively dividing grid units according to the spatial distribution condition of data points; S2, for each data partition, constructing relevance between weighted grids; S3, calculating the grid density; S4, utilizing a MapReduce calculation model to obtain a parallel calculation local cluster; and S5, obtaining a clustering global cluster by using a local cluster merging algorithm of the union check set and the MapReduce calculation model. The parallel density clustering mining method provided by the invention has the advantage that the operation efficiency and the clustering accuracy are obviously improved.

Description

technical field [0001] The invention relates to the technical field of big data mining, in particular to a parallel density clustering mining method based on MapReduce. Background technique [0002] Data mining is also known as knowledge discovery KDD (knowledge discover in database), its purpose is to discover useful information in a large number of data sets. Common data mining tasks include clustering, classification, association rule mining, etc. Among them, the clustering algorithm is an unsupervised learning algorithm, which can classify similar objects into one category according to the relevant characteristics of the data objects, and divide the data objects with large differences into different categories, so the clustering algorithm can be Discovering potential distribution patterns from sample data is widely used in various fields such as text analysis, biology, medicine, and satellite image analysis. Among clustering algorithms, density-based clustering algorit...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/2458G06F16/28
CPCG06F16/2465G06F16/285
Inventor 毛伊敏徐锴滨
Owner JIANGXI UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products