Check patentability & draft patents in minutes with Patsnap Eureka AI!

Parallel spectral clustering method based on KD tree and chaos mayfly naiad optimization algorithm

An optimization algorithm, KD tree technology, applied in the direction of chaos model, calculation, calculation model, etc., can solve the problems of unbalanced load, large time overhead, redundant computing search performance, etc., to improve parallel efficiency and clustering effect. , the effect of good data and system scalability

Pending Publication Date: 2021-07-16
JIANGXI UNIV OF SCI & TECH
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the parallelization of spectral clustering can achieve good acceleration in the big data environment, there are still four problems in the algorithm: (1) When allocating data, the default partition strategy of MapReduce is used, and the allocation of data is random. A reasonable data division method can be adopted according to the distribution characteristics of the data, and the problem of load imbalance is easy to occur when the nodes perform tasks
(2) In the process of constructing a sparse matrix, although some scholars have proposed to use the KD tree index technology to reduce calculations, the KD tree is only suitable for low-dimensional data, and it may take a lot of time to backtrack the tree and the optimal solution on high-dimensional data. A large number of redundant calculations will still be generated, resulting in a decrease in search performance
(3) When normalizing the Laplacian matrix, the matrix multiplication operation is distributed on each node, and the time overhead is large
(4) When the k-means algorithm is used for final clustering, it is simply parallelized and does not solve the initial center sensitivity problem caused by randomly selecting the initial clustering center, which may lead to unstable clustering effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel spectral clustering method based on KD tree and chaos mayfly naiad optimization algorithm
  • Parallel spectral clustering method based on KD tree and chaos mayfly naiad optimization algorithm
  • Parallel spectral clustering method based on KD tree and chaos mayfly naiad optimization algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0073] Embodiments of the present invention will be described in detail below, and examples of the embodiments are illustrated in the drawings, in which the same or similar reference numerals represent the same or similar elements or elements having the same or similar functions. The following is exemplary, and is intended to be used herein, not to be construed as limiting the invention.

[0074] Data partition

[0075] Currently, the parallelism clustering algorithm is used to divide the data when data is used by MapReduce default data partitioners. This often cannot take into account the distribution characteristics between the data, which is easy to generate data tilt, thereby causing uneven node load. In response to this problem, this paper proposes a sampling KD-TREE data partition policy DPS to obtain a data partition on Map. This strategy includes five main steps, namely, sampling, support point selection, mapping, spatial division, and data division.

[0076] (1) Sampling....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a parallel spectral clustering method based on a KD tree and a chaos mayfly naiad optimization algorithm, which is characterized by comprising the following steps: S1, dividing data by adopting a sampling-based KD-tree data partitioning strategy DPS to obtain data partitions on Map; s2, in the process of constructing the sparse similar matrix, performing cross-partition t neighbor search by adopting an optimized partition allocation strategy OPA and two KD tree pruning strategies based on triangular inequality; s3, adopting a normalization theorem, and replacing matrix multiplication by a mode of element corresponding multiplication so as to optimize a Laplacian matrix normalization process; s4, adopting a chaos mayfly naiad optimization algorithm CMO to obtain an optimal position as an initial cluster center, and then performing k-means parallel clustering on the feature space; and S5, obtaining a final clustering result, and outputting the final clustering result. According to the method, the clustering effect and the parallel efficiency are remarkably improved, and good data and system expandability is achieved under a large-scale data set.

Description

Technical field [0001] The present invention relates to the field of large data mining, and more particularly to a parallel spectrum clustering method based on a Kd tree and chaotic 蜉蝣 optimization algorithm. Background technique [0002] The cluster analysis acts as a non-supervised learning, plays a vital role in the field of data mining and machine learning, which clusters the data set according to the characteristics of the data object, maximizing the similarity within the class, and the class is similar. Minimize sex, thus discover the intrinsic contact between the objects, and obtain the value behind the data. Among them, the spectral cluster algorithm is used as a novel clustering algorithm to convert clustering problems to the optimal segmentation problem of the map, and can clinter the sample space of any shape, overcome the traditional clustering algorithm (such as K -means) It is easy to fall into a local optimal solution under non-active space, and has been widely use...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06N3/00G06N7/08
CPCG06N3/006G06N7/08G06F18/23213
Inventor 毛伊敏刘祥敏
Owner JIANGXI UNIV OF SCI & TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More