Supercharge Your Innovation With Domain-Expert AI Agents!

Big data clustering method based on cloud computing platform

A cloud computing platform and clustering method technology, applied in computing, data mining, computer parts and other directions, can solve the problems of slow convergence speed of spectral clustering, higher running speed requirements, unfavorable universal use, etc. The effect of processing speed, improving clustering speed, and improving clustering accuracy

Pending Publication Date: 2021-06-11
苏州数海长云数据信息科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, the traditional spectral clustering algorithm is to solve the first k eigenvalues ​​and their corresponding eigenvectors in the constructed Laplacian matrix according to the number k of clusters and construct the eigenvector space, and then use the K-means algorithm to The eigenvectors in the vector space are clustered. In practical applications, as the size of the data set increases, the convergence speed of spectral clustering becomes very slow. However, using the traditional spectral clustering algorithm will reduce the running speed and configuration. Higher requirements, not conducive to universal use

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big data clustering method based on cloud computing platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044]The technical solution of the present invention will be clearly and completely described below. Exemplary embodiments will be described here in detail. The implementations described in the following exemplary examples do not represent all implementations consistent with the present invention. Rather, they are merely examples of approaches consistent with aspects of the invention as recited in the appended claims.

[0045] refer to figure 1 , a big data clustering method based on cloud computing platform, comprising:

[0046] Step 1. First, clean up the real-world data by filling in missing values, smoothing noisy data, identifying and deleting outliers, and standardize the data from different data sources, and convert them into data in a standard format. The data is collected and organized, and then the collected data set is cut into pieces to obtain multiple divided data blocks, which are stored in the distributed file system HDFS of the cloud platform, and Hadoop is...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a big data clustering method based on a cloud computing platform. The method comprises the following steps: 1: deploying a data set X = (x1, x2,..., xn) to an HDFS; 2, reading each record i in sequence, and calculating the similarity between the sample i and each sample to obtain a similarity matrix S of the data set, wherein the similarity calculation of the ith sample and the other samples and the similarity calculation of the jth sample and the other samples can be performed in parallel; 3, obtaining a weighted connection matrix W and a degree matrix D of the data set according to the matrix S; and 4, calculating a Laplacian matrix L. Compared with a traditional spectral clustering method, the parallel spectral clustering method has the advantages that the clustering precision of parallel spectral clustering is improved, meanwhile, the data processing speed can be improved through a MapReduce calculation framework used when the parallel spectral clustering is used for calculation, the method mainly depends on the number of calculation nodes in a Hadoop cluster, namely the number of task-trackers, and the clustering speed of the whole spectral clustering is greatly improved.

Description

technical field [0001] The invention relates to the field of data mining, in particular to a big data clustering method based on a cloud computing platform. Background technique [0002] Cloud computing (Cloud Computing) is a delivery model of computing resources, which usually virtualize resources. Simply put, cloud computing is the provision of computing services (including servers, storage, databases, networking, software, analytics, and intelligence)—providing rapid innovation, elastic resources, and economies of scale over the Internet. [0003] Big data, a term in the IT industry, refers to a collection of data that cannot be captured, managed, and processed by conventional software tools within a certain period of time. It requires a new processing model to have stronger decision-making power, insight, and Massive, high growth rate and diverse information assets of process optimization capabilities. [0004] The spectral clustering algorithm is based on the spectral...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/182G06K9/62
CPCG06F16/182G06F2216/03G06F18/23213G06F18/22
Inventor 梁杰
Owner 苏州数海长云数据信息科技有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More