Distributed-structure-based big data clustering method and device

A distributed structure and clustering method technology, applied in the field of data mining, can solve problems such as hard division of intervals, no consideration of the different effects of big data data points on knowledge discovery tasks, uneven data distribution, etc.

Active Publication Date: 2015-07-29
品尚电子商务有限公司
View PDF2 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the face of big data processing, the method based on sample sampling probability is generally adopted, but the sampling method does not consider the overall relative distance between data points or intervals and the uneven distribution of data, resulting in the problem of hard division of intervals
Although later, clustering, fuzzy concepts, and cloud models were introduced to improve the problem of interval division and achieved good results, but these methods did not consider the different effects of big data data points on knowledge discovery tasks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed-structure-based big data clustering method and device
  • Distributed-structure-based big data clustering method and device
  • Distributed-structure-based big data clustering method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0098] The technical solutions of the present invention will be clearly and completely described below in conjunction with the accompanying drawings of the present invention. Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present invention. Rather, they are merely examples of apparatuses and methods consistent with aspects of the invention as recited in the appended claims.

[0099] see figure 1 , a kind of big data clustering method based on distributed structure that the present invention proposes, comprises:

[0100] Step S100, big data preprocessing, cleaning up the data in the real world by fill...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a distributed-structure-based big data clustering method. The distributed-structure-based big data clustering method comprises the following steps: S100, preprocessing big data; S200, segmenting and managing the big data; S300, establishing a clustering hypergraph model; S400, mapping the big data, specifically, respectively mapping segmented data blocks to the hypergraphs H=(V, E), namely mapping each data block to one hypergraph; S500, clustering all the data blocks respectively by using the hypergraphs; S600, reclustering clustering results of all the data blocks, obtained in the step S500, to obtain a final clustering result. According to the distributed-structure-based big data clustering method, by using a cloud platform in combination with a hypergraph theory, the big data is mined and clustered, so that fast, real-time and accurate analysis and processing of the big data are achieved.

Description

technical field [0001] The invention relates to the field of data mining, in particular to a large data clustering method and device based on a distributed structure. Background technique [0002] Over the past half century, with the full integration of computer technology into social life, the information explosion has accumulated to a degree that has begun to trigger changes. Not only is it flooding the world with more information than ever before, but its growth rate is accelerating. The subject of information explosion, such as astronomy and genetics, created the concept of "big data". Today, this concept is applied to almost all areas of human intelligence and development. The 21st century is an era of great development of data and information. Mobile Internet, social network, e-commerce, etc. have greatly expanded the boundaries and application scope of the Internet, and various data are rapidly expanding and becoming larger. Internet (social, search, e-commerce), m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/285G06F16/35
Inventor 马泳宇
Owner 品尚电子商务有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products