Unsupervised rapid clustering method and system suitable for big data

A clustering method, an unsupervised technology, applied in the fields of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of inapplicability to big data, high space complexity, large memory consumption, etc., to reduce space complexity and memory overhead, strong robustness, and the effect of improving operating efficiency

Inactive Publication Date: 2018-04-20
SOUTH CHINA UNIV OF TECH
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] So far, in the clustering of large-scale data sets, the current common clustering methods, such as k-mean, etc., either have high space complexity, large memory consumption, and insufficient efficiency, which is not suitable for large data, or must Artificial participation in it cannot truly achieve unsupervised clustering, which cannot meet the requirements of the big data environment. Therefore, an unsupervised fast clustering method suitable for big data is needed to assist in the extraction of potentially useful information in big data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unsupervised rapid clustering method and system suitable for big data
  • Unsupervised rapid clustering method and system suitable for big data
  • Unsupervised rapid clustering method and system suitable for big data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0065] Such as figure 1 As shown, an unsupervised fast clustering system suitable for big data, including data acquisition module, data preprocessing module, data clustering module and data analysis module. Among them, the data preprocessing module can be subdivided into data cleaning sub-module, data integration sub-module, data transformation sub-module and data specification sub-module; data clustering module can be subdivided into hypergrid sampling sub-module, MP-AP clustering Classes submodules and maps restore submodules.

[0066] The specific method is:

[0067] Firstly, use the data collection module to identify and collect data from the data sources that generate information. Generally, data information can be obtained through web crawlers, website public APIs, and specific system interfaces provided by enterprises, and unstructured data can be extracted and stored as Unified local data files and stored in a structured way".

[0068] Then, the collected data is tr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an unsupervised rapid clustering method and system suitable for big data, and the method comprises the steps: carrying out the hyper-grid dividing and sampling of a preprocessed large-scale data set, and obtaining a new data set; carrying out the clustering of the new data set through an improved neighbor propagation method, and obtaining a preliminary clustering result; finally remapping and restoring the initial clustering result to an original data set, and finally obtaining a final clustering result, thereby creating conditions for the further analysis. The method is higher in robustness, is suitable for a low-dimensional set, and is also suitable for a high-dimensional data set.

Description

technical field [0001] The invention belongs to the technical field of big data analysis, mining and application, and in particular relates to an unsupervised fast clustering method and system suitable for big data. Background technique [0002] With the rapid development of information technology and the development and utilization of information resources, the world's demand for information is growing rapidly. At the same time, the world is entering the era of big data, and the rapid development of sensor, Internet of Things and smart mobile terminal technology makes it easy for people to obtain various types of data through the network. However, in many cases, the information cannot be obtained directly, and must be analyzed and extracted from massive data. How to extract useful information from big data is a global research hotspot at present and in the future. [0003] Clustering is an important part of the data preprocessing process and an effective method to simplif...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06F17/30
CPCG06F16/215G06F18/23
Inventor 陈均健俞祝良顾正晖余天佑
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products