Unsupervised rapid clustering method and system suitable for big data

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A clustering method, an unsupervised technology, applied in the fields of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of inapplicability to big data, high space complexity, large memory consumption, etc., to reduce space complexity and memory overhead, strong robustness, and the effect of improving operating efficiency

Inactive Publication Date: 2018-04-20

SOUTH CHINA UNIV OF TECH

View PDF0 Cites 7 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] So far, in the clustering of large-scale data sets, the current common clustering methods, such as k-mean, etc., either have high space complexity, large memory consumption, and insufficient efficiency, which is not suitable for large data, or must Artificial participation in it cannot truly achieve unsupervised clustering, which cannot meet the requirements of the big data environment. Therefore, an unsupervised fast clustering method suitable for big data is needed to assist in the extraction of potentially useful information in big data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0065] Such as figure 1 As shown, an unsupervised fast clustering system suitable for big data, including data acquisition module, data preprocessing module, data clustering module and data analysis module. Among them, the data preprocessing module can be subdivided into data cleaning sub-module, data integration sub-module, data transformation sub-module and data specification sub-module; data clustering module can be subdivided into hypergrid sampling sub-module, MP-AP clustering Classes submodules and maps restore submodules.

[0066] The specific method is:

[0067] Firstly, use the data collection module to identify and collect data from the data sources that generate information. Generally, data information can be obtained through web crawlers, website public APIs, and specific system interfaces provided by enterprises, and unstructured data can be extracted and stored as Unified local data files and stored in a structured way".

[0068] Then, the collected data is tr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an unsupervised rapid clustering method and system suitable for big data, and the method comprises the steps: carrying out the hyper-grid dividing and sampling of a preprocessed large-scale data set, and obtaining a new data set; carrying out the clustering of the new data set through an improved neighbor propagation method, and obtaining a preliminary clustering result; finally remapping and restoring the initial clustering result to an original data set, and finally obtaining a final clustering result, thereby creating conditions for the further analysis. The method is higher in robustness, is suitable for a low-dimensional set, and is also suitable for a high-dimensional data set.

Description

technical field [0001] The invention belongs to the technical field of big data analysis, mining and application, and in particular relates to an unsupervised fast clustering method and system suitable for big data. Background technique [0002] With the rapid development of information technology and the development and utilization of information resources, the world's demand for information is growing rapidly. At the same time, the world is entering the era of big data, and the rapid development of sensor, Internet of Things and smart mobile terminal technology makes it easy for people to obtain various types of data through the network. However, in many cases, the information cannot be obtained directly, and must be analyzed and extracted from massive data. How to extract useful information from big data is a global research hotspot at present and in the future. [0003] Clustering is an important part of the data preprocessing process and an effective method to simplif...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06K9/62G06F17/30

CPCG06F16/215G06F18/23

Inventor陈均健俞祝良顾正晖余天佑

OwnerSOUTH CHINA UNIV OF TECH

Unsupervised rapid clustering method and system suitable for big data

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology