Unlock instant, AI-driven research and patent intelligence for your innovation.

Subspace clustering method and subspace clustering device for high-dimensional big data

A clustering method and subspace technology, applied in the field of data processing, can solve problems such as low operating efficiency

Active Publication Date: 2017-06-30
BEIJING UNIV OF POSTS & TELECOMM
View PDF7 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, for the process of clustering, high-dimensional big data with an order of magnitude of TB, PB and above has a huge amount of data, and the number of data rows may reach tens of thousands of rows. The Mafia subspace clustering algorithm obtained after interval division It is still very large data, which leads to low operating efficiency of the Mafia subspace clustering algorithm when processing huge data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Subspace clustering method and subspace clustering device for high-dimensional big data
  • Subspace clustering method and subspace clustering device for high-dimensional big data
  • Subspace clustering method and subspace clustering device for high-dimensional big data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0049] In order to improve the operating efficiency of high-dimensional big data clustering, embodiments of the present invention provide a subspace clustering method and device for high-dimensional big data.

[0050] The following firstly introduces a subspace clustering method for high-dimensional big data provided by an embodiment of the present invention.

[0051] It should be noted that the execution subject of the high-dimensional big data-oriented subsp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiments of the invention provide a subspace clustering method and a subspace clustering device for high-dimensional big data. The method comprises the following steps: building a first Map task for each row of high-dimensional big data acquired, and segmenting data in each first Map task in accordance with dimensions to get the characteristic value of each dimension in each first Map task; in a first Reduce node, acquiring the data area, the default number of windows, the default window merging threshold and the default window density threshold of all the characteristic values of each dimension, and getting a one-dimension dense subspace of each dimension according to the data area, the default number of windows, the default window merging threshold and the default window density threshold of all the characteristic values of each dimension; determining a (k+1)-dimensional candidate subspace according to every two k-dimensional dense subspaces; for each k-dimensional dense subspace, building a second Map task, and getting all sample points distributed in each k-dimensional dense subspace; and in a second Reduce node, getting a (k+1)-dimensional dense subspace after clustering. Through the scheme, the operation efficiency of high-dimensional big data clustering can be improved.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a subspace clustering method and device for high-dimensional big data. Background technique [0002] Clustering is the process of dividing a collection of physical or abstract objects into classes of similar objects. A cluster generated by clustering is a collection of data objects that are similar to objects in the same cluster and different from objects in other clusters. Cluster analysis, also known as group analysis, is a statistical analysis method for studying classification problems. Due to the advent of the big data era, the scale of data sets is getting larger and higher, and the dimensions are getting higher and higher. Due to the existence of irrelevant features and the disaster of dimensionality, traditional clustering algorithms are no longer suitable for high-dimensional data. [0003] In order to solve the problem that the traditional clustering algorithm...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06F17/30
CPCG06F16/285G06F18/23
Inventor 高志鹏范译丹牛琨赵旸邓楠洁杨杨邱雪松李文璟
Owner BEIJING UNIV OF POSTS & TELECOMM