Big data clustering method based on decomposition and composition

A clustering method and big data technology, applied in database model, relational database, electronic digital data processing and other directions, can solve the problems of high dimension, difficult internal model of big data, and large amount of big data.

Active Publication Date: 2014-09-24
广东唯审信息科技有限公司
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Big data has the characteristics of large data volume and high dimensionality, which makes traditional data analysis methods helpless in the face of big

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big data clustering method based on decomposition and composition
  • Big data clustering method based on decomposition and composition
  • Big data clustering method based on decomposition and composition

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0049] A decomposing and combining clustering method for big data. First, the big data is segmented horizontally and vertically; then, the category label of each data subset is obtained, and then the combined clustering method is used to obtain the category label of the entire data set. The specific implementation steps are as follows:

[0050] 1) Cut horizontally. Use random sampling to split the big data horizontally, that is, randomly select 10% of the sample size to obtain the data subset D i , The repeated sampling with replacement is r=100 times, so that the full set of 100 data subsets is D.

[0051] 2) Split longitudinally. Using random sampling, for each data subset D i Perform longitudinal segmentation, that is, randomly select 10% of attributes to obtain data subset D ij , Repeated sampling with replacement c=100 times, making 100 data subsets D ij The complete works of D i .

[0052] 3) Obtain the category label of the data subset. Use K-means for each data set subset ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a big data clustering method based on decomposition and composition. The method includes the steps of transversely segmenting a data set to obtain a plurality of data subsets , transversely segmenting each transverse data subset to obtain a plurality of longitudinal data subsets, and obtaining classification tags of the data subsets obtained through transverse segmentation and longitudinal segmentation by using a basic clustering algorithm, compositing and clustering the classification tags of the longitudinal data subsets, and compositing and clustering the classification tags of the transverse data subsets again to obtain a complete classification tag of the data set. By means of the big data clustering method, the problem of big data clustering is converted into the composition clustering problem, and the big data clustering method has the advantages of having efficiency and robustness, being capable of being parallelized and the like. The big data clustering method is suitable for big data clustering and is particularly suitable for the file classification field, the customer segmentation field, the information retrieval field and other fields.

Description

technical field [0001] The invention belongs to the field of data mining, and relates to a clustering method for data division, in particular to a combined clustering method for big data. Background technique [0002] Big data has brought unprecedented impact and challenges to people. The characteristics of big data are: Volume (mass), Velocity (high speed), Variety (variety), and veracity (authenticity). How to mine the potential value information contained in big data has become a hot issue in industry and academia. Big data has the characteristics of large data volume and high dimensionality, which makes traditional data analysis methods helpless in the face of big data; and the presence of noise attributes and noise sample points in big data also makes it even more difficult to mine the internal model of big data . Contents of the invention [0003] In view of the massive high-dimensional problems in big data clustering, the purpose of the present invention is to pro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/2219G06F16/285
Inventor 吴俊杰伍之昂曹杰
Owner 广东唯审信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products