Big data clustering method based on decomposition and composition

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A clustering method and big data technology, applied in database model, relational database, electronic digital data processing and other directions, can solve the problems of high dimension, difficult internal model of big data, and large amount of big data.

Active Publication Date: 2014-09-24

广东唯审信息科技有限公司

View PDF6 Cites 9 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Big data has the characteristics of large data volume and high dimensionality, which makes traditional data analysis methods helpless in the face of big data; and the presence of noise attributes and noise sample points in big data also makes it even more difficult to mine the internal model of big data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0049] A decomposition-combination clustering method for big data. Firstly, the big data is divided horizontally and vertically; then, the category label of each data subset is obtained, and then the category label of the entire data set is obtained by using the combination clustering method. The specific implementation steps are as follows:

[0050] 1) Horizontal segmentation. Use random sampling to horizontally split the big data, that is, randomly draw 10% of the sample size to obtain the data subset D i , repeated sampling with replacement r = 100 times, so that the full set of 100 data subsets is D.

[0051] 2) Vertical segmentation. Using random sampling, for each data subset D i Carry out vertical segmentation, that is, randomly extract 10% of the attributes to obtain the data subset D ij , repeated sampling with replacement c=100 times, making 100 data subsets D ij The complete set is D i .

[0052] 3) Obtain category labels for subsets of data. Use K-means for...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a big data clustering method based on decomposition and composition. The method includes the steps of transversely segmenting a data set to obtain a plurality of data subsets , transversely segmenting each transverse data subset to obtain a plurality of longitudinal data subsets, and obtaining classification tags of the data subsets obtained through transverse segmentation and longitudinal segmentation by using a basic clustering algorithm, compositing and clustering the classification tags of the longitudinal data subsets, and compositing and clustering the classification tags of the transverse data subsets again to obtain a complete classification tag of the data set. By means of the big data clustering method, the problem of big data clustering is converted into the composition clustering problem, and the big data clustering method has the advantages of having efficiency and robustness, being capable of being parallelized and the like. The big data clustering method is suitable for big data clustering and is particularly suitable for the file classification field, the customer segmentation field, the information retrieval field and other fields.

Description

technical field [0001] The invention belongs to the field of data mining, and relates to a clustering method for data division, in particular to a combined clustering method for big data. Background technique [0002] Big data has brought unprecedented impact and challenges to people. The characteristics of big data are: Volume (mass), Velocity (high speed), Variety (variety), and veracity (authenticity). How to mine the potential value information contained in big data has become a hot issue in industry and academia. Big data has the characteristics of large data volume and high dimensionality, which makes traditional data analysis methods helpless in the face of big data; and the presence of noise attributes and noise sample points in big data also makes it even more difficult to mine the internal model of big data . Contents of the invention [0003] In view of the massive high-dimensional problems in big data clustering, the purpose of the present invention is to pro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

CPCG06F16/2219G06F16/285

Inventor吴俊杰伍之昂曹杰

Owner广东唯审信息科技有限公司

Big data clustering method based on decomposition and composition

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology