Big data stream type cluster processing system and method for on-demand clustering

A data streaming and processing system technology, applied in the field of data processing, can solve problems such as not being able to effectively improve resource utilization efficiency, and achieve the effects of fast processing, enhanced scalability and sensitivity

Active Publication Date: 2013-10-16
SOUTH CHINA NORMAL UNIVERSITY
View PDF0 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Secondly, the existing framework basically processes data continuously, that is, clustering is used for the data (also called data segment) of each sliding window, which cannot ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big data stream type cluster processing system and method for on-demand clustering
  • Big data stream type cluster processing system and method for on-demand clustering

Examples

Experimental program
Comparison scheme
Effect test

no. 1 Embodiment approach

[0031] refer to figure 1 , the first embodiment of the present invention, a big data streaming cluster processing system for on-demand clustering, the system includes a fast calculation module, a data concept drift detection module and a clustering module, the fast calculation module The output terminal is connected to the first input terminal of the clustering module through the data concept drift detection module, and the clustering module is connected to the fast calculation module.

[0032] The fast calculation module is used to receive data input and provide output of clustering results. This module is responsible for fast and simple processing of data streams, and obtains fast calculation intermediate processing results for subsequent processing of other modules. Among them, there are two main solutions for fast processing: data stream data filtering and data feature extraction. The former is calculated by reducing the amount of data in the data stream, such as data fil...

no. 2 Embodiment

[0046] The second specific embodiment of the present invention is an on-demand clustering big data stream clustering processing method, the clustering processing method includes the following steps:

[0047] A. Perform filtering operations and data feature extraction on the input data stream data to obtain intermediate processing results;

[0048] The method performs corresponding data filtering operations according to the data filtering strategy, and calculates the amount of data in the process data flow through data filtering, data sampling, and unloading. Data feature extraction is the work of extracting abstracts according to data. The data in the data stream is complex, and the size of each data point may be large. In order to obtain better clustering results, it is necessary to extract the most important points from these data. Abstract information, by reducing the storage capacity of a single data, by extracting the characteristics of the data so that the subsequent pro...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a big data stream type cluster processing system for on-demand clustering. The system comprises a fast computation module, a data concept drift detection module and a clustering module, wherein an output end of the fast computation module is connected to a first input end of the clustering module through the data concept drift detection module, and the clustering module is connected to the fast computation module. According to the invention, aiming at characteristics of mass, similarity and repetition of the big data, an on-demand clustering model based on data concept drift detection adopts a triggered type clustering processing mode, the accuracy is guaranteed, and on-demand clustering and real-time clustering result services are provided; and secondly, a resource monitoring module and an independent module are provided for clustering processing, the prior traditional clustering algorithms are effectively utilized, expansibility and sensitivity of the system can be enhanced, and quick processing of the data stream in a big data environment is efficiently realized. The big data stream type cluster processing system for on-demand clustering can be widely applied to the field of data processing.

Description

technical field [0001] The invention relates to the field of data processing, in particular to an on-demand clustering big data streaming clustering processing system and method. Background technique [0002] Explanation of terms: [0003] Big Data: Refers to the amount of data involved that is so large that it cannot be captured, managed, processed, and organized within a reasonable period of time through current mainstream software tools to help enterprises make more positive business decisions. The 4V characteristics of big data: Volume, Velocity, Variety, Veracity. [0004] Data Streams: A data stream is a sequence of data that arrives in a certain order and is only added with a timestamp. For an ordered data point (x1,x2,...,xn) must be accessed in sequence or read very few times, the sequential reading of data sequence data is also called linear scanning or one-time processing . [0005] Data Mining: Refers to the non-trivial process of revealing implicit, previous...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 赵淦森虞海王维栋卓超
Owner SOUTH CHINA NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products