Multi-partition clustering preprocessing method of stream data

A streaming data and multi-partition technology, applied in the field of DBSCAN clustering algorithm, can solve problems such as poor clustering quality, large distance between clusters, and decreased algorithm efficiency, so as to achieve improved clustering quality, uniform data distribution, and reduced workload effect

Active Publication Date: 2017-04-19
NANJING UNIV OF SCI & TECH
View PDF3 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the general-purpose density-based clustering algorithm DBSCAN algorithm, there are two weaknesses: one is that the entire database needs to be loaded into the memory during the data clustering process, and when the amount of data is large, the efficiency of the algorithm will drop sharply
Second, when the density of spatial clustering is not uniform, the distance between clusters varies greatly, and the clustering quality is poor.
In the improved grid-based and density-based clustering algorithms, there is still a problem that the processing data volume range is huge, resulting in low clustering efficiency.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-partition clustering preprocessing method of stream data
  • Multi-partition clustering preprocessing method of stream data
  • Multi-partition clustering preprocessing method of stream data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] In order to better understand the present invention, the content of the present invention will be further described below in conjunction with the accompanying drawings.

[0048] combine figure 1 , the multi-partition clustering preprocessing method of flow data of the present invention improves the partition mode of the density-based clustering algorithm, comprising the following steps:

[0049] Step 1. Determine the scope of the flow data situation factors, and screen the flow data situation factors according to the correlation between the situation factors and the network security situation;

[0050] The situational factor of streaming data is multi-source heterogeneous observation data obtained from intrusion detection logs, host equipment running status, node traffic monitoring equipment, and real-time alarm system. There are many factors affecting the security situation. The present invention utilizes the gray relational degree analysis method in the gray theory t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a multi-partition clustering preprocessing method of stream data. The method includes the following steps that: the trend factors of the stream data are screened, and a correlation degree is calculated; statistical analysis is performed on the stream data, a low-dimensional database partitioning method is adopted for high-correlation factors in a high-dimensional database, and isometric triangle partitioning is performed on low-correlation factors; a distribution-based partitioning method is adopted for a low-dimensional database; a DBSCAN (density-based spatial clustering of applications with noise) algorithm is adopted to perform clustering in each rule partition; and local clustering results are merged. According to the multi-partition clustering preprocessing method of the stream data of the invention, multi-partition improvement is made for the clustering preprocessing of the stream data, and therefore, data distribution is more uniform, clustering results are more accurate, and distributed parallel processing of the data is realized, and the pressure of low data preprocessing efficiency of a large number of data sequences which arrive sequentially, fast and continuously can be alleviated.

Description

technical field [0001] The invention relates to the technical field of DBSCAN clustering algorithm, in particular to a multi-partition clustering preprocessing method for streaming data. Background technique [0002] With the advent of the era of big data, data has gradually been transmitted in the form of data streams. Streaming data has the following four characteristics: 1) The data arrives in real time; 2) The order of data arrival is independent and not controlled by the application system; 3) The data scale is huge and its maximum value cannot be predicted; 4) Once the data is processed, unless it is specially saved, Otherwise, it cannot be retrieved and processed again, or it is expensive to retrieve the data again. The above characteristics of streaming data also bring about the expansion of security audit data. Data mining itself is a common knowledge discovery technology, its purpose is to extract the data information (knowledge) we are interested in from massive...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/24568G06F16/285
Inventor 王烁李千目戚湧王印海
Owner NANJING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products