Hadoop-based k-means clustering analysis system and method of network security log

A technology of network security and cluster analysis, applied in the field of computer information processing, can solve the problems of inability to mine the intrinsic value of massive heterogeneous data, poor expansion ability of data warehouse, etc., to improve computing power, improve query analysis efficiency, and potential value mining Effect

Inactive Publication Date: 2015-12-09
NORTHWEST UNIV(CN)
View PDF5 Cites 35 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] In order to overcome the above-mentioned deficiencies in the prior art, the purpose of the present invention is to provide a Hadoop-based network security log k-means clustering analysis system and method, on the basis of rationally utilizing the traditional data warehouse that has been built, the big data The platform is integrated to establish a unified data storage and data processing architecture, which overcomes the shortcomings of traditional data warehouses, which are poor in scalability, only good at processing structured data, and unable to mine the intrinsic value of massive heterogeneous data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hadoop-based k-means clustering analysis system and method of network security log
  • Hadoop-based k-means clustering analysis system and method of network security log
  • Hadoop-based k-means clustering analysis system and method of network security log

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0099] First, build a Hadoop distributed cluster environment, including 5 PCs. One master server and the remaining four are slave servers. Configure Hadoop on each machine, and then install and configure Sqoop, hive, and MySQL on the namenode. In this embodiment, the log records of all security devices in Shaanxi Li'an Electric Supermarket are used, and the file size is 16G. The log is regularly updated every day according to the requirements, and the query results are counted in the update business.

[0100] This method can realize fast statistical query through hive. Its advantages are: low learning cost, simple MapReduce statistics can be quickly realized through SQL-like statements, and no special MapReduce application needs to be developed, which is very suitable for statistical analysis of data warehouses. Using partitions can speed up the query speed of data shards and improve query efficiency. Realize the k-means algorithm through MapReduce, and evaluate the securit...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a hadoop-based k-means clustering analysis system and method of a network security log. The hadoop-based k-means clustering analysis system comprises a log data acquisition subsystem, a log data mixing mechanism storage management subsystem and a log data analysis subsystem. The method includes the steps that in a data storage layer, a mixing storage mechanism with Hadoop cooperating with a traditional data warehouse is adopted to store log data, a Hive operation interface is provided in a data access layer, the data storage layer and a computing layer receive instructions from a Hive engine, and efficient query analysis on the data is achieved by being matched with MapReduce through HDFS; when mining analysis is conducted on log data, MapReduce is adopted to conduct clustering mining analysis on the network security log through a k-means algorithm; the framework with the Hadoop cooperating with the traditional data warehouse is adopted, the detects of the traditional data warehouse on the aspects of mass data processing, storage and the like, and meanwhile an original traditional data warehouse is fully used; clustering analysis is conducted through the MapReduce-based k-means algorithm, and safety grade evaluation and early warning can be conducted on log data timely.

Description

technical field [0001] The invention belongs to the technical field of computer information processing, and in particular relates to a Hadoop-based network security log k-means cluster analysis system and method. Background technique [0002] With the explosion of data and the sharp increase in the amount of information, the existing traditional data warehouses of enterprises have been unable to cope with the growth rate of data. Traditional data warehouses are usually built with high-performance all-in-one machines, which are costly and poor in scalability, and traditional data warehouses are only good at processing structured data. This feature affects the mining of intrinsic value of traditional data warehouses when faced with massive heterogeneous data. This is the biggest difference between Hadoop and traditional data processing methods. We need to make reasonable use of the existing traditional data warehouse of the enterprise, and at the same time integrate the exist...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/182G06F16/2457G06F16/25
Inventor 高岭苏蓉高妮王帆杨建锋雷艳婷申元
Owner NORTHWEST UNIV(CN)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products