Hierarchical clustering-based log audit method

A hierarchical clustering and log technology, applied in the field of network security, can solve problems such as the inability to effectively audit massive logs, and achieve the effect of avoiding heavy and enhancing user experience

Active Publication Date: 2017-02-22
NANJING UNIV OF SCI & TECH
View PDF3 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the problem that the existing log audit method cannot effectively audit massive logs and find abnormal data in the log, the present invention proposes a log audit method based on hierarchical clustering

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hierarchical clustering-based log audit method
  • Hierarchical clustering-based log audit method
  • Hierarchical clustering-based log audit method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] The invention aims at performing log audit by using a hierarchical clustering method, clustering the logs, and digging out abnormal log information.

[0019] Below in conjunction with accompanying drawing and embodiment the present invention will be further described:

[0020] The log audit process is as follows: figure 1 shown.

[0021] In the case of the obtained original log file, the log needs to be preprocessed, and the present invention performs log preprocessing based on a method of tf-idf weight. Taking web logs as an example, a common web log such as figure 2 shown.

[0022] It is not difficult to find that this log is semi-structured data, in which there are classification types, time and strings, and time and classification types can be extracted as specific attribute items separately. For variable descriptions, it is necessary to use tf-idf weight to select each The keyword of the log. The formula is as follows:

[0023]

[0024]

[0025] tf-idf...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a hierarchical clustering-based log audit method. A system reads log information; a keyword in the log information is analyzed according to the read log information; the log information is subjected to word vector transformation according to the obtained keyword; and the system performs cluster analysis on obtained word vectors and displays small abnormal information obtained by analysis. According to the method, heavy work of manual log audit is avoided and abnormal log information can be automatically filtered, so that the user experience effect is enhanced.

Description

technical field [0001] The invention relates to the field of network security, in particular to a log audit method based on hierarchical clustering. Background technique [0002] With the development of informatization, network security issues have become more and more prominent. Logs, as a means of security recording, can still play an important role in the current security needs. However, in the face of massive log information, traditional log audit methods are stretched. Taking intrusion detection detection as an example, according to Julisch's investigation, as early as 2000, when the network was not inflated, the system generally triggered at least 3 alarm logs per minute. Now Logs have long belonged to the category of big data. Massive data will bring great difficulties to decision-making analysis, and manual analysis is not only labor-intensive but also error-prone. Nowadays, using the clustering method in data mining to mine network data has become the mainstream. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 孙康李千目
Owner NANJING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products