A Log Clustering Method Based on Graph Structure

A clustering method and graph structure technology, applied in the field of text clustering, can solve the problems of the number of log categories cannot be automatically identified, the amount of calculation is large, and the number of categories cannot be guaranteed by clustering.

Active Publication Date: 2019-11-19
NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional clustering algorithms cannot meet the needs of massive log clustering
For example, the traditional K-Means and K-Medoid clustering algorithms require specifying the number of clusters and cannot automatically identify the appropriate number of categories for logs
In order to obtain a better clustering effect, the traditional Denclue clustering algorithm needs continuous experiments to obtain the appropriate number of clusters. The parameters are difficult to control, the amount of calculation is too large, and the clustering cannot guarantee the real number of categories.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Log Clustering Method Based on Graph Structure
  • A Log Clustering Method Based on Graph Structure
  • A Log Clustering Method Based on Graph Structure

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0044] Such as figure 1 As shown, a log clustering method based on a graph structure, the method includes: based on text word segmentation, vector similarity and clustering the logs of the largest connected subgraph to obtain a feature library; and according to the category features in the feature library Massive logs are categorized.

[0045] 1. Obtaining the feature library includes the following steps:

[0046] (1) Structuring the original log to generate structured log data; including: inputting the original log, structuring the semi-structured original log by columns, and outputting the structured log data.

[0047] For example, the form of Linux syslog logs is shown in Table 1.1, and the columns are structured into fields such as Timestamp, Level, Source, and Message. The original syslog becomes the format in Ta...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a log clustering method based on a graph structure. The method comprises the following steps of clustering logs based on text segmentation, vector similarity and a maximum connected sub-graph in order to obtain a feature library; and carrying out class labelling on the massive logs according to the class features in the feature library. The method can automatically recognize the most appropriate class number in the massive logs without manually assigning the clustering number; in addition, the method can classify the logs precisely to lay a foundation for mining of massive log data.

Description

technical field [0001] The invention relates to the field of text clustering, in particular to a log clustering method based on a graph structure. Background technique [0002] With the rapid development of information technology and the continuous expansion of cluster scale, massive log data is generated, but there is no effective analysis and mining of log data. Log data records the operating information of the system, and mining log data is of great significance. For example, by analyzing log data, we can build an intelligent operation and maintenance system to complete functions such as fault location and fault warning. Accurate category labeling of logs is an important direction of log data mining. Based on this, we automatically identify the appropriate number of categories for logs by clustering massive logs. By extracting the features of each category, a log category feature library is generated, and new logs are marked according to the category of the feature libr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F16/36
CPCG06F16/355G06F16/36
Inventor 吕雁飞王树鹏张鸿丁煜樊冬进肖东方郑亚松周晓阳何慧虹史亮
Owner NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products