BERT anomaly detection method and equipment based on template sequence or word sequence

An anomaly detection and template sequence technology, applied in the field of log detection, can solve the problems that the anomaly detection model cannot achieve better detection results, achieve the effect of shortening training costs and improving the effect of anomaly detection

Pending Publication Date: 2021-07-13
CHANGSHA UNIVERSITY OF SCIENCE AND TECHNOLOGY
View PDF6 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] To sum up, the applicant found that in the field of log anomaly detection, log anomaly detection requires a large amount of data sets to train the anomaly detection model. good detection effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • BERT anomaly detection method and equipment based on template sequence or word sequence
  • BERT anomaly detection method and equipment based on template sequence or word sequence
  • BERT anomaly detection method and equipment based on template sequence or word sequence

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example ;

[0040] refer to Figure 1 to Figure 4 , an embodiment of the present invention provides a BERT anomaly detection method based on a word sequence, comprising the following steps:

[0041] Step S101, obtaining a plurality of original log messages;

[0042] Log messages are collected through plug-ins, such as log4net on the .net platform, and log4j and slf4j on the java platform. Such as figure 2 as shown, figure 2 9 original log messages are shown in the first block diagram from the left, for example: the third log message is "-1117848119 2005.06.03R16-M1-N2-C:J17-U012005-06-03-18.21.59.871925 R16-M1-N2-C: J17-U01 RAS KERNEL INFO CE SYM2, AT 0X0B85EEE0, MASK0X05".

[0043] Step S102, performing log analysis on each original log message, so as to obtain the log event corresponding to each original log message after analysis;

[0044] In this embodiment, the Drain log parsing tool is used to perform log parsing on all the obtained original log messages, and obtain the log e...

no. 2 example ;

[0067] refer to Figure 2 to Figure 5 , an embodiment of the present invention provides a BERT anomaly detection method based on a template sequence, comprising the following steps:

[0068] Step S201, obtaining a plurality of original log messages;

[0069] Step S202, performing log analysis on each original log message to obtain a log event corresponding to each original log message after analysis;

[0070] For the detailed introduction of step S201 and step S202, reference may be made to the first embodiment, and details are not repeated here.

[0071] Step S203, dividing all log events into a corresponding number of template sequences by using a window division method;

[0072] After the original semi-structured log messages are converted into structured log events, step S203 uses the fixed window technology to divide the logs into log sequences. It should be noted that, in this field, the concept of a log sequence is: to represent log messages in the same window. In t...

no. 3 example ;

[0084] refer to Figure 6 to Figure 8 , this embodiment uses the BGL data set generated in the BLUEGENE / L supercomputer system of LAWRENCE LIVERMORE NATIONAL LABS (LLNL). Table 2 below shows some basic information of the BGL dataset. The BGL dataset contains 4,747,963 original log messages, including 348,460 abnormal log messages. So the number of normal log messages is 4399503. All experiments are run on the GOOGLE COLAB cloud platform (HTTPS: / / COLAB.RESEARCH.GOOGLE.COM), which provides 8-core Online deep learning server with GOLD 6148CPU, TESLAK80 GPU and 25.51GB RAM. Three performance evaluation indicators commonly used in machine learning are used to evaluate the quality of the model, namely accuracy rate, recall rate, and F1-score.

[0085] system time span data size number of log messages exception message data BGL 7months 708M 4747963 348460

[0086] Table 2

[0087] In order to verify the rationality and advancement of the present...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a BERT anomaly detection method and equipment based on a template sequence or a word sequence. The method comprises the steps: firstly converting an original log message into the template sequence or the word sequence, taking the template sequence or the word sequence as the input of a BERT model, and achieving the training of the BERT model, and finally, anomaly detection on the to-be-detected template sequence or word sequence is realized by utilizing the trained BERT model, a good anomaly detection effect can be realized by only needing fewer training labels, and compared with the prior art, the training cost is greatly reduced, and the anomaly detection effect is improved.

Description

technical field [0001] The invention relates to the technical field of log detection, in particular to a BERT anomaly detection method and equipment based on a template sequence or a word sequence. Background technique [0002] In the past, supervised methods required a large amount of labeled data to train the model, so in order to obtain a good classification model, the amount of labeled data is particularly important. Within a certain range, the larger the number of labeled data, the better the trained classification model. Therefore, an ideal supervised classification model must be trained from a large amount of labeled data. [0003] Both unsupervised classification models and supervised classification models require a large amount of training data for training. When the training data is small, the effect of the trained classification model is not ideal. Therefore, the effect of the unsupervised classification model is also determined by the amount of training data. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06F40/186G06F40/284G06N3/08
CPCG06F40/284G06F40/186G06N3/08G06F18/214
Inventor 王进唐杨宁何施茗曹敦张经宇
Owner CHANGSHA UNIVERSITY OF SCIENCE AND TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products