Abnormity detection method and device for large-scale log data and storage medium

A large-scale technology for anomaly detection, applied in neural learning methods, electrical digital data processing, error detection/correction, etc. More effective, more efficient effect

Pending Publication Date: 2020-10-16
昆山伊莱智能软件科技有限公司 +1
View PDF2 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention firstly aims to solve the problem that the current anomaly detection does not take into account the probability distribution of each log when large-scale log detection occurs, resulting in low efficiency of anomaly detection

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Abnormity detection method and device for large-scale log data and storage medium
  • Abnormity detection method and device for large-scale log data and storage medium
  • Abnormity detection method and device for large-scale log data and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] Embodiment 1. An anomaly detection method for large-scale log data, such as figure 1 As shown, it includes the following steps: input the selected log sequence with a set length into the pre-built machine learning prediction model, and output the conditional probability of each log template appearing at the current position; filter the log templates according to the conditional probabilities of each log template , get the set of candidate log templates;

[0031] Analyze the log to be detected to obtain its log template; judge whether the log template corresponding to the log to be detected belongs to the set of candidate log templates, if so, determine that the log is normal, and if not, judge that the log is abnormal.

[0032] After observing and researching the log sequence, it is found that in the actual environment, the number of follow-up logs of different log sequences varies greatly. Some log sequences may have a lot of logs behind them, and some may only have on...

Embodiment 2

[0044] Embodiment 2. On the basis of Embodiment 1, this embodiment provides an anomaly detection method for large-scale log data, which also includes parsing the original logs collected in advance according to the order in which task identifiers appear. to reorder. The principle of anomaly detection based on sequence prediction is that the model trained by normal logs can mine and identify the normal behavior patterns in the logs, so as to predict subsequent logs. However, since there are many concurrently executed jobs in the system, there are multiple logs of the same task (identified uniquely by session_id). The session_id of these logs is the same, but they are not consecutive in the original log. In order to get For a better training effect, the original logs need to be reordered according to the order in which task session_ids appear: that is, to arrange multiple logs generated by concurrent execution of a task together, and to sort and sort multiple logs generated by mu...

Embodiment 4

[0053] Embodiment 4, the present invention provides a large-scale log data anomaly detection device (structure such as Figure 5 shown), including log parsing module, log template candidate set determination module and log anomaly detection module:

[0054] The log analysis module is used to analyze the log to be detected to obtain its log template;

[0055] The log template candidate set determination module is used to input the selected log sequence with a set length into the pre-built machine learning prediction model, and output the conditional probability of each log template at the current position; according to the conditional probability of each log template, the log template Perform screening to obtain a set of candidate log templates;

[0056] The log anomaly detection module is used for judging whether the log template corresponding to the log to be detected by the log parsing module belongs to the set of candidate log templates, and if so, it is judged that the lo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an abnormity detection method and device for large-scale log data and a storage medium. The method comprises the steps: inputting a selected log sequence with a set length intoa pre-constructed machine learning prediction model, and outputting the conditional probability that each log template appears at a current position; screening the log templates according to the conditional probability of each log template to obtain a candidate log template set; analyzing the log to be detected to obtain a log template of the log to be detected; judging whether the log template corresponding to the to-be-detected log belongs to a candidate log template set or not, if yes, judging that the log is normal, and if not, judging that the log is abnormal. According to the method, probability distribution of occurrence of each log during large-scale log detection is considered, so that the efficiency of abnormity detection for large-scale log data is remarkably improved.

Description

technical field [0001] The invention belongs to the technical field of data security detection, and in particular relates to an abnormality detection method, device and storage medium of large-scale log data. Background technique [0002] During the operation of Hadoop cluster, a large amount of log information will be generated, such as business logs, audit logs, etc. These log information records the system operation status, security events and their internal relations, and the security event information contained in the system operation can be mined through the logs . Existing log anomaly detection methods are based on rule bases, mathematical statistics, machine learning algorithms, and deep learning neural networks. The method based on the rule base mainly uses rule matching. The advantage is that the accuracy rate is high. The disadvantage is that the method is limited to specific scenarios, can only target specific log types, and it is difficult to analyze unknown se...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/30G06F8/658G06N3/04G06N3/08
CPCG06F11/3065G06F8/658G06N3/08G06N3/044G06N3/045
Inventor 李颉徐荣李德宇王欢
Owner 昆山伊莱智能软件科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products