Distribution iterative data processing program abnormity detection and diagnosis method

A technology for data processing and program exceptions, applied in electrical digital data processing, special data processing applications, error detection/correction, etc., can solve problems such as distributed iterative data processing program exceptions, and reduce program diagnosis and tuning time. , lowering the threshold of use, efficient matching and mining effects

Active Publication Date: 2016-09-28
TSINGHUA UNIV
View PDF4 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to propose a method for abnormal detection and diagnosis of distributed iterative data processing programs in vi...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distribution iterative data processing program abnormity detection and diagnosis method
  • Distribution iterative data processing program abnormity detection and diagnosis method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The distributed iterative data processing program anomaly detection and diagnosis method proposed by the present invention will be further described below in conjunction with the accompanying drawings and specific implementation methods.

[0030] The distributed iterative data processing program anomaly detection and diagnosis method proposed by the present invention includes two stages of training model and detection and diagnosis; wherein

[0031] The workflow of the training model stage is as follows: figure 1 As shown, it specifically includes the following steps:

[0032] (1) Input the log as the training set (the log is a known log generated by previous programs), and judge whether the log template library exists. If the log template library does not exist, skip to step (2) to construct the log template library ; If the log template library exists, then jump to step (3) log is carried out matching and feature extraction; Described log template library is made up ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a distribution iterative data processing program abnormity detection and diagnosis method, and belongs to the computer data management technical field; the method comprises two stages of model training and detection diagnosis; firstly extracting a log template database based on source code analysis, then respectively extracting characteristic vectors corresponding to data partitions and task units from mass log data according to the template database, carrying out abnormity detection model training, then combining field knowledge to tag abnormity types, using trained model to carry out abnormity detection for future calculating tasks, using a visualization interface to display abnormity analysis, associating characteristic units with code positioning information, and finally mapping the abnormal characteristic unit into program codes so as to realize program abnormity detection and diagnosis. The method can detect and diagnose mainly abnormity of the distribution iterative data processing program from multi angles, is direct and easy to use, good in interactivity, thus greatly reducing user time for program abnormity detection and diagnosis.

Description

technical field [0001] The invention belongs to the technical field of computer data management, and in particular relates to a method for detecting and diagnosing anomalies in distributed iterative data processing programs. Background technique [0002] With the development of the big data era, more and more service applications are running in distributed systems, and the scale of machine clusters for deploying distributed systems is also increasing. In a complex distributed system, when the performance of the program is abnormal, how to quickly and effectively detect and diagnose the abnormality, and then help developers optimize the program, has become an important issue in the field of distributed systems. As a popular distributed iterative system (a type of distributed system), Spark has been widely used in the industry. For example, Baidu, Alibaba, and Tencent in China have deployed large-scale Spark for machine learning. , graph computing, stream processing and other...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F11/07
CPCG06F11/0751G06F16/1815G06F16/182
Inventor 王建民龙明盛唐亚腾黄向东
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products