Supercharge Your Innovation With Domain-Expert AI Agents!

A Spark Procrastination Task Diagnosis Method Based on Statistical Analysis

A diagnostic method and statistical analysis technology, applied in computing, instrumentation, electrical and digital data processing, etc., can solve problems such as difficult expansion of diagnostic procedures and difficulty in explaining the cause of the problem, and achieve wide applicability, accurate analysis results, and improved analysis efficiency. Effect

Inactive Publication Date: 2019-01-18
JIANGSU HOPERUN SOFTWARE CO LTD
View PDF1 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At the same time, the existing Straggler migration strategy only deals with the symptoms that occur without diagnosing the cause of the problem
Although generating a large number of task execution traces in the data center can help diagnose Stragglers, it is very difficult to obtain and explain the cause of the problem from a complex data set
Existing diagnostics rely on systematic domain knowledge and applied best practices and are difficult to scale (Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. 2013. Effective Straggler Mitigation: Attack of the Clones. In Proceedings of the 10th USENIXConference on Networked Systems Design and Implementation.185–198.) ( Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. ACM Communication. 56(2): 74–80, 2013.)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Spark Procrastination Task Diagnosis Method Based on Statistical Analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] The following describes the present invention in detail with reference to specific embodiments and drawings, and the method flow of the embodiments of the present invention:

[0017] As the use environment of the method of the embodiment of the present invention, such as figure 1 As shown, five physical hosts are deployed to form a cluster. Each server is configured with a 16-core CPU Intel Xeon E5-5620, 32KB first-level cache, 256KB second-level cache, 12MB third-level cache, and 16G memory. These servers pass a 1Gbps bandwidth network connection. Spark 2.2 and HDFS 2.2 are deployed on the cluster. One server is the Master node, three servers are the Slave nodes, and the other server is the monitoring data collection and analysis node. The operating system is CentOS 6.5. Deploy the benchmarking software HiBench on the platform to simulate various loads. Deploy an agent of open source monitoring software Zabbix (https: / / www.zabbix.com) on each physical server to collect...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a Spark procrastination task diagnosis method based on statistical analysis, which monitors data centers to execute tasks in parallel, and deduces the reason of task executiondelay from the monitoring data. In the same phase, the same task in different nodes is monitored, the monitoring data of the physical server is collected, the Spark log file is analyzed to obtain the monitoring data of task execution. Task execution time is compared to detect delayed tasks, and the deviation of delayed tasks from the eigenvalues of normal tasks is analyzed to locate abnormal features, so as to diagnose the causes of task delays.

Description

Technical field [0001] The invention relates to a Spark delay task diagnosis method based on statistical analysis, and belongs to the technical field of software. Background technique [0002] Spark is a parallel data processing model, and Stragglers are extremely slow tasks in Spark's data processing jobs, which delay the completion of the entire job, and are common in data centers with multiple jobs. The data center decomposes data processing computing tasks into many tasks, which are executed in parallel on multiple machines, and the results are aggregated when the last task is completed. Stragglers threatens the parallel computing performance of the data center, and its impact will vary with the number of tasks and system performance. Increase in scale. Studies have shown that in the Google data center, delayed tasks cause 20% of the work to complete more than 1.5 times. In Facebook and Microsoft's data centers, Straggler increased the average time to complete a job by 47% ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/30G06F11/34
CPCG06F11/3051G06F11/3452
Inventor 刘延新李亚琼吴昊李守超
Owner JIANGSU HOPERUN SOFTWARE CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More