Data parallel computing-oriented fault-tolerant method

A parallel computing and data-oriented technology, which is applied in the direction of data error detection and response error generation, which can solve the problems of waste of computing resources and reduction of parallel efficiency, so as to improve efficiency and avoid the spread of errors. , The effect of shortening the fault tolerance time

Inactive Publication Date: 2013-01-30
NANJING NORMAL UNIVERSITY
View PDF3 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The current mainstream software fault-tolerant strategy is oriented to time redundancy, which leads to the failure of the node to resume the task. Since the recovery time is longer than the time interval between the previous checkpoint and the moment of failure, a large amount of remaining computing resources are left. is idle, and these problems lead to reduced parallel efficiency and waste of computing resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data parallel computing-oriented fault-tolerant method
  • Data parallel computing-oriented fault-tolerant method
  • Data parallel computing-oriented fault-tolerant method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] Embodiments of the present invention provide an error detection and recovery mechanism for an error task, such as figure 1 shown, including the following steps:

[0042] Step 101: read in the data, and perform secondary redundancy processing on the slope algorithm in the digital terrain analysis according to the redundancy strategy RI;

[0043] Step 102: The arbiter compares and judges the calculation results of the original node and the copy node, if no fault occurs, then directly go to step 104, if there is an error, then execute step 103;

[0044] Step 103: Perform parallel recalculation on the data block corresponding to the error task based on the secondary division and secondary scheduling strategy;

[0045] Step 104: system recovery, continue to perform the following tasks.

[0046] Various details in the embodiments of the present invention are further described below in detail.

[0047] The data involved in the present invention are all grid DEM data with un...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data parallel computing-oriented fault-tolerant method, belonging to the technical field of parallel system fault tolerance. The data parallel computing-oriented fault-tolerant method is an error recovery strategy and method for carrying out secondary division and secondary scheduling based on a data block corresponding to an error task. The data parallel computing-oriented fault-tolerant method comprises the following steps of: carrying out secondary redundant or triple redundant computing and result determination on a critical computing task; constructing a data composition structure based on a memory page scheduling strategy; and secondarily dividing based on the number of idle nodes and a fault-tolerant data block of the minimum data block. The data parallel computing-oriented fault-tolerant method can be completely applied to a fault-tolerant processing occasion of high-performance computing of parallel digital terrain analysis of large-scale mass data, such as terrain factor extraction of regular grid parallel interpolation, slope aspect parallel computing, depression filling parallel computing, and the like. The data parallel computing-oriented fault-tolerant method can be applied to high-performance computing of geographic information processing, and also can be applied to application occasions of space decision analysis and data mining based on geographic information, and the like to improve the processing efficiency.

Description

technical field [0001] The invention belongs to the technical field of parallel system fault tolerance, relates to error detection and recovery of key computing tasks by using redundant computing, and particularly proposes a fault recovery strategy based on secondary division and secondary scheduling of fault-tolerant data blocks. Background technique [0002] The fault-tolerant processing of computer system is a problem that cannot be ignored. A system is fault-tolerant in the sense that its programs can still run correctly despite logical failures. [0003] In recent years, with the increase in the complexity of the system structure, the development of semiconductor manufacturing technology, the reduction of line width and the improvement of integration, from user desktop systems to distributed computing environments, and even large-scale parallel computer systems, power consumption and reliability The problems are becoming more and more prominent. The reliability of a c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/14
Inventor 窦万峰杨坤许敏宋效东汤国安
Owner NANJING NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products