Cascading fault-tolerant processing method adaptive to CPU and GPU heterogeneous cluster

A technology of heterogeneous clusters and processing methods, applied in the computer field, can solve the problems of long running time, large amount of data access, and high requirements for process fault tolerance, and achieve the effect of reducing losses and quickly resetting services

Active Publication Date: 2017-12-22
北京丁牛科技有限公司
View PDF5 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Large-scale data analysis and processing applications have high computational complexity, use large-scale computing resources, run for a long time, and require high reliability of computing resources. Some application services have a large amount of data access and require large-scale read-write storage devices; some Communication is intensive and requires high network transmission of the system; some occupy a large amount of GPU space, frequent read and write operations, and require high GPU stability; some processes are complex, with many intermediate links, and single-step errors affect the entire service process. High fault tolerance requirements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cascading fault-tolerant processing method adaptive to CPU and GPU heterogeneous cluster
  • Cascading fault-tolerant processing method adaptive to CPU and GPU heterogeneous cluster
  • Cascading fault-tolerant processing method adaptive to CPU and GPU heterogeneous cluster

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The exemplary embodiments will be described in detail here, and examples thereof are shown in the accompanying drawings. When the following description refers to the accompanying drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The implementation manners described in the following exemplary embodiments do not represent all implementation manners consistent with the present disclosure. Rather, they are only examples of devices and methods consistent with some aspects of the present disclosure as detailed in the appended claims.

[0044] In the following, the present invention will be described in detail with reference to specific implementation examples. The following implementation examples will help those skilled in the art to further understand the present invention, but do not limit the present invention in any form. It should be noted that for those of ordinary skill in the art, several changes and i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a cascading fault-tolerant processing method adaptive to a CPU and GPU heterogeneous cluster. The method comprises the steps that a data transmission consistency detection model is constructed and used for detecting consistency of data transmission; a data access consistency model is constructed and used for realizing consistency of data access between a CPU and a GPU; a data operation result correctness detection model is constructed and used for detecting the correctness of a thread data operation result; a service backup model is constructed on an application layer and used for backing up historical records of service running; and a service job information backup model is constructed on a system layer and used for backing up job information of service running. In this way, when a non-physical damage fault occurs to the CPU and GPU heterogeneous cluster, a service fault is quickly positioned for business personnel, a pre-fault state is extracted, a service is reset quickly, and losses are reduced.

Description

Technical field [0001] The present disclosure relates to the field of computer technology, and in particular to a cascaded fault-tolerant processing method adapted to heterogeneous clusters of CPUs and GPUs. Background technique [0002] Since the birth of the first computer, computer technology has developed rapidly, from personal computers (Personal Computer) to super computers (Super Computer) in various forms. Humans’ demand for computing is endless. In order to meet human’s demand for computing power, computers are produced faster and faster. In 2016, my country’s "Taihu Light" and "Tianhe 2" supercomputers were developed for the first time. It occupies the first and second place in the world's supercomputer rankings, and its operating capacity has reached the order of 100Pbit (petabit, gigabit) for the first time, reflecting the comprehensive capabilities of our country. But at the same time, supercomputing power is accompanied by super complex and huge computer architectur...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/14G06F9/54
CPCG06F9/546G06F11/1443G06F11/1458
Inventor 姜海王忠儒李海磊
Owner 北京丁牛科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products